# Company World Model — Build Spec (generalizable agent handoff)

> **Reader: this is for the implementing agent.** Hand it to your coding agent and point it at your
> repo. Whatever you're building — a company memory, an assistant, a knowledge or answer layer — if it
> has to answer questions from a changing body of truth and be *trusted*, this is the architecture and
> the guardrails.
>
> It is **stack-agnostic**. The model, the invariants, and the gates apply to any language or database.
> The worked reference throughout is one concrete build (.NET 10 + PostgreSQL), so the code is real and
> copyable — substitute your own stack and keep the columns and the invariants.
>
> **How to use:** read §0–§3 before writing code. Build in the slices of §5, smallest first; a slice is
> not done until its gate (§3) passes **and can be made to fail**. Port the cited functions (§4)
> faithfully — they're proven; don't re-invent them.

---

## 0. What you're building, and how to land it on an existing system

**The one move.** Stop storing "the truth." Store **claims** — each one carrying *how you know it*.
A claim is a fact plus its **epistemic status**: `observed` (a source showed up this way), `inferred`
(synthesis derived it), or `asserted` (a person or agent set it, superseding, with who/when/why). A
fact's current value is **`fold(accepted claims for that fact)`**, never a mutable field.

**The mapping pattern (how this lands on a system you already have):**

- If you already have an **append-only ingest store**, that becomes the substrate behind **observed**
  claims — the claim points at the ingested row as its evidence. You don't replace it.
- **Synthesis** (an extractor / LLM step) writes **inferred** claims, carrying their reasoning + a
  confidence.
- A **person or agent** writes **asserted** claims — superseding, attributed. The author is also part of
  the permission model.
- **Answers, exports, and curated views become projections** of the accepted claim-set, each verified in
  its own currency.

**Worked reference (DE's company-memory service).** It already has the substrate this plugs into: an
append-only `raw_content` ingest archive (identity `(source, source_native_id)`, version-guarded on event
time) → the **observed** substrate; a four-tier access core (`everyone < account_managers < leadership <
admin`, fail-closed `Unknown`, `min(token, roster)`, on-behalf-of) → the permission seam; an append-only
`audit_log`; and a `client_intelligence_staging` table that *already* carries `provenance`,
`source_provenance`, and `superseded_by` columns — a half-built gesture at exactly this model. The claim
layer is the missing first-class piece over that substrate.

**The floor you must not weaken.** If your system has an access/permission model, the claim layer **adds
to** it — it never relaxes it. (DE's proven, stack-agnostic contract: `de-memory`'s
`behavior-acceptance-spec.md` §1–§7 — identity/on-behalf-of, `min(ceiling, roster)`, unknown→deny,
in-query tier filtering, citation-at-write, append-only audit, hash-only tokens. Treat those *kinds* of
guarantees as non-negotiable wherever they exist.)

---

## 1. The architecture (the model you are building)

Three epistemic statuses (canonical — see DE's `genome-living-provenance.md`, "THE CRYSTALLIZED MODEL"):

| Status | Means | Confidence base |
|---|---|---|
| `observed` | "the source showed up this way" — an ingested record. Certain *as an observation*. | 0.80 |
| `inferred` | "this observation contains X" — synthesis derived it, with reasoning. | 0.55 |
| `asserted` | "X is now Y" — a person/agent set it, superseding, with who/when/why. **The author is the permission.** | 0.30 |

Everything follows from that:

- **Truth is a fold, not a field.** Current value = the latest accepted assertion on a lineage back to
  the original observation. Append-only; you never overwrite.
- **Confidence is computed, never stored as an independent literal** — a deterministic monotonic function
  of `(epistemic_status, grounded)`, fail-closed to 0 on anything unknown.
- **Two clocks** — `recorded_at` (when you learned it) and `valid_from` (when it was true in the world),
  the latter carrying an explicit, never-guessed basis label.
- **Every surface is a projection, each verified in its own currency** — an answer by *meaning*
  (citation/entailment), a structured export by *fields*, a curated mirror by *its own consistency*. No
  surface is a second source of truth.
- **Substrate is a flat append-only log, NOT a graph database.** Relationships are claims + a thin entity
  layer (§2.4, §7).

---

## 2. The data model — reference implementation (.NET 10 + PostgreSQL)

> One concrete rendering. The model is stack-agnostic: substitute your ORM/DB, keep the columns and the
> invariants. Ported from DE's genome claim-record (`neuron : scripts/gene-record/atom.ts`).

### 2.1 `claims` — the claim-log (the "gene"). Append-only.

```csharp
// A fact's current value is fold(accepted claims for fact_id), NEVER a stored mutable field.
public class Claim
{
    public Guid Id { get; set; }

    // IDENTITY — what it is + which fact in the cascade.
    public string ValueType { get; set; } = "";   // 'telephone' | 'mrr' | 'status' | 'policy' | ...
    public string FactId    { get; set; } = "";   // stable fact key, e.g. 'client:acme#mrr'

    // The assertion: subject–predicate–object, polarity.
    public string SubjectClass { get; set; } = ""; // OPEN registry: 'client' | 'person' | 'agency'...
    public string SubjectId    { get; set; } = ""; // 'client:acme'
    public string Predicate    { get; set; } = ""; // 'has_mrr' | 'has_status' | ...
    public string Object       { get; set; } = ""; // jsonb — the claimed value
    public string Polarity     { get; set; } = "asserts"; // 'asserts' | 'denies'

    // HOW WE KNOW IT.
    public string EpistemicStatus { get; set; } = ""; // 'observed' | 'inferred' | 'asserted'
    public bool   Grounded        { get; set; }       // tied to a re-checkable source row?
    public double Confidence      { get; set; }       // == Confidence.Compute(status, grounded) — GATED

    // WHO + WHERE-FROM (keep source PURE evidence, never identity).
    public string  ActorId       { get; set; } = ""; // who asserted/observed — DRIVES permission
    public string  ActorKind     { get; set; } = ""; // 'person' | 'agent' | 'system'
    public string  SourceChannel { get; set; } = ""; // evidence channel: 'stripe' | 'slack' | 'vault'
    public string? SourceRef     { get; set; }       // ingest-row id / receipt id / doc#anchor

    // TWO CLOCKS (bitemporal).
    public DateTimeOffset  RecordedAt     { get; set; } // transaction time — when you learned it (STRICT)
    public DateTimeOffset  ValidFrom      { get; set; } // valid time — when true in the world
    public string          ValidFromBasis { get; set; } = ""; // STATED basis, NEVER silently defaulted
    public DateTimeOffset? RetractedAt    { get; set; }
    public DateTimeOffset? ValidTo        { get; set; }

    // PERMISSION + lineage.
    public string AccessTier  { get; set; } = ""; // your tier vocabulary
    public string Visibility  { get; set; } = "team";
    public string Disposition { get; set; } = "live"; // 'live' | 'retracted' | 'obsolete'
    public Guid?  Supersedes  { get; set; }
    public Guid?  Invalidates { get; set; }
    public Guid?  Reinstates  { get; set; }

    public DateTimeOffset CreatedAt { get; set; }
}
```

**Append-only is a role grant, not a convention.** The app role gets `INSERT, SELECT` on `claims` —
**no `UPDATE`, no `DELETE`**. Supersession and retraction are *new rows*, never mutations.

### 2.2 `Confidence.Compute` — port verbatim (do not invent the numbers)

Faithful port of `neuron : scripts/gene-record/confidence.ts`. The *ordering* is the invariant; the
constants are calibration placeholders behind a stable call site.

```csharp
public static class Confidence
{
    private static readonly IReadOnlyDictionary<string, double> Base =
        new Dictionary<string, double>(StringComparer.Ordinal)
        { ["observed"] = 0.80, ["inferred"] = 0.55, ["asserted"] = 0.30 };
    private const double GroundedBump = 0.15;

    // Deterministic, monotonic, total, FAIL-CLOSED (unknown -> 0; no false green).
    public static double Compute(string epistemic, bool grounded)
    {
        if (!Base.TryGetValue(epistemic, out var b)) return 0.0;
        var raw = b + (grounded ? GroundedBump : 0.0);
        return Math.Clamp(Math.Round(raw, 2), 0.0, 1.0); // 0.95/0.80/0.70/0.55/0.45/0.30
    }
}
```

`Confidence` is stored only as a denormalized cache; **G-CONF recomputes and asserts equality** — a
hand-set or drifted value fails the build.

### 2.3 The fold — current value as a read model (never a field)

```sql
-- CURRENT accepted value of every fact = latest valid, non-retracted, non-superseded assertion.
-- A READ MODEL (view / materialized view), never a stored mutable column.
create view fact_current as
select distinct on (c.fact_id)
       c.fact_id, c.value_type, c.object, c.epistemic_status, c.confidence,
       c.actor_id, c.access_tier, c.valid_from, c.recorded_at, c.id as claim_id
from   claims c
where  c.disposition = 'live'
  and  c.retracted_at is null
  and  not exists (select 1 from claims s where s.supersedes = c.id)
order  by c.fact_id, c.valid_from desc, c.recorded_at desc;

-- AS-OF (bitemporal): what you BELIEVED about a fact at transaction-time :as_of.
-- (Add a valid-time predicate to ask what was TRUE-in-world then — the two axes are independent.)
select distinct on (c.fact_id) c.fact_id, c.object, c.confidence
from   claims c
where  c.recorded_at <= :as_of
  and  (c.retracted_at is null or c.retracted_at > :as_of)
  and  not exists (select 1 from claims s where s.supersedes = c.id and s.recorded_at <= :as_of)
order  by c.fact_id, c.valid_from desc, c.recorded_at desc;
```

Port the matrices from `neuron : scripts/gene-record/gates/{g3-pin-the-fold, g4-as-of-matrix}.ts`.

### 2.4 The entity layer — relationships without a graph DB

Relationships and merges are **claims**, not edges in a graph engine
(`genome-living-provenance.md`, "Provenance & entities on an edit"):

- **Identity by referent, default-separate** — two facts sharing a value are **never** auto-merged.
- **A merge is an evidence-backed, reversible, attributed claim** (`predicate: 'same_as'`, with actor +
  source + a reversible supersede). No silent coalescing.
- **Use vs mention** — editing an entity cascades to its *uses*; it never rewrites a *quote* of the old
  value.
- Typed relationships (`account_manager_of`, `depends_on_vendor`, `decided_by`) are claims with subject +
  object = entity ids, carrying their own provenance and validity. Multi-hop is a recursive query over the
  flat log, not a graph-traversal engine.

### 2.5 Projections, each verified in its own currency

| Projection | Built from | Verified by (its currency) |
|---|---|---|
| An answer (chat / agent) | fold + retrieval | **meaning** — every load-bearing claim cites a source that was shown, and the span ENTAILS it |
| A structured export / API | fold | **fields** — typed field correctness |
| A curated index/mirror | accepted claims | **its own consistency** (e.g. a content-hash reconciliation) |

No projection is editable independently; truth changes by a new claim, then projections re-derive.

---

## 3. The invariants, as gates (red/green — each MUST be able to FAIL)

A gate that cannot be made to go red on a deliberate break is vacuous and itself fails. Degenerate
controls are mandatory.

| Gate | Asserts | Degenerate control (must FAIL) |
|---|---|---|
| **G-BORN** | Every claim validates: status ∈ {observed,inferred,asserted}, grounded is bool, both clocks present, identity present. Fail-closed validator, never throws. | A claim missing `epistemic_status` / `valid_from_basis` is REJECTED at insert. |
| **G-CONF** | `confidence == Confidence.Compute(status, grounded)`; strict ordering across all 6 pairs; unknown → 0. | A hand-set/constant/reversed confidence FAILS. |
| **G-CLOCKS** | Both clocks present; `valid_from_basis` non-empty; an unlabeled `valid_from = recorded_at` collapse FAILS. | A claim faking two clocks from one (no basis) FAILS. |
| **G-FOLD** | `fact_current` = the latest accepted assertion; an edit is a NEW asserted claim that supersedes; the prior row is untouched. | An UPDATE-in-place of a value FAILS (no UPDATE grant); a fold returning a superseded value FAILS. |
| **G-ASOF** | "As of T" reconstructs the accepted set as it stood at T. | An as-of query leaking a later supersession FAILS. |
| **G-SUPERSEDE** | Superseding/retracting never deletes; superseded claims remain queryable as `obsolete`. | A hard delete of a superseded claim FAILS the retained-traceable check. |
| **G-PERMISSION** | Reads gated in the query by `access_tier` + actor; `min(token, roster)`; unknown → deny; the model never receives out-of-tier claims. | A low-tier caller receiving a higher-tier claim FAILS. |
| **G-CITE** | Every inferred/asserted claim cites evidence; cited ⊆ shown; cited resolves; zero-cite FAILS; **the span ENTAILS the claim** (not substring containment); fail-closed. | A claim citing an id never shown, OR a "source says NOT X" cited for X, FAILS. |
| **G-AUDIT** | Append-only; every read (ok/denied/errored) audited; attribution = the person, mediator recorded separately. | An audit UPDATE/DELETE FAILS; a service masquerading as the person FAILS. |
| **G-NO-GRAPHDB** | Relationships live as claims + entity layer on the flat log; no triple-store/graph engine introduced. | A new RDF/graph dependency or an edges table outside the claim-log FAILS review. |

---

## 4. Reuse map — port these, don't re-invent (DE's proven assets)

**From the genome (`Digital-Empathy/neuron`):**

- `scripts/gene-record/atom.ts` — the claim-record (three roles + epistemic status + two clocks + fold +
  supersession) and its fail-closed validator. The shape of `Claim` and G-BORN.
- `scripts/gene-record/confidence.ts` — `computeConfidence`. Port verbatim (§2.2).
- `scripts/gene-record/gates/{g1-two-clocks, g3-pin-the-fold, g4-as-of-matrix, g7-obsolescence,
  g9-confidence}.ts` — the gate assertions + their degenerate controls (§3).
- `docs/ai/genome-design-laws.md` — the four laws: capture-don't-reconstruct (lossless); **the model is
  the surface, consumers adapt** (never bend the claim model to a consumer's current shape); one
  coordinate system + fail-closed verify; one source of truth, every surface a projection.
- `docs/ai/genome-living-provenance.md` — the claim-log model, the three traps (§6), the frontier
  citations (§7).

**Provenance + freshness + abstention (DE's assistant world model — the pattern, adapt inline):**

- Provenance is *how-known*, with freshness + supersession on every entry:
  `{ status, source, observedAt, staleAfterMs (null = doesn't age), supersedes, tags }`. Carry
  `staleAfterMs` as a per-claim freshness clock; hedge with age at read.
- **Entailment-checked writes:** a stored supporting *span* must ENTAIL the claim, judged adversarially —
  stronger than "the cited id was shown." This is the upgrade to G-CITE.
- **Abstention + scheduled hunt:** when no accepted claim answers a query, return an honest decline AND
  persist a hunt task `{aim, askedAt, sourceHint}` — ignorance becomes scheduled repair, not a guess.
- **Bounded symbolic retrieval** into model context (tags-first, capped) — no embedding store required for
  the structured-claim path; embeddings stay a selective, separate axis.

---

## 5. Build increments (slices — smallest first, each independently shippable)

Per slice: spec the delta → one independent adversarial review (reviewer ≠ author) → build disk-first,
small diff → run the slice's gate (prove it can fail) → integrate on green. Any access/permission floor
stays green throughout.

1. **Foundation.** `claims` table + entity + role grants (insert/select only) + `Confidence.Compute` +
   the fail-closed validator. **Gates: G-BORN, G-CONF.** No behavior change — pure substrate.
2. **Time.** Two clocks + `fact_current` view + the as-of query. **Gates: G-CLOCKS, G-ASOF.**
3. **Fold + lineage.** Supersession/retraction as new rows; `fact_current` is the only value source. An
   adapter lifts existing ingest rows into **observed** claims for ONE pilot fact-type, to prove the path
   end-to-end on real data. **Gates: G-FOLD, G-SUPERSEDE.**
4. **Projections + provenance-honest answers.** Wire your answer/index path as a projection of accepted
   claims; every answer carries how-we-know; extend citation validation to **entailment**. **Gate: G-CITE.**
5. **Permission seam.** `actor_id` + `access_tier` gating on claim reads, reusing your identity/tier model;
   full audit on the claim path. **Gates: G-PERMISSION, G-AUDIT.**
6. **Entity layer.** Default-separate, merge-as-evidence-backed-claim, use-vs-mention, typed relationships
   as claims. **Gate: G-NO-GRAPHDB** (plus a relationship-fold test).
7. **Freshness + abstention** (adopt from the assistant world model): per-claim `staleAfterMs` + read-time
   age hedge; abstention + the scheduled hunt task. No new machinery — seed-content + retrieval rules.

---

## 6. The three traps to avoid (genome frontier scan — `genome-living-provenance.md`)

- **Property Sourcing (anchor-too-low).** Log business-meaningful **claims** with epistemic status — not
  field- or byte-level diffs. The log is immutable, so you can never add the intent back. Build claims rich
  from line one.
- **Input-addressed ≠ output-verified (verification theater).** Gates assert the **emitted output matches
  the accepted claim-set**, not "the pipeline ran." Keep checks content-addressed.
- **Custody break.** Once you assert provenance, an **undocumented** transition is worse than none. The log
  is append-only; an unattributed mutation fails closed; rewriting history (a force-push, a row UPDATE) is
  the cardinal sin.

---

## 7. Research & citations (the evidence behind each choice)

Each choice rests on a field default, vetted in DE's `genome-living-provenance.md` "Frontier validation,"
whose verdict was **take the mental model, reject the substrate.**

- **Event sourcing + CQRS** — a fact is `fold(events)`; serve from read models, never a live log scan. →
  the claim-log spine + `fact_current`.
- **Bitemporal data — valid time vs transaction time** (Snodgrass; SQL:2011 system- vs application-time).
  → the two clocks.
- **Datomic** — a rich time-travel fact store needs only a flat **accretion-only log**, no graph store. →
  the no-graph-DB substrate.
- **Git2PROV** — commits→activities, files→entities, committers→agents; a content-addressed Merkle log
  *is* a provenance store. → append-only + attribution + custody.
- **ANSI/SPARC three-schema** (value / lexical / encoding; cf. Parquet logical/physical/encoding). → the
  identity-vs-serialization separation.
- **Wikidata three-rank** (preferred / normal / deprecated; never delete). → supersession-not-deletion.
- **Rejected as substrate** (the field's own anti-patterns): triple-store / RDF / nanopublication /
  argumentation frameworks — triple-explosion, reification blow-up, AGM exponential space. Mental model
  only; storage stays a flat log.
- **Agent-memory frame** — provenance-attributed entity store + symbolic retrieval into context, with
  freshness decay and abstention; rejecting embedding-store-as-default and LLM-as-memory.

---

## 8. Definition of done

The world-model layer is done when, on real data through the production path:

1. Every claim carries how-we-know — status, grounded, two clocks, actor, source (**G-BORN, G-CLOCKS**);
2. confidence is computed and ordered, fail-closed (**G-CONF**);
3. truth folds to the latest accepted assertion and as-of queries reconstruct the past (**G-FOLD, G-ASOF**);
4. supersession never deletes (**G-SUPERSEDE**);
5. reads are permission-gated **in the query** and fully audited (**G-PERMISSION, G-AUDIT**);
6. every inferred/asserted claim is citation- **and entailment**-checked, fail-closed (**G-CITE**);
7. relationships live as claims on the flat log — no graph DB (**G-NO-GRAPHDB**);
8. every gate is falsifiable — its degenerate control FAILS;
9. and any pre-existing access/permission contract stays green: **the security model never weakened in
   translation.**

*Not part of "done": embeddings/vector tuning (a selective, separate axis), source adapters (per-system),
report formatting. Build the claim-log; the surfaces are projections.*
