Company World Model — Build Spec

Reader: this is for the implementing agent. Hand it to your coding agent and point it at your repo. Whatever you're building — a company memory, an assistant, a knowledge or answer layer — if it has to answer questions from a changing body of truth and be trusted, this is the architecture and the guardrails.

It is stack-agnostic. The model, the invariants, and the gates apply to any language or database. The worked reference throughout is one concrete build (.NET 10 + PostgreSQL), so the code is real and copyable — substitute your own stack and keep the columns and the invariants.

How to use: read §0–§3 before writing code. Build in the slices of §5, smallest first; a slice is not done until its gate (§3) passes and can be made to fail. Port the cited functions (§4) faithfully — they're proven; don't re-invent them.

0. What you're building, and how to land it on an existing system

The one move. Stop storing "the truth." Store claims — each one carrying how you know it. A claim is a fact plus its epistemic status: observed (a source showed up this way), inferred (synthesis derived it), or asserted (a person or agent set it, superseding, with who/when/why). A fact's current value is fold(accepted claims for that fact), never a mutable field.

The mapping pattern (how this lands on a system you already have):

If you already have an append-only ingest store, that becomes the substrate behind observed claims — the claim points at the ingested row as its evidence. You don't replace it.
Synthesis (an extractor / LLM step) writes inferred claims, carrying their reasoning + a confidence.
A person or agent writes asserted claims — superseding, attributed. The author is also part of the permission model.
Answers, exports, and curated views become projections of the accepted claim-set, each verified in its own currency.

Worked reference (DE's company-memory service). It already has the substrate this plugs into: an append-only raw_content ingest archive (identity (source, source_native_id), version-guarded on event time) → the observed substrate; a four-tier access core (everyone < account_managers < leadership < admin, fail-closed Unknown, min(token, roster), on-behalf-of) → the permission seam; an append-only audit_log; and a client_intelligence_staging table that already carries provenance, source_provenance, and superseded_by columns — a half-built gesture at exactly this model. The claim layer is the missing first-class piece over that substrate.

The floor you must not weaken. If your system has an access/permission model, the claim layer adds to it — it never relaxes it. (DE's proven, stack-agnostic contract: de-memory's behavior-acceptance-spec.md §1–§7 — identity/on-behalf-of, min(ceiling, roster), unknown→deny, in-query tier filtering, citation-at-write, append-only audit, hash-only tokens. Treat those kinds of guarantees as non-negotiable wherever they exist.)

1. The architecture (the model you are building)

Three epistemic statuses (canonical — see DE's genome-living-provenance.md, "THE CRYSTALLIZED MODEL"):

Status	Means	Confidence base
`observed`	"the source showed up this way" — an ingested record. Certain as an observation.	0.80
`inferred`	"this observation contains X" — synthesis derived it, with reasoning.	0.55
`asserted`	"X is now Y" — a person/agent set it, superseding, with who/when/why. The author is the permission.	0.30

Everything follows from that:

Truth is a fold, not a field. Current value = the latest accepted assertion on a lineage back to the original observation. Append-only; you never overwrite.
Confidence is computed, never stored as an independent literal — a deterministic monotonic function of (epistemic_status, grounded), fail-closed to 0 on anything unknown.
Two clocks — recorded_at (when you learned it) and valid_from (when it was true in the world), the latter carrying an explicit, never-guessed basis label.
Every surface is a projection, each verified in its own currency — an answer by meaning (citation/entailment), a structured export by fields, a curated mirror by its own consistency. No surface is a second source of truth.
Substrate is a flat append-only log, NOT a graph database. Relationships are claims + a thin entity layer (§2.4, §7).

2. The data model — reference implementation (.NET 10 + PostgreSQL)

One concrete rendering. The model is stack-agnostic: substitute your ORM/DB, keep the columns and the invariants. Ported from DE's genome claim-record (neuron : scripts/gene-record/atom.ts).

2.1 `claims` — the claim-log (the "gene"). Append-only.

// A fact's current value is fold(accepted claims for fact_id), NEVER a stored mutable field.
public class Claim
{
    public Guid Id { get; set; }

    // IDENTITY — what it is + which fact in the cascade.
    public string ValueType { get; set; } = "";   // 'telephone' | 'mrr' | 'status' | 'policy' | ...
    public string FactId    { get; set; } = "";   // stable fact key, e.g. 'client:acme#mrr'

    // The assertion: subject–predicate–object, polarity.
    public string SubjectClass { get; set; } = ""; // OPEN registry: 'client' | 'person' | 'agency'...
    public string SubjectId    { get; set; } = ""; // 'client:acme'
    public string Predicate    { get; set; } = ""; // 'has_mrr' | 'has_status' | ...
    public string Object       { get; set; } = ""; // jsonb — the claimed value
    public string Polarity     { get; set; } = "asserts"; // 'asserts' | 'denies'

    // HOW WE KNOW IT.
    public string EpistemicStatus { get; set; } = ""; // 'observed' | 'inferred' | 'asserted'
    public bool   Grounded        { get; set; }       // tied to a re-checkable source row?
    public double Confidence      { get; set; }       // == Confidence.Compute(status, grounded) — GATED

    // WHO + WHERE-FROM (keep source PURE evidence, never identity).
    public string  ActorId       { get; set; } = ""; // who asserted/observed — DRIVES permission
    public string  ActorKind     { get; set; } = ""; // 'person' | 'agent' | 'system'
    public string  SourceChannel { get; set; } = ""; // evidence channel: 'stripe' | 'slack' | 'vault'
    public string? SourceRef     { get; set; }       // ingest-row id / receipt id / doc#anchor

    // TWO CLOCKS (bitemporal).
    public DateTimeOffset  RecordedAt     { get; set; } // transaction time — when you learned it (STRICT)
    public DateTimeOffset  ValidFrom      { get; set; } // valid time — when true in the world
    public string          ValidFromBasis { get; set; } = ""; // STATED basis, NEVER silently defaulted
    public DateTimeOffset? RetractedAt    { get; set; }
    public DateTimeOffset? ValidTo        { get; set; }

    // PERMISSION + lineage.
    public string AccessTier  { get; set; } = ""; // your tier vocabulary
    public string Visibility  { get; set; } = "team";
    public string Disposition { get; set; } = "live"; // 'live' | 'retracted' | 'obsolete'
    public Guid?  Supersedes  { get; set; }
    public Guid?  Invalidates { get; set; }
    public Guid?  Reinstates  { get; set; }

    public DateTimeOffset CreatedAt { get; set; }
}

Append-only is a role grant, not a convention. The app role gets INSERT, SELECT on claims — no UPDATE, no DELETE. Supersession and retraction are new rows, never mutations.

2.2 `Confidence.Compute` — port verbatim (do not invent the numbers)

Faithful port of neuron : scripts/gene-record/confidence.ts. The ordering is the invariant; the constants are calibration placeholders behind a stable call site.

public static class Confidence
{
    private static readonly IReadOnlyDictionary<string, double> Base =
        new Dictionary<string, double>(StringComparer.Ordinal)
        { ["observed"] = 0.80, ["inferred"] = 0.55, ["asserted"] = 0.30 };
    private const double GroundedBump = 0.15;

    // Deterministic, monotonic, total, FAIL-CLOSED (unknown -> 0; no false green).
    public static double Compute(string epistemic, bool grounded)
    {
        if (!Base.TryGetValue(epistemic, out var b)) return 0.0;
        var raw = b + (grounded ? GroundedBump : 0.0);
        return Math.Clamp(Math.Round(raw, 2), 0.0, 1.0); // 0.95/0.80/0.70/0.55/0.45/0.30
    }
}

Confidence is stored only as a denormalized cache; G-CONF recomputes and asserts equality — a hand-set or drifted value fails the build.

2.3 The fold — current value as a read model (never a field)

-- CURRENT accepted value of every fact = latest valid, non-retracted, non-superseded assertion.
-- A READ MODEL (view / materialized view), never a stored mutable column.
create view fact_current as
select distinct on (c.fact_id)
       c.fact_id, c.value_type, c.object, c.epistemic_status, c.confidence,
       c.actor_id, c.access_tier, c.valid_from, c.recorded_at, c.id as claim_id
from   claims c
where  c.disposition = 'live'
  and  c.retracted_at is null
  and  not exists (select 1 from claims s where s.supersedes = c.id)
order  by c.fact_id, c.valid_from desc, c.recorded_at desc;

-- AS-OF (bitemporal): what you BELIEVED about a fact at transaction-time :as_of.
-- (Add a valid-time predicate to ask what was TRUE-in-world then — the two axes are independent.)
select distinct on (c.fact_id) c.fact_id, c.object, c.confidence
from   claims c
where  c.recorded_at <= :as_of
  and  (c.retracted_at is null or c.retracted_at > :as_of)
  and  not exists (select 1 from claims s where s.supersedes = c.id and s.recorded_at <= :as_of)
order  by c.fact_id, c.valid_from desc, c.recorded_at desc;

Port the matrices from neuron : scripts/gene-record/gates/{g3-pin-the-fold, g4-as-of-matrix}.ts.

2.4 The entity layer — relationships without a graph DB

Relationships and merges are claims, not edges in a graph engine (genome-living-provenance.md, "Provenance & entities on an edit"):

Identity by referent, default-separate — two facts sharing a value are never auto-merged.
A merge is an evidence-backed, reversible, attributed claim (predicate: 'same_as', with actor + source + a reversible supersede). No silent coalescing.
Use vs mention — editing an entity cascades to its uses; it never rewrites a quote of the old value.
Typed relationships (account_manager_of, depends_on_vendor, decided_by) are claims with subject + object = entity ids, carrying their own provenance and validity. Multi-hop is a recursive query over the flat log, not a graph-traversal engine.

2.5 Projections, each verified in its own currency

Projection	Built from	Verified by (its currency)
An answer (chat / agent)	fold + retrieval	meaning — every load-bearing claim cites a source that was shown, and the span ENTAILS it
A structured export / API	fold	fields — typed field correctness
A curated index/mirror	accepted claims	its own consistency (e.g. a content-hash reconciliation)

No projection is editable independently; truth changes by a new claim, then projections re-derive.

3. The invariants, as gates (red/green — each MUST be able to FAIL)

A gate that cannot be made to go red on a deliberate break is vacuous and itself fails. Degenerate controls are mandatory.

Gate	Asserts	Degenerate control (must FAIL)
G-BORN	Every claim validates: status ∈ {observed,inferred,asserted}, grounded is bool, both clocks present, identity present. Fail-closed validator, never throws.	A claim missing `epistemic_status` / `valid_from_basis` is REJECTED at insert.
G-CONF	`confidence == Confidence.Compute(status, grounded)`; strict ordering across all 6 pairs; unknown → 0.	A hand-set/constant/reversed confidence FAILS.
G-CLOCKS	Both clocks present; `valid_from_basis` non-empty; an unlabeled `valid_from = recorded_at` collapse FAILS.	A claim faking two clocks from one (no basis) FAILS.
G-FOLD	`fact_current` = the latest accepted assertion; an edit is a NEW asserted claim that supersedes; the prior row is untouched.	An UPDATE-in-place of a value FAILS (no UPDATE grant); a fold returning a superseded value FAILS.
G-ASOF	"As of T" reconstructs the accepted set as it stood at T.	An as-of query leaking a later supersession FAILS.
G-SUPERSEDE	Superseding/retracting never deletes; superseded claims remain queryable as `obsolete`.	A hard delete of a superseded claim FAILS the retained-traceable check.
G-PERMISSION	Reads gated in the query by `access_tier` + actor; `min(token, roster)`; unknown → deny; the model never receives out-of-tier claims.	A low-tier caller receiving a higher-tier claim FAILS.
G-CITE	Every inferred/asserted claim cites evidence; cited ⊆ shown; cited resolves; zero-cite FAILS; the span ENTAILS the claim (not substring containment); fail-closed.	A claim citing an id never shown, OR a "source says NOT X" cited for X, FAILS.
G-AUDIT	Append-only; every read (ok/denied/errored) audited; attribution = the person, mediator recorded separately.	An audit UPDATE/DELETE FAILS; a service masquerading as the person FAILS.
G-NO-GRAPHDB	Relationships live as claims + entity layer on the flat log; no triple-store/graph engine introduced.	A new RDF/graph dependency or an edges table outside the claim-log FAILS review.

4. Reuse map — port these, don't re-invent (DE's proven assets)

From the genome (Digital-Empathy/neuron):

scripts/gene-record/atom.ts — the claim-record (three roles + epistemic status + two clocks + fold + supersession) and its fail-closed validator. The shape of Claim and G-BORN.
scripts/gene-record/confidence.ts — computeConfidence. Port verbatim (§2.2).
scripts/gene-record/gates/{g1-two-clocks, g3-pin-the-fold, g4-as-of-matrix, g7-obsolescence, g9-confidence}.ts — the gate assertions + their degenerate controls (§3).
docs/ai/genome-design-laws.md — the four laws: capture-don't-reconstruct (lossless); the model is the surface, consumers adapt (never bend the claim model to a consumer's current shape); one coordinate system + fail-closed verify; one source of truth, every surface a projection.
docs/ai/genome-living-provenance.md — the claim-log model, the three traps (§6), the frontier citations (§7).

Provenance + freshness + abstention (DE's assistant world model — the pattern, adapt inline):

Provenance is how-known, with freshness + supersession on every entry: { status, source, observedAt, staleAfterMs (null = doesn't age), supersedes, tags }. Carry staleAfterMs as a per-claim freshness clock; hedge with age at read.
Entailment-checked writes: a stored supporting span must ENTAIL the claim, judged adversarially — stronger than "the cited id was shown." This is the upgrade to G-CITE.
Abstention + scheduled hunt: when no accepted claim answers a query, return an honest decline AND persist a hunt task {aim, askedAt, sourceHint} — ignorance becomes scheduled repair, not a guess.
Bounded symbolic retrieval into model context (tags-first, capped) — no embedding store required for the structured-claim path; embeddings stay a selective, separate axis.

5. Build increments (slices — smallest first, each independently shippable)

Per slice: spec the delta → one independent adversarial review (reviewer ≠ author) → build disk-first, small diff → run the slice's gate (prove it can fail) → integrate on green. Any access/permission floor stays green throughout.

Foundation. claims table + entity + role grants (insert/select only) + Confidence.Compute + the fail-closed validator. Gates: G-BORN, G-CONF. No behavior change — pure substrate.
Time. Two clocks + fact_current view + the as-of query. Gates: G-CLOCKS, G-ASOF.
Fold + lineage. Supersession/retraction as new rows; fact_current is the only value source. An adapter lifts existing ingest rows into observed claims for ONE pilot fact-type, to prove the path end-to-end on real data. Gates: G-FOLD, G-SUPERSEDE.
Projections + provenance-honest answers. Wire your answer/index path as a projection of accepted claims; every answer carries how-we-know; extend citation validation to entailment. Gate: G-CITE.
Permission seam. actor_id + access_tier gating on claim reads, reusing your identity/tier model; full audit on the claim path. Gates: G-PERMISSION, G-AUDIT.
Entity layer. Default-separate, merge-as-evidence-backed-claim, use-vs-mention, typed relationships as claims. Gate: G-NO-GRAPHDB (plus a relationship-fold test).
Freshness + abstention (adopt from the assistant world model): per-claim staleAfterMs + read-time age hedge; abstention + the scheduled hunt task. No new machinery — seed-content + retrieval rules.

6. The three traps to avoid (genome frontier scan — `genome-living-provenance.md`)

Property Sourcing (anchor-too-low). Log business-meaningful claims with epistemic status — not field- or byte-level diffs. The log is immutable, so you can never add the intent back. Build claims rich from line one.
Input-addressed ≠ output-verified (verification theater). Gates assert the emitted output matches the accepted claim-set, not "the pipeline ran." Keep checks content-addressed.
Custody break. Once you assert provenance, an undocumented transition is worse than none. The log is append-only; an unattributed mutation fails closed; rewriting history (a force-push, a row UPDATE) is the cardinal sin.

7. Research & citations (the evidence behind each choice)

Each choice rests on a field default, vetted in DE's genome-living-provenance.md "Frontier validation," whose verdict was take the mental model, reject the substrate.

Event sourcing + CQRS — a fact is fold(events); serve from read models, never a live log scan. → the claim-log spine + fact_current.
Bitemporal data — valid time vs transaction time (Snodgrass; SQL:2011 system- vs application-time). → the two clocks.
Datomic — a rich time-travel fact store needs only a flat accretion-only log, no graph store. → the no-graph-DB substrate.
Git2PROV — commits→activities, files→entities, committers→agents; a content-addressed Merkle log is a provenance store. → append-only + attribution + custody.
ANSI/SPARC three-schema (value / lexical / encoding; cf. Parquet logical/physical/encoding). → the identity-vs-serialization separation.
Wikidata three-rank (preferred / normal / deprecated; never delete). → supersession-not-deletion.
Rejected as substrate (the field's own anti-patterns): triple-store / RDF / nanopublication / argumentation frameworks — triple-explosion, reification blow-up, AGM exponential space. Mental model only; storage stays a flat log.
Agent-memory frame — provenance-attributed entity store + symbolic retrieval into context, with freshness decay and abstention; rejecting embedding-store-as-default and LLM-as-memory.

8. Definition of done

The world-model layer is done when, on real data through the production path:

Every claim carries how-we-know — status, grounded, two clocks, actor, source (G-BORN, G-CLOCKS);
confidence is computed and ordered, fail-closed (G-CONF);
truth folds to the latest accepted assertion and as-of queries reconstruct the past (G-FOLD, G-ASOF);
supersession never deletes (G-SUPERSEDE);
reads are permission-gated in the query and fully audited (G-PERMISSION, G-AUDIT);
every inferred/asserted claim is citation- and entailment-checked, fail-closed (G-CITE);
relationships live as claims on the flat log — no graph DB (G-NO-GRAPHDB);
every gate is falsifiable — its degenerate control FAILS;
and any pre-existing access/permission contract stays green: the security model never weakened in translation.

Not part of "done": embeddings/vector tuning (a selective, separate axis), source adapters (per-system), report formatting. Build the claim-log; the surfaces are projections.