Reader: this is for the implementing agent. Hand it to your coding agent and point it at your repo. Whatever you're building — a company memory, an assistant, a knowledge or answer layer — if it has to answer questions from a changing body of truth and be trusted, this is the architecture and the guardrails.
It is stack-agnostic. The model, the invariants, and the gates apply to any language or database. The worked reference throughout is one concrete build (.NET 10 + PostgreSQL), so the code is real and copyable — substitute your own stack and keep the columns and the invariants.
How to use: read §0–§3 before writing code. Build in the slices of §5, smallest first; a slice is not done until its gate (§3) passes and can be made to fail. Port the cited functions (§4) faithfully — they're proven; don't re-invent them.
0. What you're building, and how to land it on an existing system
The one move. Stop storing "the truth." Store claims — each one carrying how you know it.
A claim is a fact plus its epistemic status: observed (a source showed up this way), inferred
(synthesis derived it), or asserted (a person or agent set it, superseding, with who/when/why). A
fact's current value is fold(accepted claims for that fact), never a mutable field.
The mapping pattern (how this lands on a system you already have):
- If you already have an append-only ingest store, that becomes the substrate behind observed claims — the claim points at the ingested row as its evidence. You don't replace it.
- Synthesis (an extractor / LLM step) writes inferred claims, carrying their reasoning + a confidence.
- A person or agent writes asserted claims — superseding, attributed. The author is also part of the permission model.
- Answers, exports, and curated views become projections of the accepted claim-set, each verified in its own currency.
Worked reference (DE's company-memory service). It already has the substrate this plugs into: an
append-only raw_content ingest archive (identity (source, source_native_id), version-guarded on event
time) → the observed substrate; a four-tier access core (everyone < account_managers < leadership < admin, fail-closed Unknown, min(token, roster), on-behalf-of) → the permission seam; an append-only
audit_log; and a client_intelligence_staging table that already carries provenance,
source_provenance, and superseded_by columns — a half-built gesture at exactly this model. The claim
layer is the missing first-class piece over that substrate.
The floor you must not weaken. If your system has an access/permission model, the claim layer adds
to it — it never relaxes it. (DE's proven, stack-agnostic contract: de-memory's
behavior-acceptance-spec.md §1–§7 — identity/on-behalf-of, min(ceiling, roster), unknown→deny,
in-query tier filtering, citation-at-write, append-only audit, hash-only tokens. Treat those kinds of
guarantees as non-negotiable wherever they exist.)
1. The architecture (the model you are building)
Three epistemic statuses (canonical — see DE's genome-living-provenance.md, "THE CRYSTALLIZED MODEL"):
| Status | Means | Confidence base |
|---|---|---|
observed |
"the source showed up this way" — an ingested record. Certain as an observation. | 0.80 |
inferred |
"this observation contains X" — synthesis derived it, with reasoning. | 0.55 |
asserted |
"X is now Y" — a person/agent set it, superseding, with who/when/why. The author is the permission. | 0.30 |
Everything follows from that:
- Truth is a fold, not a field. Current value = the latest accepted assertion on a lineage back to the original observation. Append-only; you never overwrite.
- Confidence is computed, never stored as an independent literal — a deterministic monotonic function
of
(epistemic_status, grounded), fail-closed to 0 on anything unknown. - Two clocks —
recorded_at(when you learned it) andvalid_from(when it was true in the world), the latter carrying an explicit, never-guessed basis label. - Every surface is a projection, each verified in its own currency — an answer by meaning (citation/entailment), a structured export by fields, a curated mirror by its own consistency. No surface is a second source of truth.
- Substrate is a flat append-only log, NOT a graph database. Relationships are claims + a thin entity layer (§2.4, §7).
2. The data model — reference implementation (.NET 10 + PostgreSQL)
One concrete rendering. The model is stack-agnostic: substitute your ORM/DB, keep the columns and the invariants. Ported from DE's genome claim-record (
neuron : scripts/gene-record/atom.ts).
2.1 claims — the claim-log (the "gene"). Append-only.
// A fact's current value is fold(accepted claims for fact_id), NEVER a stored mutable field.
public class Claim
{
public Guid Id { get; set; }
// IDENTITY — what it is + which fact in the cascade.
public string ValueType { get; set; } = ""; // 'telephone' | 'mrr' | 'status' | 'policy' | ...
public string FactId { get; set; } = ""; // stable fact key, e.g. 'client:acme#mrr'
// The assertion: subject–predicate–object, polarity.
public string SubjectClass { get; set; } = ""; // OPEN registry: 'client' | 'person' | 'agency'...
public string SubjectId { get; set; } = ""; // 'client:acme'
public string Predicate { get; set; } = ""; // 'has_mrr' | 'has_status' | ...
public string Object { get; set; } = ""; // jsonb — the claimed value
public string Polarity { get; set; } = "asserts"; // 'asserts' | 'denies'
// HOW WE KNOW IT.
public string EpistemicStatus { get; set; } = ""; // 'observed' | 'inferred' | 'asserted'
public bool Grounded { get; set; } // tied to a re-checkable source row?
public double Confidence { get; set; } // == Confidence.Compute(status, grounded) — GATED
// WHO + WHERE-FROM (keep source PURE evidence, never identity).
public string ActorId { get; set; } = ""; // who asserted/observed — DRIVES permission
public string ActorKind { get; set; } = ""; // 'person' | 'agent' | 'system'
public string SourceChannel { get; set; } = ""; // evidence channel: 'stripe' | 'slack' | 'vault'
public string? SourceRef { get; set; } // ingest-row id / receipt id / doc#anchor
// TWO CLOCKS (bitemporal).
public DateTimeOffset RecordedAt { get; set; } // transaction time — when you learned it (STRICT)
public DateTimeOffset ValidFrom { get; set; } // valid time — when true in the world
public string ValidFromBasis { get; set; } = ""; // STATED basis, NEVER silently defaulted
public DateTimeOffset? RetractedAt { get; set; }
public DateTimeOffset? ValidTo { get; set; }
// PERMISSION + lineage.
public string AccessTier { get; set; } = ""; // your tier vocabulary
public string Visibility { get; set; } = "team";
public string Disposition { get; set; } = "live"; // 'live' | 'retracted' | 'obsolete'
public Guid? Supersedes { get; set; }
public Guid? Invalidates { get; set; }
public Guid? Reinstates { get; set; }
public DateTimeOffset CreatedAt { get; set; }
}
Append-only is a role grant, not a convention. The app role gets INSERT, SELECT on claims —
no UPDATE, no DELETE. Supersession and retraction are new rows, never mutations.
2.2 Confidence.Compute — port verbatim (do not invent the numbers)
Faithful port of neuron : scripts/gene-record/confidence.ts. The ordering is the invariant; the
constants are calibration placeholders behind a stable call site.
public static class Confidence
{
private static readonly IReadOnlyDictionary<string, double> Base =
new Dictionary<string, double>(StringComparer.Ordinal)
{ ["observed"] = 0.80, ["inferred"] = 0.55, ["asserted"] = 0.30 };
private const double GroundedBump = 0.15;
// Deterministic, monotonic, total, FAIL-CLOSED (unknown -> 0; no false green).
public static double Compute(string epistemic, bool grounded)
{
if (!Base.TryGetValue(epistemic, out var b)) return 0.0;
var raw = b + (grounded ? GroundedBump : 0.0);
return Math.Clamp(Math.Round(raw, 2), 0.0, 1.0); // 0.95/0.80/0.70/0.55/0.45/0.30
}
}
Confidence is stored only as a denormalized cache; G-CONF recomputes and asserts equality — a
hand-set or drifted value fails the build.
2.3 The fold — current value as a read model (never a field)
-- CURRENT accepted value of every fact = latest valid, non-retracted, non-superseded assertion.
-- A READ MODEL (view / materialized view), never a stored mutable column.
create view fact_current as
select distinct on (c.fact_id)
c.fact_id, c.value_type, c.object, c.epistemic_status, c.confidence,
c.actor_id, c.access_tier, c.valid_from, c.recorded_at, c.id as claim_id
from claims c
where c.disposition = 'live'
and c.retracted_at is null
and not exists (select 1 from claims s where s.supersedes = c.id)
order by c.fact_id, c.valid_from desc, c.recorded_at desc;
-- AS-OF (bitemporal): what you BELIEVED about a fact at transaction-time :as_of.
-- (Add a valid-time predicate to ask what was TRUE-in-world then — the two axes are independent.)
select distinct on (c.fact_id) c.fact_id, c.object, c.confidence
from claims c
where c.recorded_at <= :as_of
and (c.retracted_at is null or c.retracted_at > :as_of)
and not exists (select 1 from claims s where s.supersedes = c.id and s.recorded_at <= :as_of)
order by c.fact_id, c.valid_from desc, c.recorded_at desc;
Port the matrices from neuron : scripts/gene-record/gates/{g3-pin-the-fold, g4-as-of-matrix}.ts.
2.4 The entity layer — relationships without a graph DB
Relationships and merges are claims, not edges in a graph engine
(genome-living-provenance.md, "Provenance & entities on an edit"):
- Identity by referent, default-separate — two facts sharing a value are never auto-merged.
- A merge is an evidence-backed, reversible, attributed claim (
predicate: 'same_as', with actor + source + a reversible supersede). No silent coalescing. - Use vs mention — editing an entity cascades to its uses; it never rewrites a quote of the old value.
- Typed relationships (
account_manager_of,depends_on_vendor,decided_by) are claims with subject + object = entity ids, carrying their own provenance and validity. Multi-hop is a recursive query over the flat log, not a graph-traversal engine.
2.5 Projections, each verified in its own currency
| Projection | Built from | Verified by (its currency) |
|---|---|---|
| An answer (chat / agent) | fold + retrieval | meaning — every load-bearing claim cites a source that was shown, and the span ENTAILS it |
| A structured export / API | fold | fields — typed field correctness |
| A curated index/mirror | accepted claims | its own consistency (e.g. a content-hash reconciliation) |
No projection is editable independently; truth changes by a new claim, then projections re-derive.
3. The invariants, as gates (red/green — each MUST be able to FAIL)
A gate that cannot be made to go red on a deliberate break is vacuous and itself fails. Degenerate controls are mandatory.
| Gate | Asserts | Degenerate control (must FAIL) |
|---|---|---|
| G-BORN | Every claim validates: status ∈ {observed,inferred,asserted}, grounded is bool, both clocks present, identity present. Fail-closed validator, never throws. | A claim missing epistemic_status / valid_from_basis is REJECTED at insert. |
| G-CONF | confidence == Confidence.Compute(status, grounded); strict ordering across all 6 pairs; unknown → 0. |
A hand-set/constant/reversed confidence FAILS. |
| G-CLOCKS | Both clocks present; valid_from_basis non-empty; an unlabeled valid_from = recorded_at collapse FAILS. |
A claim faking two clocks from one (no basis) FAILS. |
| G-FOLD | fact_current = the latest accepted assertion; an edit is a NEW asserted claim that supersedes; the prior row is untouched. |
An UPDATE-in-place of a value FAILS (no UPDATE grant); a fold returning a superseded value FAILS. |
| G-ASOF | "As of T" reconstructs the accepted set as it stood at T. | An as-of query leaking a later supersession FAILS. |
| G-SUPERSEDE | Superseding/retracting never deletes; superseded claims remain queryable as obsolete. |
A hard delete of a superseded claim FAILS the retained-traceable check. |
| G-PERMISSION | Reads gated in the query by access_tier + actor; min(token, roster); unknown → deny; the model never receives out-of-tier claims. |
A low-tier caller receiving a higher-tier claim FAILS. |
| G-CITE | Every inferred/asserted claim cites evidence; cited ⊆ shown; cited resolves; zero-cite FAILS; the span ENTAILS the claim (not substring containment); fail-closed. | A claim citing an id never shown, OR a "source says NOT X" cited for X, FAILS. |
| G-AUDIT | Append-only; every read (ok/denied/errored) audited; attribution = the person, mediator recorded separately. | An audit UPDATE/DELETE FAILS; a service masquerading as the person FAILS. |
| G-NO-GRAPHDB | Relationships live as claims + entity layer on the flat log; no triple-store/graph engine introduced. | A new RDF/graph dependency or an edges table outside the claim-log FAILS review. |
4. Reuse map — port these, don't re-invent (DE's proven assets)
From the genome (Digital-Empathy/neuron):
scripts/gene-record/atom.ts— the claim-record (three roles + epistemic status + two clocks + fold + supersession) and its fail-closed validator. The shape ofClaimand G-BORN.scripts/gene-record/confidence.ts—computeConfidence. Port verbatim (§2.2).scripts/gene-record/gates/{g1-two-clocks, g3-pin-the-fold, g4-as-of-matrix, g7-obsolescence, g9-confidence}.ts— the gate assertions + their degenerate controls (§3).docs/ai/genome-design-laws.md— the four laws: capture-don't-reconstruct (lossless); the model is the surface, consumers adapt (never bend the claim model to a consumer's current shape); one coordinate system + fail-closed verify; one source of truth, every surface a projection.docs/ai/genome-living-provenance.md— the claim-log model, the three traps (§6), the frontier citations (§7).
Provenance + freshness + abstention (DE's assistant world model — the pattern, adapt inline):
- Provenance is how-known, with freshness + supersession on every entry:
{ status, source, observedAt, staleAfterMs (null = doesn't age), supersedes, tags }. CarrystaleAfterMsas a per-claim freshness clock; hedge with age at read. - Entailment-checked writes: a stored supporting span must ENTAIL the claim, judged adversarially — stronger than "the cited id was shown." This is the upgrade to G-CITE.
- Abstention + scheduled hunt: when no accepted claim answers a query, return an honest decline AND
persist a hunt task
{aim, askedAt, sourceHint}— ignorance becomes scheduled repair, not a guess. - Bounded symbolic retrieval into model context (tags-first, capped) — no embedding store required for the structured-claim path; embeddings stay a selective, separate axis.
5. Build increments (slices — smallest first, each independently shippable)
Per slice: spec the delta → one independent adversarial review (reviewer ≠ author) → build disk-first, small diff → run the slice's gate (prove it can fail) → integrate on green. Any access/permission floor stays green throughout.
- Foundation.
claimstable + entity + role grants (insert/select only) +Confidence.Compute+ the fail-closed validator. Gates: G-BORN, G-CONF. No behavior change — pure substrate. - Time. Two clocks +
fact_currentview + the as-of query. Gates: G-CLOCKS, G-ASOF. - Fold + lineage. Supersession/retraction as new rows;
fact_currentis the only value source. An adapter lifts existing ingest rows into observed claims for ONE pilot fact-type, to prove the path end-to-end on real data. Gates: G-FOLD, G-SUPERSEDE. - Projections + provenance-honest answers. Wire your answer/index path as a projection of accepted claims; every answer carries how-we-know; extend citation validation to entailment. Gate: G-CITE.
- Permission seam.
actor_id+access_tiergating on claim reads, reusing your identity/tier model; full audit on the claim path. Gates: G-PERMISSION, G-AUDIT. - Entity layer. Default-separate, merge-as-evidence-backed-claim, use-vs-mention, typed relationships as claims. Gate: G-NO-GRAPHDB (plus a relationship-fold test).
- Freshness + abstention (adopt from the assistant world model): per-claim
staleAfterMs+ read-time age hedge; abstention + the scheduled hunt task. No new machinery — seed-content + retrieval rules.
6. The three traps to avoid (genome frontier scan — genome-living-provenance.md)
- Property Sourcing (anchor-too-low). Log business-meaningful claims with epistemic status — not field- or byte-level diffs. The log is immutable, so you can never add the intent back. Build claims rich from line one.
- Input-addressed ≠ output-verified (verification theater). Gates assert the emitted output matches the accepted claim-set, not "the pipeline ran." Keep checks content-addressed.
- Custody break. Once you assert provenance, an undocumented transition is worse than none. The log is append-only; an unattributed mutation fails closed; rewriting history (a force-push, a row UPDATE) is the cardinal sin.
7. Research & citations (the evidence behind each choice)
Each choice rests on a field default, vetted in DE's genome-living-provenance.md "Frontier validation,"
whose verdict was take the mental model, reject the substrate.
- Event sourcing + CQRS — a fact is
fold(events); serve from read models, never a live log scan. → the claim-log spine +fact_current. - Bitemporal data — valid time vs transaction time (Snodgrass; SQL:2011 system- vs application-time). → the two clocks.
- Datomic — a rich time-travel fact store needs only a flat accretion-only log, no graph store. → the no-graph-DB substrate.
- Git2PROV — commits→activities, files→entities, committers→agents; a content-addressed Merkle log is a provenance store. → append-only + attribution + custody.
- ANSI/SPARC three-schema (value / lexical / encoding; cf. Parquet logical/physical/encoding). → the identity-vs-serialization separation.
- Wikidata three-rank (preferred / normal / deprecated; never delete). → supersession-not-deletion.
- Rejected as substrate (the field's own anti-patterns): triple-store / RDF / nanopublication / argumentation frameworks — triple-explosion, reification blow-up, AGM exponential space. Mental model only; storage stays a flat log.
- Agent-memory frame — provenance-attributed entity store + symbolic retrieval into context, with freshness decay and abstention; rejecting embedding-store-as-default and LLM-as-memory.
8. Definition of done
The world-model layer is done when, on real data through the production path:
- Every claim carries how-we-know — status, grounded, two clocks, actor, source (G-BORN, G-CLOCKS);
- confidence is computed and ordered, fail-closed (G-CONF);
- truth folds to the latest accepted assertion and as-of queries reconstruct the past (G-FOLD, G-ASOF);
- supersession never deletes (G-SUPERSEDE);
- reads are permission-gated in the query and fully audited (G-PERMISSION, G-AUDIT);
- every inferred/asserted claim is citation- and entailment-checked, fail-closed (G-CITE);
- relationships live as claims on the flat log — no graph DB (G-NO-GRAPHDB);
- every gate is falsifiable — its degenerate control FAILS;
- and any pre-existing access/permission contract stays green: the security model never weakened in translation.
Not part of "done": embeddings/vector tuning (a selective, separate axis), source adapters (per-system), report formatting. Build the claim-log; the surfaces are projections.