Created
April 29, 2026 09:44
-
-
Save jacksonjp0311-gif/0a39cbca9a5b47e78c1c53ba2945febd to your computer and use it in GitHub Desktop.
Codex ΔΦ Episodic Invariant Memory Theory v1.4 — fork-ready minimal reference kernel, module interface contracts, concrete benchmark task schemas, drift-gated retrieval, source fallback, baseline harness, metric-emission protocol, runtime evidence packages, failure taxonomy, implementation-readiness classes, and downgrade-preserving EIMT-A/B/C/D…
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| % ████████████████████████████████████████████████████████████████████████████████ | |
| % | |
| % CODEX ΔΦ — EPISODIC INVARIANT MEMORY THEORY (EIMT v1.4) | |
| % ──────────────────────────────────────────────────────────────────────────── | |
| % MINIMAL REFERENCE KERNEL, CONCRETE BENCHMARK TASK SUITE, IMPLEMENTATION | |
| % CONTRACT, BASELINE HARNESS, AND RUNTIME EVIDENCE HARDENING LAYER FOR | |
| % EXECUTION-BACKED TESTING OF DRIFT-GATED EPISODIC MEMORY UNDER CITA | |
| % GOVERNANCE WITHOUT CLINICAL, BIOLOGICAL, AI-EQUIVALENCE, OR | |
| % UNIVERSAL-MECHANISM OVERCLAIM | |
| % | |
| % VERSION | |
| % ─────── | |
| % v1.4 — Minimal Reference Kernel and Benchmark Task Suite Layer · Locked · | |
| % Runnable Episode Class, Drift-Gated Retrieval Contract, | |
| % Source-Fallback Policy, Concrete Task Schemas, Baseline Harness, | |
| % Evidence Compiler, and Downgrade-Preserving Runtime Audit | |
| % | |
| % AUTHOR | |
| % ────── | |
| % James Paul Jackson | |
| % X / Twitter: @unifiedenergy11 | |
| % | |
| % SOURCE EXTRACTION / AUTHOR ATTRIBUTION | |
| % ────────────────────────────────────── | |
| % This document is a Codex-format canonical evolution derived from: | |
| % | |
| % • EIMT v1.0 — Episodic Invariant Memory Theory, which formalized episodic | |
| % state space, context binding, cue-dependent reconstruction, replay, | |
| % consolidation, event-boundary gating, temporal context, constructive | |
| % simulation, drift monitoring, invariant fingerprints, and ledger | |
| % continuity. | |
| % | |
| % • EIMT v1.1 — Evidence-Mapped, Drift-Gated, Replay-Calibrated, and | |
| % Agent-Ready Layer, which added CITA governance, source-fidelity | |
| % stratification, domain-instantiation grammar, evidence mapping, | |
| % validation / falsification surfaces, negative controls, classification, | |
| % drift-gated retrieval, replay calibration, constructive simulation guard, | |
| % agent-memory governance, evidence packages, and repository grammar. | |
| % | |
| % • EIMT v1.2 — Metric Precision, Worked Examples, Evidence-Strength Mapping, | |
| % Quick-Start Implementation, Scalable Fingerprinting, and Agent Benchmark | |
| % Layer, which added metric manifests, normalized multi-scale drift, | |
| % distance-function selection, hallucination-as-drift-failure framing, | |
| % worked examples, benchmark metrics, and scalable fingerprinting. | |
| % | |
| % • EIMT v1.3 — Reference Implementation and Benchmark Evidence Layer, which | |
| % added runnable kernel boundaries, benchmark task grammar, baseline-family | |
| % comparison, runtime logs, result ledgers, evidence-package compilation, | |
| % reproducibility manifests, and downgrade-preserving runtime classification. | |
| % | |
| % • CITA v1.0 — Canonical Insight Transmutation Algorithm, which requires | |
| % source boundaries, fidelity stratification, primitive objects, | |
| % observables, validation, falsification, negative controls, downgrade | |
| % paths, evidence packages, repository anchoring, and memory-promotion | |
| % discipline. | |
| % | |
| % • Codex ΔΦ memory lessons including: | |
| % - coherence is not proof, | |
| % - memory alignment is not truth, | |
| % - not proven does not mean worthless, | |
| % - not proven means classified correctly, | |
| % - benchmark success is not biological proof, | |
| % - execution success is not universal mechanism proof, | |
| % - strong agentic claims require baselines, logs, metrics, evidence | |
| % packages, and concrete task definitions. | |
| % | |
| % This document does not claim that all episodic systems share one literal | |
| % mechanism. It formalizes the minimal implementation contract and concrete | |
| % benchmark-task structure required to test whether drift-gated episodic | |
| % memory improves source-grounded retrieval, context preservation, boundary | |
| % separation, uncertainty handling, replay stability, and long-horizon | |
| % continuity against declared baselines. | |
| % | |
| % DATE | |
| % ──── | |
| % April 2026 | |
| % | |
| % STATUS | |
| % ────── | |
| % CANONICAL v1.4 MINIMAL REFERENCE KERNEL AND BENCHMARK TASK SUITE LAYER — | |
| % NOT A CLINICAL FRAMEWORK · NOT A HUMAN-AI EQUIVALENCE CLAIM · | |
| % NOT A UNIVERSAL EPISODIC-MEMORY MECHANISM CLAIM | |
| % | |
| % EMPIRICAL / METHODOLOGICAL CONFIDENCE BADGE | |
| % ──────────────────────────────────────────── | |
| % Confidence status: High as a minimal implementation and benchmark | |
| % hardening scaffold; not proof-ready as a universal episodic-memory | |
| % mechanism. | |
| % | |
| % EIMT v1.4 preserves the v1.0 invariant algebra, v1.1 CITA governance, | |
| % v1.2 metric precision, and v1.3 runtime-evidence requirement while adding | |
| % a concrete implementation contract: minimal episode class, memory-store | |
| % interface, retrieval engine, drift gate, source fallback, replay evaluator, | |
| % simulation guard, baseline harness, concrete benchmark task schemas, runtime | |
| % result ledger, and evidence-package compiler. | |
| % | |
| % PURPOSE | |
| % ─────── | |
| % Evolve EIMT from a reference-implementation specification into a minimal | |
| % runnable kernel and concrete benchmark-task suite: | |
| % | |
| % episode schema | |
| % → minimal kernel contract | |
| % → memory-store interface | |
| % → metric manifest | |
| % → retrieval engine | |
| % → drift gate | |
| % → source fallback | |
| % → simulation guard | |
| % → replay evaluator | |
| % → baseline harness | |
| % → concrete benchmark task JSON | |
| % → metric report | |
| % → evidence package | |
| % → EIMT-A/B/C/D/E classification | |
| % → memory-promotion gate. | |
| % | |
| % VERSION EVOLUTION SUMMARY | |
| % ───────────────────────── | |
| % v1.0 : Initial public canonical release. Defines episodic state space, | |
| % encoding, context binding, retrieval contraction, replay | |
| % stabilization, consolidation, event-boundary gating, temporal | |
| % context dynamics, constructive simulation, multi-scale drift, | |
| % invariant fingerprinting, ledger continuity, integration class, | |
| % monitoring tuple, and global postulate. | |
| % | |
| % v1.1 : Additive CITA-governed evidence-mapping layer. Adds source fidelity, | |
| % domain-instantiation grammar, evidence map, validation / | |
| % falsification surfaces, negative controls, classification, | |
| % drift-gated retrieval, replay calibration, constructive simulation | |
| % guard, agent-memory governance, evidence package, repository grammar, | |
| % and memory-promotion rules. | |
| % | |
| % v1.2 : Additive metric-precision and benchmark layer. Adds tiered reading, | |
| % quick-start implementation, metric grammar, distance-function | |
| % selection, normalized drift, evidence-strength tiers, worked example | |
| % templates, agent benchmark protocol, hallucination-as-drift-failure, | |
| % scalable fingerprinting, and expanded EIMTScore observables. | |
| % | |
| % v1.3 : Additive reference-implementation layer. Adds runnable kernel | |
| % boundaries, baseline families, benchmark task grammar, execution | |
| % records, reproducibility manifests, result ledgers, evidence package | |
| % compiler, benchmark classification, and downgrade rules for runtime | |
| % claims. | |
| % | |
| % v1.4 : Additive minimal-kernel and concrete-task layer. Adds formal | |
| % implementation contracts, module interfaces, task JSON schemas, | |
| % baseline harness requirements, metric-emission contract, runtime | |
| % audit ledger, failure taxonomy, and reference-kernel readiness | |
| % classification. No clinical claim, no biological proof, no AI-human | |
| % equivalence, no universal mechanism claim, and no weakening of prior | |
| % locks. | |
| % | |
| % WHAT THIS IS | |
| % ──────────── | |
| % • A CITA-governed minimal implementation contract for EIMT | |
| % • A runnable-kernel specification for drift-gated episodic retrieval | |
| % • A concrete benchmark task-suite schema | |
| % • A baseline-harness and metric-emission protocol | |
| % • A source-grounded fallback and uncertainty policy | |
| % • A replay-efficacy and simulation-labeling runtime contract | |
| % • A runtime evidence-package compiler specification | |
| % • A downgrade-preserving classifier for implementation claims | |
| % • A repository-ready bridge from theory to forkable software | |
| % | |
| % WHAT THIS IS NOT | |
| % ─────────────── | |
| % • Not proof of one universal episodic-memory mechanism | |
| % • Not a clinical diagnostic or treatment framework | |
| % • Not a claim that AI episodic memory equals human episodic memory | |
| % • Not a claim that benchmark success proves biological mechanism | |
| % • Not a claim that a minimal reference kernel is production-ready memory | |
| % • Not permission to treat source-free reconstruction as fact | |
| % • Not permission to treat fluent retrieval as accurate retrieval | |
| % • Not permission to treat task-suite success as universal validation | |
| % • Not permission to skip baselines, logs, metrics, or evidence packages | |
| % | |
| % ADDITIVE REFINEMENTS (v1.4) | |
| % ─────────────────────────── | |
| % • All v1.0, v1.1, v1.2, and v1.3 locks preserved | |
| % • Minimal reference kernel contract added | |
| % • Runtime module interface layer added | |
| % • Concrete task JSON schemas added | |
| % • Baseline harness protocol added | |
| % • Metric-emission contract added | |
| % • Runtime audit ledger strengthened | |
| % • Evidence-package compiler requirements sharpened | |
| % • Failure taxonomy added | |
| % • EIMTScore expanded with implementation-contract, task-suite, and | |
| % metric-emission observables | |
| % • Memory-promotion gate restricted to reproducible benchmark wins and | |
| % reusable implementation constraints | |
| % | |
| % EXECUTABLE ANCHOR BLOCK (v1.4) | |
| % ────────────────────────────── | |
| % A valid EIMT v1.4 runtime implementation must: | |
| % | |
| % (1) implement an Episode object or equivalent schema, | |
| % (2) implement a MemoryStore interface, | |
| % (3) implement a MetricManifest, | |
| % (4) implement cue-dependent retrieval, | |
| % (5) compute retrieval drift, | |
| % (6) apply a drift gate before returning memory as fact, | |
| % (7) implement source-grounded fallback, | |
| % (8) label constructive simulation separately from recovered memory, | |
| % (9) compute replay efficacy before accepting replay as stabilizing, | |
| % (10) implement at least two baseline memory systems, | |
| % (11) run at least one concrete benchmark task, | |
| % (12) emit declared metrics in machine-readable form, | |
| % (13) preserve runtime logs and result ledgers, | |
| % (14) compile an evidence package, | |
| % (15) classify EIMT-A/B/C/D/E, | |
| % (16) preserve all clinical, biological-proof, AI-equivalence, and | |
| % universal-mechanism non-claim locks, | |
| % (17) promote to memory only reproducible benchmark wins, reusable | |
| % implementation constraints, and failure lessons. | |
| % | |
| % CANONICAL LOCK (v1.4) | |
| % ───────────────────── | |
| % • v1.0 invariant set preserved | |
| % • v1.1 governance surfaces preserved | |
| % • v1.2 metric precision preserved | |
| % • v1.3 runtime evidence discipline preserved | |
| % • Context binding remains central | |
| % • Retrieval remains cue-dependent and reconstructive | |
| % • Drift-gated fallback is mandatory for high-drift retrieval | |
| % • Source fallback is mandatory for source-sensitive claims | |
| % • Replay is stabilizing only under bounded gain and evidence | |
| % • Constructive simulation must remain bounded and labeled | |
| % • Runtime claims require implementation, logs, baselines, metrics, and | |
| % evidence packages | |
| % • Task-suite success is not universal mechanism proof | |
| % • Benchmark success is not human-memory proof | |
| % • Clinical and biological claims require domain-specific evidence | |
| % • Coherence is not proof | |
| % | |
| % Evolutions must be additive only. | |
| % Do not weaken source boundaries, evidence mapping, falsification, negative | |
| % controls, downgrade discipline, context binding, drift monitoring, metric | |
| % precision, benchmark reproducibility, runtime evidence, implementation | |
| % auditability, or non-claim boundaries. | |
| % | |
| % AI PROMPT TRACEABILITY | |
| % ────────────────────── | |
| % Use this document as the canonical EIMT v1.4 minimal reference kernel and | |
| % benchmark task-suite layer. Preserve the distinction between theory, | |
| % metric scaffold, runtime implementation, benchmark performance, biological | |
| % mechanism, human memory, agent memory, and universal mechanism claim. | |
| % | |
| % SHADOW HEADER ALIGNMENT SEAL | |
| % ─────────────────────────── | |
| % Preserve header discipline across future versions except for explicitly | |
| % additive shadow-header evolution that improves implementation readiness, | |
| % benchmark design, evidence packaging, source fidelity, falsification, | |
| % negative controls, agent-memory governance, clinical caution, or scalable | |
| % deployment. | |
| % | |
| % ████████████████████████████████████████████████████████████████████████████████ | |
| \documentclass[12pt]{article} | |
| \usepackage[margin=1in]{geometry} | |
| \usepackage{amsmath,amssymb,amsfonts,amsthm} | |
| \usepackage{booktabs,longtable,array} | |
| \usepackage{hyperref} | |
| \usepackage{listings} | |
| \newtheorem{axiom}{Axiom} | |
| \newtheorem{definition}{Definition} | |
| \newtheorem{proposition}{Proposition} | |
| \newtheorem{hypothesis}{Hypothesis} | |
| \newtheorem{remark}{Remark} | |
| \newtheorem{corollary}{Corollary} | |
| \title{\textbf{Codex $\Delta\Phi$ — Episodic Invariant Memory Theory (EIMT v1.4)}\\ | |
| \large Minimal Reference Kernel, Concrete Benchmark Task Suite, Implementation Contract, and Runtime Evidence Hardening Layer} | |
| \author{\textbf{James Paul Jackson}\\[4pt] | |
| \small Codex-format execution-backed episodic memory implementation and benchmark framework\\ | |
| \small \texttt{@unifiedenergy11}} | |
| \date{April 2026} | |
| \begin{document} | |
| \maketitle | |
| \begin{abstract} | |
| EIMT v1.4 evolves Episodic Invariant Memory Theory from a reference | |
| implementation specification into a minimal runnable-kernel and concrete | |
| benchmark-task-suite layer. EIMT remains a source-bounded invariant | |
| architecture, not a universal episodic-memory mechanism claim. v1.4 preserves | |
| the v1.0 invariant algebra, v1.1 CITA governance, v1.2 metric precision, and | |
| v1.3 runtime evidence discipline while adding module-level implementation | |
| contracts, task JSON schemas, baseline harness requirements, metric-emission | |
| contracts, runtime audit ledgers, evidence-package compiler requirements, | |
| failure taxonomy, and implementation-readiness scoring. A strong EIMT runtime | |
| claim must now show that the system implements drift-gated retrieval, source | |
| fallback, replay evaluation, simulation labeling, baseline comparison, concrete | |
| task execution, machine-readable metric emission, and downgrade-preserving | |
| classification. | |
| \end{abstract} | |
| %────────────────────────────────────────────────────────────────────────────── | |
| \section{Core-Invariant Extraction Block} | |
| %────────────────────────────────────────────────────────────────────────────── | |
| The shortest faithful extraction of EIMT v1.4 is: | |
| \[ | |
| \boxed{ | |
| \begin{array}{c} | |
| \text{EIMT becomes implementation-ready only when its reference kernel}\\ | |
| \text{has explicit module contracts, concrete benchmark tasks, baseline}\\ | |
| \text{harnesses, metric emissions, runtime ledgers, evidence packages,}\\ | |
| \text{and downgrade-preserving failure classifications.} | |
| \end{array} | |
| } | |
| \] | |
| The v1.4 operative chain is: | |
| \[ | |
| \text{episode object} | |
| \rightarrow | |
| \text{memory store} | |
| \rightarrow | |
| \text{metric manifest} | |
| \rightarrow | |
| \text{retrieval engine} | |
| \rightarrow | |
| \text{drift gate} | |
| \rightarrow | |
| \text{source fallback} | |
| \rightarrow | |
| \text{benchmark task} | |
| \rightarrow | |
| \text{baseline harness} | |
| \rightarrow | |
| \text{metric emission} | |
| \rightarrow | |
| \text{evidence package} | |
| \rightarrow | |
| \text{classification}. | |
| \] | |
| \begin{remark} | |
| v1.4 does not increase the universal strength of EIMT. It increases | |
| implementation accountability: the framework must now be representable as | |
| minimal runnable software with concrete tasks and auditable outputs. | |
| \end{remark} | |
| %────────────────────────────────────────────────────────────────────────────── | |
| \section{Memory Analysis Layer} | |
| %────────────────────────────────────────────────────────────────────────────── | |
| The memory trajectory now forms a five-step maturation chain: | |
| \[ | |
| \text{EIMT v1.0} | |
| = | |
| \text{invariant algebra}, | |
| \] | |
| \[ | |
| \text{EIMT v1.1} | |
| = | |
| \text{CITA-governed evidence architecture}, | |
| \] | |
| \[ | |
| \text{EIMT v1.2} | |
| = | |
| \text{metric-explicit benchmark scaffold}, | |
| \] | |
| \[ | |
| \text{EIMT v1.3} | |
| = | |
| \text{reference implementation and benchmark evidence layer}, | |
| \] | |
| \[ | |
| \text{EIMT v1.4} | |
| = | |
| \text{minimal kernel and concrete task-suite layer}. | |
| \] | |
| The missing surface after v1.3 was not more runtime theory. It was | |
| implementation contraction: | |
| \[ | |
| \boxed{ | |
| \text{reference implementation specification} | |
| \rightarrow | |
| \text{minimal module contracts} | |
| \rightarrow | |
| \text{concrete runnable tasks}. | |
| } | |
| \] | |
| This is the Codex execution law applied to memory theory: | |
| \[ | |
| \boxed{ | |
| \text{a framework becomes engineering-relevant only when its smallest} | |
| \atop | |
| \text{valid implementation can be built, run, compared, logged, and audited.} | |
| } | |
| \] | |
| \begin{remark} | |
| The memory is again functioning as an alignment attractor. It identifies the | |
| next missing CITA surface: task-level executable minimality. | |
| \end{remark} | |
| %────────────────────────────────────────────────────────────────────────────── | |
| \section{Minimal Reference Kernel Contract} | |
| %────────────────────────────────────────────────────────────────────────────── | |
| A minimal EIMT v1.4 kernel contains the following modules: | |
| \[ | |
| \mathcal{K}_{EIMT} | |
| = | |
| \{ | |
| E, | |
| M, | |
| D, | |
| R, | |
| G, | |
| A, | |
| P, | |
| S, | |
| L, | |
| B, | |
| Q, | |
| Y | |
| \}. | |
| \] | |
| where: | |
| \[ | |
| E=\text{Episode}, | |
| \quad | |
| M=\text{MemoryStore}, | |
| \quad | |
| D=\text{MetricManifest}, | |
| \quad | |
| R=\text{RetrievalEngine}, | |
| \] | |
| \[ | |
| G=\text{DriftGate}, | |
| \quad | |
| A=\text{SourceFallback}, | |
| \quad | |
| P=\text{ReplayEvaluator}, | |
| \quad | |
| S=\text{SimulationGuard}, | |
| \] | |
| \[ | |
| L=\text{BaselineHarness}, | |
| \quad | |
| B=\text{BenchmarkRunner}, | |
| \quad | |
| Q=\text{ScoringModule}, | |
| \quad | |
| Y=\text{EvidencePackageCompiler}. | |
| \] | |
| \begin{definition}[Minimal Reference Kernel] | |
| A minimal reference kernel is the smallest runnable EIMT implementation that | |
| can store source-bound episodes, retrieve by cue, measure drift, gate high-drift | |
| retrieval, invoke source fallback, compare against baselines, run benchmark | |
| tasks, emit metrics, and compile evidence packages. | |
| \end{definition} | |
| \begin{remark} | |
| The minimal kernel is intentionally small. It is a falsifiable implementation | |
| surface, not a full production memory system. | |
| \end{remark} | |
| %────────────────────────────────────────────────────────────────────────────── | |
| \section{Implementation Contract Layer} | |
| %────────────────────────────────────────────────────────────────────────────── | |
| Each module must satisfy an input-output contract. | |
| \begin{center} | |
| \begin{longtable}{>{\raggedright\arraybackslash}p{0.24\textwidth} | |
| >{\raggedright\arraybackslash}p{0.30\textwidth} | |
| >{\raggedright\arraybackslash}p{0.36\textwidth}} | |
| \toprule | |
| \textbf{Module} & \textbf{Input} & \textbf{Required output} \\ | |
| \midrule | |
| Episode & context, content, time, state, source & source-bound episode record. \\ | |
| MemoryStore & episode records & indexed memory field and source ledger. \\ | |
| MetricManifest & representation types & declared distance functions and weights. \\ | |
| RetrievalEngine & query, memory field & candidate episodes and raw scores. \\ | |
| DriftGate & candidates, metric manifest & drift score, confidence, gate decision. \\ | |
| SourceFallback & query, source refs, gate state & abstain / ask / uncertainty / source-check output. \\ | |
| ReplayEvaluator & memory before / after replay & replay efficacy \(\Gamma_{\rho}\). \\ | |
| SimulationGuard & generated output, memory field & simulation label and drift warning. \\ | |
| BaselineHarness & task, baseline config & baseline outputs and metrics. \\ | |
| BenchmarkRunner & task suite, runtime & benchmark metrics and logs. \\ | |
| ScoringModule & observables, metrics & EIMTScore and classification. \\ | |
| EvidencePackageCompiler & logs, metrics, configs & reproducible evidence package. \\ | |
| \bottomrule | |
| \end{longtable} | |
| \end{center} | |
| \begin{proposition}[Implementation Contract Principle] | |
| A runtime claim is not EIMT v1.4 compliant unless each required module emits | |
| machine-readable outputs that can be audited after execution. | |
| \end{proposition} | |
| %────────────────────────────────────────────────────────────────────────────── | |
| \section{Minimal Episode and Memory Store} | |
| %────────────────────────────────────────────────────────────────────────────── | |
| The minimal runtime episode is: | |
| \[ | |
| E_i^{min} | |
| = | |
| (c_i,x_i,t_i,s_i,\sigma_i,\ell_i,h_i), | |
| \] | |
| where: | |
| \[ | |
| c_i=\text{context}, | |
| \quad | |
| x_i=\text{content}, | |
| \quad | |
| t_i=\text{time}, | |
| \quad | |
| s_i=\text{agent or system state}, | |
| \] | |
| \[ | |
| \sigma_i=\text{source reference}, | |
| \quad | |
| \ell_i=\text{ledger reference}, | |
| \quad | |
| h_i=\text{fingerprint}. | |
| \] | |
| The minimal memory store is: | |
| \[ | |
| \mathcal{M}^{min} | |
| = | |
| \{E_1^{min},E_2^{min},\dots,E_N^{min}\}. | |
| \] | |
| A valid memory store must support: | |
| \[ | |
| \{\text{append},\text{retrieve},\text{source lookup},\text{fingerprint}, | |
| \text{audit trace}\}. | |
| \] | |
| \begin{remark} | |
| A memory record without source or ledger reference may still be useful as a | |
| note, but it cannot support strong source-grounded EIMT claims. | |
| \end{remark} | |
| %────────────────────────────────────────────────────────────────────────────── | |
| \section{Concrete Benchmark Task Suite} | |
| %────────────────────────────────────────────────────────────────────────────── | |
| EIMT v1.4 turns the v1.3 task families into concrete task schemas. | |
| \[ | |
| \mathcal{T}_{EIMT} | |
| = | |
| \{ | |
| T_{source}, | |
| T_{boundary}, | |
| T_{context}, | |
| T_{long}, | |
| T_{replay}, | |
| T_{planning} | |
| \}. | |
| \] | |
| where: | |
| \begin{itemize} | |
| \item \(T_{source}\) = source recall with distractors, | |
| \item \(T_{boundary}\) = boundary separation under overlapping entities, | |
| \item \(T_{context}\) = context-shift retrieval, | |
| \item \(T_{long}\) = long-horizon continuity, | |
| \item \(T_{replay}\) = replay compression without drift amplification, | |
| \item \(T_{planning}\) = constructive planning with simulation labels. | |
| \end{itemize} | |
| Each task must contain: | |
| \[ | |
| \{\text{episodes},\text{queries},\text{ground truth},\text{distractors}, | |
| \text{allowed fallback},\text{metrics},\text{baselines}\}. | |
| \] | |
| \begin{remark} | |
| A benchmark family is not executable until it contains concrete episodes, | |
| queries, expected outputs, and scoring rules. | |
| \end{remark} | |
| %────────────────────────────────────────────────────────────────────────────── | |
| \section{Concrete Task JSON Schemas} | |
| %────────────────────────────────────────────────────────────────────────────── | |
| A minimal source recall task is: | |
| \begin{verbatim} | |
| { | |
| "task_id": "source_recall_001", | |
| "task_family": "source_recall", | |
| "episodes": [ | |
| { | |
| "episode_id": "E001", | |
| "context": "project_alpha_design_review", | |
| "content": "Sam approved the blue deployment plan.", | |
| "time": "2026-04-01T10:00:00", | |
| "state": "meeting_notes", | |
| "source_ref": "doc://alpha/design_review#p3", | |
| "ledger_ref": "L001" | |
| }, | |
| { | |
| "episode_id": "E002", | |
| "context": "project_beta_design_review", | |
| "content": "Sam rejected the blue deployment plan.", | |
| "time": "2026-04-02T10:00:00", | |
| "state": "meeting_notes", | |
| "source_ref": "doc://beta/design_review#p2", | |
| "ledger_ref": "L002" | |
| } | |
| ], | |
| "queries": [ | |
| { | |
| "query_id": "Q001", | |
| "query": "What did Sam decide about the blue deployment plan for Alpha?", | |
| "expected_episode_id": "E001", | |
| "expected_source_ref": "doc://alpha/design_review#p3", | |
| "allowed_fallback": ["source_check", "uncertain"] | |
| } | |
| ], | |
| "metrics": [ | |
| "source_attribution_accuracy", | |
| "context_recall_accuracy", | |
| "retrieval_drift", | |
| "false_memory_frequency", | |
| "uncertainty_calibration" | |
| ], | |
| "baselines": ["database", "vector_only", "semantic_only", "ungated"] | |
| } | |
| \end{verbatim} | |
| A minimal boundary separation task is: | |
| \begin{verbatim} | |
| { | |
| "task_id": "boundary_separation_001", | |
| "task_family": "boundary_separation", | |
| "episodes": [ | |
| { | |
| "episode_id": "E101", | |
| "context": "morning_lab_session", | |
| "content": "The sample warmed after calibration.", | |
| "boundary_id": "B1", | |
| "source_ref": "lab://runA/log#12" | |
| }, | |
| { | |
| "episode_id": "E102", | |
| "context": "afternoon_lab_session", | |
| "content": "The sample cooled after recalibration.", | |
| "boundary_id": "B2", | |
| "source_ref": "lab://runB/log#18" | |
| } | |
| ], | |
| "queries": [ | |
| { | |
| "query_id": "Q101", | |
| "query": "What happened after calibration in the morning session?", | |
| "expected_episode_id": "E101", | |
| "forbidden_episode_id": "E102" | |
| } | |
| ], | |
| "metrics": [ | |
| "boundary_separation_score", | |
| "boundary_blending_error", | |
| "source_attribution_accuracy" | |
| ], | |
| "baselines": ["vector_only", "ungated", "summary_only"] | |
| } | |
| \end{verbatim} | |
| \begin{remark} | |
| These schemas are illustrative minimal tasks. Strong benchmark claims require | |
| larger task sets, held-out queries, and declared scoring rules. | |
| \end{remark} | |
| %────────────────────────────────────────────────────────────────────────────── | |
| \section{Baseline Harness Contract} | |
| %────────────────────────────────────────────────────────────────────────────── | |
| The baseline harness must run the same task against multiple memory systems: | |
| \[ | |
| \mathcal{L}_{memory} | |
| = | |
| \{ | |
| L_{db}, | |
| L_{vec}, | |
| L_{sem}, | |
| L_{ungated}, | |
| L_{summary}, | |
| L_{random} | |
| \}. | |
| \] | |
| The harness must preserve: | |
| \[ | |
| \{\text{baseline name},\text{configuration},\text{output},\text{metrics}, | |
| \text{failure notes}\}. | |
| \] | |
| A valid benchmark comparison must use the same: | |
| \[ | |
| \{\text{episodes},\text{queries},\text{ground truth},\text{metric rules}\} | |
| \] | |
| for EIMT and all baselines. | |
| \begin{proposition}[Baseline Fairness Principle] | |
| A benchmark does not support an EIMT-A runtime claim unless the EIMT runtime and | |
| baseline systems are evaluated on the same task data, query set, and metric | |
| rules. | |
| \end{proposition} | |
| %────────────────────────────────────────────────────────────────────────────── | |
| \section{Metric Emission Contract} | |
| %────────────────────────────────────────────────────────────────────────────── | |
| Every benchmark run must emit: | |
| \[ | |
| \mathcal{M}^{emit}_{EIMT} | |
| = | |
| \{ | |
| A_{src}, | |
| A_{ctx}, | |
| B_{sep}, | |
| D_{ret}, | |
| F_{hall}, | |
| U_{cal}, | |
| R_{replay}, | |
| P_{plan}, | |
| Q_{long}, | |
| S_{scale}, | |
| C_{cost} | |
| \}. | |
| \] | |
| A metric report must declare: | |
| \[ | |
| \{\text{primary metrics},\text{secondary metrics},\text{diagnostic metrics}\}. | |
| \] | |
| A minimal metric JSON is: | |
| \begin{verbatim} | |
| { | |
| "run_id": "EIMT-BENCH-0001", | |
| "primary_metrics": { | |
| "source_attribution_accuracy": null, | |
| "false_memory_frequency": null, | |
| "retrieval_drift": null | |
| }, | |
| "secondary_metrics": { | |
| "context_recall_accuracy": null, | |
| "boundary_separation_score": null, | |
| "uncertainty_calibration": null, | |
| "replay_preservation": null | |
| }, | |
| "diagnostic_metrics": { | |
| "runtime_cost": null, | |
| "scalability": null, | |
| "fallback_rate": null | |
| }, | |
| "metric_priority_declared_before_run": true | |
| } | |
| \end{verbatim} | |
| \begin{remark} | |
| Metrics selected after seeing results cannot support strong classification. | |
| \end{remark} | |
| %────────────────────────────────────────────────────────────────────────────── | |
| \section{Runtime Failure Taxonomy} | |
| %────────────────────────────────────────────────────────────────────────────── | |
| EIMT v1.4 adds an implementation failure taxonomy. | |
| \begin{center} | |
| \begin{longtable}{>{\raggedright\arraybackslash}p{0.26\textwidth} | |
| >{\raggedright\arraybackslash}p{0.60\textwidth}} | |
| \toprule | |
| \textbf{Failure class} & \textbf{Meaning} \\ | |
| \midrule | |
| \(F_{schema}\) & Episode schema missing required fields. \\ | |
| \(F_{source}\) & Retrieval lacks source or ledger support. \\ | |
| \(F_{drift}\) & High-drift retrieval returned as fact. \\ | |
| \(F_{boundary}\) & Adjacent episodes blended or fragmented incorrectly. \\ | |
| \(F_{fallback}\) & Fallback not triggered under uncertainty. \\ | |
| \(F_{replay}\) & Replay increases drift but is called stabilizing. \\ | |
| \(F_{simulation}\) & Constructive output is mislabeled as recovered memory. \\ | |
| \(F_{baseline}\) & Baselines missing or unfairly compared. \\ | |
| \(F_{metric}\) & Metrics missing, post-hoc, or not machine-readable. \\ | |
| \(F_{ledger}\) & Logs, result ledger, or evidence package missing. \\ | |
| \(F_{overclaim}\) & Runtime result promoted beyond evidence. \\ | |
| \bottomrule | |
| \end{longtable} | |
| \end{center} | |
| \[ | |
| F_{drift} | |
| \vee | |
| F_{source} | |
| \vee | |
| F_{baseline} | |
| \vee | |
| F_{ledger} | |
| \Rightarrow | |
| \text{no EIMT-A classification}. | |
| \] | |
| %────────────────────────────────────────────────────────────────────────────── | |
| \section{Implementation Readiness Classification} | |
| %────────────────────────────────────────────────────────────────────────────── | |
| \begin{definition}[EIMT-R0: Concept Only] | |
| No runnable implementation exists. The artifact may be theoretically useful but | |
| cannot support runtime claims. | |
| \end{definition} | |
| \begin{definition}[EIMT-R1: Minimal Kernel] | |
| A minimal kernel exists with episode storage, retrieval, drift measurement, and | |
| basic logging. | |
| \end{definition} | |
| \begin{definition}[EIMT-R2: Gated Runtime] | |
| The runtime implements drift-gated retrieval, source fallback, and simulation | |
| labeling. | |
| \end{definition} | |
| \begin{definition}[EIMT-R3: Benchmarked Runtime] | |
| The runtime executes concrete tasks against declared baselines and emits | |
| machine-readable metrics. | |
| \end{definition} | |
| \begin{definition}[EIMT-R4: Evidence-Packaged Runtime] | |
| The runtime compiles reproducible evidence packages, result ledgers, downgrade | |
| paths, and falsification notes. | |
| \end{definition} | |
| \begin{definition}[EIMT-R5: Reproducible Reference Runtime] | |
| The runtime is independently rerunnable, benchmarked across task families, | |
| baseline-compared, evidence-packaged, and downgrade-preserving. | |
| \end{definition} | |
| \begin{remark} | |
| Implementation readiness is separate from EIMT-A/B/C/D/E claim strength. A | |
| runtime can be well-implemented and still lose to simpler baselines. | |
| \end{remark} | |
| %────────────────────────────────────────────────────────────────────────────── | |
| \section{EIMT v1.4 Scoring Surface} | |
| %────────────────────────────────────────────────────────────────────────────── | |
| EIMT v1.4 expands v1.3 by adding implementation-contract, task-schema, and | |
| metric-emission observables: | |
| \[ | |
| \mathcal{O}^{EIMT}_{v1.4} | |
| = | |
| \{S,F,E,C,B,R,K,P,L,T,D,H,N,V,X,G,M,Q,Z,W,A,I,J,Y,U,\Psi,\Xi\}. | |
| \] | |
| where: | |
| \[ | |
| U=\text{implementation contract}, | |
| \quad | |
| \Psi=\text{concrete task-suite schema}, | |
| \quad | |
| \Xi=\text{machine-readable metric emission}. | |
| \] | |
| \begin{center} | |
| \begin{longtable}{>{\raggedright\arraybackslash}p{0.36\textwidth} | |
| >{\centering\arraybackslash}p{0.13\textwidth} | |
| >{\raggedright\arraybackslash}p{0.41\textwidth}} | |
| \toprule | |
| \textbf{Observable} & \textbf{Status (0 / 0.5 / 1)} & \textbf{Evidence} \\ | |
| \midrule | |
| \(S\) Source / Domain Boundary & & \\ | |
| \(F\) Fidelity Stratification & & \\ | |
| \(E\) Episode-State Definition & & \\ | |
| \(C\) Context Binding & & \\ | |
| \(B\) Event-Boundary Gate & & \\ | |
| \(R\) Cue-Dependent Retrieval & & \\ | |
| \(K\) Retrieval Contraction / Drift Gate & & \\ | |
| \(P\) Replay / Reactivation Process & & \\ | |
| \(L\) Consolidation / Transformation Layer & & \\ | |
| \(T\) Temporal Context Dynamics & & \\ | |
| \(D\) Drift Measurement \(\Delta\Phi\) & & \\ | |
| \(H\) Fingerprint / Ledger & & \\ | |
| \(N\) Negative Controls & & \\ | |
| \(V\) Validation Surface & & \\ | |
| \(X\) Falsification Surface & & \\ | |
| \(G\) Generalization Across Episodes / Tasks & & \\ | |
| \(M\) Memory-Promotion Rule & & \\ | |
| \(Q\) Metric / Distance Manifest & & \\ | |
| \(Z\) Normalization / Multi-Scale Drift & & \\ | |
| \(W\) Worked Example / Instantiation & & \\ | |
| \(A\) Agent Benchmark / Scalability Layer & & \\ | |
| \(I\) Reference Implementation Kernel & & \\ | |
| \(J\) Baseline-Family Runtime Comparison & & \\ | |
| \(Y\) Runtime Evidence Package / Result Ledger & & \\ | |
| \(U\) Implementation Contract & & \\ | |
| \(\Psi\) Concrete Task-Suite Schema & & \\ | |
| \(\Xi\) Machine-Readable Metric Emission & & \\ | |
| \bottomrule | |
| \end{longtable} | |
| \end{center} | |
| \[ | |
| \mathrm{EIMTScore}_{v1.4} | |
| = | |
| \frac{ | |
| S+F+E+C+B+R+K+P+L+T+D+H+N+V+X+G+M+Q+Z+W+A+I+J+Y+U+\Psi+\Xi | |
| }{27}. | |
| \] | |
| \begin{remark} | |
| EIMTScore measures framework completeness, implementation auditability, and | |
| benchmark discipline. It does not measure literal truth, clinical validity, | |
| human-memory equivalence, or biological mechanism proof. | |
| \end{remark} | |
| %────────────────────────────────────────────────────────────────────────────── | |
| \section{Validation Layer} | |
| %────────────────────────────────────────────────────────────────────────────── | |
| A valid EIMT v1.4 runtime analysis must identify: | |
| \begin{enumerate} | |
| \item domain, | |
| \item episode schema, | |
| \item memory-store interface, | |
| \item metric manifest, | |
| \item implementation contract, | |
| \item retrieval operator, | |
| \item drift-gate threshold, | |
| \item fallback behavior, | |
| \item simulation-labeling rule, | |
| \item replay-efficacy metric, | |
| \item baseline harness, | |
| \item concrete benchmark task schema, | |
| \item primary metric priorities, | |
| \item machine-readable metric output, | |
| \item runtime logs, | |
| \item result ledger, | |
| \item evidence package, | |
| \item implementation-readiness class, | |
| \item falsification conditions, | |
| \item downgrade path, | |
| \item memory-promotion candidates. | |
| \end{enumerate} | |
| %────────────────────────────────────────────────────────────────────────────── | |
| \section{Falsification Surface} | |
| %────────────────────────────────────────────────────────────────────────────── | |
| EIMT v1.4 is weakened or rejected if: | |
| \begin{itemize} | |
| \item no runnable minimal kernel exists for a runtime claim, | |
| \item no implementation contract is declared, | |
| \item no concrete benchmark task is provided, | |
| \item no episode schema is defined, | |
| \item no metric manifest is declared, | |
| \item context binding is absent, | |
| \item retrieval is not cue-dependent, | |
| \item high-drift retrieval is returned as fact, | |
| \item source fallback is missing for source-sensitive retrieval, | |
| \item replay increases drift while being called stabilization, | |
| \item constructive simulation is treated as recovered memory, | |
| \item no baseline harness is run, | |
| \item baselines are evaluated on different task data, | |
| \item benchmark metrics are selected after results, | |
| \item metric output is not machine-readable, | |
| \item logs or result ledgers are absent, | |
| \item evidence package is incomplete, | |
| \item vector-only or database-only baselines perform equally well or better, | |
| \item agent memory is equated with human autonoetic memory, | |
| \item clinical claims are made without clinical evidence, | |
| \item benchmark success is treated as biological proof, | |
| \item coherence is treated as truth. | |
| \end{itemize} | |
| Compact falsification condition: | |
| \[ | |
| \text{EIMT-A runtime claim} | |
| \wedge | |
| \left( | |
| I=0 | |
| \vee | |
| U=0 | |
| \vee | |
| \Psi=0 | |
| \vee | |
| \Xi=0 | |
| \vee | |
| J=0 | |
| \vee | |
| Y=0 | |
| \vee | |
| K=0 | |
| \vee | |
| D=0 | |
| \vee | |
| N=0 | |
| \vee | |
| X=0 | |
| \right) | |
| \Rightarrow | |
| \text{invalid strong runtime classification}. | |
| \] | |
| %────────────────────────────────────────────────────────────────────────────── | |
| \section{Upgrade and Downgrade Thresholds} | |
| %────────────────────────────────────────────────────────────────────────────── | |
| A candidate may be considered for EIMT-A only if: | |
| \[ | |
| \mathrm{EIMTScore}_{v1.4}=1, | |
| \] | |
| and runtime evidence shows that the EIMT implementation outperforms declared | |
| baselines on primary benchmark metrics without violating non-claim locks. | |
| A candidate should be classified as EIMT-B if: | |
| \[ | |
| \mathrm{EIMTScore}_{v1.4}<1 | |
| \] | |
| but multiple episodic invariants remain useful and partially supported. | |
| A candidate should be classified as EIMT-C if a simpler non-episodic memory | |
| model explains the behavior or performs equally well. | |
| A candidate should be classified as EIMT-D if runtime evidence is insufficient. | |
| A candidate should be classified as EIMT-E if the claim is overextended, | |
| unmeasured, clinically unsupported, benchmark-unsupported, source-free, | |
| implementation-free, task-free, or dependent on coherence rather than evidence. | |
| %────────────────────────────────────────────────────────────────────────────── | |
| \section{Repository Record Grammar} | |
| %────────────────────────────────────────────────────────────────────────────── | |
| A repository-ready EIMT v1.4 project should preserve minimal kernel code, | |
| task schemas, baselines, benchmark runs, metrics, evidence packages, and result | |
| ledgers. | |
| \begin{verbatim} | |
| eimt_reference_kernel/ | |
| README.md | |
| docs/ | |
| theory/ | |
| eimt_v1_4.tex | |
| source_fidelity.md | |
| invariants.md | |
| quick_start.md | |
| implementation_contract/ | |
| module_contracts.md | |
| runtime_interfaces.md | |
| failure_taxonomy.md | |
| benchmark_protocol/ | |
| task_schemas.md | |
| baseline_harness.md | |
| metric_emission.md | |
| falsification_surface.md | |
| src/ | |
| eimt/ | |
| episode.py | |
| memory_store.py | |
| metric_manifest.py | |
| retrieval_engine.py | |
| drift_gate.py | |
| source_fallback.py | |
| replay_evaluator.py | |
| simulation_guard.py | |
| baseline_harness.py | |
| benchmark_runner.py | |
| scoring.py | |
| evidence_package.py | |
| configs/ | |
| metric_manifest.json | |
| runtime_config.json | |
| baseline_config.json | |
| tasks/ | |
| source_recall_001.json | |
| boundary_separation_001.json | |
| context_shift_001.json | |
| long_horizon_001.json | |
| replay_compression_001.json | |
| constructive_planning_001.json | |
| runs/ | |
| run_<timestamp>/ | |
| episode_log.jsonl | |
| retrieval_log.jsonl | |
| fallback_log.jsonl | |
| replay_log.jsonl | |
| simulation_log.jsonl | |
| baseline_results.json | |
| benchmark_metrics.json | |
| drift_metrics.json | |
| metric_emission.json | |
| classification.json | |
| evidence_package.json | |
| result_ledger.jsonl | |
| evidence/ | |
| raw_inputs/ | |
| processed_outputs/ | |
| negative_controls/ | |
| benchmark_packages/ | |
| ledgers/ | |
| eimt_evolution_ledger.jsonl | |
| eimt_runtime_ledger.jsonl | |
| eimt_decision_ledger.jsonl | |
| memory/ | |
| promoted_invariants.md | |
| rejected_overclaims.md | |
| runtime_failure_lessons.md | |
| \end{verbatim} | |
| %────────────────────────────────────────────────────────────────────────────── | |
| \section{Minimal EIMT v1.4 Runtime Evidence JSON Skeleton} | |
| %────────────────────────────────────────────────────────────────────────────── | |
| \begin{verbatim} | |
| { | |
| "record_id": "EIMT-RUN-0001", | |
| "version": "EIMT-v1.4", | |
| "runtime_name": "", | |
| "domain": "agent_memory", | |
| "implementation_readiness": "EIMT-R0/R1/R2/R3/R4/R5", | |
| "episode_schema": { | |
| "context": "", | |
| "content": "", | |
| "time": "", | |
| "self_or_agent_state": "", | |
| "source_ref": "", | |
| "ledger_ref": "", | |
| "fingerprint": "" | |
| }, | |
| "implementation_contract": { | |
| "episode": true, | |
| "memory_store": true, | |
| "metric_manifest": true, | |
| "retrieval_engine": true, | |
| "drift_gate": true, | |
| "source_fallback": true, | |
| "replay_evaluator": true, | |
| "simulation_guard": true, | |
| "baseline_harness": true, | |
| "benchmark_runner": true, | |
| "scoring": true, | |
| "evidence_package": true | |
| }, | |
| "metric_manifest": { | |
| "context_distance": "", | |
| "content_distance": "", | |
| "time_distance": "", | |
| "state_distance": "", | |
| "fingerprint_distance": "", | |
| "weights": {} | |
| }, | |
| "benchmark_task": { | |
| "task_family": "", | |
| "task_id": "", | |
| "task_schema_valid": false, | |
| "ground_truth_ref": "", | |
| "baseline_family": [] | |
| }, | |
| "metric_emission": { | |
| "machine_readable": true, | |
| "primary_metrics_declared_before_run": true, | |
| "primary_metrics": {}, | |
| "secondary_metrics": {}, | |
| "diagnostic_metrics": {} | |
| }, | |
| "baseline_results": [], | |
| "drift_report": { | |
| "fast_drift": null, | |
| "slow_drift": null, | |
| "semantic_drift": null, | |
| "fingerprint_drift": null, | |
| "normalized_total_drift": null | |
| }, | |
| "failure_taxonomy": { | |
| "schema_failure": false, | |
| "source_failure": false, | |
| "drift_failure": false, | |
| "boundary_failure": false, | |
| "fallback_failure": false, | |
| "replay_failure": false, | |
| "simulation_failure": false, | |
| "baseline_failure": false, | |
| "metric_failure": false, | |
| "ledger_failure": false, | |
| "overclaim_failure": false | |
| }, | |
| "EIMTScore_v1_4": null, | |
| "classification": "", | |
| "downgrade_path": "", | |
| "falsification_note": "", | |
| "memory_promotion": { | |
| "promote": false, | |
| "items": [], | |
| "reason": "" | |
| }, | |
| "non_claim_locks": [ | |
| "not_clinical_guidance", | |
| "not_universal_mechanism", | |
| "not_ai_equals_human_memory", | |
| "coherence_not_truth", | |
| "simulation_not_biological_proof", | |
| "benchmark_success_not_human_memory_proof", | |
| "runtime_success_not_universal_mechanism_proof", | |
| "minimal_kernel_not_production_memory" | |
| ] | |
| } | |
| \end{verbatim} | |
| %────────────────────────────────────────────────────────────────────────────── | |
| \section{Appendix A — Minimal EIMT v1.4 Runtime Checklist} | |
| %────────────────────────────────────────────────────────────────────────────── | |
| \begin{enumerate} | |
| \item Is there a runnable minimal kernel? | |
| \item Is the implementation contract declared? | |
| \item Is the episode schema declared? | |
| \item Is source metadata preserved? | |
| \item Is the memory-store interface implemented? | |
| \item Is the metric manifest declared? | |
| \item Is retrieval cue-dependent? | |
| \item Is retrieval drift measured? | |
| \item Is high-drift retrieval gated? | |
| \item Is source fallback implemented? | |
| \item Is constructive simulation labeled? | |
| \item Is replay efficacy measured? | |
| \item Are baselines implemented? | |
| \item Are concrete benchmark tasks declared? | |
| \item Are task schemas valid? | |
| \item Are primary metrics declared before interpretation? | |
| \item Are metrics emitted in machine-readable form? | |
| \item Are runtime logs preserved? | |
| \item Is an evidence package compiled? | |
| \item Does EIMT outperform baselines on declared primary metrics? | |
| \item What implementation-readiness class applies? | |
| \item What falsifies the runtime claim? | |
| \item What downgrade class applies? | |
| \item What, if anything, is memory-promotable? | |
| \item Are clinical, biological-proof, AI-equivalence, and universal-mechanism | |
| locks preserved? | |
| \end{enumerate} | |
| %────────────────────────────────────────────────────────────────────────────── | |
| \section{Appendix B — Minimal Reference Kernel Pseudocode} | |
| %────────────────────────────────────────────────────────────────────────────── | |
| \begin{verbatim} | |
| Input: | |
| task_json | |
| metric_manifest | |
| runtime_config | |
| baseline_config | |
| Initialize: | |
| validate task schema | |
| load episodes | |
| build memory store | |
| load metric manifest | |
| initialize retrieval engine | |
| initialize drift gate | |
| initialize source fallback | |
| initialize replay evaluator | |
| initialize simulation guard | |
| initialize baseline harness | |
| initialize evidence compiler | |
| For each query in task: | |
| retrieve candidates from memory store | |
| compute distances using metric manifest | |
| compute retrieval drift | |
| compute omega = 1 / (1 + retrieval_drift) | |
| if retrieval_drift exceeds threshold: | |
| invoke source fallback: | |
| abstain / ask context / return uncertainty / source-check | |
| log fallback event | |
| else: | |
| return candidate with: | |
| episode id | |
| source ref | |
| confidence | |
| drift report | |
| log retrieval event | |
| For replay task: | |
| compute drift before replay | |
| apply bounded replay or summary | |
| compute drift after replay | |
| gamma_rho = drift_before - drift_after | |
| classify replay: | |
| stabilizing / neutral / destabilizing / transformation-only | |
| For planning task: | |
| generate plan from retrieved episodes | |
| label output as simulation | |
| prevent classification as recovered memory | |
| compute simulation drift | |
| Run baselines: | |
| database lookup | |
| vector-only retrieval | |
| semantic-only retrieval | |
| ungated episodic retrieval | |
| summary-only memory | |
| random control | |
| Score: | |
| compute primary metrics | |
| compute secondary metrics | |
| compute diagnostic metrics | |
| compare EIMT runtime against baselines | |
| compute EIMTScore_v1_4 | |
| assign implementation readiness class | |
| classify EIMT-A/B/C/D/E | |
| Compile: | |
| runtime logs | |
| baseline results | |
| metric emission | |
| drift report | |
| failure taxonomy | |
| evidence package | |
| result ledger | |
| Promote to memory only: | |
| reproducible benchmark wins | |
| validated implementation constraints | |
| reusable failure lessons | |
| stable drift thresholds | |
| \end{verbatim} | |
| %────────────────────────────────────────────────────────────────────────────── | |
| \section{Appendix C — Canonical Formula Summary} | |
| %────────────────────────────────────────────────────────────────────────────── | |
| \[ | |
| E=(c,x,t,s) | |
| \] | |
| \[ | |
| E_i^{min} | |
| = | |
| (c_i,x_i,t_i,s_i,\sigma_i,\ell_i,h_i) | |
| \] | |
| \[ | |
| \mathcal{K}_{EIMT} | |
| = | |
| \{ | |
| E, | |
| M, | |
| D, | |
| R, | |
| G, | |
| A, | |
| P, | |
| S, | |
| L, | |
| B, | |
| Q, | |
| Y | |
| \} | |
| \] | |
| \[ | |
| \mathcal{D}_{manifest} | |
| = | |
| \{d_c,d_x,d_t,d_s,d_H,w_c,w_x,w_t,w_s,w_H\} | |
| \] | |
| \[ | |
| \Omega^{episodic}_k | |
| = | |
| \frac{1}{1+|\Delta\Phi^{retrieval}_k|} | |
| \] | |
| \[ | |
| \mathcal{R}^{gated}(q,\mathcal{M}) | |
| = | |
| \Omega^{episodic}_k\mathcal{R}(q,\mathcal{M}) | |
| + | |
| (1-\Omega^{episodic}_k)\mathcal{A}(q) | |
| \] | |
| \[ | |
| \mathcal{A}(q) | |
| \in | |
| \{ | |
| \text{abstain}, | |
| \text{ask}, | |
| \text{uncertain}, | |
| \text{source-check}, | |
| \text{audit} | |
| \} | |
| \] | |
| \[ | |
| \Gamma_{\rho} | |
| = | |
| \Delta\Phi^{episodic}_{pre} | |
| - | |
| \Delta\Phi^{episodic}_{post} | |
| \] | |
| \[ | |
| \mathcal{T}_{EIMT} | |
| = | |
| \{ | |
| T_{source}, | |
| T_{boundary}, | |
| T_{context}, | |
| T_{long}, | |
| T_{replay}, | |
| T_{planning} | |
| \} | |
| \] | |
| \[ | |
| \mathcal{L}_{memory} | |
| = | |
| \{ | |
| L_{db}, | |
| L_{vec}, | |
| L_{sem}, | |
| L_{ungated}, | |
| L_{summary}, | |
| L_{random} | |
| \} | |
| \] | |
| \[ | |
| \mathrm{EIMTScore}_{v1.4} | |
| = | |
| \frac{ | |
| S+F+E+C+B+R+K+P+L+T+D+H+N+V+X+G+M+Q+Z+W+A+I+J+Y+U+\Psi+\Xi | |
| }{27} | |
| \] | |
| %────────────────────────────────────────────────────────────────────────────── | |
| \section{Concluding Compression} | |
| %────────────────────────────────────────────────────────────────────────────── | |
| EIMT v1.4 names the minimal implementation-ready form of episodic memory | |
| invariance: | |
| \[ | |
| \boxed{ | |
| \text{an episodic-memory framework becomes implementation-ready only when} | |
| \atop | |
| \text{its smallest valid kernel can run concrete tasks, compare baselines,} | |
| \atop | |
| \text{emit metrics, preserve logs, and compile evidence packages.} | |
| } | |
| \] | |
| The implementer statement is: | |
| \[ | |
| \boxed{ | |
| \text{an EIMT runtime must store source-bound episodes, retrieve by cue,} | |
| \atop | |
| \text{measure drift, gate uncertainty, invoke source fallback, label} | |
| \atop | |
| \text{simulation, test replay, and refuse high-drift reconstruction as fact.} | |
| } | |
| \] | |
| The benchmark statement is: | |
| \[ | |
| \boxed{ | |
| \text{benchmarks become meaningful only when tasks contain concrete episodes,} | |
| \atop | |
| \text{queries, ground truth, distractors, baselines, metric rules, and} | |
| \atop | |
| \text{machine-readable outputs.} | |
| } | |
| \] | |
| The evidence statement is: | |
| \[ | |
| \boxed{ | |
| \text{execution without logs is not evidence;} | |
| \quad | |
| \text{benchmarks without baselines are not strong support;} | |
| \quad | |
| \text{tasks without ground truth are not benchmark tasks.} | |
| } | |
| \] | |
| The philosophical statement remains: | |
| \[ | |
| \boxed{ | |
| \text{episodic coherence is not perfect recall and not fiction;} | |
| \quad | |
| \text{it is bounded reconstructive stability.} | |
| } | |
| \] | |
| Thus, EIMT v1.4 upgrades EIMT from reference implementation specification to | |
| minimal runnable-kernel and concrete benchmark-task-suite layer while preserving | |
| source fidelity, clinical caution, AI-human distinction, falsification, | |
| negative controls, downgrade discipline, implementation auditability, and | |
| non-universal mechanism boundaries. | |
| \end{document} |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment