jacksonjp0311-gif · April 29, 2026 09:44 · Apr 29, 2026
diff --git a/EIMT-v1.4-minimal-reference-kernel-forkable-benchmark-suite b/EIMT-v1.4-minimal-reference-kernel-forkable-benchmark-suite
@@ -0,0 +1,1554 @@
+% ████████████████████████████████████████████████████████████████████████████████
+%
+%   CODEX ΔΦ — EPISODIC INVARIANT MEMORY THEORY (EIMT v1.4)
+%   ────────────────────────────────────────────────────────────────────────────
+%   MINIMAL REFERENCE KERNEL, CONCRETE BENCHMARK TASK SUITE, IMPLEMENTATION
+%   CONTRACT, BASELINE HARNESS, AND RUNTIME EVIDENCE HARDENING LAYER FOR
+%   EXECUTION-BACKED TESTING OF DRIFT-GATED EPISODIC MEMORY UNDER CITA
+%   GOVERNANCE WITHOUT CLINICAL, BIOLOGICAL, AI-EQUIVALENCE, OR
+%   UNIVERSAL-MECHANISM OVERCLAIM
+%
+%   VERSION
+%   ───────
+%   v1.4 — Minimal Reference Kernel and Benchmark Task Suite Layer · Locked ·
+%          Runnable Episode Class, Drift-Gated Retrieval Contract,
+%          Source-Fallback Policy, Concrete Task Schemas, Baseline Harness,
+%          Evidence Compiler, and Downgrade-Preserving Runtime Audit
+%
+%   AUTHOR
+%   ──────
+%   James Paul Jackson
+%   X / Twitter: @unifiedenergy11
+%
+%   SOURCE EXTRACTION / AUTHOR ATTRIBUTION
+%   ──────────────────────────────────────
+%   This document is a Codex-format canonical evolution derived from:
+%
+%   • EIMT v1.0 — Episodic Invariant Memory Theory, which formalized episodic
+%     state space, context binding, cue-dependent reconstruction, replay,
+%     consolidation, event-boundary gating, temporal context, constructive
+%     simulation, drift monitoring, invariant fingerprints, and ledger
+%     continuity.
+%
+%   • EIMT v1.1 — Evidence-Mapped, Drift-Gated, Replay-Calibrated, and
+%     Agent-Ready Layer, which added CITA governance, source-fidelity
+%     stratification, domain-instantiation grammar, evidence mapping,
+%     validation / falsification surfaces, negative controls, classification,
+%     drift-gated retrieval, replay calibration, constructive simulation guard,
+%     agent-memory governance, evidence packages, and repository grammar.
+%
+%   • EIMT v1.2 — Metric Precision, Worked Examples, Evidence-Strength Mapping,
+%     Quick-Start Implementation, Scalable Fingerprinting, and Agent Benchmark
+%     Layer, which added metric manifests, normalized multi-scale drift,
+%     distance-function selection, hallucination-as-drift-failure framing,
+%     worked examples, benchmark metrics, and scalable fingerprinting.
+%
+%   • EIMT v1.3 — Reference Implementation and Benchmark Evidence Layer, which
+%     added runnable kernel boundaries, benchmark task grammar, baseline-family
+%     comparison, runtime logs, result ledgers, evidence-package compilation,
+%     reproducibility manifests, and downgrade-preserving runtime classification.
+%
+%   • CITA v1.0 — Canonical Insight Transmutation Algorithm, which requires
+%     source boundaries, fidelity stratification, primitive objects,
+%     observables, validation, falsification, negative controls, downgrade
+%     paths, evidence packages, repository anchoring, and memory-promotion
+%     discipline.
+%
+%   • Codex ΔΦ memory lessons including:
+%       - coherence is not proof,
+%       - memory alignment is not truth,
+%       - not proven does not mean worthless,
+%       - not proven means classified correctly,
+%       - benchmark success is not biological proof,
+%       - execution success is not universal mechanism proof,
+%       - strong agentic claims require baselines, logs, metrics, evidence
+%         packages, and concrete task definitions.
+%
+%   This document does not claim that all episodic systems share one literal
+%   mechanism. It formalizes the minimal implementation contract and concrete
+%   benchmark-task structure required to test whether drift-gated episodic
+%   memory improves source-grounded retrieval, context preservation, boundary
+%   separation, uncertainty handling, replay stability, and long-horizon
+%   continuity against declared baselines.
+%
+%   DATE
+%   ────
+%   April 2026
+%
+%   STATUS
+%   ──────
+%   CANONICAL v1.4 MINIMAL REFERENCE KERNEL AND BENCHMARK TASK SUITE LAYER —
+%   NOT A CLINICAL FRAMEWORK · NOT A HUMAN-AI EQUIVALENCE CLAIM ·
+%   NOT A UNIVERSAL EPISODIC-MEMORY MECHANISM CLAIM
+%
+%   EMPIRICAL / METHODOLOGICAL CONFIDENCE BADGE
+%   ────────────────────────────────────────────
+%   Confidence status: High as a minimal implementation and benchmark
+%   hardening scaffold; not proof-ready as a universal episodic-memory
+%   mechanism.
+%
+%   EIMT v1.4 preserves the v1.0 invariant algebra, v1.1 CITA governance,
+%   v1.2 metric precision, and v1.3 runtime-evidence requirement while adding
+%   a concrete implementation contract: minimal episode class, memory-store
+%   interface, retrieval engine, drift gate, source fallback, replay evaluator,
+%   simulation guard, baseline harness, concrete benchmark task schemas, runtime
+%   result ledger, and evidence-package compiler.
+%
+%   PURPOSE
+%   ───────
+%   Evolve EIMT from a reference-implementation specification into a minimal
+%   runnable kernel and concrete benchmark-task suite:
+%
+%       episode schema
+%       → minimal kernel contract
+%       → memory-store interface
+%       → metric manifest
+%       → retrieval engine
+%       → drift gate
+%       → source fallback
+%       → simulation guard
+%       → replay evaluator
+%       → baseline harness
+%       → concrete benchmark task JSON
+%       → metric report
+%       → evidence package
+%       → EIMT-A/B/C/D/E classification
+%       → memory-promotion gate.
+%
+%   VERSION EVOLUTION SUMMARY
+%   ─────────────────────────
+%   v1.0 : Initial public canonical release. Defines episodic state space,
+%          encoding, context binding, retrieval contraction, replay
+%          stabilization, consolidation, event-boundary gating, temporal
+%          context dynamics, constructive simulation, multi-scale drift,
+%          invariant fingerprinting, ledger continuity, integration class,
+%          monitoring tuple, and global postulate.
+%
+%   v1.1 : Additive CITA-governed evidence-mapping layer. Adds source fidelity,
+%          domain-instantiation grammar, evidence map, validation /
+%          falsification surfaces, negative controls, classification,
+%          drift-gated retrieval, replay calibration, constructive simulation
+%          guard, agent-memory governance, evidence package, repository grammar,
+%          and memory-promotion rules.
+%
+%   v1.2 : Additive metric-precision and benchmark layer. Adds tiered reading,
+%          quick-start implementation, metric grammar, distance-function
+%          selection, normalized drift, evidence-strength tiers, worked example
+%          templates, agent benchmark protocol, hallucination-as-drift-failure,
+%          scalable fingerprinting, and expanded EIMTScore observables.
+%
+%   v1.3 : Additive reference-implementation layer. Adds runnable kernel
+%          boundaries, baseline families, benchmark task grammar, execution
+%          records, reproducibility manifests, result ledgers, evidence package
+%          compiler, benchmark classification, and downgrade rules for runtime
+%          claims.
+%
+%   v1.4 : Additive minimal-kernel and concrete-task layer. Adds formal
+%          implementation contracts, module interfaces, task JSON schemas,
+%          baseline harness requirements, metric-emission contract, runtime
+%          audit ledger, failure taxonomy, and reference-kernel readiness
+%          classification. No clinical claim, no biological proof, no AI-human
+%          equivalence, no universal mechanism claim, and no weakening of prior
+%          locks.
+%
+%   WHAT THIS IS
+%   ────────────
+%   • A CITA-governed minimal implementation contract for EIMT
+%   • A runnable-kernel specification for drift-gated episodic retrieval
+%   • A concrete benchmark task-suite schema
+%   • A baseline-harness and metric-emission protocol
+%   • A source-grounded fallback and uncertainty policy
+%   • A replay-efficacy and simulation-labeling runtime contract
+%   • A runtime evidence-package compiler specification
+%   • A downgrade-preserving classifier for implementation claims
+%   • A repository-ready bridge from theory to forkable software
+%
+%   WHAT THIS IS NOT
+%   ───────────────
+%   • Not proof of one universal episodic-memory mechanism
+%   • Not a clinical diagnostic or treatment framework
+%   • Not a claim that AI episodic memory equals human episodic memory
+%   • Not a claim that benchmark success proves biological mechanism
+%   • Not a claim that a minimal reference kernel is production-ready memory
+%   • Not permission to treat source-free reconstruction as fact
+%   • Not permission to treat fluent retrieval as accurate retrieval
+%   • Not permission to treat task-suite success as universal validation
+%   • Not permission to skip baselines, logs, metrics, or evidence packages
+%
+%   ADDITIVE REFINEMENTS (v1.4)
+%   ───────────────────────────
+%   • All v1.0, v1.1, v1.2, and v1.3 locks preserved
+%   • Minimal reference kernel contract added
+%   • Runtime module interface layer added
+%   • Concrete task JSON schemas added
+%   • Baseline harness protocol added
+%   • Metric-emission contract added
+%   • Runtime audit ledger strengthened
+%   • Evidence-package compiler requirements sharpened
+%   • Failure taxonomy added
+%   • EIMTScore expanded with implementation-contract, task-suite, and
+%     metric-emission observables
+%   • Memory-promotion gate restricted to reproducible benchmark wins and
+%     reusable implementation constraints
+%
+%   EXECUTABLE ANCHOR BLOCK (v1.4)
+%   ──────────────────────────────
+%   A valid EIMT v1.4 runtime implementation must:
+%
+%       (1) implement an Episode object or equivalent schema,
+%       (2) implement a MemoryStore interface,
+%       (3) implement a MetricManifest,
+%       (4) implement cue-dependent retrieval,
+%       (5) compute retrieval drift,
+%       (6) apply a drift gate before returning memory as fact,
+%       (7) implement source-grounded fallback,
+%       (8) label constructive simulation separately from recovered memory,
+%       (9) compute replay efficacy before accepting replay as stabilizing,
+%       (10) implement at least two baseline memory systems,
+%       (11) run at least one concrete benchmark task,
+%       (12) emit declared metrics in machine-readable form,
+%       (13) preserve runtime logs and result ledgers,
+%       (14) compile an evidence package,
+%       (15) classify EIMT-A/B/C/D/E,
+%       (16) preserve all clinical, biological-proof, AI-equivalence, and
+%            universal-mechanism non-claim locks,
+%       (17) promote to memory only reproducible benchmark wins, reusable
+%            implementation constraints, and failure lessons.
+%
+%   CANONICAL LOCK (v1.4)
+%   ─────────────────────
+%   • v1.0 invariant set preserved
+%   • v1.1 governance surfaces preserved
+%   • v1.2 metric precision preserved
+%   • v1.3 runtime evidence discipline preserved
+%   • Context binding remains central
+%   • Retrieval remains cue-dependent and reconstructive
+%   • Drift-gated fallback is mandatory for high-drift retrieval
+%   • Source fallback is mandatory for source-sensitive claims
+%   • Replay is stabilizing only under bounded gain and evidence
+%   • Constructive simulation must remain bounded and labeled
+%   • Runtime claims require implementation, logs, baselines, metrics, and
+%     evidence packages
+%   • Task-suite success is not universal mechanism proof
+%   • Benchmark success is not human-memory proof
+%   • Clinical and biological claims require domain-specific evidence
+%   • Coherence is not proof
+%
+%   Evolutions must be additive only.
+%   Do not weaken source boundaries, evidence mapping, falsification, negative
+%   controls, downgrade discipline, context binding, drift monitoring, metric
+%   precision, benchmark reproducibility, runtime evidence, implementation
+%   auditability, or non-claim boundaries.
+%
+%   AI PROMPT TRACEABILITY
+%   ──────────────────────
+%   Use this document as the canonical EIMT v1.4 minimal reference kernel and
+%   benchmark task-suite layer. Preserve the distinction between theory,
+%   metric scaffold, runtime implementation, benchmark performance, biological
+%   mechanism, human memory, agent memory, and universal mechanism claim.
+%
+%   SHADOW HEADER ALIGNMENT SEAL
+%   ───────────────────────────
+%   Preserve header discipline across future versions except for explicitly
+%   additive shadow-header evolution that improves implementation readiness,
+%   benchmark design, evidence packaging, source fidelity, falsification,
+%   negative controls, agent-memory governance, clinical caution, or scalable
+%   deployment.
+%
+% ████████████████████████████████████████████████████████████████████████████████
+
+\documentclass[12pt]{article}
+\usepackage[margin=1in]{geometry}
+\usepackage{amsmath,amssymb,amsfonts,amsthm}
+\usepackage{booktabs,longtable,array}
+\usepackage{hyperref}
+\usepackage{listings}
+
+\newtheorem{axiom}{Axiom}
+\newtheorem{definition}{Definition}
+\newtheorem{proposition}{Proposition}
+\newtheorem{hypothesis}{Hypothesis}
+\newtheorem{remark}{Remark}
+\newtheorem{corollary}{Corollary}
+
+\title{\textbf{Codex $\Delta\Phi$ — Episodic Invariant Memory Theory (EIMT v1.4)}\\
+\large Minimal Reference Kernel, Concrete Benchmark Task Suite, Implementation Contract, and Runtime Evidence Hardening Layer}
+\author{\textbf{James Paul Jackson}\\[4pt]
+\small Codex-format execution-backed episodic memory implementation and benchmark framework\\
+\small \texttt{@unifiedenergy11}}
+\date{April 2026}
+
+\begin{document}
+\maketitle
+
+\begin{abstract}
+EIMT v1.4 evolves Episodic Invariant Memory Theory from a reference
+implementation specification into a minimal runnable-kernel and concrete
+benchmark-task-suite layer. EIMT remains a source-bounded invariant
+architecture, not a universal episodic-memory mechanism claim. v1.4 preserves
+the v1.0 invariant algebra, v1.1 CITA governance, v1.2 metric precision, and
+v1.3 runtime evidence discipline while adding module-level implementation
+contracts, task JSON schemas, baseline harness requirements, metric-emission
+contracts, runtime audit ledgers, evidence-package compiler requirements,
+failure taxonomy, and implementation-readiness scoring. A strong EIMT runtime
+claim must now show that the system implements drift-gated retrieval, source
+fallback, replay evaluation, simulation labeling, baseline comparison, concrete
+task execution, machine-readable metric emission, and downgrade-preserving
+classification.
+\end{abstract}
+
+%──────────────────────────────────────────────────────────────────────────────
+\section{Core-Invariant Extraction Block}
+%──────────────────────────────────────────────────────────────────────────────
+
+The shortest faithful extraction of EIMT v1.4 is:
+
+\[
+\boxed{
+\begin{array}{c}
+\text{EIMT becomes implementation-ready only when its reference kernel}\\
+\text{has explicit module contracts, concrete benchmark tasks, baseline}\\
+\text{harnesses, metric emissions, runtime ledgers, evidence packages,}\\
+\text{and downgrade-preserving failure classifications.}
+\end{array}
+}
+\]
+
+The v1.4 operative chain is:
+
+\[
+\text{episode object}
+\rightarrow
+\text{memory store}
+\rightarrow
+\text{metric manifest}
+\rightarrow
+\text{retrieval engine}
+\rightarrow
+\text{drift gate}
+\rightarrow
+\text{source fallback}
+\rightarrow
+\text{benchmark task}
+\rightarrow
+\text{baseline harness}
+\rightarrow
+\text{metric emission}
+\rightarrow
+\text{evidence package}
+\rightarrow
+\text{classification}.
+\]
+
+\begin{remark}
+v1.4 does not increase the universal strength of EIMT. It increases
+implementation accountability: the framework must now be representable as
+minimal runnable software with concrete tasks and auditable outputs.
+\end{remark}
+
+%──────────────────────────────────────────────────────────────────────────────
+\section{Memory Analysis Layer}
+%──────────────────────────────────────────────────────────────────────────────
+
+The memory trajectory now forms a five-step maturation chain:
+
+\[
+\text{EIMT v1.0}
+=
+\text{invariant algebra},
+\]
+
+\[
+\text{EIMT v1.1}
+=
+\text{CITA-governed evidence architecture},
+\]
+
+\[
+\text{EIMT v1.2}
+=
+\text{metric-explicit benchmark scaffold},
+\]
+
+\[
+\text{EIMT v1.3}
+=
+\text{reference implementation and benchmark evidence layer},
+\]
+
+\[
+\text{EIMT v1.4}
+=
+\text{minimal kernel and concrete task-suite layer}.
+\]
+
+The missing surface after v1.3 was not more runtime theory. It was
+implementation contraction:
+
+\[
+\boxed{
+\text{reference implementation specification}
+\rightarrow
+\text{minimal module contracts}
+\rightarrow
+\text{concrete runnable tasks}.
+}
+\]
+
+This is the Codex execution law applied to memory theory:
+
+\[
+\boxed{
+\text{a framework becomes engineering-relevant only when its smallest}
+\atop
+\text{valid implementation can be built, run, compared, logged, and audited.}
+}
+\]
+
+\begin{remark}
+The memory is again functioning as an alignment attractor. It identifies the
+next missing CITA surface: task-level executable minimality.
+\end{remark}
+
+%──────────────────────────────────────────────────────────────────────────────
+\section{Minimal Reference Kernel Contract}
+%──────────────────────────────────────────────────────────────────────────────
+
+A minimal EIMT v1.4 kernel contains the following modules:
+
+\[
+\mathcal{K}_{EIMT}
+=
+\{
+E,
+M,
+D,
+R,
+G,
+A,
+P,
+S,
+L,
+B,
+Q,
+Y
+\}.
+\]
+
+where:
+
+\[
+E=\text{Episode},
+\quad
+M=\text{MemoryStore},
+\quad
+D=\text{MetricManifest},
+\quad
+R=\text{RetrievalEngine},
+\]
+
+\[
+G=\text{DriftGate},
+\quad
+A=\text{SourceFallback},
+\quad
+P=\text{ReplayEvaluator},
+\quad
+S=\text{SimulationGuard},
+\]
+
+\[
+L=\text{BaselineHarness},
+\quad
+B=\text{BenchmarkRunner},
+\quad
+Q=\text{ScoringModule},
+\quad
+Y=\text{EvidencePackageCompiler}.
+\]
+
+\begin{definition}[Minimal Reference Kernel]
+A minimal reference kernel is the smallest runnable EIMT implementation that
+can store source-bound episodes, retrieve by cue, measure drift, gate high-drift
+retrieval, invoke source fallback, compare against baselines, run benchmark
+tasks, emit metrics, and compile evidence packages.
+\end{definition}
+
+\begin{remark}
+The minimal kernel is intentionally small. It is a falsifiable implementation
+surface, not a full production memory system.
+\end{remark}
+
+%──────────────────────────────────────────────────────────────────────────────
+\section{Implementation Contract Layer}
+%──────────────────────────────────────────────────────────────────────────────
+
+Each module must satisfy an input-output contract.
+
+\begin{center}
+\begin{longtable}{>{\raggedright\arraybackslash}p{0.24\textwidth}
+                  >{\raggedright\arraybackslash}p{0.30\textwidth}
+                  >{\raggedright\arraybackslash}p{0.36\textwidth}}
+\toprule
+\textbf{Module} & \textbf{Input} & \textbf{Required output} \\
+\midrule
+Episode & context, content, time, state, source & source-bound episode record. \\
+MemoryStore & episode records & indexed memory field and source ledger. \\
+MetricManifest & representation types & declared distance functions and weights. \\
+RetrievalEngine & query, memory field & candidate episodes and raw scores. \\
+DriftGate & candidates, metric manifest & drift score, confidence, gate decision. \\
+SourceFallback & query, source refs, gate state & abstain / ask / uncertainty / source-check output. \\
+ReplayEvaluator & memory before / after replay & replay efficacy \(\Gamma_{\rho}\). \\
+SimulationGuard & generated output, memory field & simulation label and drift warning. \\
+BaselineHarness & task, baseline config & baseline outputs and metrics. \\
+BenchmarkRunner & task suite, runtime & benchmark metrics and logs. \\
+ScoringModule & observables, metrics & EIMTScore and classification. \\
+EvidencePackageCompiler & logs, metrics, configs & reproducible evidence package. \\
+\bottomrule
+\end{longtable}
+\end{center}
+
+\begin{proposition}[Implementation Contract Principle]
+A runtime claim is not EIMT v1.4 compliant unless each required module emits
+machine-readable outputs that can be audited after execution.
+\end{proposition}
+
+%──────────────────────────────────────────────────────────────────────────────
+\section{Minimal Episode and Memory Store}
+%──────────────────────────────────────────────────────────────────────────────
+
+The minimal runtime episode is:
+
+\[
+E_i^{min}
+=
+(c_i,x_i,t_i,s_i,\sigma_i,\ell_i,h_i),
+\]
+
+where:
+
+\[
+c_i=\text{context},
+\quad
+x_i=\text{content},
+\quad
+t_i=\text{time},
+\quad
+s_i=\text{agent or system state},
+\]
+
+\[
+\sigma_i=\text{source reference},
+\quad
+\ell_i=\text{ledger reference},
+\quad
+h_i=\text{fingerprint}.
+\]
+
+The minimal memory store is:
+
+\[
+\mathcal{M}^{min}
+=
+\{E_1^{min},E_2^{min},\dots,E_N^{min}\}.
+\]
+
+A valid memory store must support:
+
+\[
+\{\text{append},\text{retrieve},\text{source lookup},\text{fingerprint},
+\text{audit trace}\}.
+\]
+
+\begin{remark}
+A memory record without source or ledger reference may still be useful as a
+note, but it cannot support strong source-grounded EIMT claims.
+\end{remark}
+
+%──────────────────────────────────────────────────────────────────────────────
+\section{Concrete Benchmark Task Suite}
+%──────────────────────────────────────────────────────────────────────────────
+
+EIMT v1.4 turns the v1.3 task families into concrete task schemas.
+
+\[
+\mathcal{T}_{EIMT}
+=
+\{
+T_{source},
+T_{boundary},
+T_{context},
+T_{long},
+T_{replay},
+T_{planning}
+\}.
+\]
+
+where:
+
+\begin{itemize}
+\item \(T_{source}\) = source recall with distractors,
+\item \(T_{boundary}\) = boundary separation under overlapping entities,
+\item \(T_{context}\) = context-shift retrieval,
+\item \(T_{long}\) = long-horizon continuity,
+\item \(T_{replay}\) = replay compression without drift amplification,
+\item \(T_{planning}\) = constructive planning with simulation labels.
+\end{itemize}
+
+Each task must contain:
+
+\[
+\{\text{episodes},\text{queries},\text{ground truth},\text{distractors},
+\text{allowed fallback},\text{metrics},\text{baselines}\}.
+\]
+
+\begin{remark}
+A benchmark family is not executable until it contains concrete episodes,
+queries, expected outputs, and scoring rules.
+\end{remark}
+
+%──────────────────────────────────────────────────────────────────────────────
+\section{Concrete Task JSON Schemas}
+%──────────────────────────────────────────────────────────────────────────────
+
+A minimal source recall task is:
+
+\begin{verbatim}
+{
+  "task_id": "source_recall_001",
+  "task_family": "source_recall",
+  "episodes": [
+    {
+      "episode_id": "E001",
+      "context": "project_alpha_design_review",
+      "content": "Sam approved the blue deployment plan.",
+      "time": "2026-04-01T10:00:00",
+      "state": "meeting_notes",
+      "source_ref": "doc://alpha/design_review#p3",
+      "ledger_ref": "L001"
+    },
+    {
+      "episode_id": "E002",
+      "context": "project_beta_design_review",
+      "content": "Sam rejected the blue deployment plan.",
+      "time": "2026-04-02T10:00:00",
+      "state": "meeting_notes",
+      "source_ref": "doc://beta/design_review#p2",
+      "ledger_ref": "L002"
+    }
+  ],
+  "queries": [
+    {
+      "query_id": "Q001",
+      "query": "What did Sam decide about the blue deployment plan for Alpha?",
+      "expected_episode_id": "E001",
+      "expected_source_ref": "doc://alpha/design_review#p3",
+      "allowed_fallback": ["source_check", "uncertain"]
+    }
+  ],
+  "metrics": [
+    "source_attribution_accuracy",
+    "context_recall_accuracy",
+    "retrieval_drift",
+    "false_memory_frequency",
+    "uncertainty_calibration"
+  ],
+  "baselines": ["database", "vector_only", "semantic_only", "ungated"]
+}
+\end{verbatim}
+
+A minimal boundary separation task is:
+
+\begin{verbatim}
+{
+  "task_id": "boundary_separation_001",
+  "task_family": "boundary_separation",
+  "episodes": [
+    {
+      "episode_id": "E101",
+      "context": "morning_lab_session",
+      "content": "The sample warmed after calibration.",
+      "boundary_id": "B1",
+      "source_ref": "lab://runA/log#12"
+    },
+    {
+      "episode_id": "E102",
+      "context": "afternoon_lab_session",
+      "content": "The sample cooled after recalibration.",
+      "boundary_id": "B2",
+      "source_ref": "lab://runB/log#18"
+    }
+  ],
+  "queries": [
+    {
+      "query_id": "Q101",
+      "query": "What happened after calibration in the morning session?",
+      "expected_episode_id": "E101",
+      "forbidden_episode_id": "E102"
+    }
+  ],
+  "metrics": [
+    "boundary_separation_score",
+    "boundary_blending_error",
+    "source_attribution_accuracy"
+  ],
+  "baselines": ["vector_only", "ungated", "summary_only"]
+}
+\end{verbatim}
+
+\begin{remark}
+These schemas are illustrative minimal tasks. Strong benchmark claims require
+larger task sets, held-out queries, and declared scoring rules.
+\end{remark}
+
+%──────────────────────────────────────────────────────────────────────────────
+\section{Baseline Harness Contract}
+%──────────────────────────────────────────────────────────────────────────────
+
+The baseline harness must run the same task against multiple memory systems:
+
+\[
+\mathcal{L}_{memory}
+=
+\{
+L_{db},
+L_{vec},
+L_{sem},
+L_{ungated},
+L_{summary},
+L_{random}
+\}.
+\]
+
+The harness must preserve:
+
+\[
+\{\text{baseline name},\text{configuration},\text{output},\text{metrics},
+\text{failure notes}\}.
+\]
+
+A valid benchmark comparison must use the same:
+
+\[
+\{\text{episodes},\text{queries},\text{ground truth},\text{metric rules}\}
+\]
+
+for EIMT and all baselines.
+
+\begin{proposition}[Baseline Fairness Principle]
+A benchmark does not support an EIMT-A runtime claim unless the EIMT runtime and
+baseline systems are evaluated on the same task data, query set, and metric
+rules.
+\end{proposition}
+
+%──────────────────────────────────────────────────────────────────────────────
+\section{Metric Emission Contract}
+%──────────────────────────────────────────────────────────────────────────────
+
+Every benchmark run must emit:
+
+\[
+\mathcal{M}^{emit}_{EIMT}
+=
+\{
+A_{src},
+A_{ctx},
+B_{sep},
+D_{ret},
+F_{hall},
+U_{cal},
+R_{replay},
+P_{plan},
+Q_{long},
+S_{scale},
+C_{cost}
+\}.
+\]
+
+A metric report must declare:
+
+\[
+\{\text{primary metrics},\text{secondary metrics},\text{diagnostic metrics}\}.
+\]
+
+A minimal metric JSON is:
+
+\begin{verbatim}
+{
+  "run_id": "EIMT-BENCH-0001",
+  "primary_metrics": {
+    "source_attribution_accuracy": null,
+    "false_memory_frequency": null,
+    "retrieval_drift": null
+  },
+  "secondary_metrics": {
+    "context_recall_accuracy": null,
+    "boundary_separation_score": null,
+    "uncertainty_calibration": null,
+    "replay_preservation": null
+  },
+  "diagnostic_metrics": {
+    "runtime_cost": null,
+    "scalability": null,
+    "fallback_rate": null
+  },
+  "metric_priority_declared_before_run": true
+}
+\end{verbatim}
+
+\begin{remark}
+Metrics selected after seeing results cannot support strong classification.
+\end{remark}
+
+%──────────────────────────────────────────────────────────────────────────────
+\section{Runtime Failure Taxonomy}
+%──────────────────────────────────────────────────────────────────────────────
+
+EIMT v1.4 adds an implementation failure taxonomy.
+
+\begin{center}
+\begin{longtable}{>{\raggedright\arraybackslash}p{0.26\textwidth}
+                  >{\raggedright\arraybackslash}p{0.60\textwidth}}
+\toprule
+\textbf{Failure class} & \textbf{Meaning} \\
+\midrule
+\(F_{schema}\) & Episode schema missing required fields. \\
+\(F_{source}\) & Retrieval lacks source or ledger support. \\
+\(F_{drift}\) & High-drift retrieval returned as fact. \\
+\(F_{boundary}\) & Adjacent episodes blended or fragmented incorrectly. \\
+\(F_{fallback}\) & Fallback not triggered under uncertainty. \\
+\(F_{replay}\) & Replay increases drift but is called stabilizing. \\
+\(F_{simulation}\) & Constructive output is mislabeled as recovered memory. \\
+\(F_{baseline}\) & Baselines missing or unfairly compared. \\
+\(F_{metric}\) & Metrics missing, post-hoc, or not machine-readable. \\
+\(F_{ledger}\) & Logs, result ledger, or evidence package missing. \\
+\(F_{overclaim}\) & Runtime result promoted beyond evidence. \\
+\bottomrule
+\end{longtable}
+\end{center}
+
+\[
+F_{drift}
+\vee
+F_{source}
+\vee
+F_{baseline}
+\vee
+F_{ledger}
+\Rightarrow
+\text{no EIMT-A classification}.
+\]
+
+%──────────────────────────────────────────────────────────────────────────────
+\section{Implementation Readiness Classification}
+%──────────────────────────────────────────────────────────────────────────────
+
+\begin{definition}[EIMT-R0: Concept Only]
+No runnable implementation exists. The artifact may be theoretically useful but
+cannot support runtime claims.
+\end{definition}
+
+\begin{definition}[EIMT-R1: Minimal Kernel]
+A minimal kernel exists with episode storage, retrieval, drift measurement, and
+basic logging.
+\end{definition}
+
+\begin{definition}[EIMT-R2: Gated Runtime]
+The runtime implements drift-gated retrieval, source fallback, and simulation
+labeling.
+\end{definition}
+
+\begin{definition}[EIMT-R3: Benchmarked Runtime]
+The runtime executes concrete tasks against declared baselines and emits
+machine-readable metrics.
+\end{definition}
+
+\begin{definition}[EIMT-R4: Evidence-Packaged Runtime]
+The runtime compiles reproducible evidence packages, result ledgers, downgrade
+paths, and falsification notes.
+\end{definition}
+
+\begin{definition}[EIMT-R5: Reproducible Reference Runtime]
+The runtime is independently rerunnable, benchmarked across task families,
+baseline-compared, evidence-packaged, and downgrade-preserving.
+\end{definition}
+
+\begin{remark}
+Implementation readiness is separate from EIMT-A/B/C/D/E claim strength. A
+runtime can be well-implemented and still lose to simpler baselines.
+\end{remark}
+
+%──────────────────────────────────────────────────────────────────────────────
+\section{EIMT v1.4 Scoring Surface}
+%──────────────────────────────────────────────────────────────────────────────
+
+EIMT v1.4 expands v1.3 by adding implementation-contract, task-schema, and
+metric-emission observables:
+
+\[
+\mathcal{O}^{EIMT}_{v1.4}
+=
+\{S,F,E,C,B,R,K,P,L,T,D,H,N,V,X,G,M,Q,Z,W,A,I,J,Y,U,\Psi,\Xi\}.
+\]
+
+where:
+
+\[
+U=\text{implementation contract},
+\quad
+\Psi=\text{concrete task-suite schema},
+\quad
+\Xi=\text{machine-readable metric emission}.
+\]
+
+\begin{center}
+\begin{longtable}{>{\raggedright\arraybackslash}p{0.36\textwidth}
+                  >{\centering\arraybackslash}p{0.13\textwidth}
+                  >{\raggedright\arraybackslash}p{0.41\textwidth}}
+\toprule
+\textbf{Observable} & \textbf{Status (0 / 0.5 / 1)} & \textbf{Evidence} \\
+\midrule
+\(S\) Source / Domain Boundary &  &  \\
+\(F\) Fidelity Stratification &  &  \\
+\(E\) Episode-State Definition &  &  \\
+\(C\) Context Binding &  &  \\
+\(B\) Event-Boundary Gate &  &  \\
+\(R\) Cue-Dependent Retrieval &  &  \\
+\(K\) Retrieval Contraction / Drift Gate &  &  \\
+\(P\) Replay / Reactivation Process &  &  \\
+\(L\) Consolidation / Transformation Layer &  &  \\
+\(T\) Temporal Context Dynamics &  &  \\
+\(D\) Drift Measurement \(\Delta\Phi\) &  &  \\
+\(H\) Fingerprint / Ledger &  &  \\
+\(N\) Negative Controls &  &  \\
+\(V\) Validation Surface &  &  \\
+\(X\) Falsification Surface &  &  \\
+\(G\) Generalization Across Episodes / Tasks &  &  \\
+\(M\) Memory-Promotion Rule &  &  \\
+\(Q\) Metric / Distance Manifest &  &  \\
+\(Z\) Normalization / Multi-Scale Drift &  &  \\
+\(W\) Worked Example / Instantiation &  &  \\
+\(A\) Agent Benchmark / Scalability Layer &  &  \\
+\(I\) Reference Implementation Kernel &  &  \\
+\(J\) Baseline-Family Runtime Comparison &  &  \\
+\(Y\) Runtime Evidence Package / Result Ledger &  &  \\
+\(U\) Implementation Contract &  &  \\
+\(\Psi\) Concrete Task-Suite Schema &  &  \\
+\(\Xi\) Machine-Readable Metric Emission &  &  \\
+\bottomrule
+\end{longtable}
+\end{center}
+
+\[
+\mathrm{EIMTScore}_{v1.4}
+=
+\frac{
+S+F+E+C+B+R+K+P+L+T+D+H+N+V+X+G+M+Q+Z+W+A+I+J+Y+U+\Psi+\Xi
+}{27}.
+\]
+
+\begin{remark}
+EIMTScore measures framework completeness, implementation auditability, and
+benchmark discipline. It does not measure literal truth, clinical validity,
+human-memory equivalence, or biological mechanism proof.
+\end{remark}
+
+%──────────────────────────────────────────────────────────────────────────────
+\section{Validation Layer}
+%──────────────────────────────────────────────────────────────────────────────
+
+A valid EIMT v1.4 runtime analysis must identify:
+
+\begin{enumerate}
+\item domain,
+\item episode schema,
+\item memory-store interface,
+\item metric manifest,
+\item implementation contract,
+\item retrieval operator,
+\item drift-gate threshold,
+\item fallback behavior,
+\item simulation-labeling rule,
+\item replay-efficacy metric,
+\item baseline harness,
+\item concrete benchmark task schema,
+\item primary metric priorities,
+\item machine-readable metric output,
+\item runtime logs,
+\item result ledger,
+\item evidence package,
+\item implementation-readiness class,
+\item falsification conditions,
+\item downgrade path,
+\item memory-promotion candidates.
+\end{enumerate}
+
+%──────────────────────────────────────────────────────────────────────────────
+\section{Falsification Surface}
+%──────────────────────────────────────────────────────────────────────────────
+
+EIMT v1.4 is weakened or rejected if:
+
+\begin{itemize}
+\item no runnable minimal kernel exists for a runtime claim,
+\item no implementation contract is declared,
+\item no concrete benchmark task is provided,
+\item no episode schema is defined,
+\item no metric manifest is declared,
+\item context binding is absent,
+\item retrieval is not cue-dependent,
+\item high-drift retrieval is returned as fact,
+\item source fallback is missing for source-sensitive retrieval,
+\item replay increases drift while being called stabilization,
+\item constructive simulation is treated as recovered memory,
+\item no baseline harness is run,
+\item baselines are evaluated on different task data,
+\item benchmark metrics are selected after results,
+\item metric output is not machine-readable,
+\item logs or result ledgers are absent,
+\item evidence package is incomplete,
+\item vector-only or database-only baselines perform equally well or better,
+\item agent memory is equated with human autonoetic memory,
+\item clinical claims are made without clinical evidence,
+\item benchmark success is treated as biological proof,
+\item coherence is treated as truth.
+\end{itemize}
+
+Compact falsification condition:
+
+\[
+\text{EIMT-A runtime claim}
+\wedge
+\left(
+I=0
+\vee
+U=0
+\vee
+\Psi=0
+\vee
+\Xi=0
+\vee
+J=0
+\vee
+Y=0
+\vee
+K=0
+\vee
+D=0
+\vee
+N=0
+\vee
+X=0
+\right)
+\Rightarrow
+\text{invalid strong runtime classification}.
+\]
+
+%──────────────────────────────────────────────────────────────────────────────
+\section{Upgrade and Downgrade Thresholds}
+%──────────────────────────────────────────────────────────────────────────────
+
+A candidate may be considered for EIMT-A only if:
+
+\[
+\mathrm{EIMTScore}_{v1.4}=1,
+\]
+
+and runtime evidence shows that the EIMT implementation outperforms declared
+baselines on primary benchmark metrics without violating non-claim locks.
+
+A candidate should be classified as EIMT-B if:
+
+\[
+\mathrm{EIMTScore}_{v1.4}<1
+\]
+
+but multiple episodic invariants remain useful and partially supported.
+
+A candidate should be classified as EIMT-C if a simpler non-episodic memory
+model explains the behavior or performs equally well.
+
+A candidate should be classified as EIMT-D if runtime evidence is insufficient.
+
+A candidate should be classified as EIMT-E if the claim is overextended,
+unmeasured, clinically unsupported, benchmark-unsupported, source-free,
+implementation-free, task-free, or dependent on coherence rather than evidence.
+
+%──────────────────────────────────────────────────────────────────────────────
+\section{Repository Record Grammar}
+%──────────────────────────────────────────────────────────────────────────────
+
+A repository-ready EIMT v1.4 project should preserve minimal kernel code,
+task schemas, baselines, benchmark runs, metrics, evidence packages, and result
+ledgers.
+
+\begin{verbatim}
+eimt_reference_kernel/
+  README.md
+  docs/
+    theory/
+      eimt_v1_4.tex
+      source_fidelity.md
+      invariants.md
+      quick_start.md
+    implementation_contract/
+      module_contracts.md
+      runtime_interfaces.md
+      failure_taxonomy.md
+    benchmark_protocol/
+      task_schemas.md
+      baseline_harness.md
+      metric_emission.md
+      falsification_surface.md
+  src/
+    eimt/
+      episode.py
+      memory_store.py
+      metric_manifest.py
+      retrieval_engine.py
+      drift_gate.py
+      source_fallback.py
+      replay_evaluator.py
+      simulation_guard.py
+      baseline_harness.py
+      benchmark_runner.py
+      scoring.py
+      evidence_package.py
+  configs/
+    metric_manifest.json
+    runtime_config.json
+    baseline_config.json
+  tasks/
+    source_recall_001.json
+    boundary_separation_001.json
+    context_shift_001.json
+    long_horizon_001.json
+    replay_compression_001.json
+    constructive_planning_001.json
+  runs/
+    run_<timestamp>/
+      episode_log.jsonl
+      retrieval_log.jsonl
+      fallback_log.jsonl
+      replay_log.jsonl
+      simulation_log.jsonl
+      baseline_results.json
+      benchmark_metrics.json
+      drift_metrics.json
+      metric_emission.json
+      classification.json
+      evidence_package.json
+      result_ledger.jsonl
+  evidence/
+    raw_inputs/
+    processed_outputs/
+    negative_controls/
+    benchmark_packages/
+  ledgers/
+    eimt_evolution_ledger.jsonl
+    eimt_runtime_ledger.jsonl
+    eimt_decision_ledger.jsonl
+  memory/
+    promoted_invariants.md
+    rejected_overclaims.md
+    runtime_failure_lessons.md
+\end{verbatim}
+
+%──────────────────────────────────────────────────────────────────────────────
+\section{Minimal EIMT v1.4 Runtime Evidence JSON Skeleton}
+%──────────────────────────────────────────────────────────────────────────────
+
+\begin{verbatim}
+{
+  "record_id": "EIMT-RUN-0001",
+  "version": "EIMT-v1.4",
+  "runtime_name": "",
+  "domain": "agent_memory",
+  "implementation_readiness": "EIMT-R0/R1/R2/R3/R4/R5",
+  "episode_schema": {
+    "context": "",
+    "content": "",
+    "time": "",
+    "self_or_agent_state": "",
+    "source_ref": "",
+    "ledger_ref": "",
+    "fingerprint": ""
+  },
+  "implementation_contract": {
+    "episode": true,
+    "memory_store": true,
+    "metric_manifest": true,
+    "retrieval_engine": true,
+    "drift_gate": true,
+    "source_fallback": true,
+    "replay_evaluator": true,
+    "simulation_guard": true,
+    "baseline_harness": true,
+    "benchmark_runner": true,
+    "scoring": true,
+    "evidence_package": true
+  },
+  "metric_manifest": {
+    "context_distance": "",
+    "content_distance": "",
+    "time_distance": "",
+    "state_distance": "",
+    "fingerprint_distance": "",
+    "weights": {}
+  },
+  "benchmark_task": {
+    "task_family": "",
+    "task_id": "",
+    "task_schema_valid": false,
+    "ground_truth_ref": "",
+    "baseline_family": []
+  },
+  "metric_emission": {
+    "machine_readable": true,
+    "primary_metrics_declared_before_run": true,
+    "primary_metrics": {},
+    "secondary_metrics": {},
+    "diagnostic_metrics": {}
+  },
+  "baseline_results": [],
+  "drift_report": {
+    "fast_drift": null,
+    "slow_drift": null,
+    "semantic_drift": null,
+    "fingerprint_drift": null,
+    "normalized_total_drift": null
+  },
+  "failure_taxonomy": {
+    "schema_failure": false,
+    "source_failure": false,
+    "drift_failure": false,
+    "boundary_failure": false,
+    "fallback_failure": false,
+    "replay_failure": false,
+    "simulation_failure": false,
+    "baseline_failure": false,
+    "metric_failure": false,
+    "ledger_failure": false,
+    "overclaim_failure": false
+  },
+  "EIMTScore_v1_4": null,
+  "classification": "",
+  "downgrade_path": "",
+  "falsification_note": "",
+  "memory_promotion": {
+    "promote": false,
+    "items": [],
+    "reason": ""
+  },
+  "non_claim_locks": [
+    "not_clinical_guidance",
+    "not_universal_mechanism",
+    "not_ai_equals_human_memory",
+    "coherence_not_truth",
+    "simulation_not_biological_proof",
+    "benchmark_success_not_human_memory_proof",
+    "runtime_success_not_universal_mechanism_proof",
+    "minimal_kernel_not_production_memory"
+  ]
+}
+\end{verbatim}
+
+%──────────────────────────────────────────────────────────────────────────────
+\section{Appendix A — Minimal EIMT v1.4 Runtime Checklist}
+%──────────────────────────────────────────────────────────────────────────────
+
+\begin{enumerate}
+\item Is there a runnable minimal kernel?
+\item Is the implementation contract declared?
+\item Is the episode schema declared?
+\item Is source metadata preserved?
+\item Is the memory-store interface implemented?
+\item Is the metric manifest declared?
+\item Is retrieval cue-dependent?
+\item Is retrieval drift measured?
+\item Is high-drift retrieval gated?
+\item Is source fallback implemented?
+\item Is constructive simulation labeled?
+\item Is replay efficacy measured?
+\item Are baselines implemented?
+\item Are concrete benchmark tasks declared?
+\item Are task schemas valid?
+\item Are primary metrics declared before interpretation?
+\item Are metrics emitted in machine-readable form?
+\item Are runtime logs preserved?
+\item Is an evidence package compiled?
+\item Does EIMT outperform baselines on declared primary metrics?
+\item What implementation-readiness class applies?
+\item What falsifies the runtime claim?
+\item What downgrade class applies?
+\item What, if anything, is memory-promotable?
+\item Are clinical, biological-proof, AI-equivalence, and universal-mechanism
+locks preserved?
+\end{enumerate}
+
+%──────────────────────────────────────────────────────────────────────────────
+\section{Appendix B — Minimal Reference Kernel Pseudocode}
+%──────────────────────────────────────────────────────────────────────────────
+
+\begin{verbatim}
+Input:
+    task_json
+    metric_manifest
+    runtime_config
+    baseline_config
+
+Initialize:
+    validate task schema
+    load episodes
+    build memory store
+    load metric manifest
+    initialize retrieval engine
+    initialize drift gate
+    initialize source fallback
+    initialize replay evaluator
+    initialize simulation guard
+    initialize baseline harness
+    initialize evidence compiler
+
+For each query in task:
+    retrieve candidates from memory store
+    compute distances using metric manifest
+    compute retrieval drift
+    compute omega = 1 / (1 + retrieval_drift)
+
+    if retrieval_drift exceeds threshold:
+        invoke source fallback:
+            abstain / ask context / return uncertainty / source-check
+        log fallback event
+    else:
+        return candidate with:
+            episode id
+            source ref
+            confidence
+            drift report
+        log retrieval event
+
+For replay task:
+    compute drift before replay
+    apply bounded replay or summary
+    compute drift after replay
+    gamma_rho = drift_before - drift_after
+    classify replay:
+        stabilizing / neutral / destabilizing / transformation-only
+
+For planning task:
+    generate plan from retrieved episodes
+    label output as simulation
+    prevent classification as recovered memory
+    compute simulation drift
+
+Run baselines:
+    database lookup
+    vector-only retrieval
+    semantic-only retrieval
+    ungated episodic retrieval
+    summary-only memory
+    random control
+
+Score:
+    compute primary metrics
+    compute secondary metrics
+    compute diagnostic metrics
+    compare EIMT runtime against baselines
+    compute EIMTScore_v1_4
+    assign implementation readiness class
+    classify EIMT-A/B/C/D/E
+
+Compile:
+    runtime logs
+    baseline results
+    metric emission
+    drift report
+    failure taxonomy
+    evidence package
+    result ledger
+
+Promote to memory only:
+    reproducible benchmark wins
+    validated implementation constraints
+    reusable failure lessons
+    stable drift thresholds
+\end{verbatim}
+
+%──────────────────────────────────────────────────────────────────────────────
+\section{Appendix C — Canonical Formula Summary}
+%──────────────────────────────────────────────────────────────────────────────
+
+\[
+E=(c,x,t,s)
+\]
+
+\[
+E_i^{min}
+=
+(c_i,x_i,t_i,s_i,\sigma_i,\ell_i,h_i)
+\]
+
+\[
+\mathcal{K}_{EIMT}
+=
+\{
+E,
+M,
+D,
+R,
+G,
+A,
+P,
+S,
+L,
+B,
+Q,
+Y
+\}
+\]
+
+\[
+\mathcal{D}_{manifest}
+=
+\{d_c,d_x,d_t,d_s,d_H,w_c,w_x,w_t,w_s,w_H\}
+\]
+
+\[
+\Omega^{episodic}_k
+=
+\frac{1}{1+|\Delta\Phi^{retrieval}_k|}
+\]
+
+\[
+\mathcal{R}^{gated}(q,\mathcal{M})
+=
+\Omega^{episodic}_k\mathcal{R}(q,\mathcal{M})
++
+(1-\Omega^{episodic}_k)\mathcal{A}(q)
+\]
+
+\[
+\mathcal{A}(q)
+\in
+\{
+\text{abstain},
+\text{ask},
+\text{uncertain},
+\text{source-check},
+\text{audit}
+\}
+\]
+
+\[
+\Gamma_{\rho}
+=
+\Delta\Phi^{episodic}_{pre}
+-
+\Delta\Phi^{episodic}_{post}
+\]
+
+\[
+\mathcal{T}_{EIMT}
+=
+\{
+T_{source},
+T_{boundary},
+T_{context},
+T_{long},
+T_{replay},
+T_{planning}
+\}
+\]
+
+\[
+\mathcal{L}_{memory}
+=
+\{
+L_{db},
+L_{vec},
+L_{sem},
+L_{ungated},
+L_{summary},
+L_{random}
+\}
+\]
+
+\[
+\mathrm{EIMTScore}_{v1.4}
+=
+\frac{
+S+F+E+C+B+R+K+P+L+T+D+H+N+V+X+G+M+Q+Z+W+A+I+J+Y+U+\Psi+\Xi
+}{27}
+\]
+
+%──────────────────────────────────────────────────────────────────────────────
+\section{Concluding Compression}
+%──────────────────────────────────────────────────────────────────────────────
+
+EIMT v1.4 names the minimal implementation-ready form of episodic memory
+invariance:
+
+\[
+\boxed{
+\text{an episodic-memory framework becomes implementation-ready only when}
+\atop
+\text{its smallest valid kernel can run concrete tasks, compare baselines,}
+\atop
+\text{emit metrics, preserve logs, and compile evidence packages.}
+}
+\]
+
+The implementer statement is:
+
+\[
+\boxed{
+\text{an EIMT runtime must store source-bound episodes, retrieve by cue,}
+\atop
+\text{measure drift, gate uncertainty, invoke source fallback, label}
+\atop
+\text{simulation, test replay, and refuse high-drift reconstruction as fact.}
+}
+\]
+
+The benchmark statement is:
+
+\[
+\boxed{
+\text{benchmarks become meaningful only when tasks contain concrete episodes,}
+\atop
+\text{queries, ground truth, distractors, baselines, metric rules, and}
+\atop
+\text{machine-readable outputs.}
+}
+\]
+
+The evidence statement is:
+
+\[
+\boxed{
+\text{execution without logs is not evidence;}
+\quad
+\text{benchmarks without baselines are not strong support;}
+\quad
+\text{tasks without ground truth are not benchmark tasks.}
+}
+\]
+
+The philosophical statement remains:
+
+\[
+\boxed{
+\text{episodic coherence is not perfect recall and not fiction;}
+\quad
+\text{it is bounded reconstructive stability.}
+}
+\]
+
+Thus, EIMT v1.4 upgrades EIMT from reference implementation specification to
+minimal runnable-kernel and concrete benchmark-task-suite layer while preserving
+source fidelity, clinical caution, AI-human distinction, falsification,
+negative controls, downgrade discipline, implementation auditability, and
+non-universal mechanism boundaries.
+
+\end{document}
No results found