Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Select an option

  • Save jacksonjp0311-gif/a04a883ff7f28078d68ec77c8d9eb256 to your computer and use it in GitHub Desktop.

Select an option

Save jacksonjp0311-gif/a04a883ff7f28078d68ec77c8d9eb256 to your computer and use it in GitHub Desktop.
Codex-format source-bounded extraction of “The Last Human-Written Paper: Agent-Native Research Artifacts” into a governance-ready ARA-X v1.0 scaffold preserving Storytelling Tax, Engineering Tax, four-layer ARA architecture, Live Research Manager, ARA Compiler, ARA Seal, evaluation results, limitations, and non-claim locks.
% ████████████████████████████████████████████████████████████████████████████████
%
% CODEX ΔΦ — AGENT-NATIVE RESEARCH ARTIFACT EXTRACTION (ARA-X v1.0)
% ────────────────────────────────────────────────────────────────────────────
% SOURCE-BOUNDED EXTRACTION OF “THE LAST HUMAN-WRITTEN PAPER:
% AGENT-NATIVE RESEARCH ARTIFACTS” INTO A CITA-STYLE GOVERNANCE FORMAT
% WITHOUT CLAIMING A NEW CODEX THEORY, UNIVERSAL RESEARCH LAW, OR
% UNSUPPORTED EXTENSION BEYOND THE PDF
%
% VERSION
% ───────
% v1.0 — Source-Fidelity Extraction Layer · Locked ·
% Storytelling-Tax, Engineering-Tax, Four-Layer ARA Protocol,
% Live Research Manager, ARA Compiler, ARA Seal, ARA-Native Review,
% Human+AI Research Network, Evaluation Results, Limitations, and
% Extraction-Safe Algorithmic Operators
%
% AUTHOR
% ──────
% James Paul Jackson
% X / Twitter: @unifiedenergy11
%
% SOURCE EXTRACTION / AUTHOR ATTRIBUTION
% ──────────────────────────────────────
% This document is a Codex-format extraction derived from the uploaded paper:
%
% Jiachen Liu, Jiaxin Pei, Jintao Huang, Chenglei Si, Ao Qu,
% Xiangru Tang, Runyu Lu, Lichang Chen, Xiaoyan Bai,
% Haizhong Zheng, Carl Chen, Zhiyang Chen, Haojie Ye,
% Yujuan Fu, Zexue He, Zijian Jin, Zhenyu Zhang,
% Shangquan Sun, Maestro Harmon, John Dianzhuo Wang,
% Jianqiao Zeng, Jiachen Sun, Mingyuan Wu, Baoyu Zhou,
% Yuchen You, Shijian Lu, Yiming Qiu, Fan Lai, Yuan Yuan,
% Yao Li, Junyuan Hong, Ruihao Zhu, Beidi Chen,
% Alex Pentland, Ang Chen, Mosharaf Chowdhury, Zechen Zhang,
%
% “The Last Human-Written Paper: Agent-Native Research Artifacts,”
% arXiv:2604.24658v1 [cs.LG], dated April 27 / April 28, 2026.
%
% Source affiliations listed in the paper include:
%
% Orchestra Research, Stanford University, Cornell University,
% Ohio State University, MIT, Yale University, University of Michigan,
% Meta Superintelligence Labs, University of Chicago,
% Carnegie Mellon University, University of Washington,
% University of Toronto, NVIDIA, Meta,
% Nanyang Technological University, Harvard University, LinkedIn,
% UIUC, Arizona State University, Stony Brook University,
% University of Hong Kong, Boston College, Portland State University,
% National University of Singapore, and New York University.
%
% The paper argues that scientific publication compresses a rich, branching,
% iterative research process into a linear narrative, discarding most of what
% was discovered along the way. This compilation creates two structural costs:
%
% • Storytelling Tax:
% failed experiments, rejected hypotheses, dead ends, pivots, and
% branching exploration history are discarded to fit a linear story.
%
% • Engineering Tax:
% reviewer-sufficient prose fails to provide the agent-sufficient
% operational specification needed for reproduction and extension.
%
% The paper introduces the Agent-Native Research Artifact (ARA), a protocol
% that replaces the narrative paper as the primary research object with a
% machine-executable research package organized into four interlocking layers:
%
% (1) scientific logic,
% (2) executable code with full specifications,
% (3) an exploration graph preserving failed and branching research paths,
% (4) grounded evidence binding claims to raw outputs.
%
% The paper further introduces three ecosystem mechanisms:
%
% • Live Research Manager,
% • ARA Compiler,
% • ARA-native review / ARA Seal system.
%
% It also frames a broader (Human+AI)^2 research network in which researchers
% work through agents, ARAs become canonical artifacts, and downstream agents
% retrieve, fork, extend, verify, and render these artifacts into paper, video,
% slides, demos, or grounded dialogue.
%
% This extraction preserves the source claims without merging them into a
% broader Codex theory. Codex terminology is used only as formatting,
% compression, and governance scaffolding.
%
% DATE
% ────
% April 2026
%
% STATUS
% ──────
% SOURCE-BOUNDED EXTRACTION OF ARA PAPER —
% NOT A COMBINED THEORY · NOT A UNIVERSAL SCIENTIFIC LAW ·
% NOT A CLAIM THAT CODEX INVENTED ARA · NOT A CLAIM OF VALIDITY BEYOND
% THE PAPER’S STATED SCOPE
%
% EMPIRICAL / METHODOLOGICAL CONFIDENCE BADGE
% ────────────────────────────────────────────
% Confidence status: High as an extraction of the uploaded PDF’s structure;
% not independently validated beyond the paper’s reported experiments.
%
% The source paper reports:
%
% • PaperBench reproduction requirements:
% 8,921 expert-annotated requirements across 23 ICML 2024 papers.
% Only 45.4% are fully specified in the source PDF.
% Code development is the most underspecified category
% at 37.3% sufficient.
% Missing hyperparameters account for 26.2% of all gaps.
%
% • RE-Bench / METR trajectory analysis:
% 24,008 agent runs across 21 frontier models.
% Failed runs account for 90.2% of total dollar cost and
% 59.2% of tokens.
% Median failed-to-success token ratio is reported as 113×.
%
% • Understanding evaluation:
% ARA raises question-answering accuracy from 72.4% to 93.7%.
%
% • Reproduction evaluation:
% ARA raises difficulty-weighted reproduction success from
% 57.4% to 64.4%.
%
% • Extension evaluation:
% preserved failure traces can accelerate progress on open-ended
% RE-Bench tasks, but may also constrain a capable agent from stepping
% outside prior-run exploration boundaries depending on capability.
%
% PURPOSE
% ───────
% Extract the ARA paper into a reusable, source-faithful, audit-ready
% artifact skeleton:
%
% narrative research paper
% → lossy compilation diagnosis
% → Storytelling Tax
% → Engineering Tax
% → Knowledge over Narrative principle
% → ARA four-layer protocol
% → cross-layer forensic bindings
% → Live Research Manager
% → ARA Compiler
% → ARA Seal / review pipeline
% → (Human+AI)^2 research network
% → evaluation surfaces
% → limitations
% → extraction-safe algorithmic primitives.
%
% VERSION EVOLUTION SUMMARY
% ─────────────────────────
% v1.0 : First source-bounded Codex-format extraction of the ARA paper.
% Converts the paper’s claims into compact primitives, operators,
% file-system grammar, validation surfaces, falsification surfaces,
% non-claim locks, and extraction-safe algorithmic summaries without
% merging ARA into a new Codex theory.
%
% WHAT THIS IS
% ────────────
% • A source-bounded extraction of the uploaded ARA paper
% • A structured restatement of the paper’s core thesis
% • A compression of the ARA protocol into primitives and operators
% • A formal extraction of Storytelling Tax and Engineering Tax
% • A representation of the four ARA layers
% • A summary of the Knowledge over Narrative design principle
% • A summary of cross-layer forensic bindings
% • A summary of the Live Research Manager
% • A summary of the ARA Compiler
% • A summary of the ARA Seal and review pipeline
% • A summary of the (Human+AI)^2 research network
% • A source-faithful algorithmic skeleton for agent-native research artifacts
% • A downgrade-preserving note on limitations and failure-trace risk
%
% WHAT THIS IS NOT
% ───────────────
% • Not a combined Codex + ARA theory
% • Not ANTA
% • Not a claim that Codex authored ARA
% • Not a universal law of science
% • Not proof that all disciplines should use ARA
% • Not proof that ARA always outperforms papers
% • Not proof that failure traces always help
% • Not a claim that machine verification replaces human judgment
% • Not permission to treat executable artifacts as automatically true
% • Not permission to treat benchmark success as universal validation
% • Not permission to treat ARA Seal success as proof of significance
% • Not permission to treat reproduction checks as novelty checks
%
% ADDITIVE REFINEMENTS (v1.0)
% ───────────────────────────
% • Source-fidelity extraction boundary added
% • Full source-author attribution added
% • Storytelling Tax extracted
% • Engineering Tax extracted
% • Knowledge over Narrative principle extracted
% • Four-layer ARA protocol extracted
% • Forensic binding layer extracted
% • Live Research Manager extracted
% • ARA Compiler extracted
% • ARA Seal / review pipeline extracted
% • (Human+AI)^2 network extracted
% • Evaluation result surface extracted
% • Capability-relative sufficiency principle extracted
% • Failure-trace constraint warning extracted
% • Non-claim locks added
%
% EXECUTABLE ANCHOR BLOCK (v1.0)
% ──────────────────────────────
% A valid ARA-X extraction must:
%
% (1) preserve the paper’s source claims,
% (2) preserve full paper author attribution,
% (3) distinguish paper claims from Codex interpretation,
% (4) identify the Storytelling Tax,
% (5) identify the Engineering Tax,
% (6) preserve the Knowledge over Narrative principle,
% (7) preserve the four ARA layers,
% (8) preserve cross-layer forensic bindings,
% (9) preserve the Live Research Manager mechanism,
% (10) preserve the ARA Compiler mechanism,
% (11) preserve the ARA Seal / review mechanism,
% (12) preserve the (Human+AI)^2 network framing,
% (13) preserve reported benchmark results only as reported results,
% (14) preserve the limitation that failure traces can sometimes constrain,
% (15) avoid claiming universal validity,
% (16) avoid claiming this is a combined theory,
% (17) and classify all extensions beyond the PDF as speculative unless
% independently grounded.
%
% CANONICAL LOCK (v1.0)
% ─────────────────────
% • This is an extraction of the uploaded PDF
% • Source boundaries must be preserved
% • Full source-author attribution must be preserved
% • ARA must not be collapsed into Codex
% • Codex language may format the extraction but must not overwrite the source
% • Reported numbers remain paper-reported numbers
% • Failure traces are valuable but not always beneficial
% • Machine verification does not replace human judgment
% • Executability does not equal truth
% • Reproducibility checks do not prove significance
% • ARA Seal does not replace novelty, ethics, significance, or taste review
% • Extension beyond the PDF must be labeled as extension
%
% Evolutions must be additive only.
% Do not weaken source fidelity, author attribution, non-claim boundaries,
% evidence caution, benchmark caution, or the distinction between extraction
% and theory synthesis.
%
% AI PROMPT TRACEABILITY
% ──────────────────────
% Use this document as the source-bounded extraction of the uploaded ARA paper.
% Preserve the distinction between source claim, extraction, interpretation,
% algorithmic compression, benchmark result, and speculative extension.
%
% SHADOW HEADER ALIGNMENT SEAL
% ───────────────────────────
% Preserve header discipline across future versions except for explicitly
% additive refinements that improve source fidelity, author attribution,
% extraction clarity, limitation tracking, validation, falsification, or
% implementation grammar.
%
% ████████████████████████████████████████████████████████████████████████████████
\documentclass[12pt]{article}
\usepackage[margin=1in]{geometry}
\usepackage{amsmath,amssymb,amsfonts,amsthm}
\usepackage{booktabs,longtable,array}
\usepackage{hyperref}
\usepackage{listings}
\newtheorem{axiom}{Axiom}
\newtheorem{definition}{Definition}
\newtheorem{proposition}{Proposition}
\newtheorem{hypothesis}{Hypothesis}
\newtheorem{remark}{Remark}
\newtheorem{corollary}{Corollary}
\title{\textbf{Codex $\Delta\Phi$ — Agent-Native Research Artifact Extraction (ARA-X v1.0)}\\
\large Source-Bounded Extraction of ``The Last Human-Written Paper: Agent-Native Research Artifacts''}
\author{\textbf{James Paul Jackson}\\[4pt]
\small Codex-format source extraction layer\\
\small \texttt{@unifiedenergy11}}
\date{April 2026}
\begin{document}
\maketitle
\begin{abstract}
ARA-X v1.0 extracts the uploaded paper
\emph{The Last Human-Written Paper: Agent-Native Research Artifacts}
into a source-bounded Codex-format artifact. The paper argues that conventional
scientific papers impose two structural costs: the Storytelling Tax, which
erases failed experiments, rejected hypotheses, pivots, dead ends, and
branching exploration history; and the Engineering Tax, which leaves a gap
between reviewer-sufficient prose and agent-sufficient execution detail. The
paper introduces the Agent-Native Research Artifact (ARA), a machine-executable
research package organized around scientific logic, executable code with full
specifications, an exploration graph, and grounded evidence. This extraction
formalizes the paper's primitives, operators, architecture, review mechanisms,
evaluation claims, and limitations without merging ARA into a broader Codex
theory.
\end{abstract}
%──────────────────────────────────────────────────────────────────────────────
\section{Source Paper and Author Attribution}
%──────────────────────────────────────────────────────────────────────────────
The source paper is:
\[
\textit{The Last Human-Written Paper: Agent-Native Research Artifacts}.
\]
The author list preserved from the uploaded PDF is:
\begin{center}
\begin{minipage}{0.92\linewidth}
Jiachen Liu,
Jiaxin Pei,
Jintao Huang,
Chenglei Si,
Ao Qu,
Xiangru Tang,
Runyu Lu,
Lichang Chen,
Xiaoyan Bai,
Haizhong Zheng,
Carl Chen,
Zhiyang Chen,
Haojie Ye,
Yujuan Fu,
Zexue He,
Zijian Jin,
Zhenyu Zhang,
Shangquan Sun,
Maestro Harmon,
John Dianzhuo Wang,
Jianqiao Zeng,
Jiachen Sun,
Mingyuan Wu,
Baoyu Zhou,
Yuchen You,
Shijian Lu,
Yiming Qiu,
Fan Lai,
Yuan Yuan,
Yao Li,
Junyuan Hong,
Ruihao Zhu,
Beidi Chen,
Alex Pentland,
Ang Chen,
Mosharaf Chowdhury,
Zechen Zhang.
\end{minipage}
\end{center}
\begin{remark}
The extraction names all visible source authors from the paper title page.
This document's author field names the Codex-format extractor, not the source
paper's scientific authors.
\end{remark}
%──────────────────────────────────────────────────────────────────────────────
\section{Core-Invariant Extraction Block}
%──────────────────────────────────────────────────────────────────────────────
The shortest faithful extraction of the paper is:
\[
\boxed{
\begin{array}{c}
\text{A research paper is a lossy compiled view of a richer branching}\\
\text{research object; ARA restores that object as a machine-executable}\\
\text{package containing logic, code, exploration history, and evidence.}
\end{array}
}
\]
The core source transformation is:
\[
\text{narrative paper}
\rightarrow
\text{agent-native research artifact}.
\]
Expanded:
\[
\text{research process}
\rightarrow
\text{scientific logic}
\rightarrow
\text{executable specification}
\rightarrow
\text{exploration graph}
\rightarrow
\text{evidence grounding}
\rightarrow
\text{operable artifact}.
\]
\begin{remark}
The source paper does not merely propose a new file format. It proposes that
the primary research object should shift from a linear human narrative to a
structured, executable, forkable, agent-operable knowledge package.
\end{remark}
%──────────────────────────────────────────────────────────────────────────────
\section{Source Boundary and Non-Claim Layer}
%──────────────────────────────────────────────────────────────────────────────
This extraction preserves the following boundaries:
\begin{enumerate}
\item The source is the uploaded ARA paper.
\item The extraction is not a combined Codex theory.
\item The formulas below compress the paper's claims into operator form.
\item Any algorithmic names introduced here are extraction labels, not claims
from the authors unless explicitly present in the paper.
\item Reported benchmark numbers are treated as paper-reported results.
\item ARA is scoped by the paper primarily to computer science / machine
learning research artifacts where code, configurations, experiments, and
agentic reproduction are central.
\item The paper does not claim that ARA eliminates human judgment.
\item The paper explicitly reserves significance, novelty, taste, ethics,
problem formulation, and contested machine findings for human reviewers
after mechanical and empirical checks.
\end{enumerate}
%──────────────────────────────────────────────────────────────────────────────
\section{Problem Extraction: The Two Taxes}
%──────────────────────────────────────────────────────────────────────────────
The paper identifies two structural losses imposed by narrative compilation.
\subsection{Storytelling Tax}
\begin{definition}[Storytelling Tax]
The Storytelling Tax is the systematic erasure of branching research-process
knowledge when months of exploration are compressed into a polished linear
paper.
\end{definition}
It discards:
\[
\{\text{failed experiments},
\text{rejected hypotheses},
\text{abandoned approaches},
\text{dead ends},
\text{pivots},
\text{design alternatives},
\text{human judgment signals}\}.
\]
The paper's compression law may be extracted as:
\[
\mathcal{P}_{paper}
=
\mathrm{Compile}_{narrative}(\mathcal{R}_{branching}),
\]
where:
\[
\mathcal{R}_{branching}
=
\{\text{questions},\text{decisions},\text{experiments},
\text{failures},\text{pivots},\text{evidence},\text{judgment signals}\}.
\]
The loss is:
\[
\mathcal{L}_{story}
=
\mathcal{R}_{branching}
-
\mathcal{P}_{paper}.
\]
The source paper quantifies this cost through RE-Bench / METR trajectory data:
\[
\text{failed-run cost share}=90.2\%,
\]
\[
\text{failed-run token share}=59.2\%,
\]
\[
\text{median failed-to-success token ratio}=113\times.
\]
\begin{remark}
The paper frames failure traces as economically and epistemically valuable:
they prevent future agents from rediscovering the same dead ends and preserve
human judgment signals as structured supervision.
\end{remark}
\subsection{Engineering Tax}
\begin{definition}[Engineering Tax]
The Engineering Tax is the gap between prose sufficient to convince human
reviewers and operational specification sufficient for agents to reproduce or
extend the work.
\end{definition}
It includes missing or underspecified:
\[
\{\text{hyperparameters},
\text{environment},
\text{hardware},
\text{seeds},
\text{implementation details},
\text{baseline details},
\text{implicit assumptions},
\text{configuration choices},
\text{instrumentation requirements}\}.
\]
The paper's engineering-gap structure may be extracted as:
\[
\mathcal{L}_{eng}
=
\mathcal{S}_{agent}
-
\mathcal{S}_{paper},
\]
where:
\[
\mathcal{S}_{agent}
=
\text{agent-sufficient execution specification},
\]
and:
\[
\mathcal{S}_{paper}
=
\text{reviewer-sufficient narrative specification}.
\]
The source paper reports, from PaperBench requirements:
\[
\text{fully specified in PDF}=45.4\%,
\]
\[
\text{code-development sufficiency}=37.3\%,
\]
\[
\text{missing hyperparameter gap share}=26.2\%.
\]
\begin{remark}
The paper's core diagnosis is that human-readable publication conventions
discard precisely the information AI agents need for understanding,
reproduction, and extension.
\end{remark}
%──────────────────────────────────────────────────────────────────────────────
\section{Design Principle: Knowledge over Narrative}
%──────────────────────────────────────────────────────────────────────────────
The source paper grounds ARA in the principle:
\[
\boxed{
\text{Knowledge over Narrative.}
}
\]
Meaning:
\[
\text{organized evolving research knowledge}
>
\text{linear narrative paper}.
\]
The narrative paper becomes:
\[
\text{compiled view}
\]
of an underlying:
\[
\text{structured research object}.
\]
\begin{definition}[Knowledge over Narrative]
Knowledge over Narrative is the paper's design principle that the organized,
evolving knowledge produced during research is the primary scientific object,
while the paper is merely one rendered view of that object.
\end{definition}
%──────────────────────────────────────────────────────────────────────────────
\section{ARA Master Definition}
%──────────────────────────────────────────────────────────────────────────────
\begin{definition}[Agent-Native Research Artifact]
An Agent-Native Research Artifact is a machine-executable research package
whose primary function is not linear reading but agent operation: querying
claims, executing specifications, traversing exploration history, and verifying
claims against raw evidence.
\end{definition}
The master ARA operator is:
\[
\mathcal{ARA}:
\mathcal{R}_{branching}
\mapsto
\mathcal{K}_{agent},
\]
where:
\[
\mathcal{K}_{agent}
=
\{
\mathcal{L}_{logic},
\mathcal{L}_{physical},
\mathcal{G}_{trace},
\mathcal{E}_{evidence}
\}.
\]
Thus:
\[
\boxed{
\mathcal{ARA}
=
\text{logic}
+
\text{code}
+
\text{trace}
+
\text{evidence}.
}
\]
The agent-facing questions ARA materializes are:
\[
\{\text{why does it work},
\text{how is it implemented},
\text{what was tried},
\text{what are the numbers}\}.
\]
%──────────────────────────────────────────────────────────────────────────────
\section{ARA Four-Layer Architecture}
%──────────────────────────────────────────────────────────────────────────────
The source paper defines four ARA layers.
\subsection{Cognitive Layer}
\[
\mathcal{L}_{logic}
=
\{
\texttt{problem.md},
\texttt{solution/},
\texttt{claims.md},
\texttt{experiments.md},
\texttt{related\_work.md}
\}.
\]
Function:
\[
\text{what was done}
+
\text{why it works}
+
\text{which claims are falsifiable}
+
\text{which experiments verify them}.
\]
\begin{definition}[Cognitive Layer]
The Cognitive Layer is the structured scientific-logic layer that converts
conceptual contribution, claims, formal concepts, verification plans, and typed
related-work dependencies into queryable, agent-operable files.
\end{definition}
Within this layer:
\[
\texttt{claims.md}
\rightarrow
\{\text{statement},\text{status},\text{falsification criteria},\text{proof}\}.
\]
\[
\texttt{related\_work.md}
\rightarrow
\{\text{imports},\text{bounds},\text{baselines},\text{dependency graph}\}.
\]
\subsection{Physical Layer}
\[
\mathcal{L}_{physical}
=
\{
\texttt{src/},
\texttt{configs/},
\texttt{environment.md},
\texttt{index.md}
\}.
\]
Function:
\[
\text{how the work is implemented}
+
\text{which code implements which claim}
+
\text{which dependencies, seeds, hardware, and parameters are required}.
\]
\begin{definition}[Physical Layer]
The Physical Layer is the executable-code and operational-specification layer.
It contains either a compact kernel for algorithmic contributions or a full
repository with an index mapping source files to ARA components.
\end{definition}
The source paper distinguishes:
\[
\text{kernel mode}
\quad
\text{for algorithmic contributions}
\]
and:
\[
\text{repository mode}
\quad
\text{for systems contributions}.
\]
\subsection{Exploration Graph}
\[
\mathcal{G}_{trace}
=
(V,E,\tau,\pi,\ell),
\]
where:
\[
V=\text{research nodes},
\quad
E=\text{parent/child and dependency edges},
\quad
\tau=\text{node type},
\]
\[
\pi=\text{provenance},
\quad
\ell=\text{lesson or rationale}.
\]
Node types:
\[
\tau
\in
\{
\text{question},
\text{decision},
\text{experiment},
\text{dead\_end},
\text{pivot}
\}.
\]
\begin{definition}[Exploration Graph]
The Exploration Graph is the branching research DAG that preserves questions,
decisions, experiments, dead ends, and pivots that narrative papers normally
discard.
\end{definition}
The source paper describes it as:
\[
\text{a git log for research}.
\]
\subsection{Evidence Layer}
\[
\mathcal{E}_{evidence}
=
\{
\texttt{results/},
\texttt{logs/},
\texttt{metrics},
\texttt{raw outputs},
\texttt{diagnostics}
\}.
\]
Function:
\[
\text{ground every claim in raw empirical output}.
\]
\begin{definition}[Evidence Layer]
The Evidence Layer stores machine-readable outputs, metrics, logs, training
curves, resource usage, and diagnostics that ground claims through explicit
proof chains.
\end{definition}
The proof-chain pattern is:
\[
\texttt{claims.md}
\rightarrow
\texttt{experiments.md}
\rightarrow
\texttt{/evidence/}.
\]
The source paper also notes that withholding ground truth enables layered access
control:
\[
\text{experiment logic in /logic}
\quad
\text{and}
\quad
\text{exact results in /evidence}.
\]
%──────────────────────────────────────────────────────────────────────────────
\section{Forensic Binding Extraction}
%──────────────────────────────────────────────────────────────────────────────
The paper's cross-layer binding structure can be formalized as:
\[
C_i
\rightarrow
X_j
\rightarrow
K_m
\rightarrow
R_n,
\]
where:
\[
C_i=\text{claim},
\quad
X_j=\text{experiment},
\quad
K_m=\text{code or kernel},
\quad
R_n=\text{raw result}.
\]
A claim is ARA-grounded only if:
\[
\exists X_j,K_m,R_n
\quad
\text{such that}
\quad
C_i \leftrightarrow X_j \leftrightarrow K_m \leftrightarrow R_n.
\]
\begin{definition}[Forensic Binding]
A forensic binding is a traceable cross-layer link between a claim, the
experiment that tests it, the code that implements it, and the evidence that
grounds it.
\end{definition}
\begin{proposition}[Claim Operability Principle]
A claim becomes agent-operable only when it can be traversed downstream to
code and evidence and upstream to motivation, assumptions, and exploration
history.
\end{proposition}
%──────────────────────────────────────────────────────────────────────────────
\section{Live Research Manager Extraction}
%──────────────────────────────────────────────────────────────────────────────
The Live Research Manager captures research process knowledge during ordinary
human-agent development.
\[
\mathrm{LRM}
=
\mathrm{MaturityTracker}
\circ
\mathrm{EventRouter}
\circ
\mathrm{ContextHarvester}.
\]
\subsection{Design Principles}
The paper gives three Live Research Manager principles:
\[
P_1=\text{silent, framework-independent integration},
\]
\[
P_2=\text{faithful epistemic provenance},
\]
\[
P_3=\text{comprehensive trajectory capture}.
\]
\subsection{Context Harvester}
\[
\mathrm{CH}:
\text{session record}
\rightarrow
\text{candidate research events}.
\]
Inputs include:
\[
\{\text{conversation},
\text{tool outputs},
\text{experiment results},
\text{code diffs},
\text{researcher confirmations}\}.
\]
\subsection{Event Router}
\[
\mathrm{ER}:
\text{candidate event}
\rightarrow
(\text{event type},\text{provenance},\text{ARA layer}).
\]
The paper identifies event types:
\[
\{
\text{decision},
\text{experiment},
\text{dead\_end},
\text{pivot},
\text{claim},
\text{heuristic},
\text{observation}
\}.
\]
Payloads:
\[
\text{decision}
=
\{\text{choice},\text{alternatives},\text{evidence}\},
\]
\[
\text{experiment}
=
\{\text{metrics},\text{claim linkage}\},
\]
\[
\text{dead\_end}
=
\{\text{hypothesis},\text{failure mode},\text{lesson}\},
\]
\[
\text{pivot}
=
\{\text{trigger},\text{rationale}\},
\]
\[
\text{claim}
=
\{\text{statement},\text{falsification criteria}\},
\]
\[
\text{heuristic}
=
\{\text{trick},\text{sensitivity},\text{bounds}\},
\]
\[
\text{observation}
=
\{\text{raw finding},\text{awaiting classification}\}.
\]
Provenance tags include:
\[
\{\text{user},\text{ai-suggested},\text{ai-executed},\text{user-revised}\}.
\]
An AI-suggested event does not auto-upgrade until researcher confirmation.
\subsection{Maturity Tracker}
\[
\mathrm{MT}:
\text{staged observations}
\rightarrow
\text{formal entries}.
\]
Closure signals include:
\[
\{\text{topic abandonment},
\text{explicit researcher affirmation},
\text{empirical resolution},
\text{artifact-level commitment}\}.
\]
\begin{remark}
The Live Research Manager is not described as replacing researcher judgment.
It captures and crystallizes research traces already produced during human-AI
collaboration.
\end{remark}
%──────────────────────────────────────────────────────────────────────────────
\section{ARA Compiler Extraction}
%──────────────────────────────────────────────────────────────────────────────
The ARA Compiler translates legacy sources into ARA format.
\[
\mathrm{Compiler}:
\{
\text{PDF},
\text{repo},
\text{datasets},
\text{rubrics},
\text{trajectory logs}
\}
\mapsto
\mathcal{ARA}.
\]
The paper's compiler principles are:
\[
\text{universal input}
\rightarrow
\text{canonical output},
\]
\[
\text{high-fidelity preservation},
\]
\[
\text{knowledge lineage rather than flat extraction}.
\]
\subsection{Compiler Stages}
\[
\mathrm{Compiler}
=
\mathrm{ExplorationGraphExtraction}
\circ
\mathrm{PhysicalGrounding}
\circ
\mathrm{CognitiveMapping}
\circ
\mathrm{SemanticDeconstruction}.
\]
\subsubsection{Semantic Deconstruction}
\[
\text{narrative prose}
\rightarrow
\text{fact-dense research content}.
\]
Extracts:
\[
\{\text{formulations},\text{configs},\text{results},
\text{dependencies},\text{failed approaches}\}.
\]
\subsubsection{Cognitive Mapping}
\[
\text{research content}
\rightarrow
\texttt{/logic}.
\]
Creates:
\[
\{\text{motivation chain},
\text{claims},
\text{proof pointers},
\text{formal concepts},
\text{solution structure}\}.
\]
\subsubsection{Physical Grounding}
\[
\text{logic}
+
\text{repo/code}
\rightarrow
\texttt{/src}
+
\texttt{configs}
+
\texttt{environment.md}.
\]
Includes code-paper reconciliation and extraction of tacit knowledge.
\subsubsection{Exploration Graph Extraction}
\[
\text{trajectory evidence}
\rightarrow
\texttt{/trace/exploration\_tree.yaml}.
\]
Adds:
\[
\{\text{dead-end leaf nodes},
\text{failure modes},
\text{lessons},
\text{pivots}\}.
\]
\subsection{Compiler Validation Loop}
\[
\text{generate}
\rightarrow
\text{validate}
\rightarrow
\text{fix}
\rightarrow
\text{repeat}.
\]
The paper reports that the generate-validate-fix loop usually converges in:
\[
1\text{--}3
\]
passes for the evaluated artifacts, while the compiler uses only ARA Seal
Level 1 as an in-loop validation signal.
\subsection{Source-Aware Enrichment}
Auxiliary sources route into the layers they most directly populate:
\[
\text{code repositories}
\rightarrow
\texttt{/src},
\]
\[
\text{expert rubrics}
\rightarrow
\texttt{/logic},
\]
\[
\text{trajectory logs}
\rightarrow
\texttt{/trace}.
\]
When a library of prior ARAs exists, the Compiler may perform collective
inference, adding inferred patterns as:
\[
\texttt{collective\_inference}
\]
rather than treating them as source-stated facts.
%──────────────────────────────────────────────────────────────────────────────
\section{ARA Seal and Review Extraction}
%──────────────────────────────────────────────────────────────────────────────
The paper defines the ARA Seal as a machine-verifiable research credential.
\[
\mathrm{Seal}(A)
=
\{L_1,L_2,L_3\}.
\]
\subsection{Level 1: Structural Integrity}
\[
L_1(A)
=
\text{schema conformance}
+
\text{directory ontology}
+
\text{cross-layer reference resolution}
+
\text{required field completeness}.
\]
Examples include:
\[
\texttt{claims.md}
\rightarrow
\{\text{Statement},\text{Status},\text{Falsification},\text{Proof}\},
\]
and:
\[
\texttt{heuristics}
\rightarrow
\{\text{Rationale},\text{Sensitivity},\text{Bounds}\}.
\]
\subsection{Level 2: Argumentative Rigor}
\[
L_2(A)
=
\text{rubric-grounded rigor audit}.
\]
The paper identifies dimensions including:
\[
\{\text{evidence relevance},
\text{falsifiability quality},
\text{methodological rigor},
\text{scope calibration},
\text{argument coherence},
\text{exploration integrity}\}.
\]
Findings are collected with:
\[
\{\text{severity},\text{verbatim evidence spans},\text{actionable suggestions}\}.
\]
Severity classes include:
\[
\{\text{critical},\text{major},\text{minor},\text{suggestion}\}.
\]
\subsection{Level 3: Execution Reproducibility}
\[
L_3(A)
=
\text{budget-aware directional reproduction of central claims}.
\]
The verifying agent is isolated from the evidence layer:
\[
\text{code kernel + algorithm description}
\quad
\text{without}
\quad
\text{reported numbers}.
\]
This prevents reproduction agents from copying expected values instead of
testing the claim directionally.
\begin{proposition}[Seal Separation Principle]
The ARA Seal separates mechanical verification and empirical checking from
human judgment. Machines check structure, rigor, and reproduction signals;
humans judge significance, novelty, ethics, problem formulation, and taste.
\end{proposition}
%──────────────────────────────────────────────────────────────────────────────
\section{ARA-Native Review Pipeline}
%──────────────────────────────────────────────────────────────────────────────
The review pipeline is:
\[
\text{Stage 1: Conceptual Verification}
\rightarrow
\text{Stage 2: Empirical Verification}
\rightarrow
\text{Stage 3: Human Review}.
\]
\subsection{Stage 1: Conceptual Verification}
Checks:
\[
L_1 + L_2.
\]
Outputs:
\[
\text{CI report}
+
\text{rigor report}.
\]
\subsection{Stage 2: Empirical Verification}
Checks:
\[
L_3.
\]
Outputs:
\[
\text{empirical review report}
=
\{\text{verified claims},
\text{failed claims},
\text{deferred claims},
\text{budget notes},
\text{experimental gaps}\}.
\]
\subsection{Stage 3: Human Review}
Humans evaluate:
\[
\{\text{significance},
\text{novelty},
\text{taste},
\text{problem formulation},
\text{ethical implications},
\text{contested machine findings}\}.
\]
\begin{remark}
The paper does not remove human review. It attempts to reserve human review for
judgment rather than mechanical checking.
\end{remark}
%──────────────────────────────────────────────────────────────────────────────
\section{The \((Human+AI)^2\) Research Network}
%──────────────────────────────────────────────────────────────────────────────
The paper frames the full stack as:
\[
(Human+AI)^2.
\]
Producer side:
\[
\text{researcher}
+
\text{research agent}
\rightarrow
\text{ARA}.
\]
Consumer side:
\[
\text{reader}
+
\text{agent}
\rightarrow
\text{rendered surface}.
\]
The persistent scientific state is:
\[
\mathcal{ARA}_{canonical}.
\]
Possible rendered surfaces include:
\[
\{\text{paper},\text{video},\text{slides},\text{interactive demo},
\text{grounded dialogue}\}.
\]
Artifact operations include:
\[
\{\text{/submit},\text{/retrieve},\text{/fork}\}.
\]
\begin{definition}[\((Human+AI)^2\) Research Network]
The \((Human+AI)^2\) research network is the paper's proposed communication
structure in which humans on both production and consumption sides work through
agents, while ARAs become canonical structured artifacts that can be certified,
queried, forked, extended, rendered, and re-reviewed.
\end{definition}
\begin{remark}
This extraction preserves the network framing but does not claim the network
already exists at scale or that it is universally superior across all research
fields.
\end{remark}
%──────────────────────────────────────────────────────────────────────────────
\section{Evaluation Surface Extraction}
%──────────────────────────────────────────────────────────────────────────────
The paper evaluates ARA across three layers:
\[
\mathcal{E}_{eval}
=
\{
\text{understanding},
\text{reproduction},
\text{extension}
\}.
\]
\subsection{Dataset Layer}
The paper uses:
\[
\text{PaperBench}
\]
for configuration-depth and reproduction tasks, and:
\[
\text{RE-Bench}
\]
for trajectory-depth and extension tasks.
PaperBench characteristics include:
\[
23
\quad
\text{peer-reviewed ML papers},
\]
\[
8,921
\quad
\text{expert-authored rubric requirements}.
\]
RE-Bench characteristics include:
\[
7
\quad
\text{R\&D hill-climbing tasks},
\]
\[
24,008
\quad
\text{agent runs},
\]
\[
46,303
\quad
\text{failure episodes}.
\]
\subsection{Understanding}
Question-answering over ARA vs conventional sources:
\[
n=450
\]
paired outcomes.
\[
\text{ARA accuracy}
=
93.7\%,
\]
\[
\text{baseline accuracy}
=
72.4\%.
\]
Category-level reported results include:
\[
\text{Category A fidelity: }95.6\% \text{ ARA vs. }80.8\% \text{ baseline},
\]
\[
\text{Category B detail: }92.6\% \text{ ARA vs. }67.8\% \text{ baseline},
\]
\[
\text{Category C failure knowledge: }81.4\% \text{ ARA vs. }15.7\% \text{ baseline}.
\]
\subsection{Reproduction}
Reproduction from ARA vs PDF + GitHub:
\[
\text{ARA success}
=
64.4\%,
\]
\[
\text{baseline success}
=
57.4\%.
\]
The paper reports:
\[
150
\quad
\text{subtasks},
\]
\[
1,743
\quad
\text{rubric requirements},
\]
and a win/tie/loss breakdown:
\[
8/5/2
\]
across papers.
Difficulty-stratified reproduction advantage is reported as:
\[
+4.9\%
\quad
\text{easy},
\]
\[
+5.6\%
\quad
\text{medium},
\]
\[
+8.5\%
\quad
\text{hard}.
\]
\subsection{Extension}
The extension layer evaluates whether preserved failure traces help agents
build beyond prior results.
The paper uses five RE-Bench tasks:
\[
\{
\texttt{triton\_cumsum},
\texttt{restricted\_mlm},
\texttt{fix\_embedding},
\texttt{nanogpt\_chat\_rl},
\texttt{rust\_codecontests}
\}.
\]
The reported qualitative pattern:
\[
\text{ARA accelerates early useful moves across all five tasks},
\]
but:
\[
\text{ARA does not always finish ahead}.
\]
The limitation is:
\[
\text{failure trace}
\not\Rightarrow
\text{always beneficial}.
\]
\begin{remark}
The evaluation results are preserved here as reported source-paper results,
not independently verified claims.
\end{remark}
%──────────────────────────────────────────────────────────────────────────────
\section{Capability-Relative Sufficiency}
%──────────────────────────────────────────────────────────────────────────────
The paper defines ARA sufficiency relative to agent capability.
\[
\mathrm{Sufficient}(A,\alpha)
=
\text{agent } \alpha \text{ can reproduce the core claim from artifact } A.
\]
Thus:
\[
\mathrm{Sufficiency}
\neq
\text{absolute property}.
\]
It is:
\[
\mathrm{Sufficiency}
=
f(\text{artifact completeness},\text{agent capability}).
\]
\begin{proposition}[Capability-Relative Artifact Principle]
An ARA can be complete relative to future agents even if current agents cannot
fully exploit it.
\end{proposition}
%──────────────────────────────────────────────────────────────────────────────
\section{Failure-Trace Constraint Extraction}
%──────────────────────────────────────────────────────────────────────────────
The paper's extension results imply a dual effect:
\[
\text{preserved failure traces}
\rightarrow
\text{accelerated progress}
\]
but also:
\[
\text{preserved failure traces}
\rightarrow
\text{possible exploration constraint}.
\]
This produces the extracted caution:
\[
\boxed{
\text{Failure memory is useful only when it prevents redundant dead ends}
\atop
\text{without trapping stronger agents inside prior exploration boundaries.}
}
\]
\begin{definition}[Failure-Trace Constraint]
A failure-trace constraint occurs when preserved dead ends bias an agent away
from potentially valid paths that were unavailable to weaker prior explorers.
\end{definition}
This is not a new claim beyond the paper; it is an extraction of the paper's
reported observation that preserved traces can help or constrain depending on
agent capability.
%──────────────────────────────────────────────────────────────────────────────
\section{Extracted Algorithms}
%──────────────────────────────────────────────────────────────────────────────
The following algorithm names are extraction labels. They compress the source
paper's mechanisms but should not be attributed as named algorithms in the
paper unless independently verified.
\subsection{Storytelling-Tax Recovery Algorithm}
\[
\mathrm{STRA}:
\mathcal{P}_{paper}
\rightarrow
\mathcal{G}_{trace}.
\]
Procedure:
\begin{enumerate}
\item remove narrative smoothing,
\item identify questions,
\item identify decisions,
\item recover experiments,
\item recover dead ends,
\item recover pivots,
\item attach lessons,
\item build exploration DAG.
\end{enumerate}
\subsection{Engineering-Tax Closure Algorithm}
\[
\mathrm{ETCA}:
(\text{paper},\text{repo},\text{configs},\text{env},\text{logs})
\rightarrow
\text{agent-sufficient specification}.
\]
Required closure fields:
\[
\{\text{hyperparameters},
\text{environment},
\text{hardware},
\text{seeds},
\text{data paths},
\text{baseline configs},
\text{code entrypoints},
\text{expected outputs},
\text{instrumentation}\}.
\]
\subsection{Forensic Binding Algorithm}
\[
\mathrm{FBA}:
C_i
\mapsto
(X_j,K_m,R_n,T_p).
\]
where:
\[
T_p=\text{trace node / provenance}.
\]
A valid binding satisfies:
\[
C_i
\leftrightarrow
X_j
\leftrightarrow
K_m
\leftrightarrow
R_n
\leftrightarrow
T_p.
\]
\subsection{Live Crystallization Algorithm}
\[
\mathrm{LCA}:
\text{session stream}
\rightarrow
\text{staging}
\rightarrow
\text{typed events}
\rightarrow
\text{mature ARA entries}.
\]
Promotion condition:
\[
\text{closure signal}
+
\text{evidence}
+
\text{provenance}
\Rightarrow
\text{crystallize}.
\]
\subsection{Seal-Gated Review Algorithm}
\[
\mathrm{SGRA}:
A
\rightarrow
L_1(A)
\rightarrow
L_2(A)
\rightarrow
L_3(A)
\rightarrow
\text{human judgment}.
\]
Reject if:
\[
L_1=0
\]
for structural submission.
Downgrade if:
\[
L_2 \text{ identifies critical under-support}
\]
or:
\[
L_3 \text{ fails central claim reproduction}.
\]
\subsection{Human-AI Research Network Operator}
\[
\mathrm{H}A^2:
(\text{human producer}+\text{agent})
\rightarrow
\mathcal{ARA}
\rightarrow
(\text{human consumer}+\text{agent}).
\]
The canonical artifact remains:
\[
\mathcal{ARA}_{canonical},
\]
while renderings are task-specific:
\[
\mathrm{Render}(\mathcal{ARA},u)
\in
\{\text{paper},\text{slides},\text{video},\text{demo},\text{dialogue}\}.
\]
%──────────────────────────────────────────────────────────────────────────────
\section{ARA File-System Grammar}
%──────────────────────────────────────────────────────────────────────────────
A minimal extracted ARA grammar is:
\begin{verbatim}
my-research-ara/
PAPER.md
logic/
problem.md
claims.md
concepts.md
experiments.md
solution/
architecture.md
algorithm.md
heuristics.md
constraints.md
related_work.md
src/
kernel/
repo/
configs/
environment.md
index.md
trace/
exploration_tree.yaml
sessions/
dead_ends.yaml
pivots.yaml
staging/
evidence/
results/
logs/
tables/
metrics.json
seal/
level_1_structural_integrity.json
level_2_argumentative_rigor.json
level_3_execution_reproducibility.json
seal_certificate.json
\end{verbatim}
\begin{remark}
This grammar is an extraction-level representation of the paper's ARA
architecture, not a complete reproduction of the paper's appendix schema.
\end{remark}
%──────────────────────────────────────────────────────────────────────────────
\section{ARA-X Scoring Surface}
%──────────────────────────────────────────────────────────────────────────────
ARA-X v1.0 defines an extraction-governance vector:
\[
\mathcal{O}^{ARA-X}
=
\{S,F,A,T,E,K,L,P,G,V,C,R,H,N,D\}.
\]
where:
\[
S=\text{source boundary},
\quad
F=\text{fidelity preservation},
\quad
A=\text{author attribution},
\]
\[
T=\text{Storytelling Tax extraction},
\quad
E=\text{Engineering Tax extraction},
\quad
K=\text{Knowledge over Narrative extraction},
\]
\[
L=\text{four-layer ARA architecture},
\quad
P=\text{provenance / forensic binding},
\quad
G=\text{exploration graph},
\]
\[
V=\text{verification / seal layer},
\quad
C=\text{compiler extraction},
\quad
R=\text{reported results preserved as reported},
\]
\[
H=\text{Human+AI network extraction},
\quad
N=\text{non-claim locks},
\quad
D=\text{downgrade / limitation surface}.
\]
\[
\mathrm{ARAXScore}_{v1.0}
=
\frac{
S+F+A+T+E+K+L+P+G+V+C+R+H+N+D
}{15}.
\]
\begin{remark}
ARAXScore measures extraction completeness. It does not measure whether ARA is
empirically correct beyond the source paper.
\end{remark}
%──────────────────────────────────────────────────────────────────────────────
\section{Validation Layer}
%──────────────────────────────────────────────────────────────────────────────
A valid ARA-X extraction must identify:
\begin{enumerate}
\item source paper,
\item source authors,
\item source scope,
\item core thesis,
\item Storytelling Tax,
\item Engineering Tax,
\item Knowledge over Narrative principle,
\item ARA four-layer structure,
\item forensic bindings,
\item Live Research Manager,
\item Live Research Manager event types and provenance tags,
\item ARA Compiler,
\item Compiler design principles,
\item Compiler stages,
\item ARA Seal,
\item three-stage review pipeline,
\item \((Human+AI)^2\) research network,
\item evaluation categories,
\item reported results,
\item limitations,
\item non-claims,
\item and extraction vs. speculation boundary.
\end{enumerate}
%──────────────────────────────────────────────────────────────────────────────
\section{Falsification Surface}
%──────────────────────────────────────────────────────────────────────────────
This extraction is weakened or rejected if it:
\begin{itemize}
\item omits visible source authors,
\item claims ARA is a Codex invention,
\item merges ARA into a broader theory without labeling the merge,
\item omits the Storytelling Tax,
\item omits the Engineering Tax,
\item omits the Knowledge over Narrative principle,
\item omits any of the four ARA layers,
\item omits the Live Research Manager,
\item omits the ARA Compiler,
\item omits the ARA Seal / review mechanism,
\item treats failure traces as always beneficial,
\item treats reported benchmark gains as independently verified,
\item claims machine review replaces human judgment,
\item treats ARA as universally valid across all disciplines,
\item or fails to distinguish source claim from extraction label.
\end{itemize}
Compact falsification condition:
\[
\text{ARA-X valid extraction}
\wedge
\left(
S=0
\vee
F=0
\vee
A=0
\vee
L=0
\vee
N=0
\vee
D=0
\right)
\Rightarrow
\text{invalid extraction}.
\]
%──────────────────────────────────────────────────────────────────────────────
\section{Minimal ARA-X Evidence JSON Skeleton}
%──────────────────────────────────────────────────────────────────────────────
\begin{verbatim}
{
"record_id": "ARA-X-0001",
"version": "ARA-X-v1.0",
"source_paper": {
"title": "The Last Human-Written Paper: Agent-Native Research Artifacts",
"authors": [
"Jiachen Liu",
"Jiaxin Pei",
"Jintao Huang",
"Chenglei Si",
"Ao Qu",
"Xiangru Tang",
"Runyu Lu",
"Lichang Chen",
"Xiaoyan Bai",
"Haizhong Zheng",
"Carl Chen",
"Zhiyang Chen",
"Haojie Ye",
"Yujuan Fu",
"Zexue He",
"Zijian Jin",
"Zhenyu Zhang",
"Shangquan Sun",
"Maestro Harmon",
"John Dianzhuo Wang",
"Jianqiao Zeng",
"Jiachen Sun",
"Mingyuan Wu",
"Baoyu Zhou",
"Yuchen You",
"Shijian Lu",
"Yiming Qiu",
"Fan Lai",
"Yuan Yuan",
"Yao Li",
"Junyuan Hong",
"Ruihao Zhu",
"Beidi Chen",
"Alex Pentland",
"Ang Chen",
"Mosharaf Chowdhury",
"Zechen Zhang"
],
"date": "April 2026",
"arxiv": "2604.24658v1",
"source_type": "uploaded_pdf"
},
"core_thesis": "Scientific publication should shift from lossy narrative paper to agent-native research artifact.",
"taxes": {
"storytelling_tax": {
"definition": "Loss of branching research process knowledge during narrative compilation.",
"discarded_information": [
"failed_experiments",
"rejected_hypotheses",
"dead_ends",
"pivots",
"design_alternatives",
"human_judgment_signals"
]
},
"engineering_tax": {
"definition": "Gap between reviewer-sufficient prose and agent-sufficient execution specification.",
"missing_specification_types": [
"hyperparameters",
"environment",
"hardware",
"seeds",
"implementation_details",
"baseline_details",
"implicit_assumptions",
"configuration_choices",
"instrumentation"
]
}
},
"ara_layers": {
"logic": [
"problem.md",
"solution/",
"claims.md",
"experiments.md",
"related_work.md"
],
"src": [
"kernel/",
"repo/",
"configs/",
"environment.md",
"index.md"
],
"trace": [
"exploration_tree.yaml",
"sessions/",
"dead_ends.yaml",
"pivots.yaml",
"staging/"
],
"evidence": [
"results/",
"logs/",
"tables/",
"metrics.json"
]
},
"mechanisms": {
"live_research_manager": {
"context_harvester": true,
"event_router": true,
"maturity_tracker": true,
"event_types": [
"decision",
"experiment",
"dead_end",
"pivot",
"claim",
"heuristic",
"observation"
],
"provenance_tags": [
"user",
"ai_suggested",
"ai_executed",
"user_revised"
]
},
"ara_compiler": {
"semantic_deconstruction": true,
"cognitive_mapping": true,
"physical_grounding": true,
"exploration_graph_extraction": true,
"level_1_validation_loop": true
},
"ara_seal": {
"level_1_structural_integrity": true,
"level_2_argumentative_rigor": true,
"level_3_execution_reproducibility": true
},
"human_ai_squared_network": {
"submit": true,
"retrieve": true,
"fork": true,
"render_surfaces": [
"paper",
"slides",
"video",
"interactive_demo",
"grounded_dialogue"
]
}
},
"reported_results": {
"paperbench_requirements": 8921,
"paperbench_pdf_fully_specified": 0.454,
"code_development_sufficiency": 0.373,
"missing_hyperparameter_gap_share": 0.262,
"rebench_agent_runs": 24008,
"rebench_failure_episodes": 46303,
"failed_run_dollar_cost_share": 0.902,
"failed_run_token_share": 0.592,
"median_failed_to_success_token_ratio": 113,
"understanding_accuracy_ara": 0.937,
"understanding_accuracy_baseline": 0.724,
"reproduction_success_ara": 0.644,
"reproduction_success_baseline": 0.574
},
"limitations": [
"failure_traces_can_constrain_capable_agents",
"results_are_paper_reported_not_independently_verified",
"primary_scope_is_cs_ml_research_artifacts",
"machine_review_does_not_replace_human_judgment",
"ara_seal_does_not_prove_significance_or_novelty"
],
"non_claim_locks": [
"not_combined_codex_theory",
"not_universal_research_law",
"not_machine_review_replaces_human_judgment",
"not_benchmark_success_universal_validation",
"not_codex_authorship_claim",
"not_ara_always_outperforms_papers"
],
"classification": "source_bounded_extraction"
}
\end{verbatim}
%──────────────────────────────────────────────────────────────────────────────
\section{Appendix A — Minimal ARA-X Extraction Checklist}
%──────────────────────────────────────────────────────────────────────────────
\begin{enumerate}
\item What is the source paper?
\item Who are the source authors?
\item What is the core thesis?
\item What is the Storytelling Tax?
\item What is the Engineering Tax?
\item What is the Knowledge over Narrative principle?
\item What are the four ARA layers?
\item What are the cross-layer forensic bindings?
\item What does the Live Research Manager capture?
\item What event types are preserved?
\item What provenance tags are used?
\item What does the ARA Compiler transform?
\item What are the Compiler stages?
\item What does the ARA Seal verify?
\item What does Level 1 check?
\item What does Level 2 check?
\item What does Level 3 check?
\item What is left to human reviewers?
\item What is the \((Human+AI)^2\) network?
\item What benchmark categories are reported?
\item What numbers are reported?
\item What limitations are acknowledged?
\item What claims must not be made?
\end{enumerate}
%──────────────────────────────────────────────────────────────────────────────
\section{Appendix B — Minimal AI Collaboration Pseudocode}
%──────────────────────────────────────────────────────────────────────────────
\begin{verbatim}
Input: uploaded ARA paper PDF
Preserve:
source title
source authors
source affiliations where needed
source date
source scope
non-claim boundaries
Extract:
core thesis
Storytelling Tax
Engineering Tax
Knowledge over Narrative principle
four ARA layers
forensic bindings
Live Research Manager
event types
provenance tags
ARA Compiler
Compiler stages
ARA Seal
review pipeline
Human+AI squared network
evaluation setup
reported results
limitations
For each extracted claim:
classify as:
source claim
extraction compression
interpretation
speculative extension
Reject:
claims that Codex invented ARA
claims that ARA is universal law
claims that machine review replaces human judgment
claims that benchmark success proves general validity
claims that failure traces always help
claims that ARA Seal proves significance or novelty
Build:
source-bounded extraction document
file-system grammar
operator summary
evidence JSON
validation checklist
falsification surface
\end{verbatim}
%──────────────────────────────────────────────────────────────────────────────
\section{Appendix C — Canonical Formula Summary}
%──────────────────────────────────────────────────────────────────────────────
\[
\mathcal{P}_{paper}
=
\mathrm{Compile}_{narrative}(\mathcal{R}_{branching})
\]
\[
\mathcal{L}_{story}
=
\mathcal{R}_{branching}
-
\mathcal{P}_{paper}
\]
\[
\mathcal{L}_{eng}
=
\mathcal{S}_{agent}
-
\mathcal{S}_{paper}
\]
\[
\mathcal{ARA}
=
\text{logic}
+
\text{code}
+
\text{trace}
+
\text{evidence}
\]
\[
\mathcal{K}_{agent}
=
\{
\mathcal{L}_{logic},
\mathcal{L}_{physical},
\mathcal{G}_{trace},
\mathcal{E}_{evidence}
\}
\]
\[
\mathcal{G}_{trace}
=
(V,E,\tau,\pi,\ell)
\]
\[
\tau
\in
\{
\text{question},
\text{decision},
\text{experiment},
\text{dead\_end},
\text{pivot}
\}
\]
\[
C_i
\leftrightarrow
X_j
\leftrightarrow
K_m
\leftrightarrow
R_n
\leftrightarrow
T_p
\]
\[
\mathrm{LRM}
=
\mathrm{MaturityTracker}
\circ
\mathrm{EventRouter}
\circ
\mathrm{ContextHarvester}
\]
\[
\mathrm{Compiler}
=
\mathrm{ExplorationGraphExtraction}
\circ
\mathrm{PhysicalGrounding}
\circ
\mathrm{CognitiveMapping}
\circ
\mathrm{SemanticDeconstruction}
\]
\[
\mathrm{Seal}(A)
=
\{L_1,L_2,L_3\}
\]
\[
L_1=\text{structural integrity}
\]
\[
L_2=\text{argumentative rigor}
\]
\[
L_3=\text{execution reproducibility}
\]
\[
\mathrm{Sufficient}(A,\alpha)
=
\text{agent } \alpha \text{ can reproduce the core claim from artifact } A
\]
\[
\mathrm{H}A^2:
(\text{human producer}+\text{agent})
\rightarrow
\mathcal{ARA}
\rightarrow
(\text{human consumer}+\text{agent})
\]
\[
\mathrm{ARAXScore}_{v1.0}
=
\frac{
S+F+A+T+E+K+L+P+G+V+C+R+H+N+D
}{15}
\]
%──────────────────────────────────────────────────────────────────────────────
\section{Concluding Compression}
%──────────────────────────────────────────────────────────────────────────────
ARA-X v1.0 names the source-bounded extraction of the uploaded ARA paper:
\[
\boxed{
\text{the paper argues that research should become an agent-operable}
\atop
\text{artifact, not merely a human-readable narrative.}
}
\]
The tax statement is:
\[
\boxed{
\text{narrative publication loses branching process knowledge and}
\atop
\text{agent-sufficient engineering specification.}
}
\]
The artifact statement is:
\[
\boxed{
\text{ARA restores research as four linked layers: logic, executable code,}
\atop
\text{exploration graph, and grounded evidence.}
}
\]
The manager statement is:
\[
\boxed{
\text{the Live Research Manager crystallizes already-digital researcher-agent}
\atop
\text{sessions into typed events, staged observations, and mature ARA entries.}
}
\]
The compiler statement is:
\[
\boxed{
\text{the ARA Compiler translates legacy PDFs, repos, rubrics, datasets,}
\atop
\text{and trajectory logs into canonical ARA format through staged extraction.}
}
\]
The review statement is:
\[
\boxed{
\text{ARA-native review uses machines for structure, rigor, and reproduction}
\atop
\text{checks while preserving human judgment for significance and novelty.}
}
\]
The network statement is:
\[
\boxed{
\text{in the paper's }(Human+AI)^2\text{ network, ARAs become canonical}
\atop
\text{research objects that agents can submit, retrieve, fork, verify, and render.}
}
\]
The limitation statement is:
\[
\boxed{
\text{preserved failure traces can accelerate research, but may also constrain}
\atop
\text{agents when prior exploration becomes a boundary rather than a guide.}
}
\]
The source-fidelity statement is:
\[
\boxed{
\text{this document extracts the uploaded PDF only;}
\quad
\text{it does not merge ARA into a new Codex theory.}
}
\]
Thus, ARA-X v1.0 preserves the paper's central transmutation:
research publication shifts from lossy narrative compression toward structured,
executable, trace-preserving, evidence-grounded, agent-native knowledge
artifacts.
\end{document}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment