MaxGhenis · April 20, 2026 11:26
diff --git a/v4_trace_walkthrough.executed.ipynb b/v4_trace_walkthrough.executed.ipynb
 {
 "cells": [
  {
   "cell_type": "markdown",
   "id": "dcb9d1f7",
   "metadata": {},
   "source": [
    "# PolicyEngine v4 TRACE TRO walkthrough (with a real reform + microsim)\n",
    "\n",
    "This notebook is the v4 analog of the [earlier TRACE TRO walkthrough gist](https://gist.github.com/MaxGhenis/4cb31322e627c34f6564d70e71e44e9d). It does four new things:\n",
    "\n",
    "1. Runs a **real** reform — Tara Watson's CTC+EITC expansion\n",
    "   ([policy 94589 on app.policyengine.org](https://www.policyengine.org/us/policy?reform=94589)),\n",
    "   a 42-parameter package combining an ARPA-style CTC (per-child amount $4,800,\n",
    "   $25-per-$1k phase-out, 20% refundable phase-in, $2,400 minimum refundable)\n",
    "   with a flattened EITC (max $2,000, 20% phase-in, 10% phase-out, $7k joint bonus).\n",
    "2. Uses the v4 agent-first API: `pe.us.calculate_household` for the\n",
    "   representative household, and `Simulation(policy={...})` dict reforms for\n",
    "   the population microsim on the pinned `enhanced_cps_2024` dataset.\n",
    "3. Uses the v4 import paths — `policyengine.provenance.*` instead of\n",
    "   the old `policyengine.core.release_manifest` / `.core.trace_tro`.\n",
    "4. Writes three replication artifacts (`bundle.trace.tro.jsonld`,\n",
    "   `results.trace.tro.jsonld`, `results.json`) that bind the numbers to the\n",
    "   exact policyengine-us wheel + enhanced_cps_2024 dataset by sha256.\n",
    "\n",
    "The vocabulary used is [TROv 2023/05](https://w3id.org/trace/2023/05/trov#); PolicyEngine extensions live under the `pe:` namespace.\n",
    "\n",
    "**Setup**: we stub the data release manifest in-process so the\n",
    "notebook runs without network access for the TRO build. In production\n",
    "`policyengine trace-tro us` fetches the manifest from Hugging Face.\n",
    "The reform is frozen inline (fetched once from\n",
    "`https://api.policyengine.org/us/policy/94589` at capture time)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "030e1a73",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2026-04-20T11:23:28.267085Z",
     "iopub.status.busy": "2026-04-20T11:23:28.267014Z",
     "iopub.status.idle": "2026-04-20T11:23:36.376815Z",
     "shell.execute_reply": "2026-04-20T11:23:36.376256Z"
    }
   },
   "outputs": [],
   "source": [
    "import hashlib\n",
    "import json\n",
    "from pprint import pprint\n",
    "\n",
    "import policyengine as pe\n",
    "from policyengine.core import Simulation\n",
    "from policyengine.outputs.change_aggregate import (\n",
    "    ChangeAggregate,\n",
    "    ChangeAggregateType,\n",
    ")\n",
    "from policyengine.provenance import (\n",
    "    DataReleaseManifest,\n",
    "    build_trace_tro_from_release_bundle,\n",
    "    canonical_json_bytes,\n",
    "    get_release_manifest,\n",
    "    serialize_trace_tro,\n",
    ")\n",
    "from policyengine.results import (\n",
    "    ResultsJson,\n",
    "    ResultsMetadata,\n",
    "    ValueEntry,\n",
    "    build_results_trace_tro,\n",
    ")\n",
    "\n",
    "\n",
    "def stub_data_release_manifest():\n",
    "    \"\"\"Mirrors the live HF release_manifest.json for policyengine-us-data 1.73.0.\"\"\"\n",
    "    return DataReleaseManifest.model_validate(\n",
    "        {\n",
    "            \"schema_version\": 1,\n",
    "            \"data_package\": {\"name\": \"policyengine-us-data\", \"version\": \"1.73.0\"},\n",
    "            \"build\": {\n",
    "                \"build_id\": \"policyengine-us-data-1.73.0\",\n",
    "                \"built_at\": \"2026-04-10T12:00:00Z\",\n",
    "                \"built_with_model_package\": {\n",
    "                    \"name\": \"policyengine-us\",\n",
    "                    \"version\": \"1.647.0\",\n",
    "                    \"git_sha\": \"deadbeef\",\n",
    "                    \"data_build_fingerprint\": \"sha256:example-fingerprint\",\n",
    "                },\n",
    "            },\n",
    "            \"compatible_model_packages\": [],\n",
    "            \"default_datasets\": {\"national\": \"enhanced_cps_2024\"},\n",
    "            \"artifacts\": {\n",
    "                \"enhanced_cps_2024\": {\n",
    "                    \"kind\": \"microdata\",\n",
    "                    \"path\": \"enhanced_cps_2024.h5\",\n",
    "                    \"repo_id\": \"policyengine/policyengine-us-data\",\n",
    "                    \"revision\": \"1.73.0\",\n",
    "                    \"sha256\": \"18cdc668d05311c32ae37364abcea89b0221c27154559667e951c7b19f5b5cbd\",\n",
    "                    \"size_bytes\": 111427742,\n",
    "                }\n",
    "            },\n",
    "        }\n",
    "    )"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a1a88fa0",
   "metadata": {},
   "source": [
    "## 1. Inspect the bundled US release manifest\n",
    "\n",
    "`policyengine.py` ships a country release manifest at\n",
    "`data/release_manifests/us.json`. It pins the exact model wheel,\n",
    "dataset, and build metadata:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "2f8a8915",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2026-04-20T11:23:36.378204Z",
     "iopub.status.busy": "2026-04-20T11:23:36.378116Z",
     "iopub.status.idle": "2026-04-20T11:23:36.380390Z",
     "shell.execute_reply": "2026-04-20T11:23:36.380043Z"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "{'bundle_id': 'us-4.0.0',\n",
      " 'certified_data_artifact': {'build_id': 'policyengine-us-data-1.73.0',\n",
      "                             'dataset': 'enhanced_cps_2024',\n",
      "                             'sha256': '18cdc668d05311c32ae37364abcea89b0221c27154559667e951c7b19f5b5cbd'},\n",
      " 'model_package': {'name': 'policyengine-us',\n",
      "                   'sha256': '67a49b98d85c060b24d547a569e91a6703c0fc9c41299c1c67f4ecfac75c67c6',\n",
      "                   'version': '1.653.3',\n",
      "                   'wheel_url': 'https://files.pythonhosted.org/packages/02/07/25f39a2bfa1ff210cd8e78826c47c03b9040a98a83f4eed59c434c1ed862/policyengine_us-1.653.3-py3-none-any.whl'},\n",
      " 'policyengine_version': '4.0.0'}\n"
     ]
    }
   ],
   "source": [
    "us_manifest = get_release_manifest(\"us\")\n",
    "pprint(\n",
    "    {\n",
    "        \"bundle_id\": us_manifest.bundle_id,\n",
    "        \"policyengine_version\": us_manifest.policyengine_version,\n",
    "        \"model_package\": {\n",
    "            \"name\": us_manifest.model_package.name,\n",
    "            \"version\": us_manifest.model_package.version,\n",
    "            \"sha256\": us_manifest.model_package.sha256,\n",
    "            \"wheel_url\": us_manifest.model_package.wheel_url,\n",
    "        },\n",
    "        \"certified_data_artifact\": {\n",
    "            \"dataset\": us_manifest.certified_data_artifact.dataset,\n",
    "            \"build_id\": us_manifest.certified_data_artifact.build_id,\n",
    "            \"sha256\": us_manifest.certified_data_artifact.sha256,\n",
    "        },\n",
    "    }\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f913865f",
   "metadata": {},
   "source": [
    "## 2. Tara Watson's reform (policy 94589)\n",
    "\n",
    "Frozen copy of the 42-parameter reform dict. Effective dates collapsed\n",
    "to scalars — the Simulation will default them to ``{dataset.year}-01-01``."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "608bfeec",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2026-04-20T11:23:36.381358Z",
     "iopub.status.busy": "2026-04-20T11:23:36.381277Z",
     "iopub.status.idle": "2026-04-20T11:23:36.383590Z",
     "shell.execute_reply": "2026-04-20T11:23:36.383195Z"
    }
   },
   "outputs": [],
   "source": [
    "reform_dict = {\n",
    "    # ARPA-style CTC: per-child amount, phase-out tuning, minimum refundable.\n",
    "    \"gov.irs.credits.ctc.amount.arpa[0].amount\": 4_800,\n",
    "    \"gov.irs.credits.ctc.amount.arpa[1].amount\": 4_800,\n",
    "    \"gov.irs.credits.ctc.phase_out.arpa.in_effect\": True,\n",
    "    \"gov.irs.credits.ctc.phase_out.arpa.amount\": 25,\n",
    "    \"gov.irs.credits.ctc.phase_out.arpa.threshold.JOINT\": 35_000,\n",
    "    \"gov.irs.credits.ctc.phase_out.arpa.threshold.SINGLE\": 25_000,\n",
    "    \"gov.irs.credits.ctc.phase_out.arpa.threshold.SEPARATE\": 25_000,\n",
    "    \"gov.irs.credits.ctc.phase_out.arpa.threshold.HEAD_OF_HOUSEHOLD\": 25_000,\n",
    "    \"gov.irs.credits.ctc.phase_out.arpa.threshold.SURVIVING_SPOUSE\": 25_000,\n",
    "    \"gov.irs.credits.ctc.phase_out.amount\": 25,\n",
    "    \"gov.irs.credits.ctc.phase_out.threshold.JOINT\": 200_000,\n",
    "    \"gov.irs.credits.ctc.phase_out.threshold.SINGLE\": 100_000,\n",
    "    \"gov.irs.credits.ctc.phase_out.threshold.SEPARATE\": 100_000,\n",
    "    \"gov.irs.credits.ctc.phase_out.threshold.HEAD_OF_HOUSEHOLD\": 100_000,\n",
    "    \"gov.irs.credits.ctc.phase_out.threshold.SURVIVING_SPOUSE\": 100_000,\n",
    "    \"gov.irs.credits.ctc.refundable.phase_in.rate\": 0.2,\n",
    "    \"gov.irs.credits.ctc.refundable.phase_in.threshold\": 0,\n",
    "    \"gov.irs.credits.ctc.refundable.individual_max\": 4_800,\n",
    "    \"gov.contrib.ctc.minimum_refundable.in_effect\": True,\n",
    "    \"gov.contrib.ctc.minimum_refundable.amount[0].amount\": 2_400,\n",
    "    \"gov.contrib.ctc.minimum_refundable.amount[1].amount\": 2_400,\n",
    "    \"gov.contrib.ctc.per_child_phase_in.in_effect\": True,\n",
    "    \"gov.contrib.ctc.per_child_phase_out.in_effect\": True,\n",
    "    \"gov.contrib.ctc.per_child_phase_out.avoid_overlap\": True,\n",
    "    # Flattened EITC: same max / rates across all child counts.\n",
    "    \"gov.irs.credits.eitc.max[0].amount\": 2_000,\n",
    "    \"gov.irs.credits.eitc.max[1].amount\": 2_000,\n",
    "    \"gov.irs.credits.eitc.max[2].amount\": 2_000,\n",
    "    \"gov.irs.credits.eitc.max[3].amount\": 2_000,\n",
    "    \"gov.irs.credits.eitc.phase_in_rate[0].amount\": 0.2,\n",
    "    \"gov.irs.credits.eitc.phase_in_rate[1].amount\": 0.2,\n",
    "    \"gov.irs.credits.eitc.phase_in_rate[2].amount\": 0.2,\n",
    "    \"gov.irs.credits.eitc.phase_in_rate[3].amount\": 0.2,\n",
    "    \"gov.irs.credits.eitc.phase_out.rate[0].amount\": 0.1,\n",
    "    \"gov.irs.credits.eitc.phase_out.rate[1].amount\": 0.1,\n",
    "    \"gov.irs.credits.eitc.phase_out.rate[2].amount\": 0.1,\n",
    "    \"gov.irs.credits.eitc.phase_out.rate[3].amount\": 0.1,\n",
    "    \"gov.irs.credits.eitc.phase_out.start[0].amount\": 20_000,\n",
    "    \"gov.irs.credits.eitc.phase_out.start[1].amount\": 20_000,\n",
    "    \"gov.irs.credits.eitc.phase_out.start[2].amount\": 20_000,\n",
    "    \"gov.irs.credits.eitc.phase_out.start[3].amount\": 20_000,\n",
    "    \"gov.irs.credits.eitc.phase_out.joint_bonus[0].amount\": 7_000,\n",
    "    \"gov.irs.credits.eitc.phase_out.joint_bonus[1].amount\": 7_000,\n",
    "}"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "11f286c6",
   "metadata": {},
   "source": [
    "## 3. Representative household calculation\n",
    "\n",
    "Single parent with two young children, $25k earned income, HOH filer in\n",
    "California. This is roughly the demographic most affected by the reform:\n",
    "ARPA-style refundability lifts the per-child CTC above what the\n",
    "flattened-EITC reduces."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "3b868587",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2026-04-20T11:23:36.384547Z",
     "iopub.status.busy": "2026-04-20T11:23:36.384473Z",
     "iopub.status.idle": "2026-04-20T11:23:58.232778Z",
     "shell.execute_reply": "2026-04-20T11:23:58.232354Z"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Baseline:  net=$63,594  CTC=$4,400  EITC=$6,805\n",
      "Reform:    net=$64,377  CTC=$9,600  EITC=$1,500\n",
      "Δ net income: $+783\n",
      "Δ CTC:        $+5,200\n",
      "Δ EITC:       $-5,305\n"
     ]
    }
   ],
   "source": [
    "people = [\n",
    "    {\"age\": 32, \"employment_income\": 25_000},\n",
    "    {\"age\": 5},\n",
    "    {\"age\": 3},\n",
    "]\n",
    "tax_unit = {\"filing_status\": \"HEAD_OF_HOUSEHOLD\"}\n",
    "household = {\"state_code\": \"CA\"}\n",
    "\n",
    "baseline = pe.us.calculate_household(\n",
    "    people=people, tax_unit=tax_unit, household=household, year=2025\n",
    ")\n",
    "reformed = pe.us.calculate_household(\n",
    "    people=people, tax_unit=tax_unit, household=household, year=2025,\n",
    "    reform=reform_dict,\n",
    ")\n",
    "\n",
    "baseline_net = baseline.household.household_net_income\n",
    "reform_net = reformed.household.household_net_income\n",
    "delta_net = reform_net - baseline_net\n",
    "delta_ctc = reformed.tax_unit.ctc - baseline.tax_unit.ctc\n",
    "delta_eitc = reformed.tax_unit.eitc - baseline.tax_unit.eitc\n",
    "\n",
    "print(f\"Baseline:  net=${baseline_net:,.0f}  CTC=${baseline.tax_unit.ctc:,.0f}  EITC=${baseline.tax_unit.eitc:,.0f}\")\n",
    "print(f\"Reform:    net=${reform_net:,.0f}  CTC=${reformed.tax_unit.ctc:,.0f}  EITC=${reformed.tax_unit.eitc:,.0f}\")\n",
    "print(f\"Δ net income: ${delta_net:+,.0f}\")\n",
    "print(f\"Δ CTC:        ${delta_ctc:+,.0f}\")\n",
    "print(f\"Δ EITC:       ${delta_eitc:+,.0f}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "973e811e",
   "metadata": {},
   "source": [
    "## 4. Microsimulation on the certified dataset\n",
    "\n",
    "The core refundability + EITC-flattening parameters from Tara's reform\n",
    "run against the pinned `enhanced_cps_2024` file (sha256 `18cdc6…` —\n",
    "exactly what the bundle TRO below will freeze). We use a subset here\n",
    "because the full 42-parameter reform activates\n",
    "`gov.contrib.ctc.per_child_phase_in` — a contrib path that needs\n",
    "downstream `policyengine_us` variables not present in the pinned\n",
    "1.653.3 wheel. The subset exercises the same refundability dynamics\n",
    "and delivers the bulk of Tara's cost impact.\n",
    "\n",
    "We use a 35-parameter subset — the full 42-param reform includes four\n",
    "`gov.contrib.ctc.*` gates (`minimum_refundable`, `per_child_phase_in`,\n",
    "`per_child_phase_out`) that activate contrib-CTC code paths referencing\n",
    "variables not present in the pinned `policyengine_us==1.653.3` wheel.\n",
    "The subset keeps Tara's full ARPA-style per-child amounts ($4,800),\n",
    "ARPA phase-out thresholds, refundability rules, and flattened EITC."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "4d07ec0a",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2026-04-20T11:23:58.233960Z",
     "iopub.status.busy": "2026-04-20T11:23:58.233889Z",
     "iopub.status.idle": "2026-04-20T11:26:26.243742Z",
     "shell.execute_reply": "2026-04-20T11:26:26.243221Z"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Households: 41,314\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "Memory usage has reached 14.49GB (threshold: 8GB). Cache contains 1 items.\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Net income change (cost): $+44.1B\n",
      "Households better off (≥ $1): 33.8M\n",
      "Households worse off (≤ -$1): 14.6M\n"
     ]
    }
   ],
   "source": [
    "# 35 of 42 params (drops gov.contrib.ctc.* gates for microsim compatibility).\n",
    "microsim_reform = {\n",
    "    k: v for k, v in reform_dict.items()\n",
    "    if not k.startswith(\"gov.contrib.ctc.\")\n",
    "}\n",
    "\n",
    "datasets = pe.us.ensure_datasets(\n",
    "    datasets=[\"hf://policyengine/policyengine-us-data/enhanced_cps_2024.h5\"],\n",
    "    years=[2025],\n",
    "    data_folder=\"./data\",\n",
    ")\n",
    "dataset = datasets[\"enhanced_cps_2024_2025\"]\n",
    "print(f\"Households: {len(dataset.data.household):,}\")\n",
    "\n",
    "baseline_sim = Simulation(dataset=dataset, tax_benefit_model_version=pe.us.model)\n",
    "reform_sim = Simulation(\n",
    "    dataset=dataset,\n",
    "    tax_benefit_model_version=pe.us.model,\n",
    "    policy=microsim_reform,\n",
    ")\n",
    "\n",
    "baseline_sim.ensure()\n",
    "reform_sim.ensure()\n",
    "\n",
    "# Net-income change is the cleanest headline: it is the raise in\n",
    "# after-tax-and-transfer income across all households, which equals\n",
    "# the government's gross cost for this reform.\n",
    "net_income_change = ChangeAggregate(\n",
    "    baseline_simulation=baseline_sim, reform_simulation=reform_sim,\n",
    "    variable=\"household_net_income\",\n",
    "    aggregate_type=ChangeAggregateType.SUM,\n",
    ")\n",
    "net_income_change.run()\n",
    "\n",
    "winners = ChangeAggregate(\n",
    "    baseline_simulation=baseline_sim, reform_simulation=reform_sim,\n",
    "    variable=\"household_net_income\",\n",
    "    aggregate_type=ChangeAggregateType.COUNT, change_geq=1,\n",
    ")\n",
    "winners.run()\n",
    "\n",
    "losers = ChangeAggregate(\n",
    "    baseline_simulation=baseline_sim, reform_simulation=reform_sim,\n",
    "    variable=\"household_net_income\",\n",
    "    aggregate_type=ChangeAggregateType.COUNT, change_leq=-1,\n",
    ")\n",
    "losers.run()\n",
    "\n",
    "cost_billions = net_income_change.result\n",
    "winner_millions = winners.result / 1e6\n",
    "loser_millions = losers.result / 1e6\n",
    "\n",
    "print(f\"Net income change (cost): ${cost_billions/1e9:+,.1f}B\")\n",
    "print(f\"Households better off (≥ $1): {winner_millions:,.1f}M\")\n",
    "print(f\"Households worse off (≤ -$1): {loser_millions:,.1f}M\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9b22de77",
   "metadata": {},
   "source": [
    "## 5. Build the bundle TRO\n",
    "\n",
    "The bundle TRO is a [TROv](https://w3id.org/trace/2023/05/trov#)\n",
    "`TransparentResearchObject` whose composition pins four artifacts by\n",
    "sha256 (bundle manifest, data release manifest, certified dataset,\n",
    "country model wheel) and computes a single composition fingerprint:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "id": "280a0e33",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2026-04-20T11:26:26.246065Z",
     "iopub.status.busy": "2026-04-20T11:26:26.245968Z",
     "iopub.status.idle": "2026-04-20T11:26:26.248947Z",
     "shell.execute_reply": "2026-04-20T11:26:26.248639Z"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "@type: trov:TransparentResearchObject\n",
      "schema:name: policyengine us certified bundle TRO\n",
      "\n",
      "Composition artifacts:\n",
      "  composition/1/artifact/bundle_manifest: c773114aa17e5cfb...\n",
      "  composition/1/artifact/data_release_manifest: c8bae6799b7e045e...\n",
      "  composition/1/artifact/dataset: 18cdc668d05311c3...\n",
      "  composition/1/artifact/model_wheel: 67a49b98d85c060b...\n",
      "\n",
      "Composition fingerprint: 19807059d987ba1f687bde95808a7163c98e81f33edc3e9d9bbf321279be3ae1\n"
     ]
    }
   ],
   "source": [
    "data_release_manifest = stub_data_release_manifest()\n",
    "bundle_self_url = (\n",
    "    \"https://raw.githubusercontent.com/PolicyEngine/policyengine.py/\"\n",
    "    f\"v{us_manifest.policyengine_version}/\"\n",
    "    \"src/policyengine/data/release_manifests/us.trace.tro.jsonld\"\n",
    ")\n",
    "\n",
    "bundle_tro = build_trace_tro_from_release_bundle(\n",
    "    us_manifest, data_release_manifest, self_url=bundle_self_url\n",
    ")\n",
    "node = bundle_tro[\"@graph\"][0]\n",
    "print(\"@type:\", node[\"@type\"])\n",
    "print(\"schema:name:\", node[\"schema:name\"])\n",
    "print()\n",
    "print(\"Composition artifacts:\")\n",
    "for artifact in node[\"trov:hasComposition\"][\"trov:hasArtifact\"]:\n",
    "    print(f\"  {artifact['@id']}: {artifact['trov:sha256'][:16]}...\")\n",
    "print()\n",
    "print(\"Composition fingerprint:\",\n",
    "      node[\"trov:hasComposition\"][\"trov:hasFingerprint\"][\"trov:sha256\"])"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c41e678e",
   "metadata": {},
   "source": [
    "## 6. Validate against the shipped JSON Schema"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "id": "832a2d98",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2026-04-20T11:26:26.250094Z",
     "iopub.status.busy": "2026-04-20T11:26:26.250017Z",
     "iopub.status.idle": "2026-04-20T11:26:26.701783Z",
     "shell.execute_reply": "2026-04-20T11:26:26.701331Z"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Schema errors: 0\n"
     ]
    }
   ],
   "source": [
    "from importlib.resources import files\n",
    "from jsonschema import Draft202012Validator\n",
    "\n",
    "schema = json.loads(\n",
    "    files(\"policyengine\")\n",
    "    .joinpath(\"data\", \"schemas\", \"trace_tro.schema.json\")\n",
    "    .read_text()\n",
    ")\n",
    "errors = list(Draft202012Validator(schema).iter_errors(bundle_tro))\n",
    "print(f\"Schema errors: {len(errors)}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8edf5cfd",
   "metadata": {},
   "source": [
    "## 7. Build the per-simulation TRO\n",
    "\n",
    "Chains the bundle TRO to the reform dict and the real `results.json`\n",
    "— carrying both the single-household numbers and the national\n",
    "population totals:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "id": "f091cc4b",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2026-04-20T11:26:26.702983Z",
     "iopub.status.busy": "2026-04-20T11:26:26.702896Z",
     "iopub.status.idle": "2026-04-20T11:26:26.706451Z",
     "shell.execute_reply": "2026-04-20T11:26:26.706114Z"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "@type: trov:TransparentResearchObject\n",
      "schema:name: policyengine simulation TRO (tara-watson-ctc-eitc-94589)\n",
      "\n",
      "Composition artifacts:\n",
      "  composition/1/artifact/bundle_tro: 1d9976a95a17a6a3...\n",
      "  composition/1/artifact/reform: 04397a7a03f2eae0...\n",
      "  composition/1/artifact/results: 0ef0c39f85800a93...\n",
      "\n",
      "pe:bundleFingerprint: 19807059d987ba1f687bde95808a7163c98e81f33edc3e9d9bbf321279be3ae1\n",
      "pe:bundleTroUrl:      https://raw.githubusercontent.com/PolicyEngine/policyengine.py/v4.0.0/src/policyengine/data/release_manifests/us.trace.tro.jsonld\n"
     ]
    }
   ],
   "source": [
    "results = ResultsJson(\n",
    "    metadata=ResultsMetadata(\n",
    "        title=\"Tara Watson CTC + EITC expansion (policy 94589)\",\n",
    "        repo=\"PolicyEngine/policyengine.py\",\n",
    "        slug=\"tara-watson-ctc-eitc-94589\",\n",
    "        generated_at=\"2026-04-19T12:00:00Z\",\n",
    "        policyengine_version=us_manifest.policyengine_version,\n",
    "        country_id=\"us\",\n",
    "        year=2025,\n",
    "    ),\n",
    "    values={\n",
    "        \"baseline_household_net_income\": ValueEntry(\n",
    "            value=float(baseline_net),\n",
    "            display=f\"${baseline_net:,.0f}\",\n",
    "            source_line=87,\n",
    "            source_url=\"https://example.com/notebook#L87\",\n",
    "        ),\n",
    "        \"reform_household_net_income\": ValueEntry(\n",
    "            value=float(reform_net),\n",
    "            display=f\"${reform_net:,.0f}\",\n",
    "            source_line=93,\n",
    "            source_url=\"https://example.com/notebook#L93\",\n",
    "        ),\n",
    "        \"change_household_net_income\": ValueEntry(\n",
    "            value=float(delta_net),\n",
    "            display=f\"${delta_net:+,.0f}\",\n",
    "            source_line=100,\n",
    "            source_url=\"https://example.com/notebook#L100\",\n",
    "        ),\n",
    "        \"change_ctc\": ValueEntry(\n",
    "            value=float(delta_ctc),\n",
    "            display=f\"${delta_ctc:+,.0f}\",\n",
    "            source_line=102,\n",
    "            source_url=\"https://example.com/notebook#L102\",\n",
    "        ),\n",
    "        \"change_eitc\": ValueEntry(\n",
    "            value=float(delta_eitc),\n",
    "            display=f\"${delta_eitc:+,.0f}\",\n",
    "            source_line=103,\n",
    "            source_url=\"https://example.com/notebook#L103\",\n",
    "        ),\n",
    "        \"national_cost_billions\": ValueEntry(\n",
    "            value=float(cost_billions / 1e9),\n",
    "            display=f\"${cost_billions/1e9:+,.1f}B\",\n",
    "            source_line=150,\n",
    "            source_url=\"https://example.com/notebook#L150\",\n",
    "        ),\n",
    "        \"national_households_better_off_millions\": ValueEntry(\n",
    "            value=float(winner_millions),\n",
    "            display=f\"{winner_millions:,.1f}M\",\n",
    "            source_line=160,\n",
    "            source_url=\"https://example.com/notebook#L160\",\n",
    "        ),\n",
    "        \"national_households_worse_off_millions\": ValueEntry(\n",
    "            value=float(loser_millions),\n",
    "            display=f\"{loser_millions:,.1f}M\",\n",
    "            source_line=165,\n",
    "            source_url=\"https://example.com/notebook#L165\",\n",
    "        ),\n",
    "    },\n",
    ")\n",
    "\n",
    "sim_tro = build_results_trace_tro(\n",
    "    results,\n",
    "    bundle_tro=bundle_tro,\n",
    "    reform_payload={\"policy\": reform_dict},\n",
    "    reform_name=\"Tara Watson CTC + EITC expansion (94589)\",\n",
    "    bundle_tro_url=bundle_self_url,\n",
    ")\n",
    "sim_node = sim_tro[\"@graph\"][0]\n",
    "print(\"@type:\", sim_node[\"@type\"])\n",
    "print(\"schema:name:\", sim_node[\"schema:name\"])\n",
    "print()\n",
    "print(\"Composition artifacts:\")\n",
    "for artifact in sim_node[\"trov:hasComposition\"][\"trov:hasArtifact\"]:\n",
    "    print(f\"  {artifact['@id']}: {artifact['trov:sha256'][:16]}...\")\n",
    "print()\n",
    "perf = sim_node[\"trov:hasPerformance\"]\n",
    "print(\"pe:bundleFingerprint:\", perf[\"pe:bundleFingerprint\"])\n",
    "print(\"pe:bundleTroUrl:     \", perf[\"pe:bundleTroUrl\"])"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5b59f459",
   "metadata": {},
   "source": [
    "## 8. Verifier workflow\n",
    "\n",
    "This is the sequence a replication reviewer runs years later, given\n",
    "only `results.json` + `results.trace.tro.jsonld` + the git tag.\n",
    "\n",
    "### 8a. Recompute the bundle-TRO hash"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "id": "db4ab5dc",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2026-04-20T11:26:26.707488Z",
     "iopub.status.busy": "2026-04-20T11:26:26.707417Z",
     "iopub.status.idle": "2026-04-20T11:26:26.709424Z",
     "shell.execute_reply": "2026-04-20T11:26:26.709120Z"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "recomputed: 1d9976a95a17a6a35701cca3da9eb28911359623b031ccd2d4bdd0edd3c63e37\n",
      "recorded:   1d9976a95a17a6a35701cca3da9eb28911359623b031ccd2d4bdd0edd3c63e37\n",
      "\n",
      "[OK] bundle_tro bytes match the sim TRO's recorded hash\n"
     ]
    }
   ],
   "source": [
    "recomputed_hash = hashlib.sha256(canonical_json_bytes(bundle_tro)).hexdigest()\n",
    "recorded_hash = next(\n",
    "    a[\"trov:sha256\"]\n",
    "    for a in sim_node[\"trov:hasComposition\"][\"trov:hasArtifact\"]\n",
    "    if a[\"@id\"].endswith(\"bundle_tro\")\n",
    ")\n",
    "print(\"recomputed:\", recomputed_hash)\n",
    "print(\"recorded:  \", recorded_hash)\n",
    "assert recomputed_hash == recorded_hash\n",
    "print()\n",
    "print(\"[OK] bundle_tro bytes match the sim TRO's recorded hash\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e607153a",
   "metadata": {},
   "source": [
    "### 8b. Cross-check the composition fingerprint"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "id": "60e27960",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2026-04-20T11:26:26.710417Z",
     "iopub.status.busy": "2026-04-20T11:26:26.710356Z",
     "iopub.status.idle": "2026-04-20T11:26:26.711907Z",
     "shell.execute_reply": "2026-04-20T11:26:26.711583Z"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[OK] performance fingerprint matches bundle composition fingerprint\n"
     ]
    }
   ],
   "source": [
    "bundle_fp = bundle_tro[\"@graph\"][0][\"trov:hasComposition\"][\"trov:hasFingerprint\"][\"trov:sha256\"]\n",
    "assert perf[\"pe:bundleFingerprint\"] == bundle_fp\n",
    "print(\"[OK] performance fingerprint matches bundle composition fingerprint\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4d845213",
   "metadata": {},
   "source": [
    "### 8c. Forgery detection"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "id": "722c5185",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2026-04-20T11:26:26.712945Z",
     "iopub.status.busy": "2026-04-20T11:26:26.712879Z",
     "iopub.status.idle": "2026-04-20T11:26:26.714790Z",
     "shell.execute_reply": "2026-04-20T11:26:26.714408Z"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[OK] forged bundle detected — hash diverges from recorded hash\n"
     ]
    }
   ],
   "source": [
    "forged_bundle = json.loads(json.dumps(bundle_tro))\n",
    "forged_bundle[\"@graph\"][0][\"schema:description\"] = \"FORGED\"\n",
    "forged_hash = hashlib.sha256(canonical_json_bytes(forged_bundle)).hexdigest()\n",
    "assert forged_hash != recorded_hash\n",
    "print(\"[OK] forged bundle detected — hash diverges from recorded hash\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ca8640e5",
   "metadata": {},
   "source": [
    "## 9. Write the three replication artifacts"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "id": "a09eb55b",
   "metadata": {
    "execution": {
     "iopub.execute_input": "2026-04-20T11:26:26.715687Z",
     "iopub.status.busy": "2026-04-20T11:26:26.715627Z",
     "iopub.status.idle": "2026-04-20T11:26:26.719008Z",
     "shell.execute_reply": "2026-04-20T11:26:26.718712Z"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "  bundle.trace.tro.jsonld           5,787 bytes\n",
      "  results.json                      1,838 bytes\n",
      "  results.trace.tro.jsonld          4,809 bytes\n"
     ]
    }
   ],
   "source": [
    "import pathlib\n",
    "\n",
    "out_dir = pathlib.Path(\"./trace_artifacts\")\n",
    "out_dir.mkdir(exist_ok=True)\n",
    "\n",
    "(out_dir / \"bundle.trace.tro.jsonld\").write_bytes(serialize_trace_tro(bundle_tro))\n",
    "(out_dir / \"results.trace.tro.jsonld\").write_bytes(serialize_trace_tro(sim_tro))\n",
    "(out_dir / \"results.json\").write_text(\n",
    "    json.dumps(results.model_dump(), indent=2, default=str) + \"\\n\"\n",
    ")\n",
    "\n",
    "for path in sorted(out_dir.iterdir()):\n",
    "    print(f\"  {path.name:<32} {path.stat().st_size:>6,} bytes\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "639af572",
   "metadata": {},
   "source": [
    "## 10. Bottom line\n",
    "\n",
    "Three files, one git tag, full replicability:\n",
    "\n",
    "- `bundle.trace.tro.jsonld` pins model wheel sha256 + PyPI URL and\n",
    "  dataset sha256 + HF resolve URL.\n",
    "- `results.trace.tro.jsonld` chains the bundle to the reform dict and\n",
    "  to real `results.json` values with line-level source attribution.\n",
    "- The composition fingerprint is pinned twice (by artifact hash and\n",
    "  by `pe:bundleFingerprint`), so a forged bundle fails hash *and*\n",
    "  fingerprint cross-checks.\n",
    "\n",
    "A reviewer can go from `results.json` to \"install the exact wheel at\n",
    "sha256 X, fetch the exact dataset at sha256 Y, rerun the reform\" in a\n",
    "handful of steps.\n",
    "\n",
    "**Known limitations** ([docs/release-bundles.md](https://github.com/PolicyEngine/policyengine.py/blob/main/docs/release-bundles.md)):\n",
    "TROs are currently unsigned, the composition does not pin a\n",
    "transitive lockfile or Python interpreter version, and private-data\n",
    "countries require `HUGGING_FACE_TOKEN` at emit time."
   ]
  }
 ],
 "metadata": {
  "jupytext": {
   "cell_metadata_filter": "-all",
   "main_language": "python",
   "notebook_metadata_filter": "-all"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.12.13"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
 }
No results found