Last active
April 20, 2026 11:26
-
-
Save MaxGhenis/15c9d719a4bfac74eb674bad12cd28b7 to your computer and use it in GitHub Desktop.
PolicyEngine v4 TRACE TRO walkthrough — real household calc + replicability chain
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| { | |
| "cells": [ | |
| { | |
| "cell_type": "markdown", | |
| "id": "dcb9d1f7", | |
| "metadata": {}, | |
| "source": [ | |
| "# PolicyEngine v4 TRACE TRO walkthrough (with a real reform + microsim)\n", | |
| "\n", | |
| "This notebook is the v4 analog of the [earlier TRACE TRO walkthrough gist](https://gist.github.com/MaxGhenis/4cb31322e627c34f6564d70e71e44e9d). It does four new things:\n", | |
| "\n", | |
| "1. Runs a **real** reform — Tara Watson's CTC+EITC expansion\n", | |
| " ([policy 94589 on app.policyengine.org](https://www.policyengine.org/us/policy?reform=94589)),\n", | |
| " a 42-parameter package combining an ARPA-style CTC (per-child amount $4,800,\n", | |
| " $25-per-$1k phase-out, 20% refundable phase-in, $2,400 minimum refundable)\n", | |
| " with a flattened EITC (max $2,000, 20% phase-in, 10% phase-out, $7k joint bonus).\n", | |
| "2. Uses the v4 agent-first API: `pe.us.calculate_household` for the\n", | |
| " representative household, and `Simulation(policy={...})` dict reforms for\n", | |
| " the population microsim on the pinned `enhanced_cps_2024` dataset.\n", | |
| "3. Uses the v4 import paths — `policyengine.provenance.*` instead of\n", | |
| " the old `policyengine.core.release_manifest` / `.core.trace_tro`.\n", | |
| "4. Writes three replication artifacts (`bundle.trace.tro.jsonld`,\n", | |
| " `results.trace.tro.jsonld`, `results.json`) that bind the numbers to the\n", | |
| " exact policyengine-us wheel + enhanced_cps_2024 dataset by sha256.\n", | |
| "\n", | |
| "The vocabulary used is [TROv 2023/05](https://w3id.org/trace/2023/05/trov#); PolicyEngine extensions live under the `pe:` namespace.\n", | |
| "\n", | |
| "**Setup**: we stub the data release manifest in-process so the\n", | |
| "notebook runs without network access for the TRO build. In production\n", | |
| "`policyengine trace-tro us` fetches the manifest from Hugging Face.\n", | |
| "The reform is frozen inline (fetched once from\n", | |
| "`https://api.policyengine.org/us/policy/94589` at capture time)." | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 1, | |
| "id": "030e1a73", | |
| "metadata": { | |
| "execution": { | |
| "iopub.execute_input": "2026-04-20T11:23:28.267085Z", | |
| "iopub.status.busy": "2026-04-20T11:23:28.267014Z", | |
| "iopub.status.idle": "2026-04-20T11:23:36.376815Z", | |
| "shell.execute_reply": "2026-04-20T11:23:36.376256Z" | |
| } | |
| }, | |
| "outputs": [], | |
| "source": [ | |
| "import hashlib\n", | |
| "import json\n", | |
| "from pprint import pprint\n", | |
| "\n", | |
| "import policyengine as pe\n", | |
| "from policyengine.core import Simulation\n", | |
| "from policyengine.outputs.change_aggregate import (\n", | |
| " ChangeAggregate,\n", | |
| " ChangeAggregateType,\n", | |
| ")\n", | |
| "from policyengine.provenance import (\n", | |
| " DataReleaseManifest,\n", | |
| " build_trace_tro_from_release_bundle,\n", | |
| " canonical_json_bytes,\n", | |
| " get_release_manifest,\n", | |
| " serialize_trace_tro,\n", | |
| ")\n", | |
| "from policyengine.results import (\n", | |
| " ResultsJson,\n", | |
| " ResultsMetadata,\n", | |
| " ValueEntry,\n", | |
| " build_results_trace_tro,\n", | |
| ")\n", | |
| "\n", | |
| "\n", | |
| "def stub_data_release_manifest():\n", | |
| " \"\"\"Mirrors the live HF release_manifest.json for policyengine-us-data 1.73.0.\"\"\"\n", | |
| " return DataReleaseManifest.model_validate(\n", | |
| " {\n", | |
| " \"schema_version\": 1,\n", | |
| " \"data_package\": {\"name\": \"policyengine-us-data\", \"version\": \"1.73.0\"},\n", | |
| " \"build\": {\n", | |
| " \"build_id\": \"policyengine-us-data-1.73.0\",\n", | |
| " \"built_at\": \"2026-04-10T12:00:00Z\",\n", | |
| " \"built_with_model_package\": {\n", | |
| " \"name\": \"policyengine-us\",\n", | |
| " \"version\": \"1.647.0\",\n", | |
| " \"git_sha\": \"deadbeef\",\n", | |
| " \"data_build_fingerprint\": \"sha256:example-fingerprint\",\n", | |
| " },\n", | |
| " },\n", | |
| " \"compatible_model_packages\": [],\n", | |
| " \"default_datasets\": {\"national\": \"enhanced_cps_2024\"},\n", | |
| " \"artifacts\": {\n", | |
| " \"enhanced_cps_2024\": {\n", | |
| " \"kind\": \"microdata\",\n", | |
| " \"path\": \"enhanced_cps_2024.h5\",\n", | |
| " \"repo_id\": \"policyengine/policyengine-us-data\",\n", | |
| " \"revision\": \"1.73.0\",\n", | |
| " \"sha256\": \"18cdc668d05311c32ae37364abcea89b0221c27154559667e951c7b19f5b5cbd\",\n", | |
| " \"size_bytes\": 111427742,\n", | |
| " }\n", | |
| " },\n", | |
| " }\n", | |
| " )" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "id": "a1a88fa0", | |
| "metadata": {}, | |
| "source": [ | |
| "## 1. Inspect the bundled US release manifest\n", | |
| "\n", | |
| "`policyengine.py` ships a country release manifest at\n", | |
| "`data/release_manifests/us.json`. It pins the exact model wheel,\n", | |
| "dataset, and build metadata:" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 2, | |
| "id": "2f8a8915", | |
| "metadata": { | |
| "execution": { | |
| "iopub.execute_input": "2026-04-20T11:23:36.378204Z", | |
| "iopub.status.busy": "2026-04-20T11:23:36.378116Z", | |
| "iopub.status.idle": "2026-04-20T11:23:36.380390Z", | |
| "shell.execute_reply": "2026-04-20T11:23:36.380043Z" | |
| } | |
| }, | |
| "outputs": [ | |
| { | |
| "name": "stdout", | |
| "output_type": "stream", | |
| "text": [ | |
| "{'bundle_id': 'us-4.0.0',\n", | |
| " 'certified_data_artifact': {'build_id': 'policyengine-us-data-1.73.0',\n", | |
| " 'dataset': 'enhanced_cps_2024',\n", | |
| " 'sha256': '18cdc668d05311c32ae37364abcea89b0221c27154559667e951c7b19f5b5cbd'},\n", | |
| " 'model_package': {'name': 'policyengine-us',\n", | |
| " 'sha256': '67a49b98d85c060b24d547a569e91a6703c0fc9c41299c1c67f4ecfac75c67c6',\n", | |
| " 'version': '1.653.3',\n", | |
| " 'wheel_url': 'https://files.pythonhosted.org/packages/02/07/25f39a2bfa1ff210cd8e78826c47c03b9040a98a83f4eed59c434c1ed862/policyengine_us-1.653.3-py3-none-any.whl'},\n", | |
| " 'policyengine_version': '4.0.0'}\n" | |
| ] | |
| } | |
| ], | |
| "source": [ | |
| "us_manifest = get_release_manifest(\"us\")\n", | |
| "pprint(\n", | |
| " {\n", | |
| " \"bundle_id\": us_manifest.bundle_id,\n", | |
| " \"policyengine_version\": us_manifest.policyengine_version,\n", | |
| " \"model_package\": {\n", | |
| " \"name\": us_manifest.model_package.name,\n", | |
| " \"version\": us_manifest.model_package.version,\n", | |
| " \"sha256\": us_manifest.model_package.sha256,\n", | |
| " \"wheel_url\": us_manifest.model_package.wheel_url,\n", | |
| " },\n", | |
| " \"certified_data_artifact\": {\n", | |
| " \"dataset\": us_manifest.certified_data_artifact.dataset,\n", | |
| " \"build_id\": us_manifest.certified_data_artifact.build_id,\n", | |
| " \"sha256\": us_manifest.certified_data_artifact.sha256,\n", | |
| " },\n", | |
| " }\n", | |
| ")" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "id": "f913865f", | |
| "metadata": {}, | |
| "source": [ | |
| "## 2. Tara Watson's reform (policy 94589)\n", | |
| "\n", | |
| "Frozen copy of the 42-parameter reform dict. Effective dates collapsed\n", | |
| "to scalars — the Simulation will default them to ``{dataset.year}-01-01``." | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 3, | |
| "id": "608bfeec", | |
| "metadata": { | |
| "execution": { | |
| "iopub.execute_input": "2026-04-20T11:23:36.381358Z", | |
| "iopub.status.busy": "2026-04-20T11:23:36.381277Z", | |
| "iopub.status.idle": "2026-04-20T11:23:36.383590Z", | |
| "shell.execute_reply": "2026-04-20T11:23:36.383195Z" | |
| } | |
| }, | |
| "outputs": [], | |
| "source": [ | |
| "reform_dict = {\n", | |
| " # ARPA-style CTC: per-child amount, phase-out tuning, minimum refundable.\n", | |
| " \"gov.irs.credits.ctc.amount.arpa[0].amount\": 4_800,\n", | |
| " \"gov.irs.credits.ctc.amount.arpa[1].amount\": 4_800,\n", | |
| " \"gov.irs.credits.ctc.phase_out.arpa.in_effect\": True,\n", | |
| " \"gov.irs.credits.ctc.phase_out.arpa.amount\": 25,\n", | |
| " \"gov.irs.credits.ctc.phase_out.arpa.threshold.JOINT\": 35_000,\n", | |
| " \"gov.irs.credits.ctc.phase_out.arpa.threshold.SINGLE\": 25_000,\n", | |
| " \"gov.irs.credits.ctc.phase_out.arpa.threshold.SEPARATE\": 25_000,\n", | |
| " \"gov.irs.credits.ctc.phase_out.arpa.threshold.HEAD_OF_HOUSEHOLD\": 25_000,\n", | |
| " \"gov.irs.credits.ctc.phase_out.arpa.threshold.SURVIVING_SPOUSE\": 25_000,\n", | |
| " \"gov.irs.credits.ctc.phase_out.amount\": 25,\n", | |
| " \"gov.irs.credits.ctc.phase_out.threshold.JOINT\": 200_000,\n", | |
| " \"gov.irs.credits.ctc.phase_out.threshold.SINGLE\": 100_000,\n", | |
| " \"gov.irs.credits.ctc.phase_out.threshold.SEPARATE\": 100_000,\n", | |
| " \"gov.irs.credits.ctc.phase_out.threshold.HEAD_OF_HOUSEHOLD\": 100_000,\n", | |
| " \"gov.irs.credits.ctc.phase_out.threshold.SURVIVING_SPOUSE\": 100_000,\n", | |
| " \"gov.irs.credits.ctc.refundable.phase_in.rate\": 0.2,\n", | |
| " \"gov.irs.credits.ctc.refundable.phase_in.threshold\": 0,\n", | |
| " \"gov.irs.credits.ctc.refundable.individual_max\": 4_800,\n", | |
| " \"gov.contrib.ctc.minimum_refundable.in_effect\": True,\n", | |
| " \"gov.contrib.ctc.minimum_refundable.amount[0].amount\": 2_400,\n", | |
| " \"gov.contrib.ctc.minimum_refundable.amount[1].amount\": 2_400,\n", | |
| " \"gov.contrib.ctc.per_child_phase_in.in_effect\": True,\n", | |
| " \"gov.contrib.ctc.per_child_phase_out.in_effect\": True,\n", | |
| " \"gov.contrib.ctc.per_child_phase_out.avoid_overlap\": True,\n", | |
| " # Flattened EITC: same max / rates across all child counts.\n", | |
| " \"gov.irs.credits.eitc.max[0].amount\": 2_000,\n", | |
| " \"gov.irs.credits.eitc.max[1].amount\": 2_000,\n", | |
| " \"gov.irs.credits.eitc.max[2].amount\": 2_000,\n", | |
| " \"gov.irs.credits.eitc.max[3].amount\": 2_000,\n", | |
| " \"gov.irs.credits.eitc.phase_in_rate[0].amount\": 0.2,\n", | |
| " \"gov.irs.credits.eitc.phase_in_rate[1].amount\": 0.2,\n", | |
| " \"gov.irs.credits.eitc.phase_in_rate[2].amount\": 0.2,\n", | |
| " \"gov.irs.credits.eitc.phase_in_rate[3].amount\": 0.2,\n", | |
| " \"gov.irs.credits.eitc.phase_out.rate[0].amount\": 0.1,\n", | |
| " \"gov.irs.credits.eitc.phase_out.rate[1].amount\": 0.1,\n", | |
| " \"gov.irs.credits.eitc.phase_out.rate[2].amount\": 0.1,\n", | |
| " \"gov.irs.credits.eitc.phase_out.rate[3].amount\": 0.1,\n", | |
| " \"gov.irs.credits.eitc.phase_out.start[0].amount\": 20_000,\n", | |
| " \"gov.irs.credits.eitc.phase_out.start[1].amount\": 20_000,\n", | |
| " \"gov.irs.credits.eitc.phase_out.start[2].amount\": 20_000,\n", | |
| " \"gov.irs.credits.eitc.phase_out.start[3].amount\": 20_000,\n", | |
| " \"gov.irs.credits.eitc.phase_out.joint_bonus[0].amount\": 7_000,\n", | |
| " \"gov.irs.credits.eitc.phase_out.joint_bonus[1].amount\": 7_000,\n", | |
| "}" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "id": "11f286c6", | |
| "metadata": {}, | |
| "source": [ | |
| "## 3. Representative household calculation\n", | |
| "\n", | |
| "Single parent with two young children, $25k earned income, HOH filer in\n", | |
| "California. This is roughly the demographic most affected by the reform:\n", | |
| "ARPA-style refundability lifts the per-child CTC above what the\n", | |
| "flattened-EITC reduces." | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 4, | |
| "id": "3b868587", | |
| "metadata": { | |
| "execution": { | |
| "iopub.execute_input": "2026-04-20T11:23:36.384547Z", | |
| "iopub.status.busy": "2026-04-20T11:23:36.384473Z", | |
| "iopub.status.idle": "2026-04-20T11:23:58.232778Z", | |
| "shell.execute_reply": "2026-04-20T11:23:58.232354Z" | |
| } | |
| }, | |
| "outputs": [ | |
| { | |
| "name": "stdout", | |
| "output_type": "stream", | |
| "text": [ | |
| "Baseline: net=$63,594 CTC=$4,400 EITC=$6,805\n", | |
| "Reform: net=$64,377 CTC=$9,600 EITC=$1,500\n", | |
| "Δ net income: $+783\n", | |
| "Δ CTC: $+5,200\n", | |
| "Δ EITC: $-5,305\n" | |
| ] | |
| } | |
| ], | |
| "source": [ | |
| "people = [\n", | |
| " {\"age\": 32, \"employment_income\": 25_000},\n", | |
| " {\"age\": 5},\n", | |
| " {\"age\": 3},\n", | |
| "]\n", | |
| "tax_unit = {\"filing_status\": \"HEAD_OF_HOUSEHOLD\"}\n", | |
| "household = {\"state_code\": \"CA\"}\n", | |
| "\n", | |
| "baseline = pe.us.calculate_household(\n", | |
| " people=people, tax_unit=tax_unit, household=household, year=2025\n", | |
| ")\n", | |
| "reformed = pe.us.calculate_household(\n", | |
| " people=people, tax_unit=tax_unit, household=household, year=2025,\n", | |
| " reform=reform_dict,\n", | |
| ")\n", | |
| "\n", | |
| "baseline_net = baseline.household.household_net_income\n", | |
| "reform_net = reformed.household.household_net_income\n", | |
| "delta_net = reform_net - baseline_net\n", | |
| "delta_ctc = reformed.tax_unit.ctc - baseline.tax_unit.ctc\n", | |
| "delta_eitc = reformed.tax_unit.eitc - baseline.tax_unit.eitc\n", | |
| "\n", | |
| "print(f\"Baseline: net=${baseline_net:,.0f} CTC=${baseline.tax_unit.ctc:,.0f} EITC=${baseline.tax_unit.eitc:,.0f}\")\n", | |
| "print(f\"Reform: net=${reform_net:,.0f} CTC=${reformed.tax_unit.ctc:,.0f} EITC=${reformed.tax_unit.eitc:,.0f}\")\n", | |
| "print(f\"Δ net income: ${delta_net:+,.0f}\")\n", | |
| "print(f\"Δ CTC: ${delta_ctc:+,.0f}\")\n", | |
| "print(f\"Δ EITC: ${delta_eitc:+,.0f}\")" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "id": "973e811e", | |
| "metadata": {}, | |
| "source": [ | |
| "## 4. Microsimulation on the certified dataset\n", | |
| "\n", | |
| "The core refundability + EITC-flattening parameters from Tara's reform\n", | |
| "run against the pinned `enhanced_cps_2024` file (sha256 `18cdc6…` —\n", | |
| "exactly what the bundle TRO below will freeze). We use a subset here\n", | |
| "because the full 42-parameter reform activates\n", | |
| "`gov.contrib.ctc.per_child_phase_in` — a contrib path that needs\n", | |
| "downstream `policyengine_us` variables not present in the pinned\n", | |
| "1.653.3 wheel. The subset exercises the same refundability dynamics\n", | |
| "and delivers the bulk of Tara's cost impact.\n", | |
| "\n", | |
| "We use a 35-parameter subset — the full 42-param reform includes four\n", | |
| "`gov.contrib.ctc.*` gates (`minimum_refundable`, `per_child_phase_in`,\n", | |
| "`per_child_phase_out`) that activate contrib-CTC code paths referencing\n", | |
| "variables not present in the pinned `policyengine_us==1.653.3` wheel.\n", | |
| "The subset keeps Tara's full ARPA-style per-child amounts ($4,800),\n", | |
| "ARPA phase-out thresholds, refundability rules, and flattened EITC." | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 5, | |
| "id": "4d07ec0a", | |
| "metadata": { | |
| "execution": { | |
| "iopub.execute_input": "2026-04-20T11:23:58.233960Z", | |
| "iopub.status.busy": "2026-04-20T11:23:58.233889Z", | |
| "iopub.status.idle": "2026-04-20T11:26:26.243742Z", | |
| "shell.execute_reply": "2026-04-20T11:26:26.243221Z" | |
| } | |
| }, | |
| "outputs": [ | |
| { | |
| "name": "stdout", | |
| "output_type": "stream", | |
| "text": [ | |
| "Households: 41,314\n" | |
| ] | |
| }, | |
| { | |
| "name": "stderr", | |
| "output_type": "stream", | |
| "text": [ | |
| "Memory usage has reached 14.49GB (threshold: 8GB). Cache contains 1 items.\n" | |
| ] | |
| }, | |
| { | |
| "name": "stdout", | |
| "output_type": "stream", | |
| "text": [ | |
| "Net income change (cost): $+44.1B\n", | |
| "Households better off (≥ $1): 33.8M\n", | |
| "Households worse off (≤ -$1): 14.6M\n" | |
| ] | |
| } | |
| ], | |
| "source": [ | |
| "# 35 of 42 params (drops gov.contrib.ctc.* gates for microsim compatibility).\n", | |
| "microsim_reform = {\n", | |
| " k: v for k, v in reform_dict.items()\n", | |
| " if not k.startswith(\"gov.contrib.ctc.\")\n", | |
| "}\n", | |
| "\n", | |
| "datasets = pe.us.ensure_datasets(\n", | |
| " datasets=[\"hf://policyengine/policyengine-us-data/enhanced_cps_2024.h5\"],\n", | |
| " years=[2025],\n", | |
| " data_folder=\"./data\",\n", | |
| ")\n", | |
| "dataset = datasets[\"enhanced_cps_2024_2025\"]\n", | |
| "print(f\"Households: {len(dataset.data.household):,}\")\n", | |
| "\n", | |
| "baseline_sim = Simulation(dataset=dataset, tax_benefit_model_version=pe.us.model)\n", | |
| "reform_sim = Simulation(\n", | |
| " dataset=dataset,\n", | |
| " tax_benefit_model_version=pe.us.model,\n", | |
| " policy=microsim_reform,\n", | |
| ")\n", | |
| "\n", | |
| "baseline_sim.ensure()\n", | |
| "reform_sim.ensure()\n", | |
| "\n", | |
| "# Net-income change is the cleanest headline: it is the raise in\n", | |
| "# after-tax-and-transfer income across all households, which equals\n", | |
| "# the government's gross cost for this reform.\n", | |
| "net_income_change = ChangeAggregate(\n", | |
| " baseline_simulation=baseline_sim, reform_simulation=reform_sim,\n", | |
| " variable=\"household_net_income\",\n", | |
| " aggregate_type=ChangeAggregateType.SUM,\n", | |
| ")\n", | |
| "net_income_change.run()\n", | |
| "\n", | |
| "winners = ChangeAggregate(\n", | |
| " baseline_simulation=baseline_sim, reform_simulation=reform_sim,\n", | |
| " variable=\"household_net_income\",\n", | |
| " aggregate_type=ChangeAggregateType.COUNT, change_geq=1,\n", | |
| ")\n", | |
| "winners.run()\n", | |
| "\n", | |
| "losers = ChangeAggregate(\n", | |
| " baseline_simulation=baseline_sim, reform_simulation=reform_sim,\n", | |
| " variable=\"household_net_income\",\n", | |
| " aggregate_type=ChangeAggregateType.COUNT, change_leq=-1,\n", | |
| ")\n", | |
| "losers.run()\n", | |
| "\n", | |
| "cost_billions = net_income_change.result\n", | |
| "winner_millions = winners.result / 1e6\n", | |
| "loser_millions = losers.result / 1e6\n", | |
| "\n", | |
| "print(f\"Net income change (cost): ${cost_billions/1e9:+,.1f}B\")\n", | |
| "print(f\"Households better off (≥ $1): {winner_millions:,.1f}M\")\n", | |
| "print(f\"Households worse off (≤ -$1): {loser_millions:,.1f}M\")" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "id": "9b22de77", | |
| "metadata": {}, | |
| "source": [ | |
| "## 5. Build the bundle TRO\n", | |
| "\n", | |
| "The bundle TRO is a [TROv](https://w3id.org/trace/2023/05/trov#)\n", | |
| "`TransparentResearchObject` whose composition pins four artifacts by\n", | |
| "sha256 (bundle manifest, data release manifest, certified dataset,\n", | |
| "country model wheel) and computes a single composition fingerprint:" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 6, | |
| "id": "280a0e33", | |
| "metadata": { | |
| "execution": { | |
| "iopub.execute_input": "2026-04-20T11:26:26.246065Z", | |
| "iopub.status.busy": "2026-04-20T11:26:26.245968Z", | |
| "iopub.status.idle": "2026-04-20T11:26:26.248947Z", | |
| "shell.execute_reply": "2026-04-20T11:26:26.248639Z" | |
| } | |
| }, | |
| "outputs": [ | |
| { | |
| "name": "stdout", | |
| "output_type": "stream", | |
| "text": [ | |
| "@type: trov:TransparentResearchObject\n", | |
| "schema:name: policyengine us certified bundle TRO\n", | |
| "\n", | |
| "Composition artifacts:\n", | |
| " composition/1/artifact/bundle_manifest: c773114aa17e5cfb...\n", | |
| " composition/1/artifact/data_release_manifest: c8bae6799b7e045e...\n", | |
| " composition/1/artifact/dataset: 18cdc668d05311c3...\n", | |
| " composition/1/artifact/model_wheel: 67a49b98d85c060b...\n", | |
| "\n", | |
| "Composition fingerprint: 19807059d987ba1f687bde95808a7163c98e81f33edc3e9d9bbf321279be3ae1\n" | |
| ] | |
| } | |
| ], | |
| "source": [ | |
| "data_release_manifest = stub_data_release_manifest()\n", | |
| "bundle_self_url = (\n", | |
| " \"https://raw.githubusercontent.com/PolicyEngine/policyengine.py/\"\n", | |
| " f\"v{us_manifest.policyengine_version}/\"\n", | |
| " \"src/policyengine/data/release_manifests/us.trace.tro.jsonld\"\n", | |
| ")\n", | |
| "\n", | |
| "bundle_tro = build_trace_tro_from_release_bundle(\n", | |
| " us_manifest, data_release_manifest, self_url=bundle_self_url\n", | |
| ")\n", | |
| "node = bundle_tro[\"@graph\"][0]\n", | |
| "print(\"@type:\", node[\"@type\"])\n", | |
| "print(\"schema:name:\", node[\"schema:name\"])\n", | |
| "print()\n", | |
| "print(\"Composition artifacts:\")\n", | |
| "for artifact in node[\"trov:hasComposition\"][\"trov:hasArtifact\"]:\n", | |
| " print(f\" {artifact['@id']}: {artifact['trov:sha256'][:16]}...\")\n", | |
| "print()\n", | |
| "print(\"Composition fingerprint:\",\n", | |
| " node[\"trov:hasComposition\"][\"trov:hasFingerprint\"][\"trov:sha256\"])" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "id": "c41e678e", | |
| "metadata": {}, | |
| "source": [ | |
| "## 6. Validate against the shipped JSON Schema" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 7, | |
| "id": "832a2d98", | |
| "metadata": { | |
| "execution": { | |
| "iopub.execute_input": "2026-04-20T11:26:26.250094Z", | |
| "iopub.status.busy": "2026-04-20T11:26:26.250017Z", | |
| "iopub.status.idle": "2026-04-20T11:26:26.701783Z", | |
| "shell.execute_reply": "2026-04-20T11:26:26.701331Z" | |
| } | |
| }, | |
| "outputs": [ | |
| { | |
| "name": "stdout", | |
| "output_type": "stream", | |
| "text": [ | |
| "Schema errors: 0\n" | |
| ] | |
| } | |
| ], | |
| "source": [ | |
| "from importlib.resources import files\n", | |
| "from jsonschema import Draft202012Validator\n", | |
| "\n", | |
| "schema = json.loads(\n", | |
| " files(\"policyengine\")\n", | |
| " .joinpath(\"data\", \"schemas\", \"trace_tro.schema.json\")\n", | |
| " .read_text()\n", | |
| ")\n", | |
| "errors = list(Draft202012Validator(schema).iter_errors(bundle_tro))\n", | |
| "print(f\"Schema errors: {len(errors)}\")" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "id": "8edf5cfd", | |
| "metadata": {}, | |
| "source": [ | |
| "## 7. Build the per-simulation TRO\n", | |
| "\n", | |
| "Chains the bundle TRO to the reform dict and the real `results.json`\n", | |
| "— carrying both the single-household numbers and the national\n", | |
| "population totals:" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 8, | |
| "id": "f091cc4b", | |
| "metadata": { | |
| "execution": { | |
| "iopub.execute_input": "2026-04-20T11:26:26.702983Z", | |
| "iopub.status.busy": "2026-04-20T11:26:26.702896Z", | |
| "iopub.status.idle": "2026-04-20T11:26:26.706451Z", | |
| "shell.execute_reply": "2026-04-20T11:26:26.706114Z" | |
| } | |
| }, | |
| "outputs": [ | |
| { | |
| "name": "stdout", | |
| "output_type": "stream", | |
| "text": [ | |
| "@type: trov:TransparentResearchObject\n", | |
| "schema:name: policyengine simulation TRO (tara-watson-ctc-eitc-94589)\n", | |
| "\n", | |
| "Composition artifacts:\n", | |
| " composition/1/artifact/bundle_tro: 1d9976a95a17a6a3...\n", | |
| " composition/1/artifact/reform: 04397a7a03f2eae0...\n", | |
| " composition/1/artifact/results: 0ef0c39f85800a93...\n", | |
| "\n", | |
| "pe:bundleFingerprint: 19807059d987ba1f687bde95808a7163c98e81f33edc3e9d9bbf321279be3ae1\n", | |
| "pe:bundleTroUrl: https://raw.githubusercontent.com/PolicyEngine/policyengine.py/v4.0.0/src/policyengine/data/release_manifests/us.trace.tro.jsonld\n" | |
| ] | |
| } | |
| ], | |
| "source": [ | |
| "results = ResultsJson(\n", | |
| " metadata=ResultsMetadata(\n", | |
| " title=\"Tara Watson CTC + EITC expansion (policy 94589)\",\n", | |
| " repo=\"PolicyEngine/policyengine.py\",\n", | |
| " slug=\"tara-watson-ctc-eitc-94589\",\n", | |
| " generated_at=\"2026-04-19T12:00:00Z\",\n", | |
| " policyengine_version=us_manifest.policyengine_version,\n", | |
| " country_id=\"us\",\n", | |
| " year=2025,\n", | |
| " ),\n", | |
| " values={\n", | |
| " \"baseline_household_net_income\": ValueEntry(\n", | |
| " value=float(baseline_net),\n", | |
| " display=f\"${baseline_net:,.0f}\",\n", | |
| " source_line=87,\n", | |
| " source_url=\"https://example.com/notebook#L87\",\n", | |
| " ),\n", | |
| " \"reform_household_net_income\": ValueEntry(\n", | |
| " value=float(reform_net),\n", | |
| " display=f\"${reform_net:,.0f}\",\n", | |
| " source_line=93,\n", | |
| " source_url=\"https://example.com/notebook#L93\",\n", | |
| " ),\n", | |
| " \"change_household_net_income\": ValueEntry(\n", | |
| " value=float(delta_net),\n", | |
| " display=f\"${delta_net:+,.0f}\",\n", | |
| " source_line=100,\n", | |
| " source_url=\"https://example.com/notebook#L100\",\n", | |
| " ),\n", | |
| " \"change_ctc\": ValueEntry(\n", | |
| " value=float(delta_ctc),\n", | |
| " display=f\"${delta_ctc:+,.0f}\",\n", | |
| " source_line=102,\n", | |
| " source_url=\"https://example.com/notebook#L102\",\n", | |
| " ),\n", | |
| " \"change_eitc\": ValueEntry(\n", | |
| " value=float(delta_eitc),\n", | |
| " display=f\"${delta_eitc:+,.0f}\",\n", | |
| " source_line=103,\n", | |
| " source_url=\"https://example.com/notebook#L103\",\n", | |
| " ),\n", | |
| " \"national_cost_billions\": ValueEntry(\n", | |
| " value=float(cost_billions / 1e9),\n", | |
| " display=f\"${cost_billions/1e9:+,.1f}B\",\n", | |
| " source_line=150,\n", | |
| " source_url=\"https://example.com/notebook#L150\",\n", | |
| " ),\n", | |
| " \"national_households_better_off_millions\": ValueEntry(\n", | |
| " value=float(winner_millions),\n", | |
| " display=f\"{winner_millions:,.1f}M\",\n", | |
| " source_line=160,\n", | |
| " source_url=\"https://example.com/notebook#L160\",\n", | |
| " ),\n", | |
| " \"national_households_worse_off_millions\": ValueEntry(\n", | |
| " value=float(loser_millions),\n", | |
| " display=f\"{loser_millions:,.1f}M\",\n", | |
| " source_line=165,\n", | |
| " source_url=\"https://example.com/notebook#L165\",\n", | |
| " ),\n", | |
| " },\n", | |
| ")\n", | |
| "\n", | |
| "sim_tro = build_results_trace_tro(\n", | |
| " results,\n", | |
| " bundle_tro=bundle_tro,\n", | |
| " reform_payload={\"policy\": reform_dict},\n", | |
| " reform_name=\"Tara Watson CTC + EITC expansion (94589)\",\n", | |
| " bundle_tro_url=bundle_self_url,\n", | |
| ")\n", | |
| "sim_node = sim_tro[\"@graph\"][0]\n", | |
| "print(\"@type:\", sim_node[\"@type\"])\n", | |
| "print(\"schema:name:\", sim_node[\"schema:name\"])\n", | |
| "print()\n", | |
| "print(\"Composition artifacts:\")\n", | |
| "for artifact in sim_node[\"trov:hasComposition\"][\"trov:hasArtifact\"]:\n", | |
| " print(f\" {artifact['@id']}: {artifact['trov:sha256'][:16]}...\")\n", | |
| "print()\n", | |
| "perf = sim_node[\"trov:hasPerformance\"]\n", | |
| "print(\"pe:bundleFingerprint:\", perf[\"pe:bundleFingerprint\"])\n", | |
| "print(\"pe:bundleTroUrl: \", perf[\"pe:bundleTroUrl\"])" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "id": "5b59f459", | |
| "metadata": {}, | |
| "source": [ | |
| "## 8. Verifier workflow\n", | |
| "\n", | |
| "This is the sequence a replication reviewer runs years later, given\n", | |
| "only `results.json` + `results.trace.tro.jsonld` + the git tag.\n", | |
| "\n", | |
| "### 8a. Recompute the bundle-TRO hash" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 9, | |
| "id": "db4ab5dc", | |
| "metadata": { | |
| "execution": { | |
| "iopub.execute_input": "2026-04-20T11:26:26.707488Z", | |
| "iopub.status.busy": "2026-04-20T11:26:26.707417Z", | |
| "iopub.status.idle": "2026-04-20T11:26:26.709424Z", | |
| "shell.execute_reply": "2026-04-20T11:26:26.709120Z" | |
| } | |
| }, | |
| "outputs": [ | |
| { | |
| "name": "stdout", | |
| "output_type": "stream", | |
| "text": [ | |
| "recomputed: 1d9976a95a17a6a35701cca3da9eb28911359623b031ccd2d4bdd0edd3c63e37\n", | |
| "recorded: 1d9976a95a17a6a35701cca3da9eb28911359623b031ccd2d4bdd0edd3c63e37\n", | |
| "\n", | |
| "[OK] bundle_tro bytes match the sim TRO's recorded hash\n" | |
| ] | |
| } | |
| ], | |
| "source": [ | |
| "recomputed_hash = hashlib.sha256(canonical_json_bytes(bundle_tro)).hexdigest()\n", | |
| "recorded_hash = next(\n", | |
| " a[\"trov:sha256\"]\n", | |
| " for a in sim_node[\"trov:hasComposition\"][\"trov:hasArtifact\"]\n", | |
| " if a[\"@id\"].endswith(\"bundle_tro\")\n", | |
| ")\n", | |
| "print(\"recomputed:\", recomputed_hash)\n", | |
| "print(\"recorded: \", recorded_hash)\n", | |
| "assert recomputed_hash == recorded_hash\n", | |
| "print()\n", | |
| "print(\"[OK] bundle_tro bytes match the sim TRO's recorded hash\")" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "id": "e607153a", | |
| "metadata": {}, | |
| "source": [ | |
| "### 8b. Cross-check the composition fingerprint" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 10, | |
| "id": "60e27960", | |
| "metadata": { | |
| "execution": { | |
| "iopub.execute_input": "2026-04-20T11:26:26.710417Z", | |
| "iopub.status.busy": "2026-04-20T11:26:26.710356Z", | |
| "iopub.status.idle": "2026-04-20T11:26:26.711907Z", | |
| "shell.execute_reply": "2026-04-20T11:26:26.711583Z" | |
| } | |
| }, | |
| "outputs": [ | |
| { | |
| "name": "stdout", | |
| "output_type": "stream", | |
| "text": [ | |
| "[OK] performance fingerprint matches bundle composition fingerprint\n" | |
| ] | |
| } | |
| ], | |
| "source": [ | |
| "bundle_fp = bundle_tro[\"@graph\"][0][\"trov:hasComposition\"][\"trov:hasFingerprint\"][\"trov:sha256\"]\n", | |
| "assert perf[\"pe:bundleFingerprint\"] == bundle_fp\n", | |
| "print(\"[OK] performance fingerprint matches bundle composition fingerprint\")" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "id": "4d845213", | |
| "metadata": {}, | |
| "source": [ | |
| "### 8c. Forgery detection" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 11, | |
| "id": "722c5185", | |
| "metadata": { | |
| "execution": { | |
| "iopub.execute_input": "2026-04-20T11:26:26.712945Z", | |
| "iopub.status.busy": "2026-04-20T11:26:26.712879Z", | |
| "iopub.status.idle": "2026-04-20T11:26:26.714790Z", | |
| "shell.execute_reply": "2026-04-20T11:26:26.714408Z" | |
| } | |
| }, | |
| "outputs": [ | |
| { | |
| "name": "stdout", | |
| "output_type": "stream", | |
| "text": [ | |
| "[OK] forged bundle detected — hash diverges from recorded hash\n" | |
| ] | |
| } | |
| ], | |
| "source": [ | |
| "forged_bundle = json.loads(json.dumps(bundle_tro))\n", | |
| "forged_bundle[\"@graph\"][0][\"schema:description\"] = \"FORGED\"\n", | |
| "forged_hash = hashlib.sha256(canonical_json_bytes(forged_bundle)).hexdigest()\n", | |
| "assert forged_hash != recorded_hash\n", | |
| "print(\"[OK] forged bundle detected — hash diverges from recorded hash\")" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "id": "ca8640e5", | |
| "metadata": {}, | |
| "source": [ | |
| "## 9. Write the three replication artifacts" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 12, | |
| "id": "a09eb55b", | |
| "metadata": { | |
| "execution": { | |
| "iopub.execute_input": "2026-04-20T11:26:26.715687Z", | |
| "iopub.status.busy": "2026-04-20T11:26:26.715627Z", | |
| "iopub.status.idle": "2026-04-20T11:26:26.719008Z", | |
| "shell.execute_reply": "2026-04-20T11:26:26.718712Z" | |
| } | |
| }, | |
| "outputs": [ | |
| { | |
| "name": "stdout", | |
| "output_type": "stream", | |
| "text": [ | |
| " bundle.trace.tro.jsonld 5,787 bytes\n", | |
| " results.json 1,838 bytes\n", | |
| " results.trace.tro.jsonld 4,809 bytes\n" | |
| ] | |
| } | |
| ], | |
| "source": [ | |
| "import pathlib\n", | |
| "\n", | |
| "out_dir = pathlib.Path(\"./trace_artifacts\")\n", | |
| "out_dir.mkdir(exist_ok=True)\n", | |
| "\n", | |
| "(out_dir / \"bundle.trace.tro.jsonld\").write_bytes(serialize_trace_tro(bundle_tro))\n", | |
| "(out_dir / \"results.trace.tro.jsonld\").write_bytes(serialize_trace_tro(sim_tro))\n", | |
| "(out_dir / \"results.json\").write_text(\n", | |
| " json.dumps(results.model_dump(), indent=2, default=str) + \"\\n\"\n", | |
| ")\n", | |
| "\n", | |
| "for path in sorted(out_dir.iterdir()):\n", | |
| " print(f\" {path.name:<32} {path.stat().st_size:>6,} bytes\")" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "id": "639af572", | |
| "metadata": {}, | |
| "source": [ | |
| "## 10. Bottom line\n", | |
| "\n", | |
| "Three files, one git tag, full replicability:\n", | |
| "\n", | |
| "- `bundle.trace.tro.jsonld` pins model wheel sha256 + PyPI URL and\n", | |
| " dataset sha256 + HF resolve URL.\n", | |
| "- `results.trace.tro.jsonld` chains the bundle to the reform dict and\n", | |
| " to real `results.json` values with line-level source attribution.\n", | |
| "- The composition fingerprint is pinned twice (by artifact hash and\n", | |
| " by `pe:bundleFingerprint`), so a forged bundle fails hash *and*\n", | |
| " fingerprint cross-checks.\n", | |
| "\n", | |
| "A reviewer can go from `results.json` to \"install the exact wheel at\n", | |
| "sha256 X, fetch the exact dataset at sha256 Y, rerun the reform\" in a\n", | |
| "handful of steps.\n", | |
| "\n", | |
| "**Known limitations** ([docs/release-bundles.md](https://github.com/PolicyEngine/policyengine.py/blob/main/docs/release-bundles.md)):\n", | |
| "TROs are currently unsigned, the composition does not pin a\n", | |
| "transitive lockfile or Python interpreter version, and private-data\n", | |
| "countries require `HUGGING_FACE_TOKEN` at emit time." | |
| ] | |
| } | |
| ], | |
| "metadata": { | |
| "jupytext": { | |
| "cell_metadata_filter": "-all", | |
| "main_language": "python", | |
| "notebook_metadata_filter": "-all" | |
| }, | |
| "language_info": { | |
| "codemirror_mode": { | |
| "name": "ipython", | |
| "version": 3 | |
| }, | |
| "file_extension": ".py", | |
| "mimetype": "text/x-python", | |
| "name": "python", | |
| "nbconvert_exporter": "python", | |
| "pygments_lexer": "ipython3", | |
| "version": "3.12.13" | |
| } | |
| }, | |
| "nbformat": 4, | |
| "nbformat_minor": 5 | |
| } |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment