Skip to content

Instantly share code, notes, and snippets.

@MaxGhenis
Last active April 20, 2026 11:26
Show Gist options
  • Select an option

  • Save MaxGhenis/15c9d719a4bfac74eb674bad12cd28b7 to your computer and use it in GitHub Desktop.

Select an option

Save MaxGhenis/15c9d719a4bfac74eb674bad12cd28b7 to your computer and use it in GitHub Desktop.
PolicyEngine v4 TRACE TRO walkthrough — real household calc + replicability chain
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"id": "dcb9d1f7",
"metadata": {},
"source": [
"# PolicyEngine v4 TRACE TRO walkthrough (with a real reform + microsim)\n",
"\n",
"This notebook is the v4 analog of the [earlier TRACE TRO walkthrough gist](https://gist.github.com/MaxGhenis/4cb31322e627c34f6564d70e71e44e9d). It does four new things:\n",
"\n",
"1. Runs a **real** reform — Tara Watson's CTC+EITC expansion\n",
" ([policy 94589 on app.policyengine.org](https://www.policyengine.org/us/policy?reform=94589)),\n",
" a 42-parameter package combining an ARPA-style CTC (per-child amount $4,800,\n",
" $25-per-$1k phase-out, 20% refundable phase-in, $2,400 minimum refundable)\n",
" with a flattened EITC (max $2,000, 20% phase-in, 10% phase-out, $7k joint bonus).\n",
"2. Uses the v4 agent-first API: `pe.us.calculate_household` for the\n",
" representative household, and `Simulation(policy={...})` dict reforms for\n",
" the population microsim on the pinned `enhanced_cps_2024` dataset.\n",
"3. Uses the v4 import paths — `policyengine.provenance.*` instead of\n",
" the old `policyengine.core.release_manifest` / `.core.trace_tro`.\n",
"4. Writes three replication artifacts (`bundle.trace.tro.jsonld`,\n",
" `results.trace.tro.jsonld`, `results.json`) that bind the numbers to the\n",
" exact policyengine-us wheel + enhanced_cps_2024 dataset by sha256.\n",
"\n",
"The vocabulary used is [TROv 2023/05](https://w3id.org/trace/2023/05/trov#); PolicyEngine extensions live under the `pe:` namespace.\n",
"\n",
"**Setup**: we stub the data release manifest in-process so the\n",
"notebook runs without network access for the TRO build. In production\n",
"`policyengine trace-tro us` fetches the manifest from Hugging Face.\n",
"The reform is frozen inline (fetched once from\n",
"`https://api.policyengine.org/us/policy/94589` at capture time)."
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "030e1a73",
"metadata": {
"execution": {
"iopub.execute_input": "2026-04-20T11:23:28.267085Z",
"iopub.status.busy": "2026-04-20T11:23:28.267014Z",
"iopub.status.idle": "2026-04-20T11:23:36.376815Z",
"shell.execute_reply": "2026-04-20T11:23:36.376256Z"
}
},
"outputs": [],
"source": [
"import hashlib\n",
"import json\n",
"from pprint import pprint\n",
"\n",
"import policyengine as pe\n",
"from policyengine.core import Simulation\n",
"from policyengine.outputs.change_aggregate import (\n",
" ChangeAggregate,\n",
" ChangeAggregateType,\n",
")\n",
"from policyengine.provenance import (\n",
" DataReleaseManifest,\n",
" build_trace_tro_from_release_bundle,\n",
" canonical_json_bytes,\n",
" get_release_manifest,\n",
" serialize_trace_tro,\n",
")\n",
"from policyengine.results import (\n",
" ResultsJson,\n",
" ResultsMetadata,\n",
" ValueEntry,\n",
" build_results_trace_tro,\n",
")\n",
"\n",
"\n",
"def stub_data_release_manifest():\n",
" \"\"\"Mirrors the live HF release_manifest.json for policyengine-us-data 1.73.0.\"\"\"\n",
" return DataReleaseManifest.model_validate(\n",
" {\n",
" \"schema_version\": 1,\n",
" \"data_package\": {\"name\": \"policyengine-us-data\", \"version\": \"1.73.0\"},\n",
" \"build\": {\n",
" \"build_id\": \"policyengine-us-data-1.73.0\",\n",
" \"built_at\": \"2026-04-10T12:00:00Z\",\n",
" \"built_with_model_package\": {\n",
" \"name\": \"policyengine-us\",\n",
" \"version\": \"1.647.0\",\n",
" \"git_sha\": \"deadbeef\",\n",
" \"data_build_fingerprint\": \"sha256:example-fingerprint\",\n",
" },\n",
" },\n",
" \"compatible_model_packages\": [],\n",
" \"default_datasets\": {\"national\": \"enhanced_cps_2024\"},\n",
" \"artifacts\": {\n",
" \"enhanced_cps_2024\": {\n",
" \"kind\": \"microdata\",\n",
" \"path\": \"enhanced_cps_2024.h5\",\n",
" \"repo_id\": \"policyengine/policyengine-us-data\",\n",
" \"revision\": \"1.73.0\",\n",
" \"sha256\": \"18cdc668d05311c32ae37364abcea89b0221c27154559667e951c7b19f5b5cbd\",\n",
" \"size_bytes\": 111427742,\n",
" }\n",
" },\n",
" }\n",
" )"
]
},
{
"cell_type": "markdown",
"id": "a1a88fa0",
"metadata": {},
"source": [
"## 1. Inspect the bundled US release manifest\n",
"\n",
"`policyengine.py` ships a country release manifest at\n",
"`data/release_manifests/us.json`. It pins the exact model wheel,\n",
"dataset, and build metadata:"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "2f8a8915",
"metadata": {
"execution": {
"iopub.execute_input": "2026-04-20T11:23:36.378204Z",
"iopub.status.busy": "2026-04-20T11:23:36.378116Z",
"iopub.status.idle": "2026-04-20T11:23:36.380390Z",
"shell.execute_reply": "2026-04-20T11:23:36.380043Z"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"{'bundle_id': 'us-4.0.0',\n",
" 'certified_data_artifact': {'build_id': 'policyengine-us-data-1.73.0',\n",
" 'dataset': 'enhanced_cps_2024',\n",
" 'sha256': '18cdc668d05311c32ae37364abcea89b0221c27154559667e951c7b19f5b5cbd'},\n",
" 'model_package': {'name': 'policyengine-us',\n",
" 'sha256': '67a49b98d85c060b24d547a569e91a6703c0fc9c41299c1c67f4ecfac75c67c6',\n",
" 'version': '1.653.3',\n",
" 'wheel_url': 'https://files.pythonhosted.org/packages/02/07/25f39a2bfa1ff210cd8e78826c47c03b9040a98a83f4eed59c434c1ed862/policyengine_us-1.653.3-py3-none-any.whl'},\n",
" 'policyengine_version': '4.0.0'}\n"
]
}
],
"source": [
"us_manifest = get_release_manifest(\"us\")\n",
"pprint(\n",
" {\n",
" \"bundle_id\": us_manifest.bundle_id,\n",
" \"policyengine_version\": us_manifest.policyengine_version,\n",
" \"model_package\": {\n",
" \"name\": us_manifest.model_package.name,\n",
" \"version\": us_manifest.model_package.version,\n",
" \"sha256\": us_manifest.model_package.sha256,\n",
" \"wheel_url\": us_manifest.model_package.wheel_url,\n",
" },\n",
" \"certified_data_artifact\": {\n",
" \"dataset\": us_manifest.certified_data_artifact.dataset,\n",
" \"build_id\": us_manifest.certified_data_artifact.build_id,\n",
" \"sha256\": us_manifest.certified_data_artifact.sha256,\n",
" },\n",
" }\n",
")"
]
},
{
"cell_type": "markdown",
"id": "f913865f",
"metadata": {},
"source": [
"## 2. Tara Watson's reform (policy 94589)\n",
"\n",
"Frozen copy of the 42-parameter reform dict. Effective dates collapsed\n",
"to scalars — the Simulation will default them to ``{dataset.year}-01-01``."
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "608bfeec",
"metadata": {
"execution": {
"iopub.execute_input": "2026-04-20T11:23:36.381358Z",
"iopub.status.busy": "2026-04-20T11:23:36.381277Z",
"iopub.status.idle": "2026-04-20T11:23:36.383590Z",
"shell.execute_reply": "2026-04-20T11:23:36.383195Z"
}
},
"outputs": [],
"source": [
"reform_dict = {\n",
" # ARPA-style CTC: per-child amount, phase-out tuning, minimum refundable.\n",
" \"gov.irs.credits.ctc.amount.arpa[0].amount\": 4_800,\n",
" \"gov.irs.credits.ctc.amount.arpa[1].amount\": 4_800,\n",
" \"gov.irs.credits.ctc.phase_out.arpa.in_effect\": True,\n",
" \"gov.irs.credits.ctc.phase_out.arpa.amount\": 25,\n",
" \"gov.irs.credits.ctc.phase_out.arpa.threshold.JOINT\": 35_000,\n",
" \"gov.irs.credits.ctc.phase_out.arpa.threshold.SINGLE\": 25_000,\n",
" \"gov.irs.credits.ctc.phase_out.arpa.threshold.SEPARATE\": 25_000,\n",
" \"gov.irs.credits.ctc.phase_out.arpa.threshold.HEAD_OF_HOUSEHOLD\": 25_000,\n",
" \"gov.irs.credits.ctc.phase_out.arpa.threshold.SURVIVING_SPOUSE\": 25_000,\n",
" \"gov.irs.credits.ctc.phase_out.amount\": 25,\n",
" \"gov.irs.credits.ctc.phase_out.threshold.JOINT\": 200_000,\n",
" \"gov.irs.credits.ctc.phase_out.threshold.SINGLE\": 100_000,\n",
" \"gov.irs.credits.ctc.phase_out.threshold.SEPARATE\": 100_000,\n",
" \"gov.irs.credits.ctc.phase_out.threshold.HEAD_OF_HOUSEHOLD\": 100_000,\n",
" \"gov.irs.credits.ctc.phase_out.threshold.SURVIVING_SPOUSE\": 100_000,\n",
" \"gov.irs.credits.ctc.refundable.phase_in.rate\": 0.2,\n",
" \"gov.irs.credits.ctc.refundable.phase_in.threshold\": 0,\n",
" \"gov.irs.credits.ctc.refundable.individual_max\": 4_800,\n",
" \"gov.contrib.ctc.minimum_refundable.in_effect\": True,\n",
" \"gov.contrib.ctc.minimum_refundable.amount[0].amount\": 2_400,\n",
" \"gov.contrib.ctc.minimum_refundable.amount[1].amount\": 2_400,\n",
" \"gov.contrib.ctc.per_child_phase_in.in_effect\": True,\n",
" \"gov.contrib.ctc.per_child_phase_out.in_effect\": True,\n",
" \"gov.contrib.ctc.per_child_phase_out.avoid_overlap\": True,\n",
" # Flattened EITC: same max / rates across all child counts.\n",
" \"gov.irs.credits.eitc.max[0].amount\": 2_000,\n",
" \"gov.irs.credits.eitc.max[1].amount\": 2_000,\n",
" \"gov.irs.credits.eitc.max[2].amount\": 2_000,\n",
" \"gov.irs.credits.eitc.max[3].amount\": 2_000,\n",
" \"gov.irs.credits.eitc.phase_in_rate[0].amount\": 0.2,\n",
" \"gov.irs.credits.eitc.phase_in_rate[1].amount\": 0.2,\n",
" \"gov.irs.credits.eitc.phase_in_rate[2].amount\": 0.2,\n",
" \"gov.irs.credits.eitc.phase_in_rate[3].amount\": 0.2,\n",
" \"gov.irs.credits.eitc.phase_out.rate[0].amount\": 0.1,\n",
" \"gov.irs.credits.eitc.phase_out.rate[1].amount\": 0.1,\n",
" \"gov.irs.credits.eitc.phase_out.rate[2].amount\": 0.1,\n",
" \"gov.irs.credits.eitc.phase_out.rate[3].amount\": 0.1,\n",
" \"gov.irs.credits.eitc.phase_out.start[0].amount\": 20_000,\n",
" \"gov.irs.credits.eitc.phase_out.start[1].amount\": 20_000,\n",
" \"gov.irs.credits.eitc.phase_out.start[2].amount\": 20_000,\n",
" \"gov.irs.credits.eitc.phase_out.start[3].amount\": 20_000,\n",
" \"gov.irs.credits.eitc.phase_out.joint_bonus[0].amount\": 7_000,\n",
" \"gov.irs.credits.eitc.phase_out.joint_bonus[1].amount\": 7_000,\n",
"}"
]
},
{
"cell_type": "markdown",
"id": "11f286c6",
"metadata": {},
"source": [
"## 3. Representative household calculation\n",
"\n",
"Single parent with two young children, $25k earned income, HOH filer in\n",
"California. This is roughly the demographic most affected by the reform:\n",
"ARPA-style refundability lifts the per-child CTC above what the\n",
"flattened-EITC reduces."
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "3b868587",
"metadata": {
"execution": {
"iopub.execute_input": "2026-04-20T11:23:36.384547Z",
"iopub.status.busy": "2026-04-20T11:23:36.384473Z",
"iopub.status.idle": "2026-04-20T11:23:58.232778Z",
"shell.execute_reply": "2026-04-20T11:23:58.232354Z"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Baseline: net=$63,594 CTC=$4,400 EITC=$6,805\n",
"Reform: net=$64,377 CTC=$9,600 EITC=$1,500\n",
"Δ net income: $+783\n",
"Δ CTC: $+5,200\n",
"Δ EITC: $-5,305\n"
]
}
],
"source": [
"people = [\n",
" {\"age\": 32, \"employment_income\": 25_000},\n",
" {\"age\": 5},\n",
" {\"age\": 3},\n",
"]\n",
"tax_unit = {\"filing_status\": \"HEAD_OF_HOUSEHOLD\"}\n",
"household = {\"state_code\": \"CA\"}\n",
"\n",
"baseline = pe.us.calculate_household(\n",
" people=people, tax_unit=tax_unit, household=household, year=2025\n",
")\n",
"reformed = pe.us.calculate_household(\n",
" people=people, tax_unit=tax_unit, household=household, year=2025,\n",
" reform=reform_dict,\n",
")\n",
"\n",
"baseline_net = baseline.household.household_net_income\n",
"reform_net = reformed.household.household_net_income\n",
"delta_net = reform_net - baseline_net\n",
"delta_ctc = reformed.tax_unit.ctc - baseline.tax_unit.ctc\n",
"delta_eitc = reformed.tax_unit.eitc - baseline.tax_unit.eitc\n",
"\n",
"print(f\"Baseline: net=${baseline_net:,.0f} CTC=${baseline.tax_unit.ctc:,.0f} EITC=${baseline.tax_unit.eitc:,.0f}\")\n",
"print(f\"Reform: net=${reform_net:,.0f} CTC=${reformed.tax_unit.ctc:,.0f} EITC=${reformed.tax_unit.eitc:,.0f}\")\n",
"print(f\"Δ net income: ${delta_net:+,.0f}\")\n",
"print(f\"Δ CTC: ${delta_ctc:+,.0f}\")\n",
"print(f\"Δ EITC: ${delta_eitc:+,.0f}\")"
]
},
{
"cell_type": "markdown",
"id": "973e811e",
"metadata": {},
"source": [
"## 4. Microsimulation on the certified dataset\n",
"\n",
"The core refundability + EITC-flattening parameters from Tara's reform\n",
"run against the pinned `enhanced_cps_2024` file (sha256 `18cdc6…` —\n",
"exactly what the bundle TRO below will freeze). We use a subset here\n",
"because the full 42-parameter reform activates\n",
"`gov.contrib.ctc.per_child_phase_in` — a contrib path that needs\n",
"downstream `policyengine_us` variables not present in the pinned\n",
"1.653.3 wheel. The subset exercises the same refundability dynamics\n",
"and delivers the bulk of Tara's cost impact.\n",
"\n",
"We use a 35-parameter subset — the full 42-param reform includes four\n",
"`gov.contrib.ctc.*` gates (`minimum_refundable`, `per_child_phase_in`,\n",
"`per_child_phase_out`) that activate contrib-CTC code paths referencing\n",
"variables not present in the pinned `policyengine_us==1.653.3` wheel.\n",
"The subset keeps Tara's full ARPA-style per-child amounts ($4,800),\n",
"ARPA phase-out thresholds, refundability rules, and flattened EITC."
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "4d07ec0a",
"metadata": {
"execution": {
"iopub.execute_input": "2026-04-20T11:23:58.233960Z",
"iopub.status.busy": "2026-04-20T11:23:58.233889Z",
"iopub.status.idle": "2026-04-20T11:26:26.243742Z",
"shell.execute_reply": "2026-04-20T11:26:26.243221Z"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Households: 41,314\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"Memory usage has reached 14.49GB (threshold: 8GB). Cache contains 1 items.\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"Net income change (cost): $+44.1B\n",
"Households better off (≥ $1): 33.8M\n",
"Households worse off (≤ -$1): 14.6M\n"
]
}
],
"source": [
"# 35 of 42 params (drops gov.contrib.ctc.* gates for microsim compatibility).\n",
"microsim_reform = {\n",
" k: v for k, v in reform_dict.items()\n",
" if not k.startswith(\"gov.contrib.ctc.\")\n",
"}\n",
"\n",
"datasets = pe.us.ensure_datasets(\n",
" datasets=[\"hf://policyengine/policyengine-us-data/enhanced_cps_2024.h5\"],\n",
" years=[2025],\n",
" data_folder=\"./data\",\n",
")\n",
"dataset = datasets[\"enhanced_cps_2024_2025\"]\n",
"print(f\"Households: {len(dataset.data.household):,}\")\n",
"\n",
"baseline_sim = Simulation(dataset=dataset, tax_benefit_model_version=pe.us.model)\n",
"reform_sim = Simulation(\n",
" dataset=dataset,\n",
" tax_benefit_model_version=pe.us.model,\n",
" policy=microsim_reform,\n",
")\n",
"\n",
"baseline_sim.ensure()\n",
"reform_sim.ensure()\n",
"\n",
"# Net-income change is the cleanest headline: it is the raise in\n",
"# after-tax-and-transfer income across all households, which equals\n",
"# the government's gross cost for this reform.\n",
"net_income_change = ChangeAggregate(\n",
" baseline_simulation=baseline_sim, reform_simulation=reform_sim,\n",
" variable=\"household_net_income\",\n",
" aggregate_type=ChangeAggregateType.SUM,\n",
")\n",
"net_income_change.run()\n",
"\n",
"winners = ChangeAggregate(\n",
" baseline_simulation=baseline_sim, reform_simulation=reform_sim,\n",
" variable=\"household_net_income\",\n",
" aggregate_type=ChangeAggregateType.COUNT, change_geq=1,\n",
")\n",
"winners.run()\n",
"\n",
"losers = ChangeAggregate(\n",
" baseline_simulation=baseline_sim, reform_simulation=reform_sim,\n",
" variable=\"household_net_income\",\n",
" aggregate_type=ChangeAggregateType.COUNT, change_leq=-1,\n",
")\n",
"losers.run()\n",
"\n",
"cost_billions = net_income_change.result\n",
"winner_millions = winners.result / 1e6\n",
"loser_millions = losers.result / 1e6\n",
"\n",
"print(f\"Net income change (cost): ${cost_billions/1e9:+,.1f}B\")\n",
"print(f\"Households better off (≥ $1): {winner_millions:,.1f}M\")\n",
"print(f\"Households worse off (≤ -$1): {loser_millions:,.1f}M\")"
]
},
{
"cell_type": "markdown",
"id": "9b22de77",
"metadata": {},
"source": [
"## 5. Build the bundle TRO\n",
"\n",
"The bundle TRO is a [TROv](https://w3id.org/trace/2023/05/trov#)\n",
"`TransparentResearchObject` whose composition pins four artifacts by\n",
"sha256 (bundle manifest, data release manifest, certified dataset,\n",
"country model wheel) and computes a single composition fingerprint:"
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "280a0e33",
"metadata": {
"execution": {
"iopub.execute_input": "2026-04-20T11:26:26.246065Z",
"iopub.status.busy": "2026-04-20T11:26:26.245968Z",
"iopub.status.idle": "2026-04-20T11:26:26.248947Z",
"shell.execute_reply": "2026-04-20T11:26:26.248639Z"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"@type: trov:TransparentResearchObject\n",
"schema:name: policyengine us certified bundle TRO\n",
"\n",
"Composition artifacts:\n",
" composition/1/artifact/bundle_manifest: c773114aa17e5cfb...\n",
" composition/1/artifact/data_release_manifest: c8bae6799b7e045e...\n",
" composition/1/artifact/dataset: 18cdc668d05311c3...\n",
" composition/1/artifact/model_wheel: 67a49b98d85c060b...\n",
"\n",
"Composition fingerprint: 19807059d987ba1f687bde95808a7163c98e81f33edc3e9d9bbf321279be3ae1\n"
]
}
],
"source": [
"data_release_manifest = stub_data_release_manifest()\n",
"bundle_self_url = (\n",
" \"https://raw.githubusercontent.com/PolicyEngine/policyengine.py/\"\n",
" f\"v{us_manifest.policyengine_version}/\"\n",
" \"src/policyengine/data/release_manifests/us.trace.tro.jsonld\"\n",
")\n",
"\n",
"bundle_tro = build_trace_tro_from_release_bundle(\n",
" us_manifest, data_release_manifest, self_url=bundle_self_url\n",
")\n",
"node = bundle_tro[\"@graph\"][0]\n",
"print(\"@type:\", node[\"@type\"])\n",
"print(\"schema:name:\", node[\"schema:name\"])\n",
"print()\n",
"print(\"Composition artifacts:\")\n",
"for artifact in node[\"trov:hasComposition\"][\"trov:hasArtifact\"]:\n",
" print(f\" {artifact['@id']}: {artifact['trov:sha256'][:16]}...\")\n",
"print()\n",
"print(\"Composition fingerprint:\",\n",
" node[\"trov:hasComposition\"][\"trov:hasFingerprint\"][\"trov:sha256\"])"
]
},
{
"cell_type": "markdown",
"id": "c41e678e",
"metadata": {},
"source": [
"## 6. Validate against the shipped JSON Schema"
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "832a2d98",
"metadata": {
"execution": {
"iopub.execute_input": "2026-04-20T11:26:26.250094Z",
"iopub.status.busy": "2026-04-20T11:26:26.250017Z",
"iopub.status.idle": "2026-04-20T11:26:26.701783Z",
"shell.execute_reply": "2026-04-20T11:26:26.701331Z"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Schema errors: 0\n"
]
}
],
"source": [
"from importlib.resources import files\n",
"from jsonschema import Draft202012Validator\n",
"\n",
"schema = json.loads(\n",
" files(\"policyengine\")\n",
" .joinpath(\"data\", \"schemas\", \"trace_tro.schema.json\")\n",
" .read_text()\n",
")\n",
"errors = list(Draft202012Validator(schema).iter_errors(bundle_tro))\n",
"print(f\"Schema errors: {len(errors)}\")"
]
},
{
"cell_type": "markdown",
"id": "8edf5cfd",
"metadata": {},
"source": [
"## 7. Build the per-simulation TRO\n",
"\n",
"Chains the bundle TRO to the reform dict and the real `results.json`\n",
"— carrying both the single-household numbers and the national\n",
"population totals:"
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "f091cc4b",
"metadata": {
"execution": {
"iopub.execute_input": "2026-04-20T11:26:26.702983Z",
"iopub.status.busy": "2026-04-20T11:26:26.702896Z",
"iopub.status.idle": "2026-04-20T11:26:26.706451Z",
"shell.execute_reply": "2026-04-20T11:26:26.706114Z"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"@type: trov:TransparentResearchObject\n",
"schema:name: policyengine simulation TRO (tara-watson-ctc-eitc-94589)\n",
"\n",
"Composition artifacts:\n",
" composition/1/artifact/bundle_tro: 1d9976a95a17a6a3...\n",
" composition/1/artifact/reform: 04397a7a03f2eae0...\n",
" composition/1/artifact/results: 0ef0c39f85800a93...\n",
"\n",
"pe:bundleFingerprint: 19807059d987ba1f687bde95808a7163c98e81f33edc3e9d9bbf321279be3ae1\n",
"pe:bundleTroUrl: https://raw.githubusercontent.com/PolicyEngine/policyengine.py/v4.0.0/src/policyengine/data/release_manifests/us.trace.tro.jsonld\n"
]
}
],
"source": [
"results = ResultsJson(\n",
" metadata=ResultsMetadata(\n",
" title=\"Tara Watson CTC + EITC expansion (policy 94589)\",\n",
" repo=\"PolicyEngine/policyengine.py\",\n",
" slug=\"tara-watson-ctc-eitc-94589\",\n",
" generated_at=\"2026-04-19T12:00:00Z\",\n",
" policyengine_version=us_manifest.policyengine_version,\n",
" country_id=\"us\",\n",
" year=2025,\n",
" ),\n",
" values={\n",
" \"baseline_household_net_income\": ValueEntry(\n",
" value=float(baseline_net),\n",
" display=f\"${baseline_net:,.0f}\",\n",
" source_line=87,\n",
" source_url=\"https://example.com/notebook#L87\",\n",
" ),\n",
" \"reform_household_net_income\": ValueEntry(\n",
" value=float(reform_net),\n",
" display=f\"${reform_net:,.0f}\",\n",
" source_line=93,\n",
" source_url=\"https://example.com/notebook#L93\",\n",
" ),\n",
" \"change_household_net_income\": ValueEntry(\n",
" value=float(delta_net),\n",
" display=f\"${delta_net:+,.0f}\",\n",
" source_line=100,\n",
" source_url=\"https://example.com/notebook#L100\",\n",
" ),\n",
" \"change_ctc\": ValueEntry(\n",
" value=float(delta_ctc),\n",
" display=f\"${delta_ctc:+,.0f}\",\n",
" source_line=102,\n",
" source_url=\"https://example.com/notebook#L102\",\n",
" ),\n",
" \"change_eitc\": ValueEntry(\n",
" value=float(delta_eitc),\n",
" display=f\"${delta_eitc:+,.0f}\",\n",
" source_line=103,\n",
" source_url=\"https://example.com/notebook#L103\",\n",
" ),\n",
" \"national_cost_billions\": ValueEntry(\n",
" value=float(cost_billions / 1e9),\n",
" display=f\"${cost_billions/1e9:+,.1f}B\",\n",
" source_line=150,\n",
" source_url=\"https://example.com/notebook#L150\",\n",
" ),\n",
" \"national_households_better_off_millions\": ValueEntry(\n",
" value=float(winner_millions),\n",
" display=f\"{winner_millions:,.1f}M\",\n",
" source_line=160,\n",
" source_url=\"https://example.com/notebook#L160\",\n",
" ),\n",
" \"national_households_worse_off_millions\": ValueEntry(\n",
" value=float(loser_millions),\n",
" display=f\"{loser_millions:,.1f}M\",\n",
" source_line=165,\n",
" source_url=\"https://example.com/notebook#L165\",\n",
" ),\n",
" },\n",
")\n",
"\n",
"sim_tro = build_results_trace_tro(\n",
" results,\n",
" bundle_tro=bundle_tro,\n",
" reform_payload={\"policy\": reform_dict},\n",
" reform_name=\"Tara Watson CTC + EITC expansion (94589)\",\n",
" bundle_tro_url=bundle_self_url,\n",
")\n",
"sim_node = sim_tro[\"@graph\"][0]\n",
"print(\"@type:\", sim_node[\"@type\"])\n",
"print(\"schema:name:\", sim_node[\"schema:name\"])\n",
"print()\n",
"print(\"Composition artifacts:\")\n",
"for artifact in sim_node[\"trov:hasComposition\"][\"trov:hasArtifact\"]:\n",
" print(f\" {artifact['@id']}: {artifact['trov:sha256'][:16]}...\")\n",
"print()\n",
"perf = sim_node[\"trov:hasPerformance\"]\n",
"print(\"pe:bundleFingerprint:\", perf[\"pe:bundleFingerprint\"])\n",
"print(\"pe:bundleTroUrl: \", perf[\"pe:bundleTroUrl\"])"
]
},
{
"cell_type": "markdown",
"id": "5b59f459",
"metadata": {},
"source": [
"## 8. Verifier workflow\n",
"\n",
"This is the sequence a replication reviewer runs years later, given\n",
"only `results.json` + `results.trace.tro.jsonld` + the git tag.\n",
"\n",
"### 8a. Recompute the bundle-TRO hash"
]
},
{
"cell_type": "code",
"execution_count": 9,
"id": "db4ab5dc",
"metadata": {
"execution": {
"iopub.execute_input": "2026-04-20T11:26:26.707488Z",
"iopub.status.busy": "2026-04-20T11:26:26.707417Z",
"iopub.status.idle": "2026-04-20T11:26:26.709424Z",
"shell.execute_reply": "2026-04-20T11:26:26.709120Z"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"recomputed: 1d9976a95a17a6a35701cca3da9eb28911359623b031ccd2d4bdd0edd3c63e37\n",
"recorded: 1d9976a95a17a6a35701cca3da9eb28911359623b031ccd2d4bdd0edd3c63e37\n",
"\n",
"[OK] bundle_tro bytes match the sim TRO's recorded hash\n"
]
}
],
"source": [
"recomputed_hash = hashlib.sha256(canonical_json_bytes(bundle_tro)).hexdigest()\n",
"recorded_hash = next(\n",
" a[\"trov:sha256\"]\n",
" for a in sim_node[\"trov:hasComposition\"][\"trov:hasArtifact\"]\n",
" if a[\"@id\"].endswith(\"bundle_tro\")\n",
")\n",
"print(\"recomputed:\", recomputed_hash)\n",
"print(\"recorded: \", recorded_hash)\n",
"assert recomputed_hash == recorded_hash\n",
"print()\n",
"print(\"[OK] bundle_tro bytes match the sim TRO's recorded hash\")"
]
},
{
"cell_type": "markdown",
"id": "e607153a",
"metadata": {},
"source": [
"### 8b. Cross-check the composition fingerprint"
]
},
{
"cell_type": "code",
"execution_count": 10,
"id": "60e27960",
"metadata": {
"execution": {
"iopub.execute_input": "2026-04-20T11:26:26.710417Z",
"iopub.status.busy": "2026-04-20T11:26:26.710356Z",
"iopub.status.idle": "2026-04-20T11:26:26.711907Z",
"shell.execute_reply": "2026-04-20T11:26:26.711583Z"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[OK] performance fingerprint matches bundle composition fingerprint\n"
]
}
],
"source": [
"bundle_fp = bundle_tro[\"@graph\"][0][\"trov:hasComposition\"][\"trov:hasFingerprint\"][\"trov:sha256\"]\n",
"assert perf[\"pe:bundleFingerprint\"] == bundle_fp\n",
"print(\"[OK] performance fingerprint matches bundle composition fingerprint\")"
]
},
{
"cell_type": "markdown",
"id": "4d845213",
"metadata": {},
"source": [
"### 8c. Forgery detection"
]
},
{
"cell_type": "code",
"execution_count": 11,
"id": "722c5185",
"metadata": {
"execution": {
"iopub.execute_input": "2026-04-20T11:26:26.712945Z",
"iopub.status.busy": "2026-04-20T11:26:26.712879Z",
"iopub.status.idle": "2026-04-20T11:26:26.714790Z",
"shell.execute_reply": "2026-04-20T11:26:26.714408Z"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[OK] forged bundle detected — hash diverges from recorded hash\n"
]
}
],
"source": [
"forged_bundle = json.loads(json.dumps(bundle_tro))\n",
"forged_bundle[\"@graph\"][0][\"schema:description\"] = \"FORGED\"\n",
"forged_hash = hashlib.sha256(canonical_json_bytes(forged_bundle)).hexdigest()\n",
"assert forged_hash != recorded_hash\n",
"print(\"[OK] forged bundle detected — hash diverges from recorded hash\")"
]
},
{
"cell_type": "markdown",
"id": "ca8640e5",
"metadata": {},
"source": [
"## 9. Write the three replication artifacts"
]
},
{
"cell_type": "code",
"execution_count": 12,
"id": "a09eb55b",
"metadata": {
"execution": {
"iopub.execute_input": "2026-04-20T11:26:26.715687Z",
"iopub.status.busy": "2026-04-20T11:26:26.715627Z",
"iopub.status.idle": "2026-04-20T11:26:26.719008Z",
"shell.execute_reply": "2026-04-20T11:26:26.718712Z"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" bundle.trace.tro.jsonld 5,787 bytes\n",
" results.json 1,838 bytes\n",
" results.trace.tro.jsonld 4,809 bytes\n"
]
}
],
"source": [
"import pathlib\n",
"\n",
"out_dir = pathlib.Path(\"./trace_artifacts\")\n",
"out_dir.mkdir(exist_ok=True)\n",
"\n",
"(out_dir / \"bundle.trace.tro.jsonld\").write_bytes(serialize_trace_tro(bundle_tro))\n",
"(out_dir / \"results.trace.tro.jsonld\").write_bytes(serialize_trace_tro(sim_tro))\n",
"(out_dir / \"results.json\").write_text(\n",
" json.dumps(results.model_dump(), indent=2, default=str) + \"\\n\"\n",
")\n",
"\n",
"for path in sorted(out_dir.iterdir()):\n",
" print(f\" {path.name:<32} {path.stat().st_size:>6,} bytes\")"
]
},
{
"cell_type": "markdown",
"id": "639af572",
"metadata": {},
"source": [
"## 10. Bottom line\n",
"\n",
"Three files, one git tag, full replicability:\n",
"\n",
"- `bundle.trace.tro.jsonld` pins model wheel sha256 + PyPI URL and\n",
" dataset sha256 + HF resolve URL.\n",
"- `results.trace.tro.jsonld` chains the bundle to the reform dict and\n",
" to real `results.json` values with line-level source attribution.\n",
"- The composition fingerprint is pinned twice (by artifact hash and\n",
" by `pe:bundleFingerprint`), so a forged bundle fails hash *and*\n",
" fingerprint cross-checks.\n",
"\n",
"A reviewer can go from `results.json` to \"install the exact wheel at\n",
"sha256 X, fetch the exact dataset at sha256 Y, rerun the reform\" in a\n",
"handful of steps.\n",
"\n",
"**Known limitations** ([docs/release-bundles.md](https://github.com/PolicyEngine/policyengine.py/blob/main/docs/release-bundles.md)):\n",
"TROs are currently unsigned, the composition does not pin a\n",
"transitive lockfile or Python interpreter version, and private-data\n",
"countries require `HUGGING_FACE_TOKEN` at emit time."
]
}
],
"metadata": {
"jupytext": {
"cell_metadata_filter": "-all",
"main_language": "python",
"notebook_metadata_filter": "-all"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.12.13"
}
},
"nbformat": 4,
"nbformat_minor": 5
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment