Skip to content

Instantly share code, notes, and snippets.

@MaxGhenis
Last active April 18, 2026 15:47
Show Gist options
  • Select an option

  • Save MaxGhenis/4cb31322e627c34f6564d70e71e44e9d to your computer and use it in GitHub Desktop.

Select an option

Save MaxGhenis/4cb31322e627c34f6564d70e71e44e9d to your computer and use it in GitHub Desktop.
PolicyEngine TRACE TRO walkthrough — executed notebook showing bundle TRO, per-simulation TRO chaining, and verifier workflow per TROv 2023/05 vocabulary (PolicyEngine/policyengine.py#237)
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"id": "200f7baa",
"metadata": {
"papermill": {
"duration": 0.001654,
"end_time": "2026-04-18T15:47:39.758343+00:00",
"exception": false,
"start_time": "2026-04-18T15:47:39.756689+00:00",
"status": "completed"
},
"tags": []
},
"source": [
"# PolicyEngine TRACE TRO walkthrough\n",
"\n",
"This notebook demonstrates the TRACE Transparent Research Object (TRO)\n",
"pipeline landed in [PolicyEngine/policyengine.py#237](https://github.com/PolicyEngine/policyengine.py/pull/237).\n",
"It shows the end-to-end flow a replication reviewer would follow:\n",
"\n",
"1. Build a **certified runtime bundle TRO** for a country, pinning the\n",
" model wheel + data package + dataset + bundle manifest by sha256.\n",
"2. Validate the TRO against the shipped JSON Schema.\n",
"3. Run a simulation and produce a **per-simulation TRO** chaining the\n",
" bundle TRO to a reform and a `results.json` payload.\n",
"4. Walk the **verifier workflow**: recompute hashes, detect a forged\n",
" bundle, and cross-check the composition fingerprint.\n",
"\n",
"The TROv vocabulary used is\n",
"[https://w3id.org/trace/2023/05/trov#](https://w3id.org/trace/2023/05/trov#),\n",
"the same one the public\n",
"[trov-demos](https://github.com/transparency-certified/trov-demos)\n",
"validate against. PolicyEngine extensions live under the `pe:` namespace.\n",
"\n",
"**Setup**: we stub the data release manifest in-process so the notebook\n",
"runs without network access. In production `policyengine trace-tro us`\n",
"fetches the manifest from Hugging Face.\n"
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "3d5e7f4f",
"metadata": {
"execution": {
"iopub.execute_input": "2026-04-18T15:47:39.761209Z",
"iopub.status.busy": "2026-04-18T15:47:39.761080Z",
"iopub.status.idle": "2026-04-18T15:47:40.654384Z",
"shell.execute_reply": "2026-04-18T15:47:40.653836Z"
},
"papermill": {
"duration": 0.895655,
"end_time": "2026-04-18T15:47:40.655044+00:00",
"exception": false,
"start_time": "2026-04-18T15:47:39.759389+00:00",
"status": "completed"
},
"tags": []
},
"outputs": [],
"source": [
"import hashlib\n",
"import json\n",
"from pprint import pprint\n",
"\n",
"from policyengine.core.release_manifest import (\n",
" DataReleaseManifest,\n",
" get_release_manifest,\n",
")\n",
"from policyengine.core.trace_tro import (\n",
" build_trace_tro_from_release_bundle,\n",
" build_simulation_trace_tro,\n",
" canonical_json_bytes,\n",
" extract_bundle_tro_reference,\n",
" serialize_trace_tro,\n",
")\n",
"from policyengine.results import (\n",
" ResultsJson,\n",
" ResultsMetadata,\n",
" ValueEntry,\n",
" build_results_trace_tro,\n",
")\n",
"\n",
"\n",
"def stub_data_release_manifest():\n",
" \"\"\"Mirrors the live HF release_manifest.json for policyengine-us-data 1.73.0.\"\"\"\n",
" return DataReleaseManifest.model_validate(\n",
" {\n",
" \"schema_version\": 1,\n",
" \"data_package\": {\"name\": \"policyengine-us-data\", \"version\": \"1.73.0\"},\n",
" \"build\": {\n",
" \"build_id\": \"policyengine-us-data-1.73.0\",\n",
" \"built_at\": \"2026-04-10T12:00:00Z\",\n",
" \"built_with_model_package\": {\n",
" \"name\": \"policyengine-us\",\n",
" \"version\": \"1.647.0\",\n",
" \"git_sha\": \"deadbeef\",\n",
" \"data_build_fingerprint\": \"sha256:example-fingerprint\",\n",
" },\n",
" },\n",
" \"compatible_model_packages\": [],\n",
" \"default_datasets\": {\"national\": \"enhanced_cps_2024\"},\n",
" \"artifacts\": {\n",
" \"enhanced_cps_2024\": {\n",
" \"kind\": \"microdata\",\n",
" \"path\": \"enhanced_cps_2024.h5\",\n",
" \"repo_id\": \"policyengine/policyengine-us-data\",\n",
" \"revision\": \"1.73.0\",\n",
" \"sha256\": \"18cdc668d05311c32ae37364abcea89b0221c27154559667e951c7b19f5b5cbd\",\n",
" \"size_bytes\": 111427742,\n",
" }\n",
" },\n",
" }\n",
" )\n"
]
},
{
"cell_type": "markdown",
"id": "31306dc1",
"metadata": {
"papermill": {
"duration": 0.000984,
"end_time": "2026-04-18T15:47:40.657541+00:00",
"exception": false,
"start_time": "2026-04-18T15:47:40.656557+00:00",
"status": "completed"
},
"tags": []
},
"source": [
"## 1. Inspect the bundled US release manifest\n",
"\n",
"`policyengine.py` ships a country release manifest under\n",
"`data/release_manifests/us.json`. It pins the exact model package\n",
"version, data package version, certified dataset, and model wheel\n",
"sha256 + URL:"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "bc112e69",
"metadata": {
"execution": {
"iopub.execute_input": "2026-04-18T15:47:40.660070Z",
"iopub.status.busy": "2026-04-18T15:47:40.659862Z",
"iopub.status.idle": "2026-04-18T15:47:40.663761Z",
"shell.execute_reply": "2026-04-18T15:47:40.663337Z"
},
"papermill": {
"duration": 0.005844,
"end_time": "2026-04-18T15:47:40.664202+00:00",
"exception": false,
"start_time": "2026-04-18T15:47:40.658358+00:00",
"status": "completed"
},
"tags": []
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"{'bundle_id': 'us-3.4.0',\n",
" 'certification': {'built_with_model_git_sha': None,\n",
" 'built_with_model_version': '1.647.0',\n",
" 'certified_by': 'policyengine.py bundled manifest',\n",
" 'certified_for_model_version': '1.647.0',\n",
" 'compatibility_basis': 'exact_build_model_version',\n",
" 'data_build_fingerprint': None,\n",
" 'data_build_id': 'policyengine-us-data-1.73.0'},\n",
" 'certified_data_artifact': {'build_id': 'policyengine-us-data-1.73.0',\n",
" 'dataset': 'enhanced_cps_2024',\n",
" 'sha256': '18cdc668d05311c32ae37364abcea89b0221c27154559667e951c7b19f5b5cbd'},\n",
" 'model_package': {'name': 'policyengine-us',\n",
" 'sha256': '50e64bf910772b224cdc2b5af5a3414f976f68a9e1748107da7e1de6e325425c',\n",
" 'version': '1.647.0',\n",
" 'wheel_url': 'https://files.pythonhosted.org/packages/2a/96/4814f2630395350915d819452d7684f232c9b8df1d9ba5c279f3b6d02c17/policyengine_us-1.647.0-py3-none-any.whl'},\n",
" 'policyengine_version': '3.4.0'}\n"
]
}
],
"source": [
"us_manifest = get_release_manifest(\"us\")\n",
"bundle_fields = {\n",
" \"bundle_id\": us_manifest.bundle_id,\n",
" \"policyengine_version\": us_manifest.policyengine_version,\n",
" \"model_package\": {\n",
" \"name\": us_manifest.model_package.name,\n",
" \"version\": us_manifest.model_package.version,\n",
" \"sha256\": us_manifest.model_package.sha256,\n",
" \"wheel_url\": us_manifest.model_package.wheel_url,\n",
" },\n",
" \"certified_data_artifact\": {\n",
" \"dataset\": us_manifest.certified_data_artifact.dataset,\n",
" \"build_id\": us_manifest.certified_data_artifact.build_id,\n",
" \"sha256\": us_manifest.certified_data_artifact.sha256,\n",
" },\n",
" \"certification\": us_manifest.certification.model_dump(),\n",
"}\n",
"pprint(bundle_fields)\n"
]
},
{
"cell_type": "markdown",
"id": "939d00b2",
"metadata": {
"papermill": {
"duration": 0.000867,
"end_time": "2026-04-18T15:47:40.666214+00:00",
"exception": false,
"start_time": "2026-04-18T15:47:40.665347+00:00",
"status": "completed"
},
"tags": []
},
"source": [
"## 2. Build the bundle TRO\n",
"\n",
"The bundle TRO is a [TROv](https://w3id.org/trace/2023/05/trov#)\n",
"`TransparentResearchObject` whose composition pins four artifacts by\n",
"sha256: the bundle manifest, the data release manifest, the certified\n",
"dataset, and the country model wheel. A composition fingerprint binds\n",
"them together as a single immutable identifier."
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "702bec0f",
"metadata": {
"execution": {
"iopub.execute_input": "2026-04-18T15:47:40.668453Z",
"iopub.status.busy": "2026-04-18T15:47:40.668358Z",
"iopub.status.idle": "2026-04-18T15:47:40.670708Z",
"shell.execute_reply": "2026-04-18T15:47:40.670328Z"
},
"papermill": {
"duration": 0.004219,
"end_time": "2026-04-18T15:47:40.671096+00:00",
"exception": false,
"start_time": "2026-04-18T15:47:40.666877+00:00",
"status": "completed"
},
"tags": []
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"@type: trov:TransparentResearchObject\n",
"schema:name: policyengine us certified bundle TRO\n",
"\n",
"Composition artifacts:\n",
" composition/1/artifact/bundle_manifest: 943ceb2fc33a6b10...\n",
" composition/1/artifact/data_release_manifest: c8bae6799b7e045e...\n",
" composition/1/artifact/dataset: 18cdc668d05311c3...\n",
" composition/1/artifact/model_wheel: 50e64bf910772b22...\n",
"\n",
"Composition fingerprint: 4e6ec328f035eb9f7ac50c1d23efc1b3575cca522c3078d7fed692959ea32bb2\n"
]
}
],
"source": [
"data_release_manifest = stub_data_release_manifest()\n",
"\n",
"bundle_tro = build_trace_tro_from_release_bundle(\n",
" us_manifest,\n",
" data_release_manifest,\n",
" self_url=(\n",
" \"https://raw.githubusercontent.com/PolicyEngine/policyengine.py/\"\n",
" \"v3.4.5/src/policyengine/data/release_manifests/us.trace.tro.jsonld\"\n",
" ),\n",
")\n",
"\n",
"# Summary of what's in the TRO\n",
"node = bundle_tro[\"@graph\"][0]\n",
"print(\"@type:\", node[\"@type\"])\n",
"print(\"schema:name:\", node[\"schema:name\"])\n",
"print()\n",
"print(\"Composition artifacts:\")\n",
"for artifact in node[\"trov:hasComposition\"][\"trov:hasArtifact\"]:\n",
" print(f\" {artifact['@id']}: {artifact['trov:sha256'][:16]}...\")\n",
"print()\n",
"print(\"Composition fingerprint:\",\n",
" node[\"trov:hasComposition\"][\"trov:hasFingerprint\"][\"trov:sha256\"])\n"
]
},
{
"cell_type": "markdown",
"id": "73916b86",
"metadata": {
"papermill": {
"duration": 0.000706,
"end_time": "2026-04-18T15:47:40.672806+00:00",
"exception": false,
"start_time": "2026-04-18T15:47:40.672100+00:00",
"status": "completed"
},
"tags": []
},
"source": [
"### 2a. Machine-readable certification\n",
"\n",
"Certification metadata lives under the `pe:` namespace on the\n",
"`TransparentResearchPerformance` node, so downstream tooling can read\n",
"it without parsing prose:"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "cdea0837",
"metadata": {
"execution": {
"iopub.execute_input": "2026-04-18T15:47:40.675229Z",
"iopub.status.busy": "2026-04-18T15:47:40.675158Z",
"iopub.status.idle": "2026-04-18T15:47:40.676814Z",
"shell.execute_reply": "2026-04-18T15:47:40.676542Z"
},
"papermill": {
"duration": 0.003585,
"end_time": "2026-04-18T15:47:40.677296+00:00",
"exception": false,
"start_time": "2026-04-18T15:47:40.673711+00:00",
"status": "completed"
},
"tags": []
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"{'pe:builtWithModelVersion': '1.647.0',\n",
" 'pe:certifiedBy': 'policyengine.py bundled manifest',\n",
" 'pe:certifiedForModelVersion': '1.647.0',\n",
" 'pe:compatibilityBasis': 'exact_build_model_version',\n",
" 'pe:dataBuildId': 'policyengine-us-data-1.73.0',\n",
" 'pe:emittedIn': 'local'}\n"
]
}
],
"source": [
"performance = node[\"trov:hasPerformance\"]\n",
"pe_fields = {k: v for k, v in performance.items() if k.startswith(\"pe:\")}\n",
"pprint(pe_fields)\n"
]
},
{
"cell_type": "markdown",
"id": "48acdc82",
"metadata": {
"papermill": {
"duration": 0.000953,
"end_time": "2026-04-18T15:47:40.679414+00:00",
"exception": false,
"start_time": "2026-04-18T15:47:40.678461+00:00",
"status": "completed"
},
"tags": []
},
"source": [
"### 2b. Inspect a full JSON-LD artifact\n",
"\n",
"Every `trov:hasArtifactLocation` uses a dereferenceable HTTPS URL or a\n",
"well-known local path relative to the shipped wheel:"
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "b1a49836",
"metadata": {
"execution": {
"iopub.execute_input": "2026-04-18T15:47:40.682256Z",
"iopub.status.busy": "2026-04-18T15:47:40.682165Z",
"iopub.status.idle": "2026-04-18T15:47:40.683824Z",
"shell.execute_reply": "2026-04-18T15:47:40.683532Z"
},
"papermill": {
"duration": 0.003953,
"end_time": "2026-04-18T15:47:40.684291+00:00",
"exception": false,
"start_time": "2026-04-18T15:47:40.680338+00:00",
"status": "completed"
},
"tags": []
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"arrangement/1/location/bundle_manifest\n",
" -> data/release_manifests/us.json\n",
"arrangement/1/location/data_release_manifest\n",
" -> https://huggingface.co/policyengine/policyengine-us-data/resolve/1.73.0/release_manifest.json\n",
"arrangement/1/location/dataset\n",
" -> https://huggingface.co/policyengine/policyengine-us-data/resolve/1.73.0/enhanced_cps_2024.h5\n",
"arrangement/1/location/model_wheel\n",
" -> https://files.pythonhosted.org/packages/2a/96/4814f2630395350915d819452d7684f232c9b8df1d9ba5c279f3b6d02c17/policyengine_us-1.647.0-py3-none-any.whl\n"
]
}
],
"source": [
"for location in node[\"trov:hasArrangement\"][0][\"trov:hasArtifactLocation\"]:\n",
" print(f\"{location['@id']}\")\n",
" print(f\" -> {location['trov:hasLocation']}\")\n"
]
},
{
"cell_type": "markdown",
"id": "6f37e14a",
"metadata": {
"papermill": {
"duration": 0.001014,
"end_time": "2026-04-18T15:47:40.686435+00:00",
"exception": false,
"start_time": "2026-04-18T15:47:40.685421+00:00",
"status": "completed"
},
"tags": []
},
"source": [
"## 3. Validate the TRO against the shipped JSON Schema\n",
"\n",
"The schema at\n",
"`policyengine/data/schemas/trace_tro.schema.json` checks structural\n",
"fields, canonical 64-char sha256s, required `pe:emittedIn`, and that\n",
"every `trov:hasLocation` is HTTPS or a well-known local path. A\n",
"`file:///` URL or path-traversal attempt would fail validation."
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "92cfe4c6",
"metadata": {
"execution": {
"iopub.execute_input": "2026-04-18T15:47:40.689374Z",
"iopub.status.busy": "2026-04-18T15:47:40.689281Z",
"iopub.status.idle": "2026-04-18T15:47:41.089682Z",
"shell.execute_reply": "2026-04-18T15:47:41.089032Z"
},
"papermill": {
"duration": 0.402512,
"end_time": "2026-04-18T15:47:41.090223+00:00",
"exception": false,
"start_time": "2026-04-18T15:47:40.687711+00:00",
"status": "completed"
},
"tags": []
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Schema errors: 0\n"
]
}
],
"source": [
"from importlib.resources import files\n",
"\n",
"from jsonschema import Draft202012Validator\n",
"\n",
"schema = json.loads(\n",
" files(\"policyengine\").joinpath(\"data\", \"schemas\", \"trace_tro.schema.json\").read_text()\n",
")\n",
"errors = list(Draft202012Validator(schema).iter_errors(bundle_tro))\n",
"print(f\"Schema errors: {len(errors)}\")\n"
]
},
{
"cell_type": "markdown",
"id": "e9fe85eb",
"metadata": {
"papermill": {
"duration": 0.000796,
"end_time": "2026-04-18T15:47:41.092606+00:00",
"exception": false,
"start_time": "2026-04-18T15:47:41.091810+00:00",
"status": "completed"
},
"tags": []
},
"source": [
"## 4. Build a per-simulation TRO\n",
"\n",
"Now simulate a reform and produce a per-simulation TRO that chains the\n",
"bundle to a reform specification and the `results.json` payload."
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "266ef18c",
"metadata": {
"execution": {
"iopub.execute_input": "2026-04-18T15:47:41.095385Z",
"iopub.status.busy": "2026-04-18T15:47:41.095261Z",
"iopub.status.idle": "2026-04-18T15:47:41.098229Z",
"shell.execute_reply": "2026-04-18T15:47:41.097795Z"
},
"papermill": {
"duration": 0.005374,
"end_time": "2026-04-18T15:47:41.098824+00:00",
"exception": false,
"start_time": "2026-04-18T15:47:41.093450+00:00",
"status": "completed"
},
"tags": []
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"@type: trov:TransparentResearchObject\n",
"schema:name: policyengine simulation TRO (salt-cap-repeal)\n",
"\n",
"Composition artifacts:\n",
" composition/1/artifact/bundle_tro: b614f22621035a33...\n",
" composition/1/artifact/reform: d8a84a794df5bdd2...\n",
" composition/1/artifact/results: 9a0aa60c5b10457a...\n",
"\n",
"pe:bundleFingerprint: 4e6ec328f035eb9f7ac50c1d23efc1b3575cca522c3078d7fed692959ea32bb2\n",
"pe:bundleTroUrl: https://raw.githubusercontent.com/PolicyEngine/policyengine.py/v3.4.5/src/policyengine/data/release_manifests/us.trace.tro.jsonld\n"
]
}
],
"source": [
"# A synthetic reform + results payload. In practice these come from\n",
"# a PolicyEngine simulation run.\n",
"import numpy as np\n",
"\n",
"reform = {\"policy\": {\"gov.irs.deductions.salt.cap\": {\"2026-01-01\": np.inf}}}\n",
"\n",
"results = ResultsJson(\n",
" metadata=ResultsMetadata(\n",
" title=\"SALT cap repeal\",\n",
" repo=\"PolicyEngine/analyses\",\n",
" slug=\"salt-cap-repeal\",\n",
" generated_at=\"2026-04-18T15:00:00Z\",\n",
" policyengine_version=us_manifest.policyengine_version,\n",
" country_id=\"us\",\n",
" year=2026,\n",
" ),\n",
" values={\n",
" \"budget_impact_billions\": ValueEntry(\n",
" value=-15200000000,\n",
" display=\"$15.2 billion\",\n",
" source_line=47,\n",
" source_url=\"https://github.com/PolicyEngine/analyses/blob/main/salt.py#L47\",\n",
" ),\n",
" },\n",
")\n",
"\n",
"bundle_url = (\n",
" \"https://raw.githubusercontent.com/PolicyEngine/policyengine.py/\"\n",
" \"v3.4.5/src/policyengine/data/release_manifests/us.trace.tro.jsonld\"\n",
")\n",
"\n",
"sim_tro = build_results_trace_tro(\n",
" results,\n",
" bundle_tro=bundle_tro,\n",
" reform_payload=reform,\n",
" reform_name=\"SALT cap repeal\",\n",
" bundle_tro_url=bundle_url,\n",
")\n",
"\n",
"sim_node = sim_tro[\"@graph\"][0]\n",
"print(\"@type:\", sim_node[\"@type\"])\n",
"print(\"schema:name:\", sim_node[\"schema:name\"])\n",
"print()\n",
"print(\"Composition artifacts:\")\n",
"for artifact in sim_node[\"trov:hasComposition\"][\"trov:hasArtifact\"]:\n",
" print(f\" {artifact['@id']}: {artifact['trov:sha256'][:16]}...\")\n",
"print()\n",
"perf = sim_node[\"trov:hasPerformance\"]\n",
"print(\"pe:bundleFingerprint:\", perf[\"pe:bundleFingerprint\"])\n",
"print(\"pe:bundleTroUrl: \", perf[\"pe:bundleTroUrl\"])\n"
]
},
{
"cell_type": "markdown",
"id": "82dcb998",
"metadata": {
"papermill": {
"duration": 0.001335,
"end_time": "2026-04-18T15:47:41.101319+00:00",
"exception": false,
"start_time": "2026-04-18T15:47:41.099984+00:00",
"status": "completed"
},
"tags": []
},
"source": [
"## 5. The verifier workflow\n",
"\n",
"This is the sequence a replication reviewer runs years later, given\n",
"only `results.json` + `results.trace.tro.jsonld` + the git tag.\n",
"\n",
"### 5a. Recompute the bundle TRO hash\n",
"\n",
"Fetch `pe:bundleTroUrl` (here, locally — in production over HTTPS),\n",
"recompute the sha256 over canonical JSON bytes, and compare against\n",
"the `bundle_tro` artifact hash in the simulation TRO."
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "339aa35a",
"metadata": {
"execution": {
"iopub.execute_input": "2026-04-18T15:47:41.104006Z",
"iopub.status.busy": "2026-04-18T15:47:41.103906Z",
"iopub.status.idle": "2026-04-18T15:47:41.106179Z",
"shell.execute_reply": "2026-04-18T15:47:41.105757Z"
},
"papermill": {
"duration": 0.004065,
"end_time": "2026-04-18T15:47:41.106598+00:00",
"exception": false,
"start_time": "2026-04-18T15:47:41.102533+00:00",
"status": "completed"
},
"tags": []
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"recomputed: b614f22621035a33234c4a4c9daeab804a63f2f451ccf3bd3557dc7e6f1ada9c\n",
"recorded: b614f22621035a33234c4a4c9daeab804a63f2f451ccf3bd3557dc7e6f1ada9c\n",
"\n",
"[OK] bundle_tro bytes match the sim TRO's recorded hash\n"
]
}
],
"source": [
"# In production: bundle_bytes = requests.get(perf[\"pe:bundleTroUrl\"]).content\n",
"# Here we reuse the bundle_tro we built above; the hash is what matters.\n",
"recomputed_hash = hashlib.sha256(canonical_json_bytes(bundle_tro)).hexdigest()\n",
"\n",
"recorded_hash = next(\n",
" a[\"trov:sha256\"]\n",
" for a in sim_node[\"trov:hasComposition\"][\"trov:hasArtifact\"]\n",
" if a[\"@id\"].endswith(\"bundle_tro\")\n",
")\n",
"\n",
"print(\"recomputed:\", recomputed_hash)\n",
"print(\"recorded: \", recorded_hash)\n",
"assert recomputed_hash == recorded_hash\n",
"print()\n",
"print(\"[OK] bundle_tro bytes match the sim TRO's recorded hash\")\n"
]
},
{
"cell_type": "markdown",
"id": "5a249c0d",
"metadata": {
"papermill": {
"duration": 0.000983,
"end_time": "2026-04-18T15:47:41.108510+00:00",
"exception": false,
"start_time": "2026-04-18T15:47:41.107527+00:00",
"status": "completed"
},
"tags": []
},
"source": [
"### 5b. Cross-check the composition fingerprint\n",
"\n",
"Even if someone rebuilt a bundle TRO with the same byte-equivalent\n",
"composition, the fingerprint inside it must match\n",
"`pe:bundleFingerprint` on the performance node — a second anchor."
]
},
{
"cell_type": "code",
"execution_count": 9,
"id": "a1cf27f7",
"metadata": {
"execution": {
"iopub.execute_input": "2026-04-18T15:47:41.111053Z",
"iopub.status.busy": "2026-04-18T15:47:41.110962Z",
"iopub.status.idle": "2026-04-18T15:47:41.112629Z",
"shell.execute_reply": "2026-04-18T15:47:41.112356Z"
},
"papermill": {
"duration": 0.003626,
"end_time": "2026-04-18T15:47:41.113180+00:00",
"exception": false,
"start_time": "2026-04-18T15:47:41.109554+00:00",
"status": "completed"
},
"tags": []
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"bundle fingerprint: 4e6ec328f035eb9f7ac50c1d23efc1b3575cca522c3078d7fed692959ea32bb2\n",
"pe:bundleFingerprint: 4e6ec328f035eb9f7ac50c1d23efc1b3575cca522c3078d7fed692959ea32bb2\n",
"\n",
"[OK] performance fingerprint matches bundle composition fingerprint\n"
]
}
],
"source": [
"bundle_fp = bundle_tro[\"@graph\"][0][\"trov:hasComposition\"][\"trov:hasFingerprint\"][\"trov:sha256\"]\n",
"print(\"bundle fingerprint: \", bundle_fp)\n",
"print(\"pe:bundleFingerprint: \", perf[\"pe:bundleFingerprint\"])\n",
"assert perf[\"pe:bundleFingerprint\"] == bundle_fp\n",
"print()\n",
"print(\"[OK] performance fingerprint matches bundle composition fingerprint\")\n"
]
},
{
"cell_type": "markdown",
"id": "fda8e874",
"metadata": {
"papermill": {
"duration": 0.001172,
"end_time": "2026-04-18T15:47:41.115745+00:00",
"exception": false,
"start_time": "2026-04-18T15:47:41.114573+00:00",
"status": "completed"
},
"tags": []
},
"source": [
"### 5c. Forgery detection\n",
"\n",
"If a paper author tampers with the bundle TRO (e.g., rewrites a\n",
"description or swaps an artifact), the recomputed hash no longer\n",
"matches the one recorded in the sim TRO. Step 5a catches it."
]
},
{
"cell_type": "code",
"execution_count": 10,
"id": "174e9fa4",
"metadata": {
"execution": {
"iopub.execute_input": "2026-04-18T15:47:41.118925Z",
"iopub.status.busy": "2026-04-18T15:47:41.118828Z",
"iopub.status.idle": "2026-04-18T15:47:41.120923Z",
"shell.execute_reply": "2026-04-18T15:47:41.120490Z"
},
"papermill": {
"duration": 0.004306,
"end_time": "2026-04-18T15:47:41.121409+00:00",
"exception": false,
"start_time": "2026-04-18T15:47:41.117103+00:00",
"status": "completed"
},
"tags": []
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"original: b614f22621035a33234c4a4c9daeab804a63f2f451ccf3bd3557dc7e6f1ada9c\n",
"forged: 7d06e52c63a785c8ea8baf7a560338311d548549c2bf464989d79e28ae849e98\n",
"\n",
"[OK] forged bundle detected — hash diverges from the sim TRO's recorded hash\n"
]
}
],
"source": [
"forged_bundle = json.loads(json.dumps(bundle_tro)) # deep copy\n",
"forged_bundle[\"@graph\"][0][\"schema:description\"] = \"FORGED\"\n",
"\n",
"forged_hash = hashlib.sha256(canonical_json_bytes(forged_bundle)).hexdigest()\n",
"print(\"original: \", recorded_hash)\n",
"print(\"forged: \", forged_hash)\n",
"assert forged_hash != recorded_hash\n",
"print()\n",
"print(\"[OK] forged bundle detected — hash diverges from the sim TRO's recorded hash\")\n"
]
},
{
"cell_type": "markdown",
"id": "fb7bf2a0",
"metadata": {
"papermill": {
"duration": 0.001008,
"end_time": "2026-04-18T15:47:41.123791+00:00",
"exception": false,
"start_time": "2026-04-18T15:47:41.122783+00:00",
"status": "completed"
},
"tags": []
},
"source": [
"## 6. Bottom line for replication review\n",
"\n",
"A published PolicyEngine simulation TRO carries, in a single\n",
"`results.trace.tro.jsonld` file:\n",
"\n",
"- The exact model wheel sha256 and its PyPI download URL\n",
"- The exact dataset sha256 and its Hugging Face resolve URL\n",
"- The certified bundle's composition fingerprint (pinned twice — by\n",
" artifact hash and by `pe:bundleFingerprint`)\n",
"- The immutable `pe:bundleTroUrl` so the verifier can fetch the bundle\n",
" independently of whoever wrote the simulation TRO\n",
"- Whether the TRO was emitted under CI or locally\n",
" (`pe:emittedIn`, plus `pe:ciRunUrl` / `pe:ciGitSha` when CI)\n",
"\n",
"A reviewer can go from `results.json` to \"install the exact wheel at\n",
"sha256 X, fetch the exact dataset at sha256 Y, rerun the reform\" in a\n",
"handful of steps. That closes the concrete gap flagged by Tara\n",
"Watson's experience with the fall-2025 PolicyEngine proposal.\n",
"\n",
"**Known limitations** (documented at\n",
"[docs/release-bundles.md](https://github.com/PolicyEngine/policyengine.py/blob/main/docs/release-bundles.md)):\n",
"TROs are currently unsigned, the composition does not yet pin a\n",
"transitive lockfile or Python interpreter version, and private-data\n",
"countries require `HUGGING_FACE_TOKEN` at emit time.\n"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.14.0"
},
"papermill": {
"default_parameters": {},
"duration": 2.403945,
"end_time": "2026-04-18T15:47:41.441554+00:00",
"environment_variables": {},
"exception": null,
"input_path": "trace_tro_walkthrough.ipynb",
"output_path": "trace_tro_walkthrough.executed.ipynb",
"parameters": {},
"start_time": "2026-04-18T15:47:39.037609+00:00",
"version": "2.7.0"
}
},
"nbformat": 4,
"nbformat_minor": 5
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment