-
-
Save denniswon/4ad3289b280121723668ea7f908eb855 to your computer and use it in GitHub Desktop.
| The goal of prevent developers in the team accidentally forget to make sure the Rust code to be compatible to both old vs new | |
| contract versions. There are 3 scenarios: 1) contracts upgraded first with breaking change before Rust update 2) contracts with | |
| breaking change not upgraded yet, but Rust deploys, 3) contracts upgraded and Rust update also deployed, but for Newton | |
| Policy/Newton Policy Data/and Policy Clients (that extends NewtonPolicyClient mixin), there is a grace period for the developers | |
| to migrate/update their policy clients with new newton policy or policy data. This means the Rust update that pairs with the | |
| contracts upgrade with breaking change should support both versions during this grace period (until version enforcement happens). | |
| 1, 2 are for prevent protocol literally failing, while 3 is for external developer's policy clients to start failing suddenly | |
| without having time to become compatible. /clarify-plan THINK HARD | |
| ⏺ I'll work through this as a design problem. The three scenarios sort into two fundamentally different problem classes, which | |
| matters for what prevention mechanism fits each. | |
| ★ Insight ───────────────────────────────────── | |
| - Scenarios 1+2 are atomic rollout problems (the AVS itself breaks during deploy transitions). Fix is deploy ordering + pre-deploy | |
| CI. | |
| - Scenario 3 is a long-tail multi-version support problem (external-dev UX, not protocol integrity). Fix is dual-path code that | |
| Rust devs cannot forget to write. | |
| - The hard part of #3 isn't "add tolerance code" — we know how to do that. It's making forgetting it structurally impossible (type | |
| system, CI, lint) rather than relying on reviewers remembering. | |
| ───────────────────────────────────────────────── | |
| Problem framing | |
| Mapping to enforcement mechanisms: | |
| ┌────────────────────────────────────────┬────────────────────┬─────────────┬──────────────────────────────────────────────────┐ | |
| │ Scenario │ Failure class │ Who gets │ Natural prevention │ | |
| │ │ │ hurt │ │ | |
| ├────────────────────────────────────────┼────────────────────┼─────────────┼──────────────────────────────────────────────────┤ | |
| │ 1 — contracts ahead of Rust │ protocol-down │ us (newton) │ deploy-pipeline ordering + │ | |
| │ │ │ │ MIN_COMPATIBLE_VERSION │ | |
| ├────────────────────────────────────────┼────────────────────┼─────────────┼──────────────────────────────────────────────────┤ | |
| │ 2 — Rust ahead of contracts │ protocol-down │ us (newton) │ deploy-pipeline ordering + pre-deploy selector │ | |
| │ │ │ │ probe │ | |
| ├────────────────────────────────────────┼────────────────────┼─────────────┼──────────────────────────────────────────────────┤ | |
| │ 3 — policy proxies stuck on old impl │ client-policy-down │ external │ dual-version Rust code paths that can't be │ | |
| │ during grace period │ │ devs │ accidentally skipped │ | |
| └────────────────────────────────────────┴────────────────────┴─────────────┴──────────────────────────────────────────────────┘ | |
| The core hazard in #3: NewtonPolicy / NewtonPolicyData are initialize-once proxies. When we ship a new version, the impl on chain | |
| is new but all existing proxies still delegate to the old bytecode. External devs have to redeploy their policies to opt in. Rust | |
| must gracefully serve both until enforcement. | |
| Clarifying questions (need answers before writing the plan) | |
| Before I draft the actual implementation, I need to resolve ambiguity on these: | |
| Q1 — Version detection. How does Rust know which version a given policy proxy is on? | |
| - (a) NewtonPolicy.PROTOCOL_VERSION() selector (does this exist on old impls? if not, asking it causes the same selector-miss we're | |
| trying to avoid) | |
| - (b) ERC-165 supportsInterface(bytes4) probe per new selector (robust but requires adding supportsInterface to NewtonPolicy in | |
| v0.3.0+) | |
| - (c) Bytecode classification (cast code + known hash per impl) — works today with zero contract changes but brittle if impl is | |
| redeployed | |
| - (d) PolicyClientRegistry stores declared version at registration time — clean but needs registry writes on every upgrade | |
| Which direction do you want? (b) is my recommendation; (c) works as a stopgap without any contract work. | |
| Q2 — Grace period trigger. Is the end of grace period: | |
| - (a) A fixed date baked into Rust at release time (grace_period_end: 2026-07-15) | |
| - (b) An admin-flipped on-chain flag (e.g., MIN_POLICY_VERSION on PolicyClientRegistry) | |
| - (c) Both — Rust has default, admin can extend/shorten | |
| (b) is more flexible but means Rust must query it. (a) is rigid but simpler. | |
| Q3 — Scope of the compat layer. Is this only for NewtonPolicy and NewtonPolicyData selectors called at task time? Or also: | |
| - Policy deployment path (NewtonPolicyFactory.deployPolicy signature changes)? | |
| - Events emitted by old vs new NewtonPolicy (events don't have selector-miss semantics but ABI changes matter)? | |
| - NewtonPolicyClient mixin internals (which external devs compile into their own contracts)? | |
| The mixin is the trickiest because it lives in external-dev bytecode — we can't upgrade it at all. External devs inherit the mixin | |
| version at compile time of their client. | |
| Q4 — Policy version enumeration. Do we want active tracking of "which policy_client is on which impl"? | |
| - Useful for: alerting external devs before grace expires, dashboards | |
| - Requires: crawler that walks PolicyClientRegistry + classifies each proxy | |
| - If yes, where does it live — gateway, a new service, CI job that publishes to a dashboard? | |
| Q5 — Deprecation communication channel. When we detect a legacy policy client at task time, do we: | |
| - Emit a structured log line only (devs may never see it) | |
| - Emit a Prometheus metric legacy_policy_detections_total{policy_client, version, days_until_grace_end} (ops can alert) | |
| - Return a deprecation warning header in the RPC response (external devs see it) | |
| - Auto-create a Linear ticket per detected client | |
| My instinct is all four at different cadences, but which are in scope? | |
| Q6 — Existing tolerance patterns. The v0.3.0 incident surfaced getPolicyCodeHash as an example. Before we design the general | |
| system, can you confirm: | |
| - Did we ship tolerance for that in Rust in a follow-up PR, or was the fix "redeploy all policies"? | |
| - If tolerance exists, where? (It will be the seed pattern for the general BackwardCompat abstraction.) | |
| Q7 — Release calendar visibility. How do we want grace periods to be discoverable across the team? | |
| - (a) .claude/rules/lessons.md entry per release | |
| - (b) docs/COMPATIBILITY_MATRIX.md — single source of truth listing every BackwardCompat, when it expires, which version introduced | |
| it | |
| - (c) Linear project "Protocol Compatibility" with one ticket per open compat window | |
| Draft plan (pending answers above) | |
| Regardless of how Q1–Q7 resolve, the plan will have four tiers. The choices above determine how each tier is implemented. | |
| Tier 1 — Atomic rollout (scenarios 1 + 2, protocol self-compat). | |
| - Per-release: pre-deploy CI smoke test exercises gateway → operator → TaskManager on a fork against current production contract | |
| state. Fails before deploy if Rust can't talk to on-chain contracts. | |
| - Per-contract-PR: require same-PR Rust change OR explicit "no Rust impact" label. Enforced via GitHub required-label check. | |
| - Deploy pipeline: ordering contract — deploy-trigger.yml already does contracts-first on stagef; formalize prod ordering | |
| (contracts first, bake time, then Rust) in runbook. | |
| - MIN_COMPATIBLE_VERSION already enforces the semver floor on-chain; keep it as the last line of defense. | |
| Tier 2 — Typed dual-version abstraction (scenario 3, policy proxy grace period). | |
| Introduce crates/core/src/compat/ module with: | |
| pub struct BackwardCompat<T> { | |
| new_value: Option<T>, // None = legacy proxy | |
| legacy_behavior: T, // what we did pre-v0.X | |
| grace_end: chrono::NaiveDate, | |
| rollout_ticket: &'static str, // Linear ID for visibility | |
| } | |
| impl<T: Clone> BackwardCompat<T> { | |
| pub fn resolve(&self, now: chrono::NaiveDate) -> Result<T, CompatError> { | |
| match (&self.new_value, now >= self.grace_end) { | |
| (Some(v), _) => Ok(v.clone()), | |
| (None, false) => { | |
| metrics::inc_legacy_policy_detection(self.rollout_ticket); | |
| Ok(self.legacy_behavior.clone()) | |
| } | |
| (None, true) => Err(CompatError::GraceExpired { .. }), | |
| } | |
| } | |
| } | |
| Every new required contract selector MUST go through this type. Enforced via: | |
| - Custom clippy lint (or dylint/ast_grep-based CI check) that flags direct calls to a registered set of "new selectors" outside of | |
| compat:: module. | |
| - Codeowner rule: changes to crates/core/src/compat/ require Dennis + one more reviewer. | |
| Tier 3 — CI enforcement of the compatibility lifecycle. | |
| - just check-grace-periods: walks all BackwardCompat instances (via a registry macro), fails CI if any grace_end < today. Forces | |
| the dev to either: | |
| - Remove the compat wrapper (use the new path directly) and prove all proxies migrated. | |
| - Extend the grace_end with justification. | |
| - just test-legacy-compat: fixture test that deploys a snapshot of the previous policy impl and exercises Rust code paths against | |
| it. Ships as a required CI job on PRs that touch contracts or compat/. | |
| - Scheduled CI job (weekly): queries PolicyClientRegistry on each network, classifies each proxy by impl, publishes dashboard + | |
| alerts when >0 clients are on impl version nearing grace expiry. | |
| Tier 4 — External-dev communication. | |
| - Runtime: operator logs legacy_policy_detected at warn! with policy_client address + grace_end. Gateway also emits Prometheus | |
| metric legacy_policy_detections_total. | |
| - RPC layer: gateway adds x-newton-policy-version header on task responses when legacy detected — surfaces to integrations without | |
| requiring log access. | |
| - Ops runbook: when dashboard shows legacy clients within 30 days of grace, ops emails external devs + files Linear tickets. | |
| Tradeoffs to flag | |
| - Type-driven enforcement vs lint-driven. Type-driven (wrap in BackwardCompat) is iron-clad but adds ceremony. Lint-driven (custom | |
| clippy/ast_grep check) is lighter but requires keeping the selector list curated. I'd pick type-driven for clarity. | |
| - Static grace dates vs admin-flipped on-chain flag. Static is simpler; admin-flipped is flexible (extend grace if external dev | |
| ecosystem isn't migrating fast enough). A hybrid — Rust has default static date, admin flag can extend only — captures both safely. | |
| - Fixture-based legacy tests vs on-fork tests. Fixtures (snapshot old bytecode) are fast + deterministic but grow stale. On-fork | |
| tests (against live stagef) are realistic but slow + flaky. Do both: fixtures on every PR, on-fork nightly. | |
| - Cost of supporting old impls forever. Every BackwardCompat is technical debt. If we don't actively retire them, the compat layer | |
| grows unbounded. The grace_end CI check is what prevents this — but only if we don't just keep bumping the date. | |
| What I need from you to write the implementation plan | |
| Answering Q1–Q3 is minimum viable. Q4–Q7 shape Tiers 3+4 but don't block starting Tiers 1+2. Preferred path: | |
| 1. You answer Q1–Q3 (version detection mechanism, grace trigger, scope). | |
| 2. I write a focused implementation plan for Tiers 1 and 2 (protocol self-compat + typed dual-version abstraction) + scaffolding | |
| for Tier 3 CI check. | |
| 3. Tiers 3 and 4 details get fleshed out based on Q4–Q7 answers in a follow-up. | |
| ★ Insight ───────────────────────────────────── | |
| - The real systemic win from this work is removing the judgement call from developers. Today, "do I need tolerance here?" is a | |
| reviewer's job and fails silently. With BackwardCompat<T> as a required wrapper, "tolerance was added" becomes a compile-time fact | |
| and "tolerance is still needed" becomes a CI-enforced fact. Reviewer discipline goes from required to optional — which is the only | |
| way discipline actually survives. | |
| - The mixin case (NewtonPolicyClient that external devs compile into their own contracts) is the worst version of this problem | |
| because we can't upgrade it at all. Worth thinking separately about whether future NewtonPolicyClient changes should use | |
| interface-based dispatch rather than mixin inheritance, so external devs don't have to recompile their contracts to get updates. | |
| That's an architectural shift, not a compat layer — flag for separate discussion. | |
| ───────────────────────────────────────────────── | |
| Which of Q1–Q7 should I pursue first? |
❯ agree with your decisions
⏺ ★ Insight ─────────────────────────────────────
- This plan touches 4 distinct concerns (contract, Rust, CI, process) but only one is a "real" feature — the per-feature strictness
flag pattern. Everything else is enforcement scaffolding around it. That ratio is right: ship a small primitive, then make the
team's habits enforce it. - Naming the flag strictPolicyCodeHash rather than enforceV0_3_0PolicyBinding matters more than it seems. Per-feature naming means
the flag can be deleted when the feature is no longer optional, leaving no version-numbered cruft. Per-release naming accumulates
dead state forever. - Hard-gating the pre-deploy check for stagef/prod (decision 4) is the highest-leverage piece. It catches all future selector-miss
incidents — not just policy-code-hash. The existing v0.3.0 incident would have been caught by this check before the deploy ever
fired.
─────────────────────────────────────────────────
Implementation Plan: Contract↔Rust Compatibility Discipline
Summary
Ship five things, in order of foundational dependency:
- Phase 1: Per-feature strictness flag infrastructure on PolicyClientRegistry + admin CLI command.
- Phase 2: Rust dual-path wiring in data-provider for the first concrete flag (strictPolicyCodeHash).
- Phase 3: just pre-deploy-check selector smoke test (Justfile + selector probe binary).
- Phase 4: Process discipline (PR template, release checklist, lessons entry codifying the pattern).
- Phase 5: Telemetry (legacy detection metric + weekly cron crawler).
Total estimated effort: ~3 engineer-days across phases. Phases 1+2 are sequential; Phases 3, 4, 5 are parallelizable once 1+2 land.
Phase 1 — Per-feature strictness flag infrastructure
1A. Contract change: PolicyClientRegistry
File: contracts/src/PolicyClientRegistry.sol (verify path — may be under a subdir).
Add:
/// @notice Per-feature strictness flags for breaking-change rollouts.
/// @dev Each flag gates Rust-side enforcement of a corresponding selector.
/// Default false (permissive) on deploy; admin flips to true after
/// external policy clients have migrated.
mapping(bytes32 => bool) public strictnessFlags;
event StrictnessFlagUpdated(bytes32 indexed feature, bool enabled);
function setStrictnessFlag(bytes32 feature, bool enabled) external onlyOwner {
strictnessFlags[feature] = enabled;
emit StrictnessFlagUpdated(feature, enabled);
}
function isStrict(bytes32 feature) external view returns (bool) {
return strictnessFlags[feature];
}
Why a generic mapping rather than named booleans: each future breaking change adds a bytes32 constant FEATURE_X =
keccak256("featureX") rather than a new storage slot — avoids storage layout migrations. The setStrictnessFlag(bytes32, bool)
signature stays stable forever.
The first real flag will be keccak256("policy_code_hash"). Defined as a constant in PolicyClientRegistry.sol for selector-stable
reference.
1B. Storage layout safety
Add a storage gap in PolicyClientRegistry if not already present. Run just snapshot-storage-layouts before, just
check-storage-layouts after to confirm no conflicts.
1C. Admin CLI command
File: crates/cli/src/main.rs — add subcommand under existing admin group.
just admin --command set-strictness-flag --feature policy_code_hash --enabled true --env stagef
The CLI computes keccak256("policy_code_hash") and calls PolicyClientRegistry.setStrictnessFlag(...) via existing admin signer
plumbing.
1D. Rust reader
File: crates/chainio/src/avs/reader.rs (or equivalent) — add:
pub async fn is_strictness_flag_enabled(
&self, feature: &str, chain_id: u64,
) -> Result<bool, ChainIoError> {
let registry = self.policy_client_registry(chain_id)?;
let key = alloy::primitives::keccak256(feature.as_bytes());
registry.isStrict(key).call().await
.map(|r| r._0)
.map_err(ChainIoError::from)
}
Phase 2 — Rust dual-path wiring (first flag: strictPolicyCodeHash)
2A. Identify call sites
The selector that broke v0.3.0 is getPolicyCodeHash() on NewtonPolicy. Find every Rust call site:
- crates/data-provider/src/lib.rs — primary policy-code-hash verification
- crates/operator/src/handler.rs — task-time invocation
- Possibly crates/challenger/src/source.rs — challenge path verification
2B. Cached strictness state
Strictness flag is read once per RPC server startup, cached behind a RwLock. External policy migration is slow enough that per-task
refresh is wasteful.
pub struct StrictnessCache {
flags: Arc<RwLock<HashMap<(u64, &'static str), bool>>>,
reader: Arc,
}
impl StrictnessCache {
pub async fn is_strict(&self, chain_id: u64, feature: &'static str) -> bool {
if let Some(&v) = self.flags.read().await.get(&(chain_id, feature)) {
return v;
}
let v = self.reader.is_strictness_flag_enabled(feature, chain_id)
.await.unwrap_or(false); // Conservative: default permissive on RPC failure
self.flags.write().await.insert((chain_id, feature), v);
v
}
}
Refresh policy: on each operator startup + on a 5-minute background tick (so flag flips propagate without restart).
2C. Dual-path selector handler
The actual call site becomes:
let strict = strictness.is_strict(chain_id, "policy_code_hash").await;
match policy.getPolicyCodeHash().call().await {
Ok(hash) => verify_policy_code_hash_binding(hash._0, policy_bytes)?,
Err(e) if is_selector_miss(&e) && !strict => {
warn!(policy = %policy_addr, chain_id, "legacy policy proxy detected, skipping code hash binding");
metrics::inc_legacy_policy_detection(chain_id, "policy_code_hash");
}
Err(e) if is_selector_miss(&e) && strict => {
return Err(LegacyPolicyNotSupported {
policy: policy_addr,
feature: "policy_code_hash",
hint: "Policy proxy is on legacy impl; redeploy to v0.3.0 NewtonPolicy via factory",
});
}
Err(e) => return Err(e.into()),
}
is_selector_miss already exists as part of the classify_batch_item_revert work in crates/chainio/src/avs/errors.rs. Reuse it.
2D. Tests
- Unit test: is_strict=true + selector miss → LegacyPolicyNotSupported
- Unit test: is_strict=false + selector miss → Ok with metric incremented
- Unit test: is_strict=true + selector returns hash → verifies binding
- E2E (optional): deploy old NewtonPolicy impl + run gateway in non-strict mode, verify task succeeds with legacy warning
Phase 3 — Pre-deploy selector smoke test
3A. New Justfile recipe
Pre-deploy compatibility check: probes that current Rust binary's required
contract selectors are present on the on-chain target deployment.
Hard gate for stagef/prod, advisory for local.
pre-deploy-check chain_id env:
cargo run --release --bin newton-compat-check --
--chain-id {{chain_id}} --env {{env}}
3B. New binary: newton-compat-check
File: bin/newton-compat-check/src/main.rs.
Logic:
- Load deployment JSON for (chain_id, env).
- For each (contract, required_selector) in a curated list, do eth_call via gateway's RPC.
- Classify response: success → ok; 0x revert → selector miss → fail.
- For policy contracts, also enumerate PolicyClientRegistry and probe each registered policy proxy for the required selectors
(skips proxies registered as legacy). - Exit 0 (all good) or 1 (incompatibility) with structured stderr explaining which selectors miss on which addresses.
The "curated list" of required selectors is the file the user contributes — it's the most decision-heavy spot.
3C. CI integration
In .github/workflows/deploy-trigger.yml, add a job that runs just pre-deploy-check against the target chain before any deploy job.
For env=local, run in advisory mode (warn, don't block). For env=stagef|prod, hard fail blocks deploy.
Phase 4 — Process discipline
4A. PR template addition
File: .github/PULL_REQUEST_TEMPLATE.md — add checkbox section:
Compatibility checklist (for PRs touching contracts/src/ or crates/{chainio,data-provider})
- Does this PR add a new required contract selector? If yes, was a corresponding
strictnessFlagadded to
PolicyClientRegistryAND a Rust dual-path wired? - Does this PR remove or change semantics of an existing selector? If yes, link the migration plan / lockstep release
coordination. - Does this PR add a new selector to
NewtonPolicyorNewtonPolicyData? If yes, was the new feature flag added to
crates/data-providerselector list? - If a previous
strictnessFlagis being removed (hard cutover complete), is the flag also removed from
PolicyClientRegistry?
4B. Release checklist update
File: docs/RELEASE_CHECKLIST.md (create if not exists). Section "Hard-Cutover Discipline":
Per-release compat audit
For each strictnessFlag currently set to true in production:
- Confirm no telemetry alerts on
legacy_policy_detections_total{feature=<name>}for >30 days - Remove the dual-path branch from Rust (delete
Err(SelectorMiss) if !strictarm) - Remove the flag definition from
PolicyClientRegistry - Document removal in CHANGELOG
For each new breaking selector introduced this release:
- Add
bytes32 constant FEATURE_X = keccak256("feature_x")toPolicyClientRegistry - Add Rust dual-path with
strictness.is_strict(chain_id, "feature_x")gate - Add metric
legacy_policy_detections_total{feature=feature_x} - Document in
.claude/rules/lessons.mdunder "feature-flag-gated breaking changes"
4C. Lessons entry
File: .claude/rules/lessons.md — new entry under "Multichain & cross-chain" or new section "Compatibility & rollout":
Breaking changes to NewtonPolicy/NewtonPolicyData require feature-flag-gated dual paths
Initialize-once proxies mean external policy clients lag behind impl upgrades; Rust hard-requiring a new selector breaks every
legacy proxy silently (selector miss = 0x revert at task time, surfaces as QuorumNotReached). Pattern: add a bytes32
strictness flag to PolicyClientRegistry, ship Rust with a dual-path that tolerates selector misses when the flag is false,
admin flips the flag via just admin --command set-strictness-flag after external migration, then the next release deletes the
legacy branch and the flag (hard cutover, max 2 versions live).
Prevention: Every new required selector on NewtonPolicy/NewtonPolicyData must wire through this pattern. PR template +
release checklist enforce. Pre-deploy just pre-deploy-check hard-gates against any selector miss in current production.
Phase 5 — Telemetry
5A. Metric definition
File: crates/metrics/src/lib.rs:
pub fn inc_legacy_policy_detection(chain_id: u64, feature: &str) {
metrics::counter!(
"legacy_policy_detections_total",
"chain_id" => chain_id.to_string(),
"feature" => feature.to_string(),
).increment(1);
}
5B. Weekly crawler
New just check-legacy-policy-clients --env stagef recipe that:
- Walks PolicyClientRegistry.getAllPolicyClients() (or equivalent enumeration).
- For each registered policy, probes each known feature selector.
- Reports counts: "X clients on legacy policy_code_hash" per feature.
- Output feeds ops decision on when to flip strictness flag.
Run as a GitHub Actions cron (weekly) and post results to a Slack channel or Linear ticket.
Spots where I'd love your input (5-10 lines each)
The plan above sketches the structure, but four specific decisions shape behavior in ways I don't want to guess at. Each is a spot
where the user (you) writing the actual code captures a domain decision.
Contribution 1 — Selector list for pre-deploy check
File to create: bin/newton-compat-check/src/required_selectors.rs
This is the highest-impact decision in the whole plan. It enumerates which contract selectors are required for current Rust to
function. Wrong list = either false positives (deploy blocked unnecessarily) or false negatives (silent breakage like v0.3.0).
// TODO: enumerate the selectors current Rust requires from each contract.
// Format: (contract_name, selector_name, criticality)
// criticality: Critical (block deploy on miss), Warning (advisory only)
pub const REQUIRED_SELECTORS: &[(&str, &str, Criticality)] = &[
// YOU FILL IN — examples:
// ("NewtonPolicy", "getPolicyCodeHash()", Criticality::Critical),
// ("NewtonPolicyData", "expireAfter()", Criticality::Critical),
// ("PolicyClientRegistry", "isStrict(bytes32)", Criticality::Critical),
// ("EpochRegistry", "currentEpoch()", Criticality::Critical),
// ...
];
Trade-offs:
- Too narrow: misses real incompatibilities → silent breakage at deploy
- Too broad: blocks deploys on selectors that have legacy fallbacks → false alarm fatigue
- Per-binary list (gateway vs operator vs challenger) vs unified: unified is simpler but may over-block
Contribution 2 — Strictness flag default behavior on RPC failure
File: crates/data-provider/src/strictness.rs (new), inside is_strict:
let v = self.reader.is_strictness_flag_enabled(feature, chain_id)
.await
// TODO: what should happen if RPC fails to read the flag?
// Option A: .unwrap_or(false) — conservative, assume permissive
// Option B: .unwrap_or(true) — strict, fail closed
// Option C: return Err and propagate to caller
.???
Trade-offs:
- (A) permissive on RPC failure: if RPC is down during a strict-mode rollout, legacy proxies still work — but new strict
enforcement is delayed - (B) strict on RPC failure: new enforcement is reliable — but RPC outage during rollout breaks all legacy clients without warning
- (C) propagate error: explicit, but every caller has to handle yet another error class
Contribution 3 — Admin authorization for flag flips
File: contracts/src/PolicyClientRegistry.sol, setStrictnessFlag modifier:
// TODO: who should be able to flip strictness flags?
// Option A: onlyOwner — single-key control (existing pattern)
// Option B: onlyOwner with per-feature allowlist — admin can only flip features they're authorized for
// Option C: AccessControl with STRICTNESS_ADMIN role — separate from contract upgrade rights
function setStrictnessFlag(bytes32 feature, bool enabled) external ??? {
strictnessFlags[feature] = enabled;
emit StrictnessFlagUpdated(feature, enabled);
}
Trade-offs:
- Strictness flips affect external developers' policy clients globally — high blast radius. Should this require multisig once Safe
is live? - Per-feature allowlists add complexity but let you delegate "Alice can flip policy_code_hash, Bob can flip expire_after" if team
grows
Contribution 4 — Telemetry alerting threshold
File: docs/observability/alerts/legacy_policy_detections.yml (or wherever Datadog/Prom alerts live):
TODO: when should ops be alerted about legacy policy detections?
Consider:
- Threshold: any non-zero count vs >X per minute
- Window: 1m / 5m / 1h
- Severity: warn (informational) vs page (action required)
- Per-feature different thresholds?
Trade-offs:
- "Any non-zero" alerts on every legacy client interaction — noisy if many clients are on old impl during normal grace period
- "Spike-above-baseline" is quieter but may miss slow-burn issues
- Per-feature is right but adds maintenance overhead
Open questions
- Does PolicyClientRegistry already have an enumeration method (e.g., getAllPolicyClients())? If not, Phase 5 weekly crawler needs
a different mechanism — could fall back to indexing PolicyClientRegistered events. - Is ChainIoError already plumbed for selector-miss classification at the data-provider layer, or is it currently only at
batch-submitter layer? Phase 2 may need to lift the helper. - Should the pre-deploy check also probe NewtonPolicy proxies enumerated via PolicyClientRegistry (more thorough but slower), or
just probe contract impls (fast)? My instinct: impl-only by default, full enumeration via --full flag for major releases.
Recommended sequencing
Day 1: Phase 1A-1D (contract + admin CLI + Rust reader)
Day 2: Phase 2A-2C (dual-path wiring + cache)
Day 2.5: Phase 2D (tests)
Day 3: Phase 3A-3C (pre-deploy check) || Phase 4 (process docs) — parallel
Day 3.5: Phase 5 (telemetry)
Phases 1+2 must land together in a single PR (contract + Rust dual-path are atomic). Phase 3 can ship in parallel since it's a
separate binary. Phase 4 is documentation-only. Phase 5 builds on Phase 2's metric.
★ Insight ─────────────────────────────────────
- The "spots for your input" pattern (selector list, default behavior, admin auth, alert thresholds) intentionally pushes the
high-leverage decisions to you — not because the work is hard but because each one carries domain judgment that's wrong to guess
at. Selector list curation is especially important: it's the file that makes pre-deploy check actually work, and it grows with
every breaking change. - The Phase 1 contract change uses mapping(bytes32 => bool) rather than named storage slots specifically to make Phase 4's "remove
the flag" step trivial — deleting an entry via setStrictnessFlag(FEATURE_X, false) and removing the constant declaration is cheaper
than removing a named storage variable (which would need a storage layout migration). - The trickiest part operationally will not be writing the code — it's the discipline to actually delete the legacy branch when the
flag flips. That's why Phase 4B explicitly lists "Remove the dual-path branch" as a release checklist item, not a TODO comment in
code. TODO comments rot; checklist items get reviewed.
─────────────────────────────────────────────────
Want me to start with Phase 1 (contract + admin CLI + Rust reader as a single PR), or would you rather lock in Contributions 1–4
first so Phase 2's behavior is fully specified?
Preview: Protocol Version Compatibility
Topic: Design plan for preventing contract↔Rust version drift incidents (post-v0.3.0 getPolicyCodeHash retrospective). Plan
approved, no code shipped yet.
Key decisions (confirmed this session):
- Flag location: PolicyClientRegistry (mapping bytes32 => bool, not named slots — avoids storage layout migrations)
- Flag granularity: per-feature (strictPolicyCodeHash) not per-release (strictV0_3_0) — trivial deletion when feature is no longer
optional - Default value: false (permissive on deploy, admin flips after grace period)
- Pre-deploy smoke test: hard gate for stagef/prod, advisory for local
- NewtonPolicyClient mixin: hard-cutover only, no compat layer (lives in external-dev bytecode)
Higher-level commitments:
- Max 2 versions concurrent (N + N-1) — hard cutover at every breaking change
- Grace period only for NewtonPolicy / NewtonPolicyData (everything else lockstep)
- Admin-flipped on-chain flag via just admin (not date-based, not CI-gated)
- Version detection: try-the-selector-catch-the-miss (no ERC-165, no bytecode hash)
Plan (5 phases, ~3 engineer-days):
- Phase 1: PolicyClientRegistry strictness mapping + admin CLI + Rust reader
- Phase 2: Rust dual-path wiring in crates/data-provider + strictness cache (5-min refresh)
- Phase 3: just pre-deploy-check selector smoke test (Justfile + newton-compat-check binary)
- Phase 4: PR template + release checklist + .claude/rules/lessons.md entry
- Phase 5: legacy_policy_detections_total metric + weekly cron crawler
Phases 1+2 ship as one atomic PR (contract + Rust dual-path together). Phases 3, 4, 5 are parallelizable after.
Open questions awaiting input:
- Selector list for pre-deploy check (per-binary vs unified) — highest-leverage decision
- Strictness flag default behavior on RPC failure (permissive vs strict vs propagate)
- Admin authorization (onlyOwner vs role-based vs multisig)
- Telemetry alerting threshold (any non-zero vs spike-above-baseline)
Related context:
- Root incident: v0.3.0 getPolicyCodeHash selector miss on legacy NewtonPolicy proxies → QuorumNotReached errors
- Lessons.md entries: "Four-site commitment" (PolicyCodeHashMismatch), "Strict checks on previously-optional fields must roll out
after migration" - Error selectors: 0x5092caa0 (PolicyCodeHashMismatch), 0xc2da1b55 (InvalidPolicyCodeHash)
- Resolves the design question Dennis raised after the v0.3.0 deploy incident: "how can we prevent the Protocol version backward
compatibility between contracts vs Rust code (operators)?"
Anything to add, correct, or remove before I save?
continue
⏺ Good pushback. The short answer: CI E2E proves "this commit works end-to-end on a fresh deploy" — pre-deploy proves "the live
target is compatible with the binary we're about to ship right now." Different questions, different blast radii.
★ Insight ─────────────────────────────────────
- E2E tests in CI always deploy fresh contracts, so every proxy in the test environment is on the impl matching that commit — they
can never observe a legacy proxy that's been sitting on stagef for 3 weeks. - The "Rust new + contracts old" scenario (incident #2) is structurally untestable in PR CI because the workflow checks out one Git
SHA — there is no version skew to test against. - Pre-deploy checks probe the actual on-chain state of the deploy target at the moment of deploy, catching drift between when CI
green-lit and when the workflow fires.
─────────────────────────────────────────────────
What CI E2E actually catches
just e2e and just e2e --scenario * deploy contracts to anvil from contracts/script/, then run gateway/operator/aggregator against
them. Every policy proxy in those tests is freshly minted — NewtonPolicyFactory.deployPolicy() runs at test setup, so they always
carry the impl from the current commit.
That's enough to verify: code compiles, BLS aggregation works, multichain routing works, attestation roundtrips work, challenge
paths work. Real value, but the policy proxies under test are never "old."
What CI E2E cannot catch (the v0.3.0 class)
- Legacy proxies on stagef/prod. External policy clients deployed against the old NewtonPolicy impl don't get re-initialized when
we upgrade the impl — they keep responding to the old selector set. CI fixtures don't include "deploy old policy → upgrade impl →
run gateway" scenarios. - External (non-Newton) policy clients. Third-party developers deploy their own PolicyClient extensions. We don't enumerate them
in fixtures because we don't know about all of them. Pre-deploy could (a weekly registry crawl). - Scenario 2 of the incident matrix. "Rust deploys before contracts" is impossible to model in CI because PR E2E pins one SHA — by
definition contracts and Rust agree. - Drift between CI and deploy. PR CI runs at SHA X. Deploy might fire hours later from main after additional contract changes
merged. The chain state at the moment of deploy can differ from what CI saw.
Two questions, two tools
┌───────────────────────────────────────────────────────────────────────────┬──────────────────┐
│ Question │ Tool │
├───────────────────────────────────────────────────────────────────────────┼──────────────────┤
│ Does this Git commit work end-to-end? │ CI E2E │
├───────────────────────────────────────────────────────────────────────────┼──────────────────┤
│ Is the live (chain_id, env) target compatible with this binary right now? │ pre-deploy check │
└───────────────────────────────────────────────────────────────────────────┴──────────────────┘
E2E is a code-quality gate. Pre-deploy is an environment-compatibility gate. They don't substitute.
Three options
A. Keep Phase 3 as proposed. Pre-deploy check enumerates required selectors per binary, calls each against real production proxies,
fails the deploy on selector miss. Catches legacy + external + drift.
B. Drop Phase 3, expand E2E with legacy fixtures. Add a CI scenario that deploys an old NewtonPolicy impl, registers a proxy, then
runs current Rust gateway against it. Cheap to add, catches "Rust new + contracts old" structurally — but still doesn't probe
actual production state at deploy time, and can't enumerate external policy clients.
C. Hybrid (recommended). Both. Expanded E2E catches the regression class in PR review (fast feedback). Pre-deploy catches the
actual production state at deploy time (last-mile safety). The two costs add to maybe a day of work and one extra CI scenario.
If we only pick one, I'd keep Phase 3 over expanded E2E — pre-deploy is the only mechanism that probes real production state.
Expanded E2E is good hygiene but doesn't close the deploy-time gap.
Which way do you want to go? Once you decide, I'll fold it into the saved memory and finish the /save-progress workflow.
✻ Worked for 32s
❯ what about just having the Rust deployment to have a pre-deploy test in ci for running a set of just e2e integration tests against
not against anvil, but against current stagef deployed contracts on sepolia/base sepolia/etc.? if that fails, Rust deploy is
blocked
⏺ Strong proposal — this is actually higher-fidelity than the selector smoke test. Let me work through it honestly.
★ Insight ─────────────────────────────────────
- E2E-against-stagef catches the v0.3.0 class the same way production catches it — by running the real code path against the real
contracts. That's strictly higher fidelity than enumerating selectors. - Bonus: it catches semantic regressions (same selector, changed behavior) that a selector-existence check would miss.
- The cost is operational complexity: real RPCs, real gas, real test isolation across concurrent PRs — orders of magnitude more
failure surface than a selector probe.
─────────────────────────────────────────────────
Why this is a better design than Phase 3 as proposed
The pre-deploy selector smoke test answers "do the selectors I depend on exist?" The E2E-against-stagef answers "does the full task
pipeline I'm shipping work against the actual contracts I'm shipping to?" The second strictly subsumes the first — you can't pass
full E2E without all required selectors existing.
It also eliminates two ongoing maintenance costs:
- No bin/newton-compat-check binary to build
- No required_selectors.rs curated list to keep in sync (which itself is a drift-prone artifact — forget to add a new selector and
the smoke test passes while production breaks)
And it kills a category of bug the smoke test can't see: a contract whose selector survived but whose semantics changed. A
selector-only check would happily green-light a deploy where getPolicyCodeHash() returns bytes32(0) because the impl was rolled
back.
Real concerns — these need answers before we adopt
- Operator topology. This is the trickiest part. To run E2E against stagef contracts, you need operators signing with the new Rust
binary. Three options:
┌──────────────────────┬──────────────────────────────────────────────────────┬───────────────────────────────────────────────┐
│ Option │ How │ Risk │
├──────────────────────┼──────────────────────────────────────────────────────┼───────────────────────────────────────────────┤
│ A. Replace a stagef │ Stop one stagef operator, run new binary in its │ Real operator goes down; quorum may fail for │
│ operator │ place during test │ unrelated stagef traffic │
├──────────────────────┼──────────────────────────────────────────────────────┼───────────────────────────────────────────────┤
│ B. Add a dedicated │ Register a separate operator slot in stagef ahead of │ Clean isolation, but needs registry setup + │
│ CI test operator │ time, dedicated to CI binary │ ongoing BLS key management │
├──────────────────────┼──────────────────────────────────────────────────────┼───────────────────────────────────────────────┤
│ C. Spin up fresh │ CI runs new binary as N operators, registers them │ Most isolated, but registration churn │
│ operator set in CI │ against stagef contracts before tests, deregisters │ pollutes stagef registry, BLS APK stability │
│ │ after │ matters │
└──────────────────────┴──────────────────────────────────────────────────────┴───────────────────────────────────────────────┘
I'd push toward B: one or two pre-registered "CI operator" slots in stagef, dedicated to this purpose. The CI workflow brings them
online with the candidate binary, runs E2E, brings them down. No churn on the stagef registry, no impact on actual stagef
operators.
-
Gas funding. Sepolia faucets are flaky and rate-limited. Base Sepolia is better but not infinite. Need a funded CI EOA per chain
with auto-refill monitoring. Probably acceptable, but it's now a real piece of infra to own. -
Test parallelism. Two PRs running E2E simultaneously will interfere — they'll compete for the same operator nonces, race on task
ID generation, contend for fixture state. Need either:
- Serialize E2E in CI (only one runs at a time) → developer experience hit, queue grows during busy periods
- Per-PR fixture isolation (CI deploys a fresh PolicyClient per run) → adds 30-60s of setup
- Accept some test flakiness from concurrency → debugging becomes painful
- Latency. Sepolia block time is 12s. Base Sepolia is 2s. A full E2E scenario (just e2e --scenario full: deploy policy, create
task, broadcast, aggregate, respond, settle) might take 5-15 min per chain on real testnets vs 30-60s on local anvil. Across 2-4
chains × N scenarios you could hit 30-60 min.
That's not unworkable but it's a real shift in PR feedback loop. Compare to the smoke test which is seconds.
- Infra fragility blocking deploys. Sepolia RPC outage → CI red → can't deploy → emergency hotfix requires a manual override path.
Need to define what "override" means and who can authorize it.
Recommended structure if we go this route
Keep most of the plan, swap Phase 3:
- Phase 1 (contract strictness flag + admin CLI + Rust reader) — unchanged, still needed for runtime dual-path behavior
- Phase 2 (Rust dual-path wiring + strictness cache) — unchanged, this is what catches legacy proxies at task-handling time, not
deploy time - Phase 3 NEW (E2E-against-stagef pre-deploy gate):
- CI workflow runs just e2e-stagef --scenarios basic,full,routing against pre-registered CI operator slot
- Runs as a required status check on Rust deploy workflow, not on every PR (too expensive for PR feedback)
- Probably runs on main push + on demand via workflow_dispatch
- Gas-funded CI EOA per target chain with monitoring
- Manual override for infra outages (documented: "if Sepolia is down, the lead can override with X")
- Phase 4 (PR template + release checklist + lessons.md) — unchanged
- Phase 5 (telemetry) — unchanged
What I'd want to think about more
A few open questions before I'd commit fully:
- PR-time vs deploy-time gating. Run E2E-against-stagef on every PR (slow but immediate signal) or only on deploy workflow (fast
PRs, but bug discovered hours later when deploying)? My instinct: deploy-workflow only, with an opt-in label to run on PR for risky
changes. - Coverage of external policy clients. Stagef has Newton-deployed proxies in test fixtures — that catches incidents in our own
contracts. External developers' policy clients (third-party PolicyClient extensions) wouldn't be in stagef test fixtures unless we
explicitly include them. Is that gap acceptable? Probably yes — we own first-party correctness, third parties opt into
compatibility risk by extending the mixin. - What "failure" means. A failed E2E scenario could be: real incompatibility (good — block deploy), flaky test (bad — block deploy
on noise), Sepolia outage (bad — block deploy on infra). Need clear classification + retry semantics.
Decision question
Three forks on the table:
A. Adopt E2E-against-stagef as Phase 3. Higher fidelity, more maintainable than selector list, but operationally heavier.
B. Keep selector smoke test as Phase 3. Cheaper, simpler, faster — but misses semantic regressions and depends on selector list
staying in sync.
C. Both. Smoke test gives fast PR-time signal, E2E-against-stagef gates deploys. Belt and suspenders.
My recommendation: A, with the operator topology question (option B from the table) figured out as part of the implementation. The
selector smoke test is a less-faithful approximation of what you actually want — and what you actually want is exactly
E2E-against-stagef.
Which fork do you want? Once you decide I'll fold this into the saved memory and resume /save-progress.
❯ thats the purpose of stagef environment. it is a place for us to test because its internal only so it can go down (although we
dont want it to go down ever) more than prod. So we can just use the existing same operator keys/gatewway keys, etc. and gas will
all be funded, etc. and this should run "sequentially" prior to every testnet stagef deploy (which auto triggers on every PR merge
to main on newton-prover-avs)
⏺ Yes — that resolves all the operational concerns I raised. The cleanest framing is "deploy with E2E gate baked in" rather than a
separate test step.
★ Insight ─────────────────────────────────────
- The stagef deploy itself already has a brief downtime window (stop old binary → start new). Putting E2E inside that window means
you're not adding a new collision risk — you're using the deploy window itself as the test sandbox. - This is structurally a canary deployment where synthetic test traffic (E2E) plays the role of real traffic. If synthetic fails,
roll back. If synthetic passes, accept real traffic. - Reusing existing keys eliminates BLS APK churn, registry writes, and the "where do we register the test operator" problem
entirely.
─────────────────────────────────────────────────
Workflow shape
- PR merges to main
- CI: build new binary
- CI: stop running stagef gateway + operator (downtime starts)
- CI: run E2E with NEW binary against stagef contracts + existing keys
- Scenarios: basic, full, routing (whatever proves the core lifecycle)
- If E2E passes:
- Start stagef gateway + operator with NEW binary (downtime ends)
- Mark deploy successful
- If E2E fails:
- Start stagef gateway + operator with OLD binary (rollback, downtime ends)
- Mark deploy blocked, surface E2E failure to PR author
Stagef downtime per merge: roughly E2E duration (5-15 min). Acceptable for an internal env per your framing.
What this kills from the original Phase 3
- No bin/newton-compat-check binary needed
- No required_selectors.rs curated list to maintain
- No "where do CI test operators register" question
- No gas funding / faucet plumbing (already in place for stagef)
- No test-parallelism question (one merge → one deploy → sequential by construction)
What still needs to be decided
-
E2E scenario set. Which just e2e --scenario * runs as the gate? I'd start minimal — basic and full (full covers two-phase
consensus + direct attestation, the v0.3.0 incident path). Add routing and privacy only if they're stable enough on Sepolia/Base
Sepolia (privacy in particular is heavier). Adding more scenarios increases gate confidence but extends downtime per merge. -
Rollback automation. Does CI automatically restart with the old binary on E2E failure, or does it leave stagef stopped and page
someone? Auto-rollback is cleaner but requires the CI workflow to keep both binaries on disk during the deploy window. -
Multi-chain coverage. Stagef has Sepolia + Base Sepolia + others. Run E2E against each in parallel during the deploy window?
Sequentially? Parallel is faster but multiplies the chance of a flaky chain blocking deploy. I'd pick one or two representative
chains as the required gate, run others as advisory. -
Override path. What happens when E2E fails for an emergency hotfix that needs to ship anyway? Manual workflow_dispatch with
skip_e2e: true? Who can authorize it? -
Production deploy implications. This gate covers stagef → main merge. Production deploy is a separate workflow. Does prod deploy
require the stagef E2E to have passed (i.e., you can only promote a SHA to prod after its stagef E2E was green)? That would close
the loop fully.
Updated phase plan
- Phase 1 (contract strictness flag + admin CLI + Rust reader) — unchanged. Still needed for runtime dual-path.
- Phase 2 (Rust dual-path wiring + strictness cache) — unchanged.
- Phase 3 NEW (stagef pre-deploy E2E gate):
- Modify the existing stagef deploy workflow to insert E2E step between binary swap
- Add rollback-on-failure logic
- Configure scenario set, multi-chain coverage, override path
- Phase 4 (PR template + release checklist + lessons.md) — unchanged.
- Phase 5 (telemetry — legacy_policy_detections_total + alert) — unchanged.
Confirmation needed
Two things to nail down before saving:
- E2E scenarios for the gate: just basic + full, or include routing and privacy from day one?
- Multi-chain: gate on one chain (Sepolia) or on all stagef chains? My instinct is gate on one (cheaper, less flake), advisory on
the rest.
Once you answer those, I'll fold the new Phase 3 into the saved memory and finish /save-progress.
✻ Worked for 1m 13s
❯ or can we just use anvil to "fork" sepolia/base-sepolia etc, so that we dont have to worry about gateway/operator nonce, gas issue
etc?
⏺ Yes — this is even better. Anvil fork gives you the fidelity of real stagef state with the isolation of CI-local execution. Best of
both worlds.
★ Insight ─────────────────────────────────────
- anvil --fork-url snapshots all on-chain state at a block (every contract, every storage slot, every legacy proxy) into a
local node — your test environment is the real stagef contract topology, not a fresh deploy. - The fork preserves the BLS APK registry, so signatures from real operator keys verify against real on-chain state — the
cryptographic correctness path is honest, not mocked. - This is the same just e2e infrastructure you already have, just pointed at a forked node instead of a fresh anvil — minimal new
tooling required.
─────────────────────────────────────────────────
Workflow shape
- PR merges to main → CI workflow starts
- CI: build new binary
- CI: spin up anvil forks
- anvil --fork-url $SEPOLIA_RPC --fork-block-number latest --port 8545
- anvil --fork-url $BASE_SEPOLIA_RPC --port 8546
- CI: anvil_setBalance for gateway + operator EOAs (free local gas)
- CI: start new binary's gateway + N operators against forked endpoints
- Same stagef config, only RPC URLs differ
- Real keys, real contract addresses, real registry state
- CI: run just e2e --scenario basic,full against the forks
- If passes → trigger actual stagef deploy
- If fails → block deploy, surface failures
Live stagef keeps running uninterrupted the whole time. No downtime, no nonce collision, no faucet plumbing.
What this catches that I care about
The v0.3.0 incident class is fully covered: the fork includes every legacy NewtonPolicy proxy on Sepolia exactly as it lives on
chain. When the new gateway calls getPolicyCodeHash() against a legacy proxy in the fork, the selector miss surfaces as the same 0x
revert it would in production. CI catches it before the real deploy fires.
Bonus: it lets you test the contract upgrade itself
If a PR includes a contract upgrade (impl swap), CI can run forge script Upgrade.s.sol --rpc-url http://localhost:8545 --broadcast
against the fork before the binary E2E. That tests the upgrade-then-Rust path end-to-end without ever touching real stagef. Catches
the third incident scenario (both upgraded, but external policy clients still on grace period) cleanly.
Edge cases worth naming
- External dependencies stay real: IPFS reads (policy fetch) hit real gateways, fine. Privacy / TLS notary tests would need real
or mock services — same as today. - Fork startup latency: Anvil pulls state lazily from upstream RPC. First few calls are slow. For consistent timing, pre-warm by
calling the contracts under test once at fork startup. - Multi-chain coordination: Each chain gets its own anvil instance + own port. CI needs to orchestrate N anvil processes. Existing
just e2e-dual-anvil-multichain already does this for testing — just point fork URLs at real testnets instead of fresh deploys. - State drift mid-test: Anvil fork advances locally as tests mine blocks. taskCreatedBlock references work the same way they do in
live testing. Just don't let tests assume a specific real block number — use the fork's current_block semantics.
What this doesn't catch
- Real RPC behavior under load (rate limits, timing, connection drops)
- Real operator-to-operator network behavior if anything depends on it (currently nothing does — operators talk only via gateway
broadcast) - Anything that depends on real-time wall clock that doesn't match block time
None of these are in the v0.3.0 incident class. Acceptable miss.
Updated phase plan
- Phase 1 (strictness flag + admin CLI + Rust reader) — unchanged
- Phase 2 (Rust dual-path + cache) — unchanged
- Phase 3 NEW (anvil-fork pre-deploy E2E gate):
- New CI workflow step: spin up anvil forks of stagef chains, run E2E, gate deploy
- Reuses existing just e2e infrastructure with --fork-url parameter
- Optional: pre-test contract upgrade if PR touches contracts
- Phase 4 (PR template + release checklist + lessons.md) — unchanged
- Phase 5 (telemetry) — unchanged
Two small confirmations before saving
- Scenarios for the gate: basic + full minimum (covers v0.3.0 path). Add routing and privacy from day one, or layer them in later
once stable? - Chains to fork: All stagef chains in parallel (Sepolia + Base Sepolia + others), or just Sepolia as the required gate with the
rest advisory?
My defaults: basic + full to start, all stagef chains in parallel (anvil forks are cheap, parallel is fast). Confirm or override
and I'll fold into saved memory and finish /save-progress.
- skip the per-chain full runs and rely on routing+direct alone, since it exercises both source and
destination attestation paths in one shot. 2. just fork and test on testnet 3. yes sounds good
⏺ Locked in. Here's the final Phase 3 design and the three small remaining items before save.
★ Insight ─────────────────────────────────────
- Adding routing-full as a Justfile scenario is the minimal codification — one line in the case statement turns implicit env-var
override into an explicit, discoverable gate command. That removes the "asymmetric defaults between e2e and e2e-gateway" footgun
entirely. - Forking only testnet chains (Sepolia + Base Sepolia) keeps anvil's upstream RPC dependency narrow — mainnet forks pull more state
and cost more in upstream RPC quota for negligible coverage gain since Newton's prod stack lives on testnets in stagef. - One scenario covering both source + destination attestation halves wall-clock at the cost of debug isolation — when it fails, the
developer reproduces locally with both forks running and bisects from there. The "two runs for clear isolation" is a debugging
convenience, not a correctness requirement.
─────────────────────────────────────────────────
Phase 3 — final shape
One new Justfile scenario:
In e2e recipe case statement, add:
routing-full) _attestation=${_attestation:-direct}; _privacy=${_privacy:-false}; _target_chains=${_target_chains:-31337,31338} ;;
One CI workflow change in the stagef deploy pipeline:
- PR merges to main
- CI: build new binary
- CI: spin up two anvil forks
- anvil --fork-url $SEPOLIA_RPC --port 8545 --chain-id 31337
- anvil --fork-url $BASE_SEPOLIA_RPC --port 8546 --chain-id 31338
- CI: anvil_setBalance for gateway + operator EOAs (free local gas)
- CI: just e2e --scenario routing-full --env stagef --block-time 1
- Pass → trigger actual stagef deploy
Fail → block deploy, surface E2E failures to PR author
Live stagef stays untouched the entire time. Real keys, real contract state, isolated execution.
Final phased plan (decisions baked in)
┌───────┬────────────────────────────────────────────────────────────────────────────────────────────────────────┬─────────────┐
│ Phase │ Deliverable │ Status │
├───────┼────────────────────────────────────────────────────────────────────────────────────────────────────────┼─────────────┤
│ 1 │ PolicyClientRegistry.strictnessFlags + setStrictnessFlag admin function + just admin set-strictness │ Decided │
│ │ CLI command + Rust is_strictness_flag_enabled reader │ │
├───────┼────────────────────────────────────────────────────────────────────────────────────────────────────────┼─────────────┤
│ 2 │ Rust dual-path wiring with StrictnessCache (5-min refresh) + first concrete dual-path on │ Decided │
│ │ getPolicyCodeHash │ (see Q2) │
├───────┼────────────────────────────────────────────────────────────────────────────────────────────────────────┼─────────────┤
│ 3 │ routing-full Justfile scenario + CI workflow step (anvil forks Sepolia + Base Sepolia, runs scenario, │ Decided │
│ │ gates stagef deploy) │ │
├───────┼────────────────────────────────────────────────────────────────────────────────────────────────────────┼─────────────┤
│ 4 │ PR template checkbox ("did you add a strictness flag for this breaking change?") + release checklist │ Decided │
│ │ update + lessons.md entry codifying the pattern │ │
├───────┼────────────────────────────────────────────────────────────────────────────────────────────────────────┼─────────────┤
│ 5 │ legacy_policy_detections_total{chain_id, feature} Prometheus counter + one critical alert │ Decided │
│ │ │ (see Q4) │
└───────┴────────────────────────────────────────────────────────────────────────────────────────────────────────┴─────────────┘
Three small confirmations remaining
These were posed before Phase 3 redesigned but still apply.
Q2 — Strictness cache RPC failure default:
The cached is_strict() check needs a default when the on-chain read fails (RPC down, network glitch). Two reasonable defaults:
- Permissive default (recommended): cache miss + RPC fail → return false (treat as non-strict, allow legacy proxy path). Argument:
prefer availability over strictness when telemetry is broken; a transient RPC blip shouldn't reject every task. - Strict default: cache miss + RPC fail → return true (reject anything that would have been legacy-tolerated). Argument:
fail-closed when uncertain.
My recommendation: permissive with a warn! log on every cache-miss-with-RPC-fail and a counter
(strictness_cache_rpc_failures_total) for alerting. Once strict mode is on, you want telemetry to flag the situation, but you don't
want a Sepolia RPC blip to brick task processing.
Q3 — setStrictnessFlag authorization:
- onlyOwner now (recommended): deployer EOA flips the flag via just admin. Simple, ships today.
- Safe multisig from day one: requires the Safe migration to land first (NEWT-882–894), which is on the roadmap but not done.
My recommendation: onlyOwner now, migrate to Safe via the NEWT-882–894 work later. The strictness flag is a "we're confident
migration is done" signal, not an emergency lever — single-signer is acceptable.
Q4 — Alerting threshold:
What's the page-someone alert? My recommendation: one critical day-1 rule.
(
policy_strictness_flag{feature="policy_code_hash"} == 1
) and (
rate(legacy_policy_detections_total{feature="policy_code_hash"}[5m]) > 0
)
Reads as: "strict mode is enabled AND legacy detections are still happening." That's the "we flipped the flag too early" alarm. Add
a non-paging warn-level rule for "legacy detections still happening even though we shipped the strict version" so we know to flip
the flag.
Confirm and save
Three quick confirmations:
- Q2: permissive default + warn log + counter? (Y/N)
- Q3: onlyOwner now, migrate to Safe later? (Y/N)
- Q4: single critical day-1 alert per the PromQL above? (Y/N)
Once you confirm all three (or override), I'll fold the full plan into protocol-version-compatibility.md, update MEMORY.md, save to
supermemory, save to recall, and finish /save-progress.
⏺ Got the authoritative answers. Stagef has deployments at 11155111 (Sepolia), 84532 (Base Sepolia), and 1 (Ethereum mainnet — likely
the EigenLayer source-chain reference, not a real deploy target — worth your confirmation).
★ Insight ─────────────────────────────────────
- The e2e recipe is a thin dispatcher around three orthogonal env-var knobs: VALIDATE_ATTESTATION, PRIVACY_ENABLED, TARGET_CHAINS.
The "scenarios" are just preset combinations of those three. - The v0.3.0 selector miss (getPolicyCodeHash) lives on the on-chain attestation validation path (validateAttestationDirect). Only
scenarios with validate_attestation=direct exercise that path — meaning basic, privacy, and routing would have all passed while
v0.3.0 was broken. - That's the load-bearing detail: only full and privacy-full would have caught the v0.3.0 incident class.
─────────────────────────────────────────────────
Scenario decomposition
From Justfile:456-462:
┌──────────────┬──────────────────────┬─────────┬───────────────┬─────────────────┐
│ Scenario │ validate_attestation │ privacy │ target_chains │ Catches v0.3.0? │
├──────────────┼──────────────────────┼─────────┼───────────────┼─────────────────┤
│ basic │ skip │ false │ 31337 │ No │
├──────────────┼──────────────────────┼─────────┼───────────────┼─────────────────┤
│ full │ direct │ false │ 31337 │ Yes │
├──────────────┼──────────────────────┼─────────┼───────────────┼─────────────────┤
│ privacy │ skip │ true │ 31337 │ No │
├──────────────┼──────────────────────┼─────────┼───────────────┼─────────────────┤
│ privacy-full │ direct │ true │ 31337 │ Yes │
├──────────────┼──────────────────────┼─────────┼───────────────┼─────────────────┤
│ routing │ skip │ false │ 31337,31338 │ No │
└──────────────┴──────────────────────┴─────────┴───────────────┴─────────────────┘
What each knob does:
- validate_attestation=direct: After BLS aggregation, gateway calls NewtonProverTaskManager.validateAttestationDirect() on-chain.
This is the path that reads policy.getPolicyCodeHash() against the live policy proxy. Selector miss = revert here. This is the
v0.3.0 path. - validate_attestation=skip: Skips on-chain validation entirely. Aggregation completes, but the on-chain selector compatibility
check never runs. - privacy=true: Enables HPKE encryption of policy params. Operators decrypt locally. Adds privacy-specific code paths but doesn't
touch the getPolicyCodeHash path. - target_chains=31337,31338: Tests dual-chain orchestration (per-chain ChainService, source/dest task flow). Different bug class
than selector miss.
Argument variations for each scenario, beyond the preset:
┌───────────────────────────┬─────────────────────────────────────────────────────────────┬──────────────┐
│ Arg │ Effect │ Default │
├───────────────────────────┼─────────────────────────────────────────────────────────────┼──────────────┤
│ --env │ Which newton_prover_config..json to read │ stagef │
├───────────────────────────┼─────────────────────────────────────────────────────────────┼──────────────┤
│ --block-time │ Anvil block time (0 = automine, 1 = 1s realistic) │ 0 │
├───────────────────────────┼─────────────────────────────────────────────────────────────┼──────────────┤
│ --force │ Force re-deploy contracts vs cached │ false │
├───────────────────────────┼─────────────────────────────────────────────────────────────┼──────────────┤
│ --keep-anvil │ Don't tear down anvil after test │ false │
├───────────────────────────┼─────────────────────────────────────────────────────────────┼──────────────┤
│ validate_attestation= env │ Override scenario default (direct or skip) │ per scenario │
├───────────────────────────┼─────────────────────────────────────────────────────────────┼──────────────┤
│ privacy= env │ Override scenario default │ per scenario │
├───────────────────────────┼─────────────────────────────────────────────────────────────┼──────────────┤
│ target_chains= env │ Override scenario default │ per scenario │
├───────────────────────────┼─────────────────────────────────────────────────────────────┼──────────────┤
│ SKIP_REGISTRATION env │ Skip operator registration step (assume already registered) │ per chain │
└───────────────────────────┴─────────────────────────────────────────────────────────────┴──────────────┘
For a forked-stagef gate, you'd want --block-time 1 (mimics real chain timing better than automine) and SKIP_REGISTRATION=true
(operators already registered in the forked state).
What gates the v0.3.0 class
Day-1 minimum to catch v0.3.0 specifically: full. That's the only scenario among the lightweight ones that exercises
validateAttestationDirect.
Recommendation for the deploy gate
Tier 1 (required, day 1):
- full — single chain, attestation validation. Catches v0.3.0 selector class.
- routing — dual-chain orchestration. Catches multichain regressions (different bug class but you ship multichain).
Tier 2 (add once stable on forked stagef):
- privacy-full — covers attestation + privacy paths. Heavier (HPKE, IPFS, possibly threshold) so I'd hold this for after the gate
is proven stable.
Skip from gate:
- basic, privacy (no attestation validation — won't catch the incident class you care about; redundant given full and privacy-full)
Chain coverage
Stagef has real deployments at:
- 11155111 — Sepolia (source)
- 84532 — Base Sepolia (destination)
- 1 — Ethereum mainnet (probably config-only, not a deploy target — please confirm)
For multichain routing, you need both Sepolia and Base Sepolia forked (source + destination). For single-chain full, Sepolia alone
is enough — but running it against Base Sepolia too would catch chain-specific issues (block-time offsets, EIP-4844 quirks).
My default: fork both Sepolia and Base Sepolia, run full against each (in parallel), run routing once across both. Total wall-clock
~10-15 min via parallel anvil instances. Mainnet 1 deployment skipped (config reference, not a real deploy target).
Two questions
- Tier 1 gate set: full + routing, or just full?
- Mainnet 1 deployment: is 1-stagef.json an actual deploy target Rust connects to, or just config reference for EigenLayer source
chain reads?
Once you answer these I'll fold it into the saved memory and finish /save-progress.
✻ Sautéed for 1m 59s
❯ does routing also check for validateAttestationDirect?
⏺ Good check — let me verify both whether routing exercises it by default and whether destination chains even have that path.
Searched for 2 patterns (ctrl+o to expand)
⏺ Important catch. Let me give you the precise answer.
★ Insight ─────────────────────────────────────
- routing defaults to validate_attestation=skip (per Justfile:460), so out-of-the-box it does NOT exercise
validateAttestationDirect. But the env-var override is wired through — validate_attestation=direct just e2e --scenario routing
works. - validateAttestationDirect exists on BOTH NewtonProverTaskManager (source) and NewtonProverDestTaskManager (destination). The
multichain destination has its own copy of the selector, which means the v0.3.0 class can recur on dest chains too if the
destination policy proxy is on a legacy impl. - The default in e2e-gateway is direct, but the default in e2e --scenario routing is skip — that asymmetry is easy to miss.
─────────────────────────────────────────────────
Direct answer
No, not by default. From Justfile:460:
routing) _attestation=${_attestation:-skip}; _privacy=${_privacy:-false}; _target_chains=${_target_chains:-31337,31338} ;;
So just e2e --scenario routing runs with attestation validation skipped. It tests cross-chain orchestration but not the
selector-miss path.
But the override is in place
The e2e recipe respects validate_attestation as an env-var override. And integration-tests/src/multichain_routing.rs:214 does honor
it:
if !matches!(config.validate_attestation, AttestationValidationMode::Skip) {
// validate attestation per config.validate_attestation mode
}
So validate_attestation=direct just e2e --scenario routing will run dual-chain orchestration AND on-chain attestation validation.
That's the combo we actually want for the gate.
Why it matters specifically for v0.3.0 class
Both task managers have validateAttestationDirect:
- crates/core/src/generated/newton_prover_task_manager.rs (source)
- crates/core/src/generated/newton_prover_dest_task_manager.rs (destination)
Both call into the policy proxy's selector during validation. If a destination-chain policy proxy is on a legacy impl, the v0.3.0
class recurs there too. Source-only full would miss that.
Updated recommendation
To cover the v0.3.0 class on every chain that runs the binary, the gate should be:
┌───────────────────────────────────────────────────────────────────────┬──────────────────────────────────────────────────────┐
│ Test command │ Catches │
├───────────────────────────────────────────────────────────────────────┼──────────────────────────────────────────────────────┤
│ just e2e --scenario full --env stagef (against Sepolia fork) │ v0.3.0 class on source chain │
├───────────────────────────────────────────────────────────────────────┼──────────────────────────────────────────────────────┤
│ just e2e --scenario full --env stagef (against Base Sepolia fork) │ v0.3.0 class on destination chain │
├───────────────────────────────────────────────────────────────────────┼──────────────────────────────────────────────────────┤
│ validate_attestation=direct just e2e --scenario routing --env stagef │ v0.3.0 class on dual-chain orchestration + │
│ (against both forks) │ cross-chain attestation flow │
└───────────────────────────────────────────────────────────────────────┴──────────────────────────────────────────────────────┘
Three runs total. With anvil forks running in parallel, total wall-clock dominated by the slowest scenario (~10 min).
Or, more parsimonious: skip the per-chain full runs and rely on routing+direct alone, since it exercises both source and
destination attestation paths in one shot. Trade-off: harder to localize failures (was it source attestation, dest attestation, or
routing logic that broke?) but cheaper.
Three confirmations to nail down before saving
- Gate set: three runs (full ×2 + routing+direct) with clearer failure isolation, OR one run (routing+direct) for parsimony?
- Mainnet 1 deployment: real deploy target Rust connects to, or just config reference for EigenLayer source-chain reads? (Affects
whether to fork mainnet too — heavier upstream RPC dependency.) - Should I propose adding a new combined scenario (e.g., routing-full → validate_attestation=direct, target_chains=31337,31338) to
the Justfile case statement? That removes the env-var-override gotcha and makes the gate command explicit. Small Justfile change.
Once you answer these, I'll fold into saved memory and finish /save-progress.
✻ Cogitated for 59s
❯ 1. skip the per-chain full runs and rely on routing+direct alone, since it exercises both source and
destination attestation paths in one shot. 2. just fork and test on testnet 3. yes sounds good
⏺ Locked in. Here's the final Phase 3 design and the three small remaining items before save.
★ Insight ─────────────────────────────────────
- Adding routing-full as a Justfile scenario is the minimal codification — one line in the case statement turns implicit env-var
override into an explicit, discoverable gate command. That removes the "asymmetric defaults between e2e and e2e-gateway" footgun
entirely. - Forking only testnet chains (Sepolia + Base Sepolia) keeps anvil's upstream RPC dependency narrow — mainnet forks pull more state
and cost more in upstream RPC quota for negligible coverage gain since Newton's prod stack lives on testnets in stagef. - One scenario covering both source + destination attestation halves wall-clock at the cost of debug isolation — when it fails, the
developer reproduces locally with both forks running and bisects from there. The "two runs for clear isolation" is a debugging
convenience, not a correctness requirement.
─────────────────────────────────────────────────
Phase 3 — final shape
One new Justfile scenario:
In e2e recipe case statement, add:
routing-full) _attestation=${_attestation:-direct}; _privacy=${_privacy:-false}; _target_chains=${_target_chains:-31337,31338} ;;
One CI workflow change in the stagef deploy pipeline:
- PR merges to main
- CI: build new binary
- CI: spin up two anvil forks
- anvil --fork-url $SEPOLIA_RPC --port 8545 --chain-id 31337
- anvil --fork-url $BASE_SEPOLIA_RPC --port 8546 --chain-id 31338
- CI: anvil_setBalance for gateway + operator EOAs (free local gas)
- CI: just e2e --scenario routing-full --env stagef --block-time 1
- Pass → trigger actual stagef deploy
Fail → block deploy, surface E2E failures to PR author
Live stagef stays untouched the entire time. Real keys, real contract state, isolated execution.
Final phased plan (decisions baked in)
┌───────┬────────────────────────────────────────────────────────────────────────────────────────────────────────┬─────────────┐
│ Phase │ Deliverable │ Status │
├───────┼────────────────────────────────────────────────────────────────────────────────────────────────────────┼─────────────┤
│ 1 │ PolicyClientRegistry.strictnessFlags + setStrictnessFlag admin function + just admin set-strictness │ Decided │
│ │ CLI command + Rust is_strictness_flag_enabled reader │ │
├───────┼────────────────────────────────────────────────────────────────────────────────────────────────────────┼─────────────┤
│ 2 │ Rust dual-path wiring with StrictnessCache (5-min refresh) + first concrete dual-path on │ Decided │
│ │ getPolicyCodeHash │ (see Q2) │
├───────┼────────────────────────────────────────────────────────────────────────────────────────────────────────┼─────────────┤
│ 3 │ routing-full Justfile scenario + CI workflow step (anvil forks Sepolia + Base Sepolia, runs scenario, │ Decided │
│ │ gates stagef deploy) │ │
├───────┼────────────────────────────────────────────────────────────────────────────────────────────────────────┼─────────────┤
│ 4 │ PR template checkbox ("did you add a strictness flag for this breaking change?") + release checklist │ Decided │
│ │ update + lessons.md entry codifying the pattern │ │
├───────┼────────────────────────────────────────────────────────────────────────────────────────────────────────┼─────────────┤
│ 5 │ legacy_policy_detections_total{chain_id, feature} Prometheus counter + one critical alert │ Decided │
│ │ │ (see Q4) │
└───────┴────────────────────────────────────────────────────────────────────────────────────────────────────────┴─────────────┘
Three small confirmations remaining
These were posed before Phase 3 redesigned but still apply.
Q2 — Strictness cache RPC failure default:
The cached is_strict() check needs a default when the on-chain read fails (RPC down, network glitch). Two reasonable defaults:
- Permissive default (recommended): cache miss + RPC fail → return false (treat as non-strict, allow legacy proxy path). Argument:
prefer availability over strictness when telemetry is broken; a transient RPC blip shouldn't reject every task. - Strict default: cache miss + RPC fail → return true (reject anything that would have been legacy-tolerated). Argument:
fail-closed when uncertain.
My recommendation: permissive with a warn! log on every cache-miss-with-RPC-fail and a counter
(strictness_cache_rpc_failures_total) for alerting. Once strict mode is on, you want telemetry to flag the situation, but you don't
want a Sepolia RPC blip to brick task processing.
Q3 — setStrictnessFlag authorization:
- onlyOwner now (recommended): deployer EOA flips the flag via just admin. Simple, ships today.
- Safe multisig from day one: requires the Safe migration to land first (NEWT-882–894), which is on the roadmap but not done.
My recommendation: onlyOwner now, migrate to Safe via the NEWT-882–894 work later. The strictness flag is a "we're confident
migration is done" signal, not an emergency lever — single-signer is acceptable.
Q4 — Alerting threshold:
What's the page-someone alert? My recommendation: one critical day-1 rule.
(
policy_strictness_flag{feature="policy_code_hash"} == 1
) and (
rate(legacy_policy_detections_total{feature="policy_code_hash"}[5m]) > 0
)
Reads as: "strict mode is enabled AND legacy detections are still happening." That's the "we flipped the flag too early" alarm. Add
a non-paging warn-level rule for "legacy detections still happening even though we shipped the strict version" so we know to flip
the flag.
Confirm and save
Three quick confirmations:
- Q2: permissive default + warn log + counter? (Y/N)
- Q3: onlyOwner now, migrate to Safe later? (Y/N)
- Q4: single critical day-1 alert per the PromQL above? (Y/N)
Once you confirm all three (or override), I'll fold the full plan into protocol-version-compatibility.md, update MEMORY.md, save to
supermemory, save to recall, and finish /save-progress.
✻ Churned for 57s
※ recap: Designing a strictness-flag plan to prevent contract/Rust version drift, with Phase 3 settled as an anvil-fork pre-deploy
E2E gate running routing-full against Sepolia and Base Sepolia. Next: confirm Q2 (permissive RPC default), Q3 (onlyOwner now), Q4
(single critical alert) so I can save progress. (disable recaps in /config)
❯ do we still need strictness check?
⏺ Sharp question — and I think the honest answer is: probably not, given your current state. The two mechanisms solve different
problems, and the one the strictness flag solves may not be a problem yet.
★ Insight ─────────────────────────────────────
- The pre-deploy E2E gate and the strictness flag answer different questions. E2E gate: "will Newton's binary work against the
contracts we're shipping to?" Strictness flag: "should new Rust tolerate or reject legacy external policy proxies that we don't
control?" - For Newton-controlled proxies (the only kind on stagef), the migration path is "re-deploy proxies before/during the binary
deploy" — the E2E gate enforces this naturally because forked stagef state with un-migrated Newton proxies will fail the gate,
blocking deploy. - The strictness flag is forward-looking infra for the day when external developers run their own policy clients in prod and you
need to give them a bounded grace period without breaking their tasks.
─────────────────────────────────────────────────
What each mechanism actually protects against
┌─────────────────────────────────────────────┬────────────────────────────────────────────┬───────────────────────────────────┐
│ Failure mode │ Caught by E2E gate? │ Caught by strictness flag? │
├─────────────────────────────────────────────┼────────────────────────────────────────────┼───────────────────────────────────┤
│ Newton ships Rust that breaks against │ Yes (E2E fails) │ N/A (flag only changes runtime │
│ Newton-deployed stagef proxies │ │ behavior) │
├─────────────────────────────────────────────┼────────────────────────────────────────────┼───────────────────────────────────┤
│ Newton ships Rust that breaks against │ Indirectly (stagef passes ⟹ prod likely │ N/A │
│ Newton-deployed prod proxies │ passes if both have same impl) │ │
├─────────────────────────────────────────────┼────────────────────────────────────────────┼───────────────────────────────────┤
│ External dev's legacy proxy hits new Rust │ No (no external proxies on stagef to test │ Yes (flag=false → log + tolerate; │
│ on prod │ against) │ flag=true → reject) │
├─────────────────────────────────────────────┼────────────────────────────────────────────┼───────────────────────────────────┤
│ Newton wants to force external migration │ N/A │ Yes (flip flag to true) │
│ after grace window │ │ │
└─────────────────────────────────────────────┴────────────────────────────────────────────┴───────────────────────────────────┘
The E2E gate handles every Newton-controlled scenario. The strictness flag only adds value for the external-policy-client grace
period.
The question that decides it
Do you currently have, or expect soon to have, external developers running their own NewtonPolicyClient extensions in production?
- No, not soon: Drop the strictness flag entirely. Rely on E2E gate. Hard-cutover semantics for any future breaking change — when
you do it, coordinate with whoever's running policy clients via Discord / direct comms. Add the flag later as a follow-up if
external clients land. - Yes, in flight or imminent: Keep the flag. The grace period mechanism is what lets external devs migrate without their tasks
failing the moment Newton deploys. Without it, every breaking change is an immediate hard break for them.
My recommendation
Drop Phases 1 and 2 for now. Keep Phases 3, 4, 5. Reasoning:
- Stagef has no external policy clients (it's internal). E2E gate fully covers stagef incident class.
- Prod likely has zero or near-zero external policy clients today (you're early-stage).
- For Newton-controlled proxies, the migration path is "re-deploy proxies via factory before/with the binary deploy" — already part
of the existing deploy workflow. - The strictness flag is theater for a problem that doesn't exist yet. Adding it now creates dead on-chain state and ~200 LOC of
Rust dual-path machinery to maintain. - When external clients become real, add the flag then with full context on what their actual upgrade cadence looks like (which
informs the grace period design).
What you give up by dropping it:
- Future-self pre-emptive infra for external client migrations
- The legacy_policy_detections_total metric (could still emit it as observability without the flag — just always-tolerate, log on
miss)
What you gain:
- Simpler shipping plan: just Phases 3 + 4 + 5
- No on-chain admin surface to secure
- No StrictnessCache to maintain
- No synchronization burden between contract flag flipping and Rust deployment
Slimmed-down plan if you agree
┌───────┬───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ Phase │ Deliverable │
├───────┼───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ 3 │ routing-full Justfile scenario + CI workflow step (anvil forks Sepolia + Base Sepolia, runs scenario, gates stagef │
│ │ deploy) │
├───────┼───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ 4 │ PR template checkbox ("does this PR add a new selector or break an existing one?") + release checklist + lessons.md │
│ │ entry codifying the E2E-gate pattern │
├───────┼───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ 5 │ legacy_policy_detections_total{chain_id, feature} Prometheus counter + warn-level alert (no critical alert needed │
│ │ without strict mode — just observability) │
└───────┴───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
Phase 1, 2, Q2, Q3, Q4 all become moot.
Decide
- A. Drop strictness flag (Phases 1+2). Ship Phases 3+4+5 only. Add flag later if external clients arrive.
- B. Keep strictness flag (full original plan, Phases 1-5). Future-proofs for external clients.
- C. Hybrid — ship contract change (Phase 1) but defer Rust dual-path (Phase 2). Cheap on-chain mechanism in place, no Rust
complexity yet.
My vote: A. Ship the minimum that solves the actual current problem (v0.3.0 incident class). Pick up the flag when external clients
become a concrete need, not a hypothetical.
Confirm A/B/C and I save.
✻ Cogitated for 1m 30s
❯ drop phase 4 - but replace with just lessons.md entry. pr template is overkill. phase 3 is a must, and phase 5 can be held off for
now as well
⏺ Locked in. Final scope: Phase 3 only + lessons.md entry. Loading the recall tools and checking for existing memory in parallel
before saving.
Listed 1 directory (ctrl+o to expand)
⏺ Now executing the save in parallel: write memory file, update MEMORY.md, push to supermemory, and search recall for dedup.
Wrote 2 memories (ctrl+o to expand)
⏺ Bash(node "$HOME/.claude/plugins/cache/supermemory-plugins/claude-supermemory/0.0.2/scripts/save-project-memory.cjs"
"[SAVE:dennis:2026-04-19]…)
⎿ Project knowledge saved: newton-prover-avs
ID: FqSZUTDETBNEz2F7Liykj4
⏺ Bash(test -d /Users/denniswon/projects/newton-prover-avs/.claude/rules && echo "rules dir exists")
⎿ rules dir exists
⎿ Context limit reached · /compact or /clear to continue
Churned for 1m 20s
❯ The goal of prevent developers in the team accidentally forget to make sure the Rust code to be compatible to both old vs new
contract versions. There are 3 scenarios: 1) contracts upgraded first with breaking change before Rust update 2) contracts with
breaking change not upgraded yet, but Rust deploys, 3) contracts upgraded and Rust update also deployed, but for Newton
Policy/Newton Policy Data/and Policy Clients (that extends NewtonPolicyClient mixin), there is a grace period for the developers
to migrate/update their policy clients with new newton policy or policy data. This means the Rust update that pairs with the
contracts upgrade with breaking change should support both versions during this grace period (until version enforcement happens).
1, 2 are for prevent protocol literally failing, while 3 is for external developer's policy clients to start failing suddenly
without having time to become compatible. /clarify-plan THINK HARD
⏺ I'll work through this as a design problem. The three scenarios sort into two fundamentally different problem classes, which
matters for what prevention mechanism fits each.
★ Insight ─────────────────────────────────────
CI.
Rust devs cannot forget to write.
system, CI, lint) rather than relying on reviewers remembering.
─────────────────────────────────────────────────
Problem framing
Mapping to enforcement mechanisms:
┌────────────────────────────────────────┬────────────────────┬─────────────┬──────────────────────────────────────────────────┐
│ Scenario │ Failure class │ Who gets │ Natural prevention │
│ │ │ hurt │ │
├────────────────────────────────────────┼────────────────────┼─────────────┼──────────────────────────────────────────────────┤
│ 1 — contracts ahead of Rust │ protocol-down │ us (newton) │ deploy-pipeline ordering + │
│ │ │ │ MIN_COMPATIBLE_VERSION │
├────────────────────────────────────────┼────────────────────┼─────────────┼──────────────────────────────────────────────────┤
│ 2 — Rust ahead of contracts │ protocol-down │ us (newton) │ deploy-pipeline ordering + pre-deploy selector │
│ │ │ │ probe │
├────────────────────────────────────────┼────────────────────┼─────────────┼──────────────────────────────────────────────────┤
│ 3 — policy proxies stuck on old impl │ client-policy-down │ external │ dual-version Rust code paths that can't be │
│ during grace period │ │ devs │ accidentally skipped │
└────────────────────────────────────────┴────────────────────┴─────────────┴──────────────────────────────────────────────────┘
The core hazard in #3: NewtonPolicy / NewtonPolicyData are initialize-once proxies. When we ship a new version, the impl on chain
is new but all existing proxies still delegate to the old bytecode. External devs have to redeploy their policies to opt in. Rust
must gracefully serve both until enforcement.
Clarifying questions (need answers before writing the plan)
Before I draft the actual implementation, I need to resolve ambiguity on these:
Q1 — Version detection. How does Rust know which version a given policy proxy is on?
trying to avoid)
v0.3.0+)
redeployed
Which direction do you want? (b) is my recommendation; (c) works as a stopgap without any contract work.
Q2 — Grace period trigger. Is the end of grace period:
(b) is more flexible but means Rust must query it. (a) is rigid but simpler.
Q3 — Scope of the compat layer. Is this only for NewtonPolicy and NewtonPolicyData selectors called at task time? Or also:
The mixin is the trickiest because it lives in external-dev bytecode — we can't upgrade it at all. External devs inherit the mixin
version at compile time of their client.
Q4 — Policy version enumeration. Do we want active tracking of "which policy_client is on which impl"?
Q5 — Deprecation communication channel. When we detect a legacy policy client at task time, do we:
My instinct is all four at different cadences, but which are in scope?
Q6 — Existing tolerance patterns. The v0.3.0 incident surfaced getPolicyCodeHash as an example. Before we design the general
system, can you confirm:
Q7 — Release calendar visibility. How do we want grace periods to be discoverable across the team?
it
Draft plan (pending answers above)
Regardless of how Q1–Q7 resolve, the plan will have four tiers. The choices above determine how each tier is implemented.
Tier 1 — Atomic rollout (scenarios 1 + 2, protocol self-compat).
state. Fails before deploy if Rust can't talk to on-chain contracts.
(contracts first, bake time, then Rust) in runbook.
Tier 2 — Typed dual-version abstraction (scenario 3, policy proxy grace period).
Introduce crates/core/src/compat/ module with:
pub struct BackwardCompat {
new_value: Option, // None = legacy proxy
legacy_behavior: T, // what we did pre-v0.X
grace_end: chrono::NaiveDate,
rollout_ticket: &'static str, // Linear ID for visibility
}
impl<T: Clone> BackwardCompat {
pub fn resolve(&self, now: chrono::NaiveDate) -> Result<T, CompatError> {
match (&self.new_value, now >= self.grace_end) {
(Some(v), _) => Ok(v.clone()),
(None, false) => {
metrics::inc_legacy_policy_detection(self.rollout_ticket);
Ok(self.legacy_behavior.clone())
}
(None, true) => Err(CompatError::GraceExpired { .. }),
}
}
}
Every new required contract selector MUST go through this type. Enforced via:
compat:: module.
Tier 3 — CI enforcement of the compatibility lifecycle.
the dev to either:
it. Ships as a required CI job on PRs that touch contracts or compat/.
alerts when >0 clients are on impl version nearing grace expiry.
Tier 4 — External-dev communication.
metric legacy_policy_detections_total.
requiring log access.
Tradeoffs to flag
clippy/ast_grep check) is lighter but requires keeping the selector list curated. I'd pick type-driven for clarity.
ecosystem isn't migrating fast enough). A hybrid — Rust has default static date, admin flag can extend only — captures both safely.
tests (against live stagef) are realistic but slow + flaky. Do both: fixtures on every PR, on-fork nightly.
grows unbounded. The grace_end CI check is what prevents this — but only if we don't just keep bumping the date.
What I need from you to write the implementation plan
Answering Q1–Q3 is minimum viable. Q4–Q7 shape Tiers 3+4 but don't block starting Tiers 1+2. Preferred path:
for Tier 3 CI check.
★ Insight ─────────────────────────────────────
reviewer's job and fails silently. With BackwardCompat as a required wrapper, "tolerance was added" becomes a compile-time fact
and "tolerance is still needed" becomes a CI-enforced fact. Reviewer discipline goes from required to optional — which is the only
way discipline actually survives.
because we can't upgrade it at all. Worth thinking separately about whether future NewtonPolicyClient changes should use
interface-based dispatch rather than mixin inheritance, so external devs don't have to recompile their contracts to get updates.
That's an architectural shift, not a compat layer — flag for separate discussion.
─────────────────────────────────────────────────
Which of Q1–Q7 should I pursue first?
✻ Baked for 2m 56s
※ recap: Goal: prevent Rust/contract version drift, especially during the grace period for external policy clients. Current task:
answered Q1-Q7 clarifying questions on detection mechanism, grace trigger, and scope. Next: you pick which questions to answer
first.
❯ q1: - (d) PolicyClientRegistry stores declared version at registration time — clean but needs registry writes on every upgrade -->
what do you mean by writes on every upgrade? q2: (b) An admin-flipped on-chain flag (e.g., MIN_POLICY_VERSION on
PolicyClientRegistry) --> we already have
just admincommand. Overall,Let's start simple, and let's focus on "preventingincidents" (1. new first deployed protocol going down when contracts are old or old policy clients/policies send requests 2. not
yet deployed protocol going down when contracts are upgraded first). Also, graceful period is ONLY in regards to Newton Policy and
Newton Policy Data, and We will make sure to do hard cut over every each breaking change, so that we never have to support more
than 2 diferent versions)
⎿ UserPromptSubmit hook error ⎿ Failed to run: Plugin directory does not exist: /Users/dennis
won/.claude/plugins/cache/omc/oh-my-claudecode/3.10.3
(oh-my-claudecode@omc — run /plugin to reinstall)
⎿ UserPromptSubmit hook error ⎿ Failed to run: Plugin directory does not exist: /Users/dennis
won/.claude/plugins/cache/omc/oh-my-claudecode/3.10.3
(oh-my-claudecode@omc — run /plugin to reinstall)
⏺ ★ Insight ─────────────────────────────────────
BackwardCompat machinery entirely. A simple if legacy { old_path } else { new_path } per breaking change is sufficient, because
the else branch gets deleted in the next release.
migration — the grace period becomes a runtime config, not a compile-time config. This means we never need date-based CI
enforcement.
proxies. With flag-gated dual paths, the second half is opt-in — legacy proxies stay working until admin explicitly flips.
─────────────────────────────────────────────────
Clarifying Q1 — "writes on every upgrade"
I meant: option (d) required PolicyClientRegistry to store per-proxy version metadata, which means a write transaction at every
policy deploy (factory calls registry.recordPolicyImplVersion(proxy, version)). For initialize-once proxies this is once per proxy
lifetime, so "every upgrade" was a misleading phrase — really it's "every new policy deployment pays one extra write." The
coordination cost is that external devs who deploy policies directly (bypassing our factory) would have to remember to register,
which is error-prone. I'm dropping option (d) in favor of (c) bytecode classification — zero contract writes, purely off-chain
detection.
Confirming your simplifications
PolicyClientRegistry, etc.) gets lockstep deployment — no dual-version support ever.
This removes 60% of the machinery I was sketching. Let me redraft as a simpler plan.
Plan (simplified)
Part A — Scenarios 1 + 2 (protocol-down prevention, lockstep contracts ↔ Rust)
A1. Pre-deploy selector smoke test.
New just pre-deploy-check recipe runs before contract/Rust deploys. Walks critical call sites
(gateway→TaskManager, operator→NewtonPolicy, challenger→ChallengeVerifier) and does eth_call against current on-chain state. Any
"0x" revert = selector miss = block deploy with actionable error ("v0.3.0 Rust calls getPolicyCodeHash but current on-chain impl
doesn't expose it; deploy contracts first").
A2. Release pipeline ordering.
Codify in .github/workflows/deploy-trigger.yml: contracts deploy first, 10-minute bake, then Rust binaries. Existing pipeline
mostly does this; formalize the gate.
A3. Additive-only ABI rule for non-policy contracts.
PR template checkbox: "Does this PR remove or change semantics of an existing selector on
TaskManager/ChallengeVerifier/PolicyClientRegistry/EpochRegistry?" If yes, requires synchronized Rust release plan in PR
description. CI can't automate this — human review + CODEOWNERS.
Part B — Scenario 3 (NewtonPolicy/NewtonPolicyData grace period, flag-gated)
B1. New per-feature strictness flags on PolicyClientRegistry.
For each breaking change to NewtonPolicy or NewtonPolicyData, add a named boolean to PolicyClientRegistry:
// Example for the v0.3.0 change
bool public strictPolicyCodeHash; // default false
function setStrictPolicyCodeHash(bool value) external onlyOwner;
Name the flag after the feature, not the version — easier to grep, easier to remove later. Admin flips via just admin --command
set-strict-policy-code-hash --env stagef true.
B2. Rust dual-path per feature flag.
In crates/data-provider/src/lib.rs and wherever the affected selector is called:
let strict = policy_client_registry.strict_policy_code_hash().await?;
match fetch_policy_code_hash(policy).await {
Ok(hash) => verify_hash_binding(hash, policy_bytes)?,
Err(SelectorMiss) if !strict => {
warn!(policy = %policy, "legacy policy proxy, skipping code hash binding");
metrics::inc_legacy_policy_detection("policy_code_hash");
}
Err(SelectorMiss) => return Err(LegacyPolicyNotSupported { policy }),
Err(e) => return Err(e),
}
The strictness flag is read once per RPC server startup (cached), because external policy migration is slow enough that
refresh-per-task is waste.
B3. Version detection.
For B2, how does Rust know "this is a legacy proxy"? Simplest answer: try the new selector, catch "0x" revert. No ERC-165, no
bytecode classification, no registry lookup. If the selector works, it's new; if it misses, it's legacy. The strictness flag
controls what we do with that answer. This is essentially what we already do in classify_batch_item_revert for other selector-miss
cases.
B4. Hard-cutover release discipline.
Release N+1 (e.g., v0.4.0) codebase:
Enforced by: release checklist item "Remove compat branches for v(N-1) features" + grep for strict* flags that should be gone. CI
can lint for known-removed flag names.
Part C — Telemetry for operational awareness
selectors, reports count of legacy clients
Scope comparison: what I'm dropping vs keeping
┌──────────────────────────────────────────────────┬───────────────────────────────────────────┐
│ Dropped (too complex for your goals) │ Kept │
├──────────────────────────────────────────────────┼───────────────────────────────────────────┤
│ BackwardCompat generic wrapper type │ Simple per-feature boolean flags │
├──────────────────────────────────────────────────┼───────────────────────────────────────────┤
│ Custom clippy lint for new selectors │ PR template checkbox │
├──────────────────────────────────────────────────┼───────────────────────────────────────────┤
│ Grace period date baked into Rust │ Admin-flipped on-chain flag (Q2: b) │
├──────────────────────────────────────────────────┼───────────────────────────────────────────┤
│ just check-grace-periods CI enforcement │ Release checklist + grep for stale flags │
├──────────────────────────────────────────────────┼───────────────────────────────────────────┤
│ Multi-version compat matrix │ Exactly 2 versions ever (your commitment) │
├──────────────────────────────────────────────────┼───────────────────────────────────────────┤
│ Fixture-based legacy test suite │ Pre-deploy smoke test against live state │
├──────────────────────────────────────────────────┼───────────────────────────────────────────┤
│ Version detection via ERC-165 / bytecode hashing │ Try-the-selector-catch-the-miss │
└──────────────────────────────────────────────────┴───────────────────────────────────────────┘
Decision points to confirm before I write the implementation plan
CompatibilityRegistry contract. I vote PolicyClientRegistry — fewer contracts, admin already uses it.
(strictV0_3_0). I vote per-feature — independently flippable, aligns with how each feature migrates at its own pace.
legacy clients still exist). I vote false — matches hard-cutover discipline without requiring perfect ops timing.
selector drift in production becomes a deploy block; advisory means humans decide. I vote hard gate for stagef/prod, advisory for
local.
(compiled into external-dev contracts)? Or is the mixin strictly hard-cutover with no compat? I vote hard cutover — mixin lives in
external-dev bytecode, we can't upgrade it, and grace-period support for three stakeholders (us, policy deployer, client deployer)
is a lot. External devs recompile with new mixin version before the strict flag flips.
★ Insight ─────────────────────────────────────
for your team size. The first version requires discipline from every reviewer; the second only requires discipline from the release
manager flipping flags.
becomes a sequence of independent flag flips rather than one big-bang enforcement. Each feature's migration timeline decouples from
the others — e.g., strictPolicyCodeHash might flip 30 days after v0.3.0 ships, while strictEpochRegistration flips 90 days later
because it affects more clients.
pattern for any future protocol evolution affecting external-deployed contracts.
─────────────────────────────────────────────────
Answer Decisions 1–5 and I'll write the concrete implementation plan (contract changes, Rust changes, Justfile recipes, release
checklist updates).