Created
April 17, 2026 05:17
-
-
Save denniswon/4ad3289b280121723668ea7f908eb855 to your computer and use it in GitHub Desktop.
Protocol Version Compatibility enforcement test
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| The goal of prevent developers in the team accidentally forget to make sure the Rust code to be compatible to both old vs new | |
| contract versions. There are 3 scenarios: 1) contracts upgraded first with breaking change before Rust update 2) contracts with | |
| breaking change not upgraded yet, but Rust deploys, 3) contracts upgraded and Rust update also deployed, but for Newton | |
| Policy/Newton Policy Data/and Policy Clients (that extends NewtonPolicyClient mixin), there is a grace period for the developers | |
| to migrate/update their policy clients with new newton policy or policy data. This means the Rust update that pairs with the | |
| contracts upgrade with breaking change should support both versions during this grace period (until version enforcement happens). | |
| 1, 2 are for prevent protocol literally failing, while 3 is for external developer's policy clients to start failing suddenly | |
| without having time to become compatible. /clarify-plan THINK HARD | |
| βΊ I'll work through this as a design problem. The three scenarios sort into two fundamentally different problem classes, which | |
| matters for what prevention mechanism fits each. | |
| β Insight βββββββββββββββββββββββββββββββββββββ | |
| - Scenarios 1+2 are atomic rollout problems (the AVS itself breaks during deploy transitions). Fix is deploy ordering + pre-deploy | |
| CI. | |
| - Scenario 3 is a long-tail multi-version support problem (external-dev UX, not protocol integrity). Fix is dual-path code that | |
| Rust devs cannot forget to write. | |
| - The hard part of #3 isn't "add tolerance code" β we know how to do that. It's making forgetting it structurally impossible (type | |
| system, CI, lint) rather than relying on reviewers remembering. | |
| βββββββββββββββββββββββββββββββββββββββββββββββββ | |
| Problem framing | |
| Mapping to enforcement mechanisms: | |
| ββββββββββββββββββββββββββββββββββββββββββ¬βββββββββββββββββββββ¬ββββββββββββββ¬βββββββββββββββββββββββββββββββββββββββββββββββββββ | |
| β Scenario β Failure class β Who gets β Natural prevention β | |
| β β β hurt β β | |
| ββββββββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββββΌββββββββββββββΌβββββββββββββββββββββββββββββββββββββββββββββββββββ€ | |
| β 1 β contracts ahead of Rust β protocol-down β us (newton) β deploy-pipeline ordering + β | |
| β β β β MIN_COMPATIBLE_VERSION β | |
| ββββββββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββββΌββββββββββββββΌβββββββββββββββββββββββββββββββββββββββββββββββββββ€ | |
| β 2 β Rust ahead of contracts β protocol-down β us (newton) β deploy-pipeline ordering + pre-deploy selector β | |
| β β β β probe β | |
| ββββββββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββββΌββββββββββββββΌβββββββββββββββββββββββββββββββββββββββββββββββββββ€ | |
| β 3 β policy proxies stuck on old impl β client-policy-down β external β dual-version Rust code paths that can't be β | |
| β during grace period β β devs β accidentally skipped β | |
| ββββββββββββββββββββββββββββββββββββββββββ΄βββββββββββββββββββββ΄ββββββββββββββ΄βββββββββββββββββββββββββββββββββββββββββββββββββββ | |
| The core hazard in #3: NewtonPolicy / NewtonPolicyData are initialize-once proxies. When we ship a new version, the impl on chain | |
| is new but all existing proxies still delegate to the old bytecode. External devs have to redeploy their policies to opt in. Rust | |
| must gracefully serve both until enforcement. | |
| Clarifying questions (need answers before writing the plan) | |
| Before I draft the actual implementation, I need to resolve ambiguity on these: | |
| Q1 β Version detection. How does Rust know which version a given policy proxy is on? | |
| - (a) NewtonPolicy.PROTOCOL_VERSION() selector (does this exist on old impls? if not, asking it causes the same selector-miss we're | |
| trying to avoid) | |
| - (b) ERC-165 supportsInterface(bytes4) probe per new selector (robust but requires adding supportsInterface to NewtonPolicy in | |
| v0.3.0+) | |
| - (c) Bytecode classification (cast code + known hash per impl) β works today with zero contract changes but brittle if impl is | |
| redeployed | |
| - (d) PolicyClientRegistry stores declared version at registration time β clean but needs registry writes on every upgrade | |
| Which direction do you want? (b) is my recommendation; (c) works as a stopgap without any contract work. | |
| Q2 β Grace period trigger. Is the end of grace period: | |
| - (a) A fixed date baked into Rust at release time (grace_period_end: 2026-07-15) | |
| - (b) An admin-flipped on-chain flag (e.g., MIN_POLICY_VERSION on PolicyClientRegistry) | |
| - (c) Both β Rust has default, admin can extend/shorten | |
| (b) is more flexible but means Rust must query it. (a) is rigid but simpler. | |
| Q3 β Scope of the compat layer. Is this only for NewtonPolicy and NewtonPolicyData selectors called at task time? Or also: | |
| - Policy deployment path (NewtonPolicyFactory.deployPolicy signature changes)? | |
| - Events emitted by old vs new NewtonPolicy (events don't have selector-miss semantics but ABI changes matter)? | |
| - NewtonPolicyClient mixin internals (which external devs compile into their own contracts)? | |
| The mixin is the trickiest because it lives in external-dev bytecode β we can't upgrade it at all. External devs inherit the mixin | |
| version at compile time of their client. | |
| Q4 β Policy version enumeration. Do we want active tracking of "which policy_client is on which impl"? | |
| - Useful for: alerting external devs before grace expires, dashboards | |
| - Requires: crawler that walks PolicyClientRegistry + classifies each proxy | |
| - If yes, where does it live β gateway, a new service, CI job that publishes to a dashboard? | |
| Q5 β Deprecation communication channel. When we detect a legacy policy client at task time, do we: | |
| - Emit a structured log line only (devs may never see it) | |
| - Emit a Prometheus metric legacy_policy_detections_total{policy_client, version, days_until_grace_end} (ops can alert) | |
| - Return a deprecation warning header in the RPC response (external devs see it) | |
| - Auto-create a Linear ticket per detected client | |
| My instinct is all four at different cadences, but which are in scope? | |
| Q6 β Existing tolerance patterns. The v0.3.0 incident surfaced getPolicyCodeHash as an example. Before we design the general | |
| system, can you confirm: | |
| - Did we ship tolerance for that in Rust in a follow-up PR, or was the fix "redeploy all policies"? | |
| - If tolerance exists, where? (It will be the seed pattern for the general BackwardCompat abstraction.) | |
| Q7 β Release calendar visibility. How do we want grace periods to be discoverable across the team? | |
| - (a) .claude/rules/lessons.md entry per release | |
| - (b) docs/COMPATIBILITY_MATRIX.md β single source of truth listing every BackwardCompat, when it expires, which version introduced | |
| it | |
| - (c) Linear project "Protocol Compatibility" with one ticket per open compat window | |
| Draft plan (pending answers above) | |
| Regardless of how Q1βQ7 resolve, the plan will have four tiers. The choices above determine how each tier is implemented. | |
| Tier 1 β Atomic rollout (scenarios 1 + 2, protocol self-compat). | |
| - Per-release: pre-deploy CI smoke test exercises gateway β operator β TaskManager on a fork against current production contract | |
| state. Fails before deploy if Rust can't talk to on-chain contracts. | |
| - Per-contract-PR: require same-PR Rust change OR explicit "no Rust impact" label. Enforced via GitHub required-label check. | |
| - Deploy pipeline: ordering contract β deploy-trigger.yml already does contracts-first on stagef; formalize prod ordering | |
| (contracts first, bake time, then Rust) in runbook. | |
| - MIN_COMPATIBLE_VERSION already enforces the semver floor on-chain; keep it as the last line of defense. | |
| Tier 2 β Typed dual-version abstraction (scenario 3, policy proxy grace period). | |
| Introduce crates/core/src/compat/ module with: | |
| pub struct BackwardCompat<T> { | |
| new_value: Option<T>, // None = legacy proxy | |
| legacy_behavior: T, // what we did pre-v0.X | |
| grace_end: chrono::NaiveDate, | |
| rollout_ticket: &'static str, // Linear ID for visibility | |
| } | |
| impl<T: Clone> BackwardCompat<T> { | |
| pub fn resolve(&self, now: chrono::NaiveDate) -> Result<T, CompatError> { | |
| match (&self.new_value, now >= self.grace_end) { | |
| (Some(v), _) => Ok(v.clone()), | |
| (None, false) => { | |
| metrics::inc_legacy_policy_detection(self.rollout_ticket); | |
| Ok(self.legacy_behavior.clone()) | |
| } | |
| (None, true) => Err(CompatError::GraceExpired { .. }), | |
| } | |
| } | |
| } | |
| Every new required contract selector MUST go through this type. Enforced via: | |
| - Custom clippy lint (or dylint/ast_grep-based CI check) that flags direct calls to a registered set of "new selectors" outside of | |
| compat:: module. | |
| - Codeowner rule: changes to crates/core/src/compat/ require Dennis + one more reviewer. | |
| Tier 3 β CI enforcement of the compatibility lifecycle. | |
| - just check-grace-periods: walks all BackwardCompat instances (via a registry macro), fails CI if any grace_end < today. Forces | |
| the dev to either: | |
| - Remove the compat wrapper (use the new path directly) and prove all proxies migrated. | |
| - Extend the grace_end with justification. | |
| - just test-legacy-compat: fixture test that deploys a snapshot of the previous policy impl and exercises Rust code paths against | |
| it. Ships as a required CI job on PRs that touch contracts or compat/. | |
| - Scheduled CI job (weekly): queries PolicyClientRegistry on each network, classifies each proxy by impl, publishes dashboard + | |
| alerts when >0 clients are on impl version nearing grace expiry. | |
| Tier 4 β External-dev communication. | |
| - Runtime: operator logs legacy_policy_detected at warn! with policy_client address + grace_end. Gateway also emits Prometheus | |
| metric legacy_policy_detections_total. | |
| - RPC layer: gateway adds x-newton-policy-version header on task responses when legacy detected β surfaces to integrations without | |
| requiring log access. | |
| - Ops runbook: when dashboard shows legacy clients within 30 days of grace, ops emails external devs + files Linear tickets. | |
| Tradeoffs to flag | |
| - Type-driven enforcement vs lint-driven. Type-driven (wrap in BackwardCompat) is iron-clad but adds ceremony. Lint-driven (custom | |
| clippy/ast_grep check) is lighter but requires keeping the selector list curated. I'd pick type-driven for clarity. | |
| - Static grace dates vs admin-flipped on-chain flag. Static is simpler; admin-flipped is flexible (extend grace if external dev | |
| ecosystem isn't migrating fast enough). A hybrid β Rust has default static date, admin flag can extend only β captures both safely. | |
| - Fixture-based legacy tests vs on-fork tests. Fixtures (snapshot old bytecode) are fast + deterministic but grow stale. On-fork | |
| tests (against live stagef) are realistic but slow + flaky. Do both: fixtures on every PR, on-fork nightly. | |
| - Cost of supporting old impls forever. Every BackwardCompat is technical debt. If we don't actively retire them, the compat layer | |
| grows unbounded. The grace_end CI check is what prevents this β but only if we don't just keep bumping the date. | |
| What I need from you to write the implementation plan | |
| Answering Q1βQ3 is minimum viable. Q4βQ7 shape Tiers 3+4 but don't block starting Tiers 1+2. Preferred path: | |
| 1. You answer Q1βQ3 (version detection mechanism, grace trigger, scope). | |
| 2. I write a focused implementation plan for Tiers 1 and 2 (protocol self-compat + typed dual-version abstraction) + scaffolding | |
| for Tier 3 CI check. | |
| 3. Tiers 3 and 4 details get fleshed out based on Q4βQ7 answers in a follow-up. | |
| β Insight βββββββββββββββββββββββββββββββββββββ | |
| - The real systemic win from this work is removing the judgement call from developers. Today, "do I need tolerance here?" is a | |
| reviewer's job and fails silently. With BackwardCompat<T> as a required wrapper, "tolerance was added" becomes a compile-time fact | |
| and "tolerance is still needed" becomes a CI-enforced fact. Reviewer discipline goes from required to optional β which is the only | |
| way discipline actually survives. | |
| - The mixin case (NewtonPolicyClient that external devs compile into their own contracts) is the worst version of this problem | |
| because we can't upgrade it at all. Worth thinking separately about whether future NewtonPolicyClient changes should use | |
| interface-based dispatch rather than mixin inheritance, so external devs don't have to recompile their contracts to get updates. | |
| That's an architectural shift, not a compat layer β flag for separate discussion. | |
| βββββββββββββββββββββββββββββββββββββββββββββββββ | |
| Which of Q1βQ7 should I pursue first? |
Author
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
βΊ Got the authoritative answers. Stagef has deployments at 11155111 (Sepolia), 84532 (Base Sepolia), and 1 (Ethereum mainnet β likely
the EigenLayer source-chain reference, not a real deploy target β worth your confirmation).
β Insight βββββββββββββββββββββββββββββββββββββ
The "scenarios" are just preset combinations of those three.
scenarios with validate_attestation=direct exercise that path β meaning basic, privacy, and routing would have all passed while
v0.3.0 was broken.
βββββββββββββββββββββββββββββββββββββββββββββββββ
Scenario decomposition
From Justfile:456-462:
ββββββββββββββββ¬βββββββββββββββββββββββ¬ββββββββββ¬ββββββββββββββββ¬ββββββββββββββββββ
β Scenario β validate_attestation β privacy β target_chains β Catches v0.3.0? β
ββββββββββββββββΌβββββββββββββββββββββββΌββββββββββΌββββββββββββββββΌββββββββββββββββββ€
β basic β skip β false β 31337 β No β
ββββββββββββββββΌβββββββββββββββββββββββΌββββββββββΌββββββββββββββββΌββββββββββββββββββ€
β full β direct β false β 31337 β Yes β
ββββββββββββββββΌβββββββββββββββββββββββΌββββββββββΌββββββββββββββββΌββββββββββββββββββ€
β privacy β skip β true β 31337 β No β
ββββββββββββββββΌβββββββββββββββββββββββΌββββββββββΌββββββββββββββββΌββββββββββββββββββ€
β privacy-full β direct β true β 31337 β Yes β
ββββββββββββββββΌβββββββββββββββββββββββΌββββββββββΌββββββββββββββββΌββββββββββββββββββ€
β routing β skip β false β 31337,31338 β No β
ββββββββββββββββ΄βββββββββββββββββββββββ΄ββββββββββ΄ββββββββββββββββ΄ββββββββββββββββββ
What each knob does:
This is the path that reads policy.getPolicyCodeHash() against the live policy proxy. Selector miss = revert here. This is the
v0.3.0 path.
check never runs.
touch the getPolicyCodeHash path.
than selector miss.
Argument variations for each scenario, beyond the preset:
βββββββββββββββββββββββββββββ¬ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ¬βββββββββββββββ
β Arg β Effect β Default β
βββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββΌβββββββββββββββ€
β --env β Which newton_prover_config..json to read β stagef β
βββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββΌβββββββββββββββ€
β --block-time β Anvil block time (0 = automine, 1 = 1s realistic) β 0 β
βββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββΌβββββββββββββββ€
β --force β Force re-deploy contracts vs cached β false β
βββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββΌβββββββββββββββ€
β --keep-anvil β Don't tear down anvil after test β false β
βββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββΌβββββββββββββββ€
β validate_attestation= env β Override scenario default (direct or skip) β per scenario β
βββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββΌβββββββββββββββ€
β privacy= env β Override scenario default β per scenario β
βββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββΌβββββββββββββββ€
β target_chains= env β Override scenario default β per scenario β
βββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββΌβββββββββββββββ€
β SKIP_REGISTRATION env β Skip operator registration step (assume already registered) β per chain β
βββββββββββββββββββββββββββββ΄ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ΄βββββββββββββββ
For a forked-stagef gate, you'd want --block-time 1 (mimics real chain timing better than automine) and SKIP_REGISTRATION=true
(operators already registered in the forked state).
What gates the v0.3.0 class
Day-1 minimum to catch v0.3.0 specifically: full. That's the only scenario among the lightweight ones that exercises
validateAttestationDirect.
Recommendation for the deploy gate
Tier 1 (required, day 1):
Tier 2 (add once stable on forked stagef):
is proven stable.
Skip from gate:
Chain coverage
Stagef has real deployments at:
For multichain routing, you need both Sepolia and Base Sepolia forked (source + destination). For single-chain full, Sepolia alone
is enough β but running it against Base Sepolia too would catch chain-specific issues (block-time offsets, EIP-4844 quirks).
My default: fork both Sepolia and Base Sepolia, run full against each (in parallel), run routing once across both. Total wall-clock
~10-15 min via parallel anvil instances. Mainnet 1 deployment skipped (config reference, not a real deploy target).
Two questions
chain reads?
Once you answer these I'll fold it into the saved memory and finish /save-progress.
β» SautΓ©ed for 1m 59s
β― does routing also check for validateAttestationDirect?
βΊ Good check β let me verify both whether routing exercises it by default and whether destination chains even have that path.
Searched for 2 patterns (ctrl+o to expand)
βΊ Important catch. Let me give you the precise answer.
β Insight βββββββββββββββββββββββββββββββββββββ
validateAttestationDirect. But the env-var override is wired through β validate_attestation=direct just e2e --scenario routing
works.
multichain destination has its own copy of the selector, which means the v0.3.0 class can recur on dest chains too if the
destination policy proxy is on a legacy impl.
βββββββββββββββββββββββββββββββββββββββββββββββββ
Direct answer
No, not by default. From Justfile:460:
routing) _attestation=${_attestation:-skip}; _privacy=${_privacy:-false}; _target_chains=${_target_chains:-31337,31338} ;;
So just e2e --scenario routing runs with attestation validation skipped. It tests cross-chain orchestration but not the
selector-miss path.
But the override is in place
The e2e recipe respects validate_attestation as an env-var override. And integration-tests/src/multichain_routing.rs:214 does honor
it:
if !matches!(config.validate_attestation, AttestationValidationMode::Skip) {
// validate attestation per config.validate_attestation mode
}
So validate_attestation=direct just e2e --scenario routing will run dual-chain orchestration AND on-chain attestation validation.
That's the combo we actually want for the gate.
Why it matters specifically for v0.3.0 class
Both task managers have validateAttestationDirect:
Both call into the policy proxy's selector during validation. If a destination-chain policy proxy is on a legacy impl, the v0.3.0
class recurs there too. Source-only full would miss that.
Updated recommendation
To cover the v0.3.0 class on every chain that runs the binary, the gate should be:
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ¬βββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Test command β Catches β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β just e2e --scenario full --env stagef (against Sepolia fork) β v0.3.0 class on source chain β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β just e2e --scenario full --env stagef (against Base Sepolia fork) β v0.3.0 class on destination chain β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β validate_attestation=direct just e2e --scenario routing --env stagef β v0.3.0 class on dual-chain orchestration + β
β (against both forks) β cross-chain attestation flow β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ΄βββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Three runs total. With anvil forks running in parallel, total wall-clock dominated by the slowest scenario (~10 min).
Or, more parsimonious: skip the per-chain full runs and rely on routing+direct alone, since it exercises both source and
destination attestation paths in one shot. Trade-off: harder to localize failures (was it source attestation, dest attestation, or
routing logic that broke?) but cheaper.
Three confirmations to nail down before saving
whether to fork mainnet too β heavier upstream RPC dependency.)
the Justfile case statement? That removes the env-var-override gotcha and makes the gate command explicit. Small Justfile change.
Once you answer these, I'll fold into saved memory and finish /save-progress.
β» Cogitated for 59s
β― 1. skip the per-chain full runs and rely on routing+direct alone, since it exercises both source and
destination attestation paths in one shot. 2. just fork and test on testnet 3. yes sounds good
βΊ Locked in. Here's the final Phase 3 design and the three small remaining items before save.
β Insight βββββββββββββββββββββββββββββββββββββ
override into an explicit, discoverable gate command. That removes the "asymmetric defaults between e2e and e2e-gateway" footgun
entirely.
and cost more in upstream RPC quota for negligible coverage gain since Newton's prod stack lives on testnets in stagef.
developer reproduces locally with both forks running and bisects from there. The "two runs for clear isolation" is a debugging
convenience, not a correctness requirement.
βββββββββββββββββββββββββββββββββββββββββββββββββ
Phase 3 β final shape
One new Justfile scenario:
In e2e recipe case statement, add:
routing-full) _attestation=${_attestation:-direct}; _privacy=${_privacy:-false}; _target_chains=${_target_chains:-31337,31338} ;;
One CI workflow change in the stagef deploy pipeline:
Fail β block deploy, surface E2E failures to PR author
Live stagef stays untouched the entire time. Real keys, real contract state, isolated execution.
Final phased plan (decisions baked in)
βββββββββ¬βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ¬ββββββββββββββ
β Phase β Deliverable β Status β
βββββββββΌβββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββΌββββββββββββββ€
β 1 β PolicyClientRegistry.strictnessFlags + setStrictnessFlag admin function + just admin set-strictness β Decided β
β β CLI command + Rust is_strictness_flag_enabled reader β β
βββββββββΌβββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββΌββββββββββββββ€
β 2 β Rust dual-path wiring with StrictnessCache (5-min refresh) + first concrete dual-path on β Decided β
β β getPolicyCodeHash β (see Q2) β
βββββββββΌβββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββΌββββββββββββββ€
β 3 β routing-full Justfile scenario + CI workflow step (anvil forks Sepolia + Base Sepolia, runs scenario, β Decided β
β β gates stagef deploy) β β
βββββββββΌβββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββΌββββββββββββββ€
β 4 β PR template checkbox ("did you add a strictness flag for this breaking change?") + release checklist β Decided β
β β update + lessons.md entry codifying the pattern β β
βββββββββΌβββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββΌββββββββββββββ€
β 5 β legacy_policy_detections_total{chain_id, feature} Prometheus counter + one critical alert β Decided β
β β β (see Q4) β
βββββββββ΄βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ΄ββββββββββββββ
Three small confirmations remaining
These were posed before Phase 3 redesigned but still apply.
Q2 β Strictness cache RPC failure default:
The cached is_strict() check needs a default when the on-chain read fails (RPC down, network glitch). Two reasonable defaults:
prefer availability over strictness when telemetry is broken; a transient RPC blip shouldn't reject every task.
fail-closed when uncertain.
My recommendation: permissive with a warn! log on every cache-miss-with-RPC-fail and a counter
(strictness_cache_rpc_failures_total) for alerting. Once strict mode is on, you want telemetry to flag the situation, but you don't
want a Sepolia RPC blip to brick task processing.
Q3 β setStrictnessFlag authorization:
My recommendation: onlyOwner now, migrate to Safe via the NEWT-882β894 work later. The strictness flag is a "we're confident
migration is done" signal, not an emergency lever β single-signer is acceptable.
Q4 β Alerting threshold:
What's the page-someone alert? My recommendation: one critical day-1 rule.
(
policy_strictness_flag{feature="policy_code_hash"} == 1
) and (
rate(legacy_policy_detections_total{feature="policy_code_hash"}[5m]) > 0
)
Reads as: "strict mode is enabled AND legacy detections are still happening." That's the "we flipped the flag too early" alarm. Add
a non-paging warn-level rule for "legacy detections still happening even though we shipped the strict version" so we know to flip
the flag.
Confirm and save
Three quick confirmations:
Once you confirm all three (or override), I'll fold the full plan into protocol-version-compatibility.md, update MEMORY.md, save to
supermemory, save to recall, and finish /save-progress.
β» Churned for 57s
β» recap: Designing a strictness-flag plan to prevent contract/Rust version drift, with Phase 3 settled as an anvil-fork pre-deploy
E2E gate running
routing-fullagainst Sepolia and Base Sepolia. Next: confirm Q2 (permissive RPC default), Q3 (onlyOwner now), Q4(single critical alert) so I can save progress. (disable recaps in /config)
β― do we still need strictness check?
βΊ Sharp question β and I think the honest answer is: probably not, given your current state. The two mechanisms solve different
problems, and the one the strictness flag solves may not be a problem yet.
β Insight βββββββββββββββββββββββββββββββββββββ
contracts we're shipping to?" Strictness flag: "should new Rust tolerate or reject legacy external policy proxies that we don't
control?"
deploy" β the E2E gate enforces this naturally because forked stagef state with un-migrated Newton proxies will fail the gate,
blocking deploy.
need to give them a bounded grace period without breaking their tasks.
βββββββββββββββββββββββββββββββββββββββββββββββββ
What each mechanism actually protects against
βββββββββββββββββββββββββββββββββββββββββββββββ¬βββββββββββββββββββββββββββββββββββββββββββββ¬ββββββββββββββββββββββββββββββββββββ
β Failure mode β Caught by E2E gate? β Caught by strictness flag? β
βββββββββββββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββββββββββ€
β Newton ships Rust that breaks against β Yes (E2E fails) β N/A (flag only changes runtime β
β Newton-deployed stagef proxies β β behavior) β
βββββββββββββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββββββββββ€
β Newton ships Rust that breaks against β Indirectly (stagef passes βΉ prod likely β N/A β
β Newton-deployed prod proxies β passes if both have same impl) β β
βββββββββββββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββββββββββ€
β External dev's legacy proxy hits new Rust β No (no external proxies on stagef to test β Yes (flag=false β log + tolerate; β
β on prod β against) β flag=true β reject) β
βββββββββββββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββββββββββ€
β Newton wants to force external migration β N/A β Yes (flip flag to true) β
β after grace window β β β
βββββββββββββββββββββββββββββββββββββββββββββββ΄βββββββββββββββββββββββββββββββββββββββββββββ΄ββββββββββββββββββββββββββββββββββββ
The E2E gate handles every Newton-controlled scenario. The strictness flag only adds value for the external-policy-client grace
period.
The question that decides it
Do you currently have, or expect soon to have, external developers running their own NewtonPolicyClient extensions in production?
you do it, coordinate with whoever's running policy clients via Discord / direct comms. Add the flag later as a follow-up if
external clients land.
failing the moment Newton deploys. Without it, every breaking change is an immediate hard break for them.
My recommendation
Drop Phases 1 and 2 for now. Keep Phases 3, 4, 5. Reasoning:
of the existing deploy workflow.
Rust dual-path machinery to maintain.
informs the grace period design).
What you give up by dropping it:
miss)
What you gain:
Slimmed-down plan if you agree
βββββββββ¬ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Phase β Deliverable β
βββββββββΌββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β 3 β routing-full Justfile scenario + CI workflow step (anvil forks Sepolia + Base Sepolia, runs scenario, gates stagef β
β β deploy) β
βββββββββΌββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β 4 β PR template checkbox ("does this PR add a new selector or break an existing one?") + release checklist + lessons.md β
β β entry codifying the E2E-gate pattern β
βββββββββΌββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β 5 β legacy_policy_detections_total{chain_id, feature} Prometheus counter + warn-level alert (no critical alert needed β
β β without strict mode β just observability) β
βββββββββ΄ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Phase 1, 2, Q2, Q3, Q4 all become moot.
Decide
complexity yet.
My vote: A. Ship the minimum that solves the actual current problem (v0.3.0 incident class). Pick up the flag when external clients
become a concrete need, not a hypothetical.
Confirm A/B/C and I save.
β» Cogitated for 1m 30s
β― drop phase 4 - but replace with just lessons.md entry. pr template is overkill. phase 3 is a must, and phase 5 can be held off for
now as well
βΊ Locked in. Final scope: Phase 3 only + lessons.md entry. Loading the recall tools and checking for existing memory in parallel
before saving.
Listed 1 directory (ctrl+o to expand)
βΊ Now executing the save in parallel: write memory file, update MEMORY.md, push to supermemory, and search recall for dedup.
Wrote 2 memories (ctrl+o to expand)
βΊ Bash(node "$HOME/.claude/plugins/cache/supermemory-plugins/claude-supermemory/0.0.2/scripts/save-project-memory.cjs"
"[SAVE:dennis:2026-04-19]β¦)
βΏ Β Project knowledge saved: newton-prover-avs
ID: FqSZUTDETBNEz2F7Liykj4
βΊ Bash(test -d /Users/denniswon/projects/newton-prover-avs/.claude/rules && echo "rules dir exists")
βΏ Β rules dir exists
βΏ Β Context limit reached Β· /compact or /clear to continue