Skip to content

Instantly share code, notes, and snippets.

@denniswon
Created April 17, 2026 05:17
Show Gist options
  • Select an option

  • Save denniswon/4ad3289b280121723668ea7f908eb855 to your computer and use it in GitHub Desktop.

Select an option

Save denniswon/4ad3289b280121723668ea7f908eb855 to your computer and use it in GitHub Desktop.
Protocol Version Compatibility enforcement test
The goal of prevent developers in the team accidentally forget to make sure the Rust code to be compatible to both old vs new
contract versions. There are 3 scenarios: 1) contracts upgraded first with breaking change before Rust update 2) contracts with
breaking change not upgraded yet, but Rust deploys, 3) contracts upgraded and Rust update also deployed, but for Newton
Policy/Newton Policy Data/and Policy Clients (that extends NewtonPolicyClient mixin), there is a grace period for the developers
to migrate/update their policy clients with new newton policy or policy data. This means the Rust update that pairs with the
contracts upgrade with breaking change should support both versions during this grace period (until version enforcement happens).
1, 2 are for prevent protocol literally failing, while 3 is for external developer's policy clients to start failing suddenly
without having time to become compatible. /clarify-plan THINK HARD
⏺ I'll work through this as a design problem. The three scenarios sort into two fundamentally different problem classes, which
matters for what prevention mechanism fits each.
β˜… Insight ─────────────────────────────────────
- Scenarios 1+2 are atomic rollout problems (the AVS itself breaks during deploy transitions). Fix is deploy ordering + pre-deploy
CI.
- Scenario 3 is a long-tail multi-version support problem (external-dev UX, not protocol integrity). Fix is dual-path code that
Rust devs cannot forget to write.
- The hard part of #3 isn't "add tolerance code" β€” we know how to do that. It's making forgetting it structurally impossible (type
system, CI, lint) rather than relying on reviewers remembering.
─────────────────────────────────────────────────
Problem framing
Mapping to enforcement mechanisms:
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Scenario β”‚ Failure class β”‚ Who gets β”‚ Natural prevention β”‚
β”‚ β”‚ β”‚ hurt β”‚ β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ 1 β€” contracts ahead of Rust β”‚ protocol-down β”‚ us (newton) β”‚ deploy-pipeline ordering + β”‚
β”‚ β”‚ β”‚ β”‚ MIN_COMPATIBLE_VERSION β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ 2 β€” Rust ahead of contracts β”‚ protocol-down β”‚ us (newton) β”‚ deploy-pipeline ordering + pre-deploy selector β”‚
β”‚ β”‚ β”‚ β”‚ probe β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ 3 β€” policy proxies stuck on old impl β”‚ client-policy-down β”‚ external β”‚ dual-version Rust code paths that can't be β”‚
β”‚ during grace period β”‚ β”‚ devs β”‚ accidentally skipped β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
The core hazard in #3: NewtonPolicy / NewtonPolicyData are initialize-once proxies. When we ship a new version, the impl on chain
is new but all existing proxies still delegate to the old bytecode. External devs have to redeploy their policies to opt in. Rust
must gracefully serve both until enforcement.
Clarifying questions (need answers before writing the plan)
Before I draft the actual implementation, I need to resolve ambiguity on these:
Q1 β€” Version detection. How does Rust know which version a given policy proxy is on?
- (a) NewtonPolicy.PROTOCOL_VERSION() selector (does this exist on old impls? if not, asking it causes the same selector-miss we're
trying to avoid)
- (b) ERC-165 supportsInterface(bytes4) probe per new selector (robust but requires adding supportsInterface to NewtonPolicy in
v0.3.0+)
- (c) Bytecode classification (cast code + known hash per impl) β€” works today with zero contract changes but brittle if impl is
redeployed
- (d) PolicyClientRegistry stores declared version at registration time β€” clean but needs registry writes on every upgrade
Which direction do you want? (b) is my recommendation; (c) works as a stopgap without any contract work.
Q2 β€” Grace period trigger. Is the end of grace period:
- (a) A fixed date baked into Rust at release time (grace_period_end: 2026-07-15)
- (b) An admin-flipped on-chain flag (e.g., MIN_POLICY_VERSION on PolicyClientRegistry)
- (c) Both β€” Rust has default, admin can extend/shorten
(b) is more flexible but means Rust must query it. (a) is rigid but simpler.
Q3 β€” Scope of the compat layer. Is this only for NewtonPolicy and NewtonPolicyData selectors called at task time? Or also:
- Policy deployment path (NewtonPolicyFactory.deployPolicy signature changes)?
- Events emitted by old vs new NewtonPolicy (events don't have selector-miss semantics but ABI changes matter)?
- NewtonPolicyClient mixin internals (which external devs compile into their own contracts)?
The mixin is the trickiest because it lives in external-dev bytecode β€” we can't upgrade it at all. External devs inherit the mixin
version at compile time of their client.
Q4 β€” Policy version enumeration. Do we want active tracking of "which policy_client is on which impl"?
- Useful for: alerting external devs before grace expires, dashboards
- Requires: crawler that walks PolicyClientRegistry + classifies each proxy
- If yes, where does it live β€” gateway, a new service, CI job that publishes to a dashboard?
Q5 β€” Deprecation communication channel. When we detect a legacy policy client at task time, do we:
- Emit a structured log line only (devs may never see it)
- Emit a Prometheus metric legacy_policy_detections_total{policy_client, version, days_until_grace_end} (ops can alert)
- Return a deprecation warning header in the RPC response (external devs see it)
- Auto-create a Linear ticket per detected client
My instinct is all four at different cadences, but which are in scope?
Q6 β€” Existing tolerance patterns. The v0.3.0 incident surfaced getPolicyCodeHash as an example. Before we design the general
system, can you confirm:
- Did we ship tolerance for that in Rust in a follow-up PR, or was the fix "redeploy all policies"?
- If tolerance exists, where? (It will be the seed pattern for the general BackwardCompat abstraction.)
Q7 β€” Release calendar visibility. How do we want grace periods to be discoverable across the team?
- (a) .claude/rules/lessons.md entry per release
- (b) docs/COMPATIBILITY_MATRIX.md β€” single source of truth listing every BackwardCompat, when it expires, which version introduced
it
- (c) Linear project "Protocol Compatibility" with one ticket per open compat window
Draft plan (pending answers above)
Regardless of how Q1–Q7 resolve, the plan will have four tiers. The choices above determine how each tier is implemented.
Tier 1 β€” Atomic rollout (scenarios 1 + 2, protocol self-compat).
- Per-release: pre-deploy CI smoke test exercises gateway β†’ operator β†’ TaskManager on a fork against current production contract
state. Fails before deploy if Rust can't talk to on-chain contracts.
- Per-contract-PR: require same-PR Rust change OR explicit "no Rust impact" label. Enforced via GitHub required-label check.
- Deploy pipeline: ordering contract β€” deploy-trigger.yml already does contracts-first on stagef; formalize prod ordering
(contracts first, bake time, then Rust) in runbook.
- MIN_COMPATIBLE_VERSION already enforces the semver floor on-chain; keep it as the last line of defense.
Tier 2 β€” Typed dual-version abstraction (scenario 3, policy proxy grace period).
Introduce crates/core/src/compat/ module with:
pub struct BackwardCompat<T> {
new_value: Option<T>, // None = legacy proxy
legacy_behavior: T, // what we did pre-v0.X
grace_end: chrono::NaiveDate,
rollout_ticket: &'static str, // Linear ID for visibility
}
impl<T: Clone> BackwardCompat<T> {
pub fn resolve(&self, now: chrono::NaiveDate) -> Result<T, CompatError> {
match (&self.new_value, now >= self.grace_end) {
(Some(v), _) => Ok(v.clone()),
(None, false) => {
metrics::inc_legacy_policy_detection(self.rollout_ticket);
Ok(self.legacy_behavior.clone())
}
(None, true) => Err(CompatError::GraceExpired { .. }),
}
}
}
Every new required contract selector MUST go through this type. Enforced via:
- Custom clippy lint (or dylint/ast_grep-based CI check) that flags direct calls to a registered set of "new selectors" outside of
compat:: module.
- Codeowner rule: changes to crates/core/src/compat/ require Dennis + one more reviewer.
Tier 3 β€” CI enforcement of the compatibility lifecycle.
- just check-grace-periods: walks all BackwardCompat instances (via a registry macro), fails CI if any grace_end < today. Forces
the dev to either:
- Remove the compat wrapper (use the new path directly) and prove all proxies migrated.
- Extend the grace_end with justification.
- just test-legacy-compat: fixture test that deploys a snapshot of the previous policy impl and exercises Rust code paths against
it. Ships as a required CI job on PRs that touch contracts or compat/.
- Scheduled CI job (weekly): queries PolicyClientRegistry on each network, classifies each proxy by impl, publishes dashboard +
alerts when >0 clients are on impl version nearing grace expiry.
Tier 4 β€” External-dev communication.
- Runtime: operator logs legacy_policy_detected at warn! with policy_client address + grace_end. Gateway also emits Prometheus
metric legacy_policy_detections_total.
- RPC layer: gateway adds x-newton-policy-version header on task responses when legacy detected β€” surfaces to integrations without
requiring log access.
- Ops runbook: when dashboard shows legacy clients within 30 days of grace, ops emails external devs + files Linear tickets.
Tradeoffs to flag
- Type-driven enforcement vs lint-driven. Type-driven (wrap in BackwardCompat) is iron-clad but adds ceremony. Lint-driven (custom
clippy/ast_grep check) is lighter but requires keeping the selector list curated. I'd pick type-driven for clarity.
- Static grace dates vs admin-flipped on-chain flag. Static is simpler; admin-flipped is flexible (extend grace if external dev
ecosystem isn't migrating fast enough). A hybrid β€” Rust has default static date, admin flag can extend only β€” captures both safely.
- Fixture-based legacy tests vs on-fork tests. Fixtures (snapshot old bytecode) are fast + deterministic but grow stale. On-fork
tests (against live stagef) are realistic but slow + flaky. Do both: fixtures on every PR, on-fork nightly.
- Cost of supporting old impls forever. Every BackwardCompat is technical debt. If we don't actively retire them, the compat layer
grows unbounded. The grace_end CI check is what prevents this β€” but only if we don't just keep bumping the date.
What I need from you to write the implementation plan
Answering Q1–Q3 is minimum viable. Q4–Q7 shape Tiers 3+4 but don't block starting Tiers 1+2. Preferred path:
1. You answer Q1–Q3 (version detection mechanism, grace trigger, scope).
2. I write a focused implementation plan for Tiers 1 and 2 (protocol self-compat + typed dual-version abstraction) + scaffolding
for Tier 3 CI check.
3. Tiers 3 and 4 details get fleshed out based on Q4–Q7 answers in a follow-up.
β˜… Insight ─────────────────────────────────────
- The real systemic win from this work is removing the judgement call from developers. Today, "do I need tolerance here?" is a
reviewer's job and fails silently. With BackwardCompat<T> as a required wrapper, "tolerance was added" becomes a compile-time fact
and "tolerance is still needed" becomes a CI-enforced fact. Reviewer discipline goes from required to optional β€” which is the only
way discipline actually survives.
- The mixin case (NewtonPolicyClient that external devs compile into their own contracts) is the worst version of this problem
because we can't upgrade it at all. Worth thinking separately about whether future NewtonPolicyClient changes should use
interface-based dispatch rather than mixin inheritance, so external devs don't have to recompile their contracts to get updates.
That's an architectural shift, not a compat layer β€” flag for separate discussion.
─────────────────────────────────────────────────
Which of Q1–Q7 should I pursue first?
@denniswon
Copy link
Copy Markdown
Author

❯ thats the purpose of stagef environment. it is a place for us to test because its internal only so it can go down (although we
dont want it to go down ever) more than prod. So we can just use the existing same operator keys/gatewway keys, etc. and gas will
all be funded, etc. and this should run "sequentially" prior to every testnet stagef deploy (which auto triggers on every PR merge
to main on newton-prover-avs)

⏺ Yes β€” that resolves all the operational concerns I raised. The cleanest framing is "deploy with E2E gate baked in" rather than a
separate test step.

β˜… Insight ─────────────────────────────────────

  • The stagef deploy itself already has a brief downtime window (stop old binary β†’ start new). Putting E2E inside that window means
    you're not adding a new collision risk β€” you're using the deploy window itself as the test sandbox.
  • This is structurally a canary deployment where synthetic test traffic (E2E) plays the role of real traffic. If synthetic fails,
    roll back. If synthetic passes, accept real traffic.
  • Reusing existing keys eliminates BLS APK churn, registry writes, and the "where do we register the test operator" problem
    entirely.
    ─────────────────────────────────────────────────

Workflow shape

  1. PR merges to main
  2. CI: build new binary
  3. CI: stop running stagef gateway + operator (downtime starts)
  4. CI: run E2E with NEW binary against stagef contracts + existing keys
    • Scenarios: basic, full, routing (whatever proves the core lifecycle)
  5. If E2E passes:
    • Start stagef gateway + operator with NEW binary (downtime ends)
    • Mark deploy successful
  6. If E2E fails:
    • Start stagef gateway + operator with OLD binary (rollback, downtime ends)
    • Mark deploy blocked, surface E2E failure to PR author

Stagef downtime per merge: roughly E2E duration (5-15 min). Acceptable for an internal env per your framing.

What this kills from the original Phase 3

  • No bin/newton-compat-check binary needed
  • No required_selectors.rs curated list to maintain
  • No "where do CI test operators register" question
  • No gas funding / faucet plumbing (already in place for stagef)
  • No test-parallelism question (one merge β†’ one deploy β†’ sequential by construction)

What still needs to be decided

  1. E2E scenario set. Which just e2e --scenario * runs as the gate? I'd start minimal β€” basic and full (full covers two-phase
    consensus + direct attestation, the v0.3.0 incident path). Add routing and privacy only if they're stable enough on Sepolia/Base
    Sepolia (privacy in particular is heavier). Adding more scenarios increases gate confidence but extends downtime per merge.

  2. Rollback automation. Does CI automatically restart with the old binary on E2E failure, or does it leave stagef stopped and page
    someone? Auto-rollback is cleaner but requires the CI workflow to keep both binaries on disk during the deploy window.

  3. Multi-chain coverage. Stagef has Sepolia + Base Sepolia + others. Run E2E against each in parallel during the deploy window?
    Sequentially? Parallel is faster but multiplies the chance of a flaky chain blocking deploy. I'd pick one or two representative
    chains as the required gate, run others as advisory.

  4. Override path. What happens when E2E fails for an emergency hotfix that needs to ship anyway? Manual workflow_dispatch with
    skip_e2e: true? Who can authorize it?

  5. Production deploy implications. This gate covers stagef β†’ main merge. Production deploy is a separate workflow. Does prod deploy
    require the stagef E2E to have passed (i.e., you can only promote a SHA to prod after its stagef E2E was green)? That would close
    the loop fully.

Updated phase plan

  • Phase 1 (contract strictness flag + admin CLI + Rust reader) β€” unchanged. Still needed for runtime dual-path.
  • Phase 2 (Rust dual-path wiring + strictness cache) β€” unchanged.
  • Phase 3 NEW (stagef pre-deploy E2E gate):
    • Modify the existing stagef deploy workflow to insert E2E step between binary swap
    • Add rollback-on-failure logic
    • Configure scenario set, multi-chain coverage, override path
  • Phase 4 (PR template + release checklist + lessons.md) β€” unchanged.
  • Phase 5 (telemetry β€” legacy_policy_detections_total + alert) β€” unchanged.

Confirmation needed

Two things to nail down before saving:

  1. E2E scenarios for the gate: just basic + full, or include routing and privacy from day one?
  2. Multi-chain: gate on one chain (Sepolia) or on all stagef chains? My instinct is gate on one (cheaper, less flake), advisory on
    the rest.

Once you answer those, I'll fold the new Phase 3 into the saved memory and finish /save-progress.

✻ Worked for 1m 13s

❯ or can we just use anvil to "fork" sepolia/base-sepolia etc, so that we dont have to worry about gateway/operator nonce, gas issue
etc?

⏺ Yes β€” this is even better. Anvil fork gives you the fidelity of real stagef state with the isolation of CI-local execution. Best of
both worlds.

β˜… Insight ─────────────────────────────────────

  • anvil --fork-url snapshots all on-chain state at a block (every contract, every storage slot, every legacy proxy) into a
    local node β€” your test environment is the real stagef contract topology, not a fresh deploy.
  • The fork preserves the BLS APK registry, so signatures from real operator keys verify against real on-chain state β€” the
    cryptographic correctness path is honest, not mocked.
  • This is the same just e2e infrastructure you already have, just pointed at a forked node instead of a fresh anvil β€” minimal new
    tooling required.
    ─────────────────────────────────────────────────

Workflow shape

  1. PR merges to main β†’ CI workflow starts
  2. CI: build new binary
  3. CI: spin up anvil forks
    • anvil --fork-url $SEPOLIA_RPC --fork-block-number latest --port 8545
    • anvil --fork-url $BASE_SEPOLIA_RPC --port 8546
  4. CI: anvil_setBalance for gateway + operator EOAs (free local gas)
  5. CI: start new binary's gateway + N operators against forked endpoints
    • Same stagef config, only RPC URLs differ
    • Real keys, real contract addresses, real registry state
  6. CI: run just e2e --scenario basic,full against the forks
  7. If passes β†’ trigger actual stagef deploy
  8. If fails β†’ block deploy, surface failures

Live stagef keeps running uninterrupted the whole time. No downtime, no nonce collision, no faucet plumbing.

What this catches that I care about

The v0.3.0 incident class is fully covered: the fork includes every legacy NewtonPolicy proxy on Sepolia exactly as it lives on
chain. When the new gateway calls getPolicyCodeHash() against a legacy proxy in the fork, the selector miss surfaces as the same 0x
revert it would in production. CI catches it before the real deploy fires.

Bonus: it lets you test the contract upgrade itself

If a PR includes a contract upgrade (impl swap), CI can run forge script Upgrade.s.sol --rpc-url http://localhost:8545 --broadcast
against the fork before the binary E2E. That tests the upgrade-then-Rust path end-to-end without ever touching real stagef. Catches
the third incident scenario (both upgraded, but external policy clients still on grace period) cleanly.

Edge cases worth naming

  1. External dependencies stay real: IPFS reads (policy fetch) hit real gateways, fine. Privacy / TLS notary tests would need real
    or mock services β€” same as today.
  2. Fork startup latency: Anvil pulls state lazily from upstream RPC. First few calls are slow. For consistent timing, pre-warm by
    calling the contracts under test once at fork startup.
  3. Multi-chain coordination: Each chain gets its own anvil instance + own port. CI needs to orchestrate N anvil processes. Existing
    just e2e-dual-anvil-multichain already does this for testing β€” just point fork URLs at real testnets instead of fresh deploys.
  4. State drift mid-test: Anvil fork advances locally as tests mine blocks. taskCreatedBlock references work the same way they do in
    live testing. Just don't let tests assume a specific real block number β€” use the fork's current_block semantics.

What this doesn't catch

  • Real RPC behavior under load (rate limits, timing, connection drops)
  • Real operator-to-operator network behavior if anything depends on it (currently nothing does β€” operators talk only via gateway
    broadcast)
  • Anything that depends on real-time wall clock that doesn't match block time

None of these are in the v0.3.0 incident class. Acceptable miss.

Updated phase plan

  • Phase 1 (strictness flag + admin CLI + Rust reader) β€” unchanged
  • Phase 2 (Rust dual-path + cache) β€” unchanged
  • Phase 3 NEW (anvil-fork pre-deploy E2E gate):
    • New CI workflow step: spin up anvil forks of stagef chains, run E2E, gate deploy
    • Reuses existing just e2e infrastructure with --fork-url parameter
    • Optional: pre-test contract upgrade if PR touches contracts
  • Phase 4 (PR template + release checklist + lessons.md) β€” unchanged
  • Phase 5 (telemetry) β€” unchanged

Two small confirmations before saving

  1. Scenarios for the gate: basic + full minimum (covers v0.3.0 path). Add routing and privacy from day one, or layer them in later
    once stable?
  2. Chains to fork: All stagef chains in parallel (Sepolia + Base Sepolia + others), or just Sepolia as the required gate with the
    rest advisory?

My defaults: basic + full to start, all stagef chains in parallel (anvil forks are cheap, parallel is fast). Confirm or override
and I'll fold into saved memory and finish /save-progress.

@denniswon
Copy link
Copy Markdown
Author

  1. skip the per-chain full runs and rely on routing+direct alone, since it exercises both source and
    destination attestation paths in one shot. 2. just fork and test on testnet 3. yes sounds good

⏺ Locked in. Here's the final Phase 3 design and the three small remaining items before save.

β˜… Insight ─────────────────────────────────────

  • Adding routing-full as a Justfile scenario is the minimal codification β€” one line in the case statement turns implicit env-var
    override into an explicit, discoverable gate command. That removes the "asymmetric defaults between e2e and e2e-gateway" footgun
    entirely.
  • Forking only testnet chains (Sepolia + Base Sepolia) keeps anvil's upstream RPC dependency narrow β€” mainnet forks pull more state
    and cost more in upstream RPC quota for negligible coverage gain since Newton's prod stack lives on testnets in stagef.
  • One scenario covering both source + destination attestation halves wall-clock at the cost of debug isolation β€” when it fails, the
    developer reproduces locally with both forks running and bisects from there. The "two runs for clear isolation" is a debugging
    convenience, not a correctness requirement.
    ─────────────────────────────────────────────────

Phase 3 β€” final shape

One new Justfile scenario:

In e2e recipe case statement, add:

routing-full) _attestation=${_attestation:-direct}; _privacy=${_privacy:-false}; _target_chains=${_target_chains:-31337,31338} ;;

One CI workflow change in the stagef deploy pipeline:

  1. PR merges to main
  2. CI: build new binary
  3. CI: spin up two anvil forks
    • anvil --fork-url $SEPOLIA_RPC --port 8545 --chain-id 31337
    • anvil --fork-url $BASE_SEPOLIA_RPC --port 8546 --chain-id 31338
  4. CI: anvil_setBalance for gateway + operator EOAs (free local gas)
  5. CI: just e2e --scenario routing-full --env stagef --block-time 1
  6. Pass β†’ trigger actual stagef deploy
    Fail β†’ block deploy, surface E2E failures to PR author

Live stagef stays untouched the entire time. Real keys, real contract state, isolated execution.

Final phased plan (decisions baked in)

β”Œβ”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Phase β”‚ Deliverable β”‚ Status β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ 1 β”‚ PolicyClientRegistry.strictnessFlags + setStrictnessFlag admin function + just admin set-strictness β”‚ Decided β”‚
β”‚ β”‚ CLI command + Rust is_strictness_flag_enabled reader β”‚ β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ 2 β”‚ Rust dual-path wiring with StrictnessCache (5-min refresh) + first concrete dual-path on β”‚ Decided β”‚
β”‚ β”‚ getPolicyCodeHash β”‚ (see Q2) β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ 3 β”‚ routing-full Justfile scenario + CI workflow step (anvil forks Sepolia + Base Sepolia, runs scenario, β”‚ Decided β”‚
β”‚ β”‚ gates stagef deploy) β”‚ β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ 4 β”‚ PR template checkbox ("did you add a strictness flag for this breaking change?") + release checklist β”‚ Decided β”‚
β”‚ β”‚ update + lessons.md entry codifying the pattern β”‚ β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ 5 β”‚ legacy_policy_detections_total{chain_id, feature} Prometheus counter + one critical alert β”‚ Decided β”‚
β”‚ β”‚ β”‚ (see Q4) β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Three small confirmations remaining

These were posed before Phase 3 redesigned but still apply.

Q2 β€” Strictness cache RPC failure default:

The cached is_strict() check needs a default when the on-chain read fails (RPC down, network glitch). Two reasonable defaults:

  • Permissive default (recommended): cache miss + RPC fail β†’ return false (treat as non-strict, allow legacy proxy path). Argument:
    prefer availability over strictness when telemetry is broken; a transient RPC blip shouldn't reject every task.
  • Strict default: cache miss + RPC fail β†’ return true (reject anything that would have been legacy-tolerated). Argument:
    fail-closed when uncertain.

My recommendation: permissive with a warn! log on every cache-miss-with-RPC-fail and a counter
(strictness_cache_rpc_failures_total) for alerting. Once strict mode is on, you want telemetry to flag the situation, but you don't
want a Sepolia RPC blip to brick task processing.

Q3 β€” setStrictnessFlag authorization:

  • onlyOwner now (recommended): deployer EOA flips the flag via just admin. Simple, ships today.
  • Safe multisig from day one: requires the Safe migration to land first (NEWT-882–894), which is on the roadmap but not done.

My recommendation: onlyOwner now, migrate to Safe via the NEWT-882–894 work later. The strictness flag is a "we're confident
migration is done" signal, not an emergency lever β€” single-signer is acceptable.

Q4 β€” Alerting threshold:

What's the page-someone alert? My recommendation: one critical day-1 rule.

(
policy_strictness_flag{feature="policy_code_hash"} == 1
) and (
rate(legacy_policy_detections_total{feature="policy_code_hash"}[5m]) > 0
)

Reads as: "strict mode is enabled AND legacy detections are still happening." That's the "we flipped the flag too early" alarm. Add
a non-paging warn-level rule for "legacy detections still happening even though we shipped the strict version" so we know to flip
the flag.

Confirm and save

Three quick confirmations:

  • Q2: permissive default + warn log + counter? (Y/N)
  • Q3: onlyOwner now, migrate to Safe later? (Y/N)
  • Q4: single critical day-1 alert per the PromQL above? (Y/N)

Once you confirm all three (or override), I'll fold the full plan into protocol-version-compatibility.md, update MEMORY.md, save to
supermemory, save to recall, and finish /save-progress.

@denniswon
Copy link
Copy Markdown
Author

⏺ Got the authoritative answers. Stagef has deployments at 11155111 (Sepolia), 84532 (Base Sepolia), and 1 (Ethereum mainnet β€” likely
the EigenLayer source-chain reference, not a real deploy target β€” worth your confirmation).

β˜… Insight ─────────────────────────────────────

  • The e2e recipe is a thin dispatcher around three orthogonal env-var knobs: VALIDATE_ATTESTATION, PRIVACY_ENABLED, TARGET_CHAINS.
    The "scenarios" are just preset combinations of those three.
  • The v0.3.0 selector miss (getPolicyCodeHash) lives on the on-chain attestation validation path (validateAttestationDirect). Only
    scenarios with validate_attestation=direct exercise that path β€” meaning basic, privacy, and routing would have all passed while
    v0.3.0 was broken.
  • That's the load-bearing detail: only full and privacy-full would have caught the v0.3.0 incident class.
    ─────────────────────────────────────────────────

Scenario decomposition

From Justfile:456-462:

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Scenario β”‚ validate_attestation β”‚ privacy β”‚ target_chains β”‚ Catches v0.3.0? β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ basic β”‚ skip β”‚ false β”‚ 31337 β”‚ No β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ full β”‚ direct β”‚ false β”‚ 31337 β”‚ Yes β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ privacy β”‚ skip β”‚ true β”‚ 31337 β”‚ No β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ privacy-full β”‚ direct β”‚ true β”‚ 31337 β”‚ Yes β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ routing β”‚ skip β”‚ false β”‚ 31337,31338 β”‚ No β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

What each knob does:

  • validate_attestation=direct: After BLS aggregation, gateway calls NewtonProverTaskManager.validateAttestationDirect() on-chain.
    This is the path that reads policy.getPolicyCodeHash() against the live policy proxy. Selector miss = revert here. This is the
    v0.3.0 path.
  • validate_attestation=skip: Skips on-chain validation entirely. Aggregation completes, but the on-chain selector compatibility
    check never runs.
  • privacy=true: Enables HPKE encryption of policy params. Operators decrypt locally. Adds privacy-specific code paths but doesn't
    touch the getPolicyCodeHash path.
  • target_chains=31337,31338: Tests dual-chain orchestration (per-chain ChainService, source/dest task flow). Different bug class
    than selector miss.

Argument variations for each scenario, beyond the preset:

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Arg β”‚ Effect β”‚ Default β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ --env β”‚ Which newton_prover_config..json to read β”‚ stagef β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ --block-time β”‚ Anvil block time (0 = automine, 1 = 1s realistic) β”‚ 0 β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ --force β”‚ Force re-deploy contracts vs cached β”‚ false β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ --keep-anvil β”‚ Don't tear down anvil after test β”‚ false β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ validate_attestation= env β”‚ Override scenario default (direct or skip) β”‚ per scenario β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ privacy= env β”‚ Override scenario default β”‚ per scenario β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ target_chains= env β”‚ Override scenario default β”‚ per scenario β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ SKIP_REGISTRATION env β”‚ Skip operator registration step (assume already registered) β”‚ per chain β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

For a forked-stagef gate, you'd want --block-time 1 (mimics real chain timing better than automine) and SKIP_REGISTRATION=true
(operators already registered in the forked state).

What gates the v0.3.0 class

Day-1 minimum to catch v0.3.0 specifically: full. That's the only scenario among the lightweight ones that exercises
validateAttestationDirect.

Recommendation for the deploy gate

Tier 1 (required, day 1):

  • full β€” single chain, attestation validation. Catches v0.3.0 selector class.
  • routing β€” dual-chain orchestration. Catches multichain regressions (different bug class but you ship multichain).

Tier 2 (add once stable on forked stagef):

  • privacy-full β€” covers attestation + privacy paths. Heavier (HPKE, IPFS, possibly threshold) so I'd hold this for after the gate
    is proven stable.

Skip from gate:

  • basic, privacy (no attestation validation β€” won't catch the incident class you care about; redundant given full and privacy-full)

Chain coverage

Stagef has real deployments at:

  • 11155111 β€” Sepolia (source)
  • 84532 β€” Base Sepolia (destination)
  • 1 β€” Ethereum mainnet (probably config-only, not a deploy target β€” please confirm)

For multichain routing, you need both Sepolia and Base Sepolia forked (source + destination). For single-chain full, Sepolia alone
is enough β€” but running it against Base Sepolia too would catch chain-specific issues (block-time offsets, EIP-4844 quirks).

My default: fork both Sepolia and Base Sepolia, run full against each (in parallel), run routing once across both. Total wall-clock
~10-15 min via parallel anvil instances. Mainnet 1 deployment skipped (config reference, not a real deploy target).

Two questions

  1. Tier 1 gate set: full + routing, or just full?
  2. Mainnet 1 deployment: is 1-stagef.json an actual deploy target Rust connects to, or just config reference for EigenLayer source
    chain reads?

Once you answer these I'll fold it into the saved memory and finish /save-progress.

✻ Sautéed for 1m 59s

❯ does routing also check for validateAttestationDirect?

⏺ Good check β€” let me verify both whether routing exercises it by default and whether destination chains even have that path.

Searched for 2 patterns (ctrl+o to expand)

⏺ Important catch. Let me give you the precise answer.

β˜… Insight ─────────────────────────────────────

  • routing defaults to validate_attestation=skip (per Justfile:460), so out-of-the-box it does NOT exercise
    validateAttestationDirect. But the env-var override is wired through β€” validate_attestation=direct just e2e --scenario routing
    works.
  • validateAttestationDirect exists on BOTH NewtonProverTaskManager (source) and NewtonProverDestTaskManager (destination). The
    multichain destination has its own copy of the selector, which means the v0.3.0 class can recur on dest chains too if the
    destination policy proxy is on a legacy impl.
  • The default in e2e-gateway is direct, but the default in e2e --scenario routing is skip β€” that asymmetry is easy to miss.
    ─────────────────────────────────────────────────

Direct answer

No, not by default. From Justfile:460:

routing) _attestation=${_attestation:-skip}; _privacy=${_privacy:-false}; _target_chains=${_target_chains:-31337,31338} ;;

So just e2e --scenario routing runs with attestation validation skipped. It tests cross-chain orchestration but not the
selector-miss path.

But the override is in place

The e2e recipe respects validate_attestation as an env-var override. And integration-tests/src/multichain_routing.rs:214 does honor
it:

if !matches!(config.validate_attestation, AttestationValidationMode::Skip) {
// validate attestation per config.validate_attestation mode
}

So validate_attestation=direct just e2e --scenario routing will run dual-chain orchestration AND on-chain attestation validation.
That's the combo we actually want for the gate.

Why it matters specifically for v0.3.0 class

Both task managers have validateAttestationDirect:

  • crates/core/src/generated/newton_prover_task_manager.rs (source)
  • crates/core/src/generated/newton_prover_dest_task_manager.rs (destination)

Both call into the policy proxy's selector during validation. If a destination-chain policy proxy is on a legacy impl, the v0.3.0
class recurs there too. Source-only full would miss that.

Updated recommendation

To cover the v0.3.0 class on every chain that runs the binary, the gate should be:

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Test command β”‚ Catches β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ just e2e --scenario full --env stagef (against Sepolia fork) β”‚ v0.3.0 class on source chain β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ just e2e --scenario full --env stagef (against Base Sepolia fork) β”‚ v0.3.0 class on destination chain β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ validate_attestation=direct just e2e --scenario routing --env stagef β”‚ v0.3.0 class on dual-chain orchestration + β”‚
β”‚ (against both forks) β”‚ cross-chain attestation flow β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Three runs total. With anvil forks running in parallel, total wall-clock dominated by the slowest scenario (~10 min).

Or, more parsimonious: skip the per-chain full runs and rely on routing+direct alone, since it exercises both source and
destination attestation paths in one shot. Trade-off: harder to localize failures (was it source attestation, dest attestation, or
routing logic that broke?) but cheaper.

Three confirmations to nail down before saving

  1. Gate set: three runs (full Γ—2 + routing+direct) with clearer failure isolation, OR one run (routing+direct) for parsimony?
  2. Mainnet 1 deployment: real deploy target Rust connects to, or just config reference for EigenLayer source-chain reads? (Affects
    whether to fork mainnet too β€” heavier upstream RPC dependency.)
  3. Should I propose adding a new combined scenario (e.g., routing-full β†’ validate_attestation=direct, target_chains=31337,31338) to
    the Justfile case statement? That removes the env-var-override gotcha and makes the gate command explicit. Small Justfile change.

Once you answer these, I'll fold into saved memory and finish /save-progress.

✻ Cogitated for 59s

❯ 1. skip the per-chain full runs and rely on routing+direct alone, since it exercises both source and
destination attestation paths in one shot. 2. just fork and test on testnet 3. yes sounds good

⏺ Locked in. Here's the final Phase 3 design and the three small remaining items before save.

β˜… Insight ─────────────────────────────────────

  • Adding routing-full as a Justfile scenario is the minimal codification β€” one line in the case statement turns implicit env-var
    override into an explicit, discoverable gate command. That removes the "asymmetric defaults between e2e and e2e-gateway" footgun
    entirely.
  • Forking only testnet chains (Sepolia + Base Sepolia) keeps anvil's upstream RPC dependency narrow β€” mainnet forks pull more state
    and cost more in upstream RPC quota for negligible coverage gain since Newton's prod stack lives on testnets in stagef.
  • One scenario covering both source + destination attestation halves wall-clock at the cost of debug isolation β€” when it fails, the
    developer reproduces locally with both forks running and bisects from there. The "two runs for clear isolation" is a debugging
    convenience, not a correctness requirement.
    ─────────────────────────────────────────────────

Phase 3 β€” final shape

One new Justfile scenario:

In e2e recipe case statement, add:

routing-full) _attestation=${_attestation:-direct}; _privacy=${_privacy:-false}; _target_chains=${_target_chains:-31337,31338} ;;

One CI workflow change in the stagef deploy pipeline:

  1. PR merges to main
  2. CI: build new binary
  3. CI: spin up two anvil forks
    • anvil --fork-url $SEPOLIA_RPC --port 8545 --chain-id 31337
    • anvil --fork-url $BASE_SEPOLIA_RPC --port 8546 --chain-id 31338
  4. CI: anvil_setBalance for gateway + operator EOAs (free local gas)
  5. CI: just e2e --scenario routing-full --env stagef --block-time 1
  6. Pass β†’ trigger actual stagef deploy
    Fail β†’ block deploy, surface E2E failures to PR author

Live stagef stays untouched the entire time. Real keys, real contract state, isolated execution.

Final phased plan (decisions baked in)

β”Œβ”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Phase β”‚ Deliverable β”‚ Status β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ 1 β”‚ PolicyClientRegistry.strictnessFlags + setStrictnessFlag admin function + just admin set-strictness β”‚ Decided β”‚
β”‚ β”‚ CLI command + Rust is_strictness_flag_enabled reader β”‚ β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ 2 β”‚ Rust dual-path wiring with StrictnessCache (5-min refresh) + first concrete dual-path on β”‚ Decided β”‚
β”‚ β”‚ getPolicyCodeHash β”‚ (see Q2) β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ 3 β”‚ routing-full Justfile scenario + CI workflow step (anvil forks Sepolia + Base Sepolia, runs scenario, β”‚ Decided β”‚
β”‚ β”‚ gates stagef deploy) β”‚ β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ 4 β”‚ PR template checkbox ("did you add a strictness flag for this breaking change?") + release checklist β”‚ Decided β”‚
β”‚ β”‚ update + lessons.md entry codifying the pattern β”‚ β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ 5 β”‚ legacy_policy_detections_total{chain_id, feature} Prometheus counter + one critical alert β”‚ Decided β”‚
β”‚ β”‚ β”‚ (see Q4) β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Three small confirmations remaining

These were posed before Phase 3 redesigned but still apply.

Q2 β€” Strictness cache RPC failure default:

The cached is_strict() check needs a default when the on-chain read fails (RPC down, network glitch). Two reasonable defaults:

  • Permissive default (recommended): cache miss + RPC fail β†’ return false (treat as non-strict, allow legacy proxy path). Argument:
    prefer availability over strictness when telemetry is broken; a transient RPC blip shouldn't reject every task.
  • Strict default: cache miss + RPC fail β†’ return true (reject anything that would have been legacy-tolerated). Argument:
    fail-closed when uncertain.

My recommendation: permissive with a warn! log on every cache-miss-with-RPC-fail and a counter
(strictness_cache_rpc_failures_total) for alerting. Once strict mode is on, you want telemetry to flag the situation, but you don't
want a Sepolia RPC blip to brick task processing.

Q3 β€” setStrictnessFlag authorization:

  • onlyOwner now (recommended): deployer EOA flips the flag via just admin. Simple, ships today.
  • Safe multisig from day one: requires the Safe migration to land first (NEWT-882–894), which is on the roadmap but not done.

My recommendation: onlyOwner now, migrate to Safe via the NEWT-882–894 work later. The strictness flag is a "we're confident
migration is done" signal, not an emergency lever β€” single-signer is acceptable.

Q4 β€” Alerting threshold:

What's the page-someone alert? My recommendation: one critical day-1 rule.

(
policy_strictness_flag{feature="policy_code_hash"} == 1
) and (
rate(legacy_policy_detections_total{feature="policy_code_hash"}[5m]) > 0
)

Reads as: "strict mode is enabled AND legacy detections are still happening." That's the "we flipped the flag too early" alarm. Add
a non-paging warn-level rule for "legacy detections still happening even though we shipped the strict version" so we know to flip
the flag.

Confirm and save

Three quick confirmations:

  • Q2: permissive default + warn log + counter? (Y/N)
  • Q3: onlyOwner now, migrate to Safe later? (Y/N)
  • Q4: single critical day-1 alert per the PromQL above? (Y/N)

Once you confirm all three (or override), I'll fold the full plan into protocol-version-compatibility.md, update MEMORY.md, save to
supermemory, save to recall, and finish /save-progress.

✻ Churned for 57s

β€» recap: Designing a strictness-flag plan to prevent contract/Rust version drift, with Phase 3 settled as an anvil-fork pre-deploy
E2E gate running routing-full against Sepolia and Base Sepolia. Next: confirm Q2 (permissive RPC default), Q3 (onlyOwner now), Q4
(single critical alert) so I can save progress. (disable recaps in /config)

❯ do we still need strictness check?

⏺ Sharp question β€” and I think the honest answer is: probably not, given your current state. The two mechanisms solve different
problems, and the one the strictness flag solves may not be a problem yet.

β˜… Insight ─────────────────────────────────────

  • The pre-deploy E2E gate and the strictness flag answer different questions. E2E gate: "will Newton's binary work against the
    contracts we're shipping to?" Strictness flag: "should new Rust tolerate or reject legacy external policy proxies that we don't
    control?"
  • For Newton-controlled proxies (the only kind on stagef), the migration path is "re-deploy proxies before/during the binary
    deploy" β€” the E2E gate enforces this naturally because forked stagef state with un-migrated Newton proxies will fail the gate,
    blocking deploy.
  • The strictness flag is forward-looking infra for the day when external developers run their own policy clients in prod and you
    need to give them a bounded grace period without breaking their tasks.
    ─────────────────────────────────────────────────

What each mechanism actually protects against

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Failure mode β”‚ Caught by E2E gate? β”‚ Caught by strictness flag? β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ Newton ships Rust that breaks against β”‚ Yes (E2E fails) β”‚ N/A (flag only changes runtime β”‚
β”‚ Newton-deployed stagef proxies β”‚ β”‚ behavior) β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ Newton ships Rust that breaks against β”‚ Indirectly (stagef passes ⟹ prod likely β”‚ N/A β”‚
β”‚ Newton-deployed prod proxies β”‚ passes if both have same impl) β”‚ β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ External dev's legacy proxy hits new Rust β”‚ No (no external proxies on stagef to test β”‚ Yes (flag=false β†’ log + tolerate; β”‚
β”‚ on prod β”‚ against) β”‚ flag=true β†’ reject) β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ Newton wants to force external migration β”‚ N/A β”‚ Yes (flip flag to true) β”‚
β”‚ after grace window β”‚ β”‚ β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

The E2E gate handles every Newton-controlled scenario. The strictness flag only adds value for the external-policy-client grace
period.

The question that decides it

Do you currently have, or expect soon to have, external developers running their own NewtonPolicyClient extensions in production?

  • No, not soon: Drop the strictness flag entirely. Rely on E2E gate. Hard-cutover semantics for any future breaking change β€” when
    you do it, coordinate with whoever's running policy clients via Discord / direct comms. Add the flag later as a follow-up if
    external clients land.
  • Yes, in flight or imminent: Keep the flag. The grace period mechanism is what lets external devs migrate without their tasks
    failing the moment Newton deploys. Without it, every breaking change is an immediate hard break for them.

My recommendation

Drop Phases 1 and 2 for now. Keep Phases 3, 4, 5. Reasoning:

  • Stagef has no external policy clients (it's internal). E2E gate fully covers stagef incident class.
  • Prod likely has zero or near-zero external policy clients today (you're early-stage).
  • For Newton-controlled proxies, the migration path is "re-deploy proxies via factory before/with the binary deploy" β€” already part
    of the existing deploy workflow.
  • The strictness flag is theater for a problem that doesn't exist yet. Adding it now creates dead on-chain state and ~200 LOC of
    Rust dual-path machinery to maintain.
  • When external clients become real, add the flag then with full context on what their actual upgrade cadence looks like (which
    informs the grace period design).

What you give up by dropping it:

  • Future-self pre-emptive infra for external client migrations
  • The legacy_policy_detections_total metric (could still emit it as observability without the flag β€” just always-tolerate, log on
    miss)

What you gain:

  • Simpler shipping plan: just Phases 3 + 4 + 5
  • No on-chain admin surface to secure
  • No StrictnessCache to maintain
  • No synchronization burden between contract flag flipping and Rust deployment

Slimmed-down plan if you agree

β”Œβ”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Phase β”‚ Deliverable β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ 3 β”‚ routing-full Justfile scenario + CI workflow step (anvil forks Sepolia + Base Sepolia, runs scenario, gates stagef β”‚
β”‚ β”‚ deploy) β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ 4 β”‚ PR template checkbox ("does this PR add a new selector or break an existing one?") + release checklist + lessons.md β”‚
β”‚ β”‚ entry codifying the E2E-gate pattern β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ 5 β”‚ legacy_policy_detections_total{chain_id, feature} Prometheus counter + warn-level alert (no critical alert needed β”‚
β”‚ β”‚ without strict mode β€” just observability) β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Phase 1, 2, Q2, Q3, Q4 all become moot.

Decide

  • A. Drop strictness flag (Phases 1+2). Ship Phases 3+4+5 only. Add flag later if external clients arrive.
  • B. Keep strictness flag (full original plan, Phases 1-5). Future-proofs for external clients.
  • C. Hybrid β€” ship contract change (Phase 1) but defer Rust dual-path (Phase 2). Cheap on-chain mechanism in place, no Rust
    complexity yet.

My vote: A. Ship the minimum that solves the actual current problem (v0.3.0 incident class). Pick up the flag when external clients
become a concrete need, not a hypothetical.

Confirm A/B/C and I save.

✻ Cogitated for 1m 30s

❯ drop phase 4 - but replace with just lessons.md entry. pr template is overkill. phase 3 is a must, and phase 5 can be held off for
now as well

⏺ Locked in. Final scope: Phase 3 only + lessons.md entry. Loading the recall tools and checking for existing memory in parallel
before saving.

Listed 1 directory (ctrl+o to expand)

⏺ Now executing the save in parallel: write memory file, update MEMORY.md, push to supermemory, and search recall for dedup.

Wrote 2 memories (ctrl+o to expand)

⏺ Bash(node "$HOME/.claude/plugins/cache/supermemory-plugins/claude-supermemory/0.0.2/scripts/save-project-memory.cjs"
"[SAVE:dennis:2026-04-19]…)
⎿  Project knowledge saved: newton-prover-avs
ID: FqSZUTDETBNEz2F7Liykj4

⏺ Bash(test -d /Users/denniswon/projects/newton-prover-avs/.claude/rules && echo "rules dir exists")
⎿  rules dir exists
⎿  Context limit reached · /compact or /clear to continue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment