denniswon · April 17, 2026 05:17 · denniswon · Apr 19, 2026 · denniswon · Apr 19, 2026
diff --git a/gistfile1.txt b/gistfile1.txt
 The goal of prevent developers in the team accidentally forget to make sure the Rust code to be compatible to both old vs new
  contract versions. There are 3 scenarios: 1) contracts upgraded first with breaking change before Rust update 2) contracts with
  breaking change not upgraded yet, but Rust deploys, 3) contracts upgraded and Rust update also deployed, but for Newton
  Policy/Newton Policy Data/and Policy Clients (that extends NewtonPolicyClient mixin), there is a grace period for the developers
  to migrate/update their policy clients with new newton policy or policy data. This means the Rust update that pairs with the
  contracts upgrade with breaking change should support both versions during this grace period (until version enforcement happens).
  1, 2 are for prevent protocol literally failing, while 3 is for external developer's policy clients to start failing suddenly
  without having time to become compatible. /clarify-plan THINK HARD

 ⏺ I'll work through this as a design problem. The three scenarios sort into two fundamentally different problem classes, which
  matters for what prevention mechanism fits each.

  ★ Insight ─────────────────────────────────────
  - Scenarios 1+2 are atomic rollout problems (the AVS itself breaks during deploy transitions). Fix is deploy ordering + pre-deploy
  CI.
  - Scenario 3 is a long-tail multi-version support problem (external-dev UX, not protocol integrity). Fix is dual-path code that
  Rust devs cannot forget to write.
  - The hard part of #3 isn't "add tolerance code" — we know how to do that. It's making forgetting it structurally impossible (type
  system, CI, lint) rather than relying on reviewers remembering.
  ─────────────────────────────────────────────────

  Problem framing

  Mapping to enforcement mechanisms:

  ┌────────────────────────────────────────┬────────────────────┬─────────────┬──────────────────────────────────────────────────┐
  │                Scenario                │   Failure class    │  Who gets   │                Natural prevention                │
  │                                        │                    │    hurt     │                                                  │
  ├────────────────────────────────────────┼────────────────────┼─────────────┼──────────────────────────────────────────────────┤
  │ 1 — contracts ahead of Rust            │ protocol-down      │ us (newton) │ deploy-pipeline ordering +                       │
  │                                        │                    │             │ MIN_COMPATIBLE_VERSION                           │
  ├────────────────────────────────────────┼────────────────────┼─────────────┼──────────────────────────────────────────────────┤
  │ 2 — Rust ahead of contracts            │ protocol-down      │ us (newton) │ deploy-pipeline ordering + pre-deploy selector   │
  │                                        │                    │             │ probe                                            │
  ├────────────────────────────────────────┼────────────────────┼─────────────┼──────────────────────────────────────────────────┤
  │ 3 — policy proxies stuck on old impl   │ client-policy-down │ external    │ dual-version Rust code paths that can't be       │
  │ during grace period                    │                    │ devs        │ accidentally skipped                             │
  └────────────────────────────────────────┴────────────────────┴─────────────┴──────────────────────────────────────────────────┘

  The core hazard in #3: NewtonPolicy / NewtonPolicyData are initialize-once proxies. When we ship a new version, the impl on chain
  is new but all existing proxies still delegate to the old bytecode. External devs have to redeploy their policies to opt in. Rust
  must gracefully serve both until enforcement.

  Clarifying questions (need answers before writing the plan)

  Before I draft the actual implementation, I need to resolve ambiguity on these:

  Q1 — Version detection. How does Rust know which version a given policy proxy is on?
  - (a) NewtonPolicy.PROTOCOL_VERSION() selector (does this exist on old impls? if not, asking it causes the same selector-miss we're
   trying to avoid)
  - (b) ERC-165 supportsInterface(bytes4) probe per new selector (robust but requires adding supportsInterface to NewtonPolicy in
  v0.3.0+)
  - (c) Bytecode classification (cast code + known hash per impl) — works today with zero contract changes but brittle if impl is
  redeployed
  - (d) PolicyClientRegistry stores declared version at registration time — clean but needs registry writes on every upgrade

  Which direction do you want? (b) is my recommendation; (c) works as a stopgap without any contract work.

  Q2 — Grace period trigger. Is the end of grace period:
  - (a) A fixed date baked into Rust at release time (grace_period_end: 2026-07-15)
  - (b) An admin-flipped on-chain flag (e.g., MIN_POLICY_VERSION on PolicyClientRegistry)
  - (c) Both — Rust has default, admin can extend/shorten

  (b) is more flexible but means Rust must query it. (a) is rigid but simpler.

  Q3 — Scope of the compat layer. Is this only for NewtonPolicy and NewtonPolicyData selectors called at task time? Or also:
  - Policy deployment path (NewtonPolicyFactory.deployPolicy signature changes)?
  - Events emitted by old vs new NewtonPolicy (events don't have selector-miss semantics but ABI changes matter)?
  - NewtonPolicyClient mixin internals (which external devs compile into their own contracts)?

  The mixin is the trickiest because it lives in external-dev bytecode — we can't upgrade it at all. External devs inherit the mixin
  version at compile time of their client.

  Q4 — Policy version enumeration. Do we want active tracking of "which policy_client is on which impl"?
  - Useful for: alerting external devs before grace expires, dashboards
  - Requires: crawler that walks PolicyClientRegistry + classifies each proxy
  - If yes, where does it live — gateway, a new service, CI job that publishes to a dashboard?

  Q5 — Deprecation communication channel. When we detect a legacy policy client at task time, do we:
  - Emit a structured log line only (devs may never see it)
  - Emit a Prometheus metric legacy_policy_detections_total{policy_client, version, days_until_grace_end} (ops can alert)
  - Return a deprecation warning header in the RPC response (external devs see it)
  - Auto-create a Linear ticket per detected client

  My instinct is all four at different cadences, but which are in scope?

  Q6 — Existing tolerance patterns. The v0.3.0 incident surfaced getPolicyCodeHash as an example. Before we design the general
  system, can you confirm:
  - Did we ship tolerance for that in Rust in a follow-up PR, or was the fix "redeploy all policies"?
  - If tolerance exists, where? (It will be the seed pattern for the general BackwardCompat abstraction.)

  Q7 — Release calendar visibility. How do we want grace periods to be discoverable across the team?
  - (a) .claude/rules/lessons.md entry per release
  - (b) docs/COMPATIBILITY_MATRIX.md — single source of truth listing every BackwardCompat, when it expires, which version introduced
   it
  - (c) Linear project "Protocol Compatibility" with one ticket per open compat window

  Draft plan (pending answers above)

  Regardless of how Q1–Q7 resolve, the plan will have four tiers. The choices above determine how each tier is implemented.

  Tier 1 — Atomic rollout (scenarios 1 + 2, protocol self-compat).
  - Per-release: pre-deploy CI smoke test exercises gateway → operator → TaskManager on a fork against current production contract
  state. Fails before deploy if Rust can't talk to on-chain contracts.
  - Per-contract-PR: require same-PR Rust change OR explicit "no Rust impact" label. Enforced via GitHub required-label check.
  - Deploy pipeline: ordering contract — deploy-trigger.yml already does contracts-first on stagef; formalize prod ordering
  (contracts first, bake time, then Rust) in runbook.
  - MIN_COMPATIBLE_VERSION already enforces the semver floor on-chain; keep it as the last line of defense.

  Tier 2 — Typed dual-version abstraction (scenario 3, policy proxy grace period).

  Introduce crates/core/src/compat/ module with:

  pub struct BackwardCompat<T> {
      new_value: Option<T>,               // None = legacy proxy
      legacy_behavior: T,                 // what we did pre-v0.X
      grace_end: chrono::NaiveDate,
      rollout_ticket: &'static str,       // Linear ID for visibility
  }

  impl<T: Clone> BackwardCompat<T> {
      pub fn resolve(&self, now: chrono::NaiveDate) -> Result<T, CompatError> {
          match (&self.new_value, now >= self.grace_end) {
              (Some(v), _) => Ok(v.clone()),
              (None, false) => {
                  metrics::inc_legacy_policy_detection(self.rollout_ticket);
                  Ok(self.legacy_behavior.clone())
              }
              (None, true) => Err(CompatError::GraceExpired { .. }),
          }
      }
  }

  Every new required contract selector MUST go through this type. Enforced via:

  - Custom clippy lint (or dylint/ast_grep-based CI check) that flags direct calls to a registered set of "new selectors" outside of
  compat:: module.
  - Codeowner rule: changes to crates/core/src/compat/ require Dennis + one more reviewer.

  Tier 3 — CI enforcement of the compatibility lifecycle.

  - just check-grace-periods: walks all BackwardCompat instances (via a registry macro), fails CI if any grace_end < today. Forces
  the dev to either:
    - Remove the compat wrapper (use the new path directly) and prove all proxies migrated.
    - Extend the grace_end with justification.
  - just test-legacy-compat: fixture test that deploys a snapshot of the previous policy impl and exercises Rust code paths against
  it. Ships as a required CI job on PRs that touch contracts or compat/.
  - Scheduled CI job (weekly): queries PolicyClientRegistry on each network, classifies each proxy by impl, publishes dashboard +
  alerts when >0 clients are on impl version nearing grace expiry.

  Tier 4 — External-dev communication.

  - Runtime: operator logs legacy_policy_detected at warn! with policy_client address + grace_end. Gateway also emits Prometheus
  metric legacy_policy_detections_total.
  - RPC layer: gateway adds x-newton-policy-version header on task responses when legacy detected — surfaces to integrations without
  requiring log access.
  - Ops runbook: when dashboard shows legacy clients within 30 days of grace, ops emails external devs + files Linear tickets.

  Tradeoffs to flag

  - Type-driven enforcement vs lint-driven. Type-driven (wrap in BackwardCompat) is iron-clad but adds ceremony. Lint-driven (custom
  clippy/ast_grep check) is lighter but requires keeping the selector list curated. I'd pick type-driven for clarity.
  - Static grace dates vs admin-flipped on-chain flag. Static is simpler; admin-flipped is flexible (extend grace if external dev
  ecosystem isn't migrating fast enough). A hybrid — Rust has default static date, admin flag can extend only — captures both safely.
  - Fixture-based legacy tests vs on-fork tests. Fixtures (snapshot old bytecode) are fast + deterministic but grow stale. On-fork
  tests (against live stagef) are realistic but slow + flaky. Do both: fixtures on every PR, on-fork nightly.
  - Cost of supporting old impls forever. Every BackwardCompat is technical debt. If we don't actively retire them, the compat layer
  grows unbounded. The grace_end CI check is what prevents this — but only if we don't just keep bumping the date.

  What I need from you to write the implementation plan

  Answering Q1–Q3 is minimum viable. Q4–Q7 shape Tiers 3+4 but don't block starting Tiers 1+2. Preferred path:

  1. You answer Q1–Q3 (version detection mechanism, grace trigger, scope).
  2. I write a focused implementation plan for Tiers 1 and 2 (protocol self-compat + typed dual-version abstraction) + scaffolding
  for Tier 3 CI check.
  3. Tiers 3 and 4 details get fleshed out based on Q4–Q7 answers in a follow-up.

  ★ Insight ─────────────────────────────────────
  - The real systemic win from this work is removing the judgement call from developers. Today, "do I need tolerance here?" is a
  reviewer's job and fails silently. With BackwardCompat<T> as a required wrapper, "tolerance was added" becomes a compile-time fact
  and "tolerance is still needed" becomes a CI-enforced fact. Reviewer discipline goes from required to optional — which is the only
  way discipline actually survives.
  - The mixin case (NewtonPolicyClient that external devs compile into their own contracts) is the worst version of this problem
  because we can't upgrade it at all. Worth thinking separately about whether future NewtonPolicyClient changes should use
  interface-based dispatch rather than mixin inheritance, so external devs don't have to recompile their contracts to get updates.
  That's an architectural shift, not a compat layer — flag for separate discussion.
  ─────────────────────────────────────────────────

  Which of Q1–Q7 should I pursue first?
	The goal of prevent developers in the team accidentally forget to make sure the Rust code to be compatible to both old vs new
	contract versions. There are 3 scenarios: 1) contracts upgraded first with breaking change before Rust update 2) contracts with
	breaking change not upgraded yet, but Rust deploys, 3) contracts upgraded and Rust update also deployed, but for Newton
	Policy/Newton Policy Data/and Policy Clients (that extends NewtonPolicyClient mixin), there is a grace period for the developers
	to migrate/update their policy clients with new newton policy or policy data. This means the Rust update that pairs with the
	contracts upgrade with breaking change should support both versions during this grace period (until version enforcement happens).
	1, 2 are for prevent protocol literally failing, while 3 is for external developer's policy clients to start failing suddenly
	without having time to become compatible. /clarify-plan THINK HARD

	⏺ I'll work through this as a design problem. The three scenarios sort into two fundamentally different problem classes, which
	matters for what prevention mechanism fits each.

	★ Insight ─────────────────────────────────────
	- Scenarios 1+2 are atomic rollout problems (the AVS itself breaks during deploy transitions). Fix is deploy ordering + pre-deploy
	CI.
	- Scenario 3 is a long-tail multi-version support problem (external-dev UX, not protocol integrity). Fix is dual-path code that
	Rust devs cannot forget to write.
	- The hard part of #3 isn't "add tolerance code" — we know how to do that. It's making forgetting it structurally impossible (type
	system, CI, lint) rather than relying on reviewers remembering.
	─────────────────────────────────────────────────

	Problem framing

	Mapping to enforcement mechanisms:

	┌────────────────────────────────────────┬────────────────────┬─────────────┬──────────────────────────────────────────────────┐
	│ Scenario │ Failure class │ Who gets │ Natural prevention │
	│ │ │ hurt │ │
	├────────────────────────────────────────┼────────────────────┼─────────────┼──────────────────────────────────────────────────┤
	│ 1 — contracts ahead of Rust │ protocol-down │ us (newton) │ deploy-pipeline ordering + │
	│ │ │ │ MIN_COMPATIBLE_VERSION │
	├────────────────────────────────────────┼────────────────────┼─────────────┼──────────────────────────────────────────────────┤
	│ 2 — Rust ahead of contracts │ protocol-down │ us (newton) │ deploy-pipeline ordering + pre-deploy selector │
	│ │ │ │ probe │
	├────────────────────────────────────────┼────────────────────┼─────────────┼──────────────────────────────────────────────────┤
	│ 3 — policy proxies stuck on old impl │ client-policy-down │ external │ dual-version Rust code paths that can't be │
	│ during grace period │ │ devs │ accidentally skipped │
	└────────────────────────────────────────┴────────────────────┴─────────────┴──────────────────────────────────────────────────┘

	The core hazard in #3: NewtonPolicy / NewtonPolicyData are initialize-once proxies. When we ship a new version, the impl on chain
	is new but all existing proxies still delegate to the old bytecode. External devs have to redeploy their policies to opt in. Rust
	must gracefully serve both until enforcement.

	Clarifying questions (need answers before writing the plan)

	Before I draft the actual implementation, I need to resolve ambiguity on these:

	Q1 — Version detection. How does Rust know which version a given policy proxy is on?
	- (a) NewtonPolicy.PROTOCOL_VERSION() selector (does this exist on old impls? if not, asking it causes the same selector-miss we're
	trying to avoid)
	- (b) ERC-165 supportsInterface(bytes4) probe per new selector (robust but requires adding supportsInterface to NewtonPolicy in
	v0.3.0+)
	- (c) Bytecode classification (cast code + known hash per impl) — works today with zero contract changes but brittle if impl is
	redeployed
	- (d) PolicyClientRegistry stores declared version at registration time — clean but needs registry writes on every upgrade

	Which direction do you want? (b) is my recommendation; (c) works as a stopgap without any contract work.

	Q2 — Grace period trigger. Is the end of grace period:
	- (a) A fixed date baked into Rust at release time (grace_period_end: 2026-07-15)
	- (b) An admin-flipped on-chain flag (e.g., MIN_POLICY_VERSION on PolicyClientRegistry)
	- (c) Both — Rust has default, admin can extend/shorten

	(b) is more flexible but means Rust must query it. (a) is rigid but simpler.

	Q3 — Scope of the compat layer. Is this only for NewtonPolicy and NewtonPolicyData selectors called at task time? Or also:
	- Policy deployment path (NewtonPolicyFactory.deployPolicy signature changes)?
	- Events emitted by old vs new NewtonPolicy (events don't have selector-miss semantics but ABI changes matter)?
	- NewtonPolicyClient mixin internals (which external devs compile into their own contracts)?

	The mixin is the trickiest because it lives in external-dev bytecode — we can't upgrade it at all. External devs inherit the mixin
	version at compile time of their client.

	Q4 — Policy version enumeration. Do we want active tracking of "which policy_client is on which impl"?
	- Useful for: alerting external devs before grace expires, dashboards
	- Requires: crawler that walks PolicyClientRegistry + classifies each proxy
	- If yes, where does it live — gateway, a new service, CI job that publishes to a dashboard?

	Q5 — Deprecation communication channel. When we detect a legacy policy client at task time, do we:
	- Emit a structured log line only (devs may never see it)
	- Emit a Prometheus metric legacy_policy_detections_total{policy_client, version, days_until_grace_end} (ops can alert)
	- Return a deprecation warning header in the RPC response (external devs see it)
	- Auto-create a Linear ticket per detected client

	My instinct is all four at different cadences, but which are in scope?

	Q6 — Existing tolerance patterns. The v0.3.0 incident surfaced getPolicyCodeHash as an example. Before we design the general
	system, can you confirm:
	- Did we ship tolerance for that in Rust in a follow-up PR, or was the fix "redeploy all policies"?
	- If tolerance exists, where? (It will be the seed pattern for the general BackwardCompat abstraction.)

	Q7 — Release calendar visibility. How do we want grace periods to be discoverable across the team?
	- (a) .claude/rules/lessons.md entry per release
	- (b) docs/COMPATIBILITY_MATRIX.md — single source of truth listing every BackwardCompat, when it expires, which version introduced
	it
	- (c) Linear project "Protocol Compatibility" with one ticket per open compat window

	Draft plan (pending answers above)

	Regardless of how Q1–Q7 resolve, the plan will have four tiers. The choices above determine how each tier is implemented.

	Tier 1 — Atomic rollout (scenarios 1 + 2, protocol self-compat).
	- Per-release: pre-deploy CI smoke test exercises gateway → operator → TaskManager on a fork against current production contract
	state. Fails before deploy if Rust can't talk to on-chain contracts.
	- Per-contract-PR: require same-PR Rust change OR explicit "no Rust impact" label. Enforced via GitHub required-label check.
	- Deploy pipeline: ordering contract — deploy-trigger.yml already does contracts-first on stagef; formalize prod ordering
	(contracts first, bake time, then Rust) in runbook.
	- MIN_COMPATIBLE_VERSION already enforces the semver floor on-chain; keep it as the last line of defense.

	Tier 2 — Typed dual-version abstraction (scenario 3, policy proxy grace period).

	Introduce crates/core/src/compat/ module with:

	pub struct BackwardCompat<T> {
	new_value: Option<T>, // None = legacy proxy
	legacy_behavior: T, // what we did pre-v0.X
	grace_end: chrono::NaiveDate,
	rollout_ticket: &'static str, // Linear ID for visibility
	}

	impl<T: Clone> BackwardCompat<T> {
	pub fn resolve(&self, now: chrono::NaiveDate) -> Result<T, CompatError> {
	match (&self.new_value, now >= self.grace_end) {
	(Some(v), _) => Ok(v.clone()),
	(None, false) => {
	metrics::inc_legacy_policy_detection(self.rollout_ticket);
	Ok(self.legacy_behavior.clone())
	}
	(None, true) => Err(CompatError::GraceExpired { .. }),
	}
	}
	}

	Every new required contract selector MUST go through this type. Enforced via:

	- Custom clippy lint (or dylint/ast_grep-based CI check) that flags direct calls to a registered set of "new selectors" outside of
	compat:: module.
	- Codeowner rule: changes to crates/core/src/compat/ require Dennis + one more reviewer.

	Tier 3 — CI enforcement of the compatibility lifecycle.

	- just check-grace-periods: walks all BackwardCompat instances (via a registry macro), fails CI if any grace_end < today. Forces
	the dev to either:
	- Remove the compat wrapper (use the new path directly) and prove all proxies migrated.
	- Extend the grace_end with justification.
	- just test-legacy-compat: fixture test that deploys a snapshot of the previous policy impl and exercises Rust code paths against
	it. Ships as a required CI job on PRs that touch contracts or compat/.
	- Scheduled CI job (weekly): queries PolicyClientRegistry on each network, classifies each proxy by impl, publishes dashboard +
	alerts when >0 clients are on impl version nearing grace expiry.

	Tier 4 — External-dev communication.

	- Runtime: operator logs legacy_policy_detected at warn! with policy_client address + grace_end. Gateway also emits Prometheus
	metric legacy_policy_detections_total.
	- RPC layer: gateway adds x-newton-policy-version header on task responses when legacy detected — surfaces to integrations without
	requiring log access.
	- Ops runbook: when dashboard shows legacy clients within 30 days of grace, ops emails external devs + files Linear tickets.

	Tradeoffs to flag

	- Type-driven enforcement vs lint-driven. Type-driven (wrap in BackwardCompat) is iron-clad but adds ceremony. Lint-driven (custom
	clippy/ast_grep check) is lighter but requires keeping the selector list curated. I'd pick type-driven for clarity.
	- Static grace dates vs admin-flipped on-chain flag. Static is simpler; admin-flipped is flexible (extend grace if external dev
	ecosystem isn't migrating fast enough). A hybrid — Rust has default static date, admin flag can extend only — captures both safely.
	- Fixture-based legacy tests vs on-fork tests. Fixtures (snapshot old bytecode) are fast + deterministic but grow stale. On-fork
	tests (against live stagef) are realistic but slow + flaky. Do both: fixtures on every PR, on-fork nightly.
	- Cost of supporting old impls forever. Every BackwardCompat is technical debt. If we don't actively retire them, the compat layer
	grows unbounded. The grace_end CI check is what prevents this — but only if we don't just keep bumping the date.

	What I need from you to write the implementation plan

	Answering Q1–Q3 is minimum viable. Q4–Q7 shape Tiers 3+4 but don't block starting Tiers 1+2. Preferred path:

	1. You answer Q1–Q3 (version detection mechanism, grace trigger, scope).
	2. I write a focused implementation plan for Tiers 1 and 2 (protocol self-compat + typed dual-version abstraction) + scaffolding
	for Tier 3 CI check.
	3. Tiers 3 and 4 details get fleshed out based on Q4–Q7 answers in a follow-up.

	★ Insight ─────────────────────────────────────
	- The real systemic win from this work is removing the judgement call from developers. Today, "do I need tolerance here?" is a
	reviewer's job and fails silently. With BackwardCompat<T> as a required wrapper, "tolerance was added" becomes a compile-time fact
	and "tolerance is still needed" becomes a CI-enforced fact. Reviewer discipline goes from required to optional — which is the only
	way discipline actually survives.
	- The mixin case (NewtonPolicyClient that external devs compile into their own contracts) is the worst version of this problem
	because we can't upgrade it at all. Worth thinking separately about whether future NewtonPolicyClient changes should use
	interface-based dispatch rather than mixin inheritance, so external devs don't have to recompile their contracts to get updates.
	That's an architectural shift, not a compat layer — flag for separate discussion.
	─────────────────────────────────────────────────

	Which of Q1–Q7 should I pursue first?
No results found