denniswon · April 14, 2026 14:38 · denniswon · Apr 14, 2026 · denniswon · Apr 14, 2026
diff --git a/gistfile1.txt b/gistfile1.txt
 ---
  Gateway Scalability & Throughput Analysis

  End-to-End Flow Timeline

  Here's the create_task critical path with approximate timings:

  Client Request
  │
  ├── Phase 0: Request Setup ..................... ~1-5ms
  │   Permission check, task_id, chain resolution, parse intent
  │
  ├── Phase 1: Block & Config Fetch ............. ~10-100ms
  │   policy_config RPC + block_cache (parallel tokio::try_join!)
  │
  ├── Phase 2: Task Creation Queue .............. ~0-1ms
  │   mpsc::send to batch submitter (non-blocking)
  │
  ├── Phase 3: BLS Aggregation Init ............. ~1-5ms
  │   DashMap insert, oneshot channel setup
  │
  ├── Phase 4: PREPARE Phase .................... ~100-30000ms ⭐ PRIMARY LATENCY
  │   EIP712 hash + ECDSA sign (CPU)
  │   broadcast_fetch_policy_data_unified → FuturesUnordered
  │   Per-response: 2× RwLock reads for stake data
  │   Exit on stake-weighted quorum
  │
  ├── Phase 5: CONSENSUS ........................ ~1-10ms
  │   Median normalization + tolerance check (CPU)
  │
  ├── Phase 6: COMMIT Phase ..................... ~100-30000ms ⭐ PRIMARY LATENCY
  │   broadcast_evaluate_and_sign → FuturesUnordered
  │   Waits for ALL operators (NO early exit on quorum!)
  │
  ├── Phase 7: BLS Aggregation .................. ~5-50ms
  │   Sequential verify + aggregate per response (CPU)
  │
  ├── Phase 8: Response Construction ............ ~1-10ms
  │   signature_data, certificate, validate_calldata
  │
  └── Return to Client
      │
      └── Phase 9: Background On-Chain .......... (async, decoupled)
          Batch submitter accumulates → pipeline_depth=3 → receipt tracker

  ★ Insight ─────────────────────────────────────
  The client-facing latency is dominated by the two sequential network broadcast rounds (Prepare + Commit). Everything else is sub-10ms.
  The key insight is that the Commit phase waits for ALL operators, not quorum — a single slow operator delays the entire response even
  when quorum stake is already met. This is the single biggest latency win available.
  ─────────────────────────────────────────────────

  ---
  Bottleneck Inventory

  CRITICAL — Directly impacts per-request latency

  ┌─────┬────────────────────────────────────────────────────────┬───────────────────────┬─────────────────────────────────────────────┐
  │ ID  │                       Bottleneck                       │       Location        │                   Impact                    │
  ├─────┼────────────────────────────────────────────────────────┼───────────────────────┼─────────────────────────────────────────────┤
  │ B1  │ Commit phase waits for ALL operators, no quorum early  │ operator.rs:1770-1802 │ p100 latency instead of p50-p75             │
  │     │ exit                                                   │                       │                                             │
  ├─────┼────────────────────────────────────────────────────────┼───────────────────────┼─────────────────────────────────────────────┤
  │ B2  │ ECDSA signer re-loaded from config on every task       │ common.rs:312-313     │ File I/O + key derivation per task          │
  ├─────┼────────────────────────────────────────────────────────┼───────────────────────┼─────────────────────────────────────────────┤
  │ B3  │ Policy config RPC call per task (no caching)           │ common.rs:121-124     │ 10-100ms RPC per task, even for same        │
  │     │                                                        │                       │ policy_client                               │
  ├─────┼────────────────────────────────────────────────────────┼───────────────────────┼─────────────────────────────────────────────┤
  │ B4  │ 2N RwLock acquisitions per broadcast (stake reads per  │ operator.rs:1072,1252 │ Lock contention scales with operator count  │
  │     │ response)                                              │                       │                                             │
  └─────┴────────────────────────────────────────────────────────┴───────────────────────┴─────────────────────────────────────────────┘

  HIGH — Impacts throughput under load

  ┌─────┬──────────────────────────────────────────────┬────────────────────────┬──────────────────────────────────────────────────────┐
  │ ID  │                  Bottleneck                  │        Location        │                        Impact                        │
  ├─────┼──────────────────────────────────────────────┼────────────────────────┼──────────────────────────────────────────────────────┤
  │ B5  │ gateway_tasks RwLock write in hot path       │ sync.rs:98-103         │ Serializes all concurrent task creations             │
  ├─────┼──────────────────────────────────────────────┼────────────────────────┼──────────────────────────────────────────────────────┤
  │ B6  │ Batch submitter pipeline semaphore blocks    │ batch_submitter.rs:551 │ Back-pressure cascades to RPC handler when 3 TXs     │
  │     │ accumulator                                  │                        │ in-flight                                            │
  ├─────┼──────────────────────────────────────────────┼────────────────────────┼──────────────────────────────────────────────────────┤
  │ B7  │ RPC response cloned for every operator       │ operator.rs:1245       │ ConsensusPrepareResponse can be large; N clones per  │
  │     │ response                                     │                        │ broadcast                                            │
  ├─────┼──────────────────────────────────────────────┼────────────────────────┼──────────────────────────────────────────────────────┤
  │ B8  │ Cold block cache on first request after      │ block_cache.rs:77      │ 100-800ms + 3 retries on cache miss                  │
  │     │ startup                                      │                        │                                                      │
  └─────┴──────────────────────────────────────────────┴────────────────────────┴──────────────────────────────────────────────────────┘

  MEDIUM — Reliability and resource usage

  ┌─────┬────────────────────────────────────────────┬──────────────────────────┬──────────────────────────────────────────────────────┐
  │ ID  │                 Bottleneck                 │         Location         │                        Impact                        │
  ├─────┼────────────────────────────────────────────┼──────────────────────────┼──────────────────────────────────────────────────────┤
  │ B9  │ No startup validation of operator          │ handler/mod.rs:1050-1130 │ Misconfigured sockets fail at first request, not     │
  │     │ reachability                               │                          │ startup                                              │
  ├─────┼────────────────────────────────────────────┼──────────────────────────┼──────────────────────────────────────────────────────┤
  │ B10 │ No startup validation of chain RPC health  │ handler/mod.rs:340-718   │ Unhealthy RPC discovered at request time             │
  ├─────┼────────────────────────────────────────────┼──────────────────────────┼──────────────────────────────────────────────────────┤
  │ B11 │ No circuit breaker on chain RPC endpoints  │ Throughout               │ Cascading failures when RPC degrades                 │
  ├─────┼────────────────────────────────────────────┼──────────────────────────┼──────────────────────────────────────────────────────┤
  │ B12 │ No backpressure signaling to clients       │ sync.rs, channel.rs      │ Silent queueing → eventual timeout                   │
  └─────┴────────────────────────────────────────────┴──────────────────────────┴──────────────────────────────────────────────────────┘

  ---
  Proposed Improvements

  Tier 1: High-Impact, Low-Risk (implement first)

  1. Quorum-based early exit for Commit phase

  The Commit phase (broadcast_evaluate_and_sign at operator.rs:1676) collects ALL responses from eligible operators with no quorum check.
  Compare with Prepare phase which exits as soon as stake threshold is met.

  Change: Refactor Commit broadcast to use broadcast_with_unified_quorum pattern — exit as soon as stake-weighted quorum is met. The BLS
  aggregation only needs quorum anyway; extra signatures are discarded.

  Expected improvement: With 5 operators at equal stake and 67% quorum, latency drops from p100 (slowest operator) to ~p60 (3rd fastest).

  2. Pass pre-loaded ECDSA signer instead of re-loading per task

  build_fetch_policy_data_request calls load_ecdsa(signer) on every invocation (common.rs:312). The signer is already loaded at
  GatewayServiceImpl construction (handler/mod.rs:407-408) as task_generator_signer.

  Change: Modify build_fetch_policy_data_request signature to accept &PrivateKeySigner directly. Pass self.task_generator_signer from
  sync.rs and channel.rs.

  3. Moka cache for PolicyConfig per (chain_id, policy_client)

  get_policy_config_for_client makes an on-chain RPC call for every task (common.rs:121-124). Policy configs rarely change — typically only
   on setPolicy calls.

  Change: Add moka::future::Cache<(u64, Address), PolicyConfig> to ChainService with 10-30s TTL. Check cache before RPC. Moka is already a
  workspace dependency.

  Expected improvement: Eliminates ~10-100ms RPC for repeat policy clients (the common case in production).

  4. Snapshot stake data once per broadcast, not per response

  Every operator response in the quorum loop triggers operator_stake.read().await and total_stake.read().await (operator.rs:1072,1252).
  With N operators, that's 2N async RwLock acquisitions.

  Change: At the start of each broadcast, snapshot operator_stake and total_stake into local HashMaps. Use the snapshots for all quorum
  calculations within that broadcast. Stake data changes only on OperatorAdded/Removed events — negligible during a single 30s broadcast
  window.

  Expected improvement: Reduces async lock contention from O(N) to O(1) per broadcast. Significant at high operator counts.

  5. Replace gateway_tasks with DashMap

  gateway_tasks: Arc<tokio::sync::RwLock<HashMap<String, HashSet<TaskId>>>> requires a write lock for every task creation (sync.rs:98-103).
   Under concurrent load, this serializes all task initiations.

  Change: DashMap<String, DashSet<TaskId>> — lock-free per-API-key granularity. DashMap is already used for aggregator.task_states.

  Tier 2: Medium-Impact, Medium-Risk

  6. Pre-warm block caches on startup

  First task after gateway start hits a cold block cache → cache miss → RPC call with up to 3 retries at 100-800ms backoff
  (block_cache.rs:21-26).

  Change: In Gateway::from_multi_chain_config(), call block_cache.refresh() for all chains before returning. Fail startup if RPC
  unreachable.

  7. Avoid response clones in normal (quorum success) path

  all_responses.push((operator_id, response_result.clone())) at operator.rs:1245 clones the entire RPC response for every operator.
  ConsensusPrepareResponse includes policyTaskData which can be large.

  Change: Only store (OperatorId, is_success) in the tracking vec. Keep the full response_result only in results (move, not clone). On
  quorum failure, reconstruct errors from the tracked failures.

  8. Startup validation: operator reachability + contract existence

  Changes:
  - After operator pool init, send a lightweight JSON-RPC ping (e.g., newt_getPublicKey or custom newt_health) to each operator. Log
  warnings for unreachable operators. Pre-open circuit breakers.
  - During ChainService init, call provider.get_code(contract_addr) for TaskManager, IdentityRegistry, ConfidentialDataRegistry. Fail
  startup if code is empty (zero-address check is not enough — the address could be non-zero but point to an EOA or wrong contract).

  9. Chain-level RPC circuit breaker

  Currently circuit breakers exist per-operator but not per-chain-RPC. If the Ethereum Sepolia RPC goes down, every task attempting block
  cache refresh or policy config fetch hangs for the full timeout.

  Change: Add a failsafe circuit breaker wrapping the ChainRpcProviderConfig HTTP provider. Open after N consecutive failures, fail-fast
  subsequent requests, half-open after cooldown.

  Tier 3: Architectural Improvements (higher effort, plan for later)

  10. Streaming consensus: incremental median during Prepare

  Instead of waiting for all Prepare responses, maintain a running median as responses arrive. When quorum is met, consensus is already
  computed — the 1-10ms compute_consensus_from_unsigned call becomes near-zero.

  11. Request deduplication for identical tasks

  Hash (policy_client, intent, wasm_args) → check if an in-flight task with the same hash exists → return the pending result. Under burst
  traffic with duplicates, eliminates redundant operator broadcasts.

  12. Concurrent operator requests via HTTP/2 multiplexing

  Each OperatorWorker processes requests sequentially (one at a time from its mpsc channel). HTTP/2 already supports multiplexing at the
  transport layer. Allow N concurrent in-flight requests per operator via semaphore-bounded processing, not strict sequential.

  13. Backpressure signaling

  When batch submitter queue depth exceeds a threshold, surface this to the RPC handler and return HTTP 503 (Service Unavailable) with a
  Retry-After header. Currently, tasks silently queue and may hit TaskCreatedBlockTooOld on-chain if they sit too long.

  ---
  Fail-Fast & Reliability Checklist

  Startup

  ┌──────────────────────────┬────────────────────────────────┬──────────────────────────────────────────────────┐
  │          Check           │         Current State          │                     Proposed                     │
  ├──────────────────────────┼────────────────────────────────┼──────────────────────────────────────────────────┤
  │ Operator reachable       │ Not checked                    │ Ping each operator, warn + pre-open CB           │
  ├──────────────────────────┼────────────────────────────────┼──────────────────────────────────────────────────┤
  │ Chain RPC healthy        │ Implicit (first request fails) │ Eager block cache refresh, fail startup on error │
  ├──────────────────────────┼────────────────────────────────┼──────────────────────────────────────────────────┤
  │ Contract deployed        │ Zero-address check only        │ get_code() check for critical contracts          │
  ├──────────────────────────┼────────────────────────────────┼──────────────────────────────────────────────────┤
  │ Database connectivity    │ Checked                        │ Already good                                     │
  ├──────────────────────────┼────────────────────────────────┼──────────────────────────────────────────────────┤
  │ Redis connectivity       │ Silently degrades              │ Log prominently, mark feature as disabled        │
  ├──────────────────────────┼────────────────────────────────┼──────────────────────────────────────────────────┤
  │ Batch submitter contract │ Zero-address fallback          │ Already good                                     │
  └──────────────────────────┴────────────────────────────────┴──────────────────────────────────────────────────┘

  Runtime

  ┌──────────────────────────┬─────────────────────────────┬─────────────────────────────────────────────┐
  │          Check           │        Current State        │                  Proposed                   │
  ├──────────────────────────┼─────────────────────────────┼─────────────────────────────────────────────┤
  │ Chain RPC degradation    │ No circuit breaker          │ Add chain-level circuit breaker             │
  ├──────────────────────────┼─────────────────────────────┼─────────────────────────────────────────────┤
  │ Batch queue backpressure │ Silent queueing             │ 503 when queue > threshold                  │
  ├──────────────────────────┼─────────────────────────────┼─────────────────────────────────────────────┤
  │ Operator pool empty      │ Returns error               │ Already good, but add metric for visibility │
  ├──────────────────────────┼─────────────────────────────┼─────────────────────────────────────────────┤
  │ Block cache staleness    │ TTL-based refresh           │ Add staleness metric, alert if > 2×TTL      │
  ├──────────────────────────┼─────────────────────────────┼─────────────────────────────────────────────┤
  │ Aggregator zombie slots  │ Periodic cleanup (30s-6min) │ Already good (cancel_task in all paths)     │
  └──────────────────────────┴─────────────────────────────┴─────────────────────────────────────────────┘

  ---
  Execution Order

  I'd suggest:

  1. B1 (Commit quorum early exit) + B2 (signer caching) + B4 (stake snapshot) — these are all in the hot path and can be done in parallel
  across files
  2. B3 (policy config cache) — needs Moka cache struct added to ChainService
  3. B5 (gateway_tasks DashMap) — mechanical type replacement
  4. B7 (response clone avoidance) + B8 (block cache pre-warm) — cleanup
  5. B9+B10 (startup validation) — reliability, not latency
  6. B11 (chain RPC circuit breaker) — requires design discussion on failure modes

  The most impactful single change is B1 — making the Commit phase quorum-aware. Combined with B4 (stake snapshot), this removes the two
  biggest latency contributors in the normal path.

  ---
  What's your priority? Want me to start with the Tier 1 changes (B1-B5), or do you want to discuss the approach for any of these first?
No results found