-
-
Save denniswon/8a55e803f87d96eca546ac7aeea8297a to your computer and use it in GitHub Desktop.
| --- | |
| Gateway Scalability & Throughput Analysis | |
| End-to-End Flow Timeline | |
| Here's the create_task critical path with approximate timings: | |
| Client Request | |
| β | |
| βββ Phase 0: Request Setup ..................... ~1-5ms | |
| β Permission check, task_id, chain resolution, parse intent | |
| β | |
| βββ Phase 1: Block & Config Fetch ............. ~10-100ms | |
| β policy_config RPC + block_cache (parallel tokio::try_join!) | |
| β | |
| βββ Phase 2: Task Creation Queue .............. ~0-1ms | |
| β mpsc::send to batch submitter (non-blocking) | |
| β | |
| βββ Phase 3: BLS Aggregation Init ............. ~1-5ms | |
| β DashMap insert, oneshot channel setup | |
| β | |
| βββ Phase 4: PREPARE Phase .................... ~100-30000ms β PRIMARY LATENCY | |
| β EIP712 hash + ECDSA sign (CPU) | |
| β broadcast_fetch_policy_data_unified β FuturesUnordered | |
| β Per-response: 2Γ RwLock reads for stake data | |
| β Exit on stake-weighted quorum | |
| β | |
| βββ Phase 5: CONSENSUS ........................ ~1-10ms | |
| β Median normalization + tolerance check (CPU) | |
| β | |
| βββ Phase 6: COMMIT Phase ..................... ~100-30000ms β PRIMARY LATENCY | |
| β broadcast_evaluate_and_sign β FuturesUnordered | |
| β Waits for ALL operators (NO early exit on quorum!) | |
| β | |
| βββ Phase 7: BLS Aggregation .................. ~5-50ms | |
| β Sequential verify + aggregate per response (CPU) | |
| β | |
| βββ Phase 8: Response Construction ............ ~1-10ms | |
| β signature_data, certificate, validate_calldata | |
| β | |
| βββ Return to Client | |
| β | |
| βββ Phase 9: Background On-Chain .......... (async, decoupled) | |
| Batch submitter accumulates β pipeline_depth=3 β receipt tracker | |
| β Insight βββββββββββββββββββββββββββββββββββββ | |
| The client-facing latency is dominated by the two sequential network broadcast rounds (Prepare + Commit). Everything else is sub-10ms. | |
| The key insight is that the Commit phase waits for ALL operators, not quorum β a single slow operator delays the entire response even | |
| when quorum stake is already met. This is the single biggest latency win available. | |
| βββββββββββββββββββββββββββββββββββββββββββββββββ | |
| --- | |
| Bottleneck Inventory | |
| CRITICAL β Directly impacts per-request latency | |
| βββββββ¬βββββββββββββββββββββββββββββββββββββββββββββββββββββββββ¬ββββββββββββββββββββββββ¬ββββββββββββββββββββββββββββββββββββββββββββββ | |
| β ID β Bottleneck β Location β Impact β | |
| βββββββΌβββββββββββββββββββββββββββββββββββββββββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββββββββββββββββββββββββ€ | |
| β B1 β Commit phase waits for ALL operators, no quorum early β operator.rs:1770-1802 β p100 latency instead of p50-p75 β | |
| β β exit β β β | |
| βββββββΌβββββββββββββββββββββββββββββββββββββββββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββββββββββββββββββββββββ€ | |
| β B2 β ECDSA signer re-loaded from config on every task β common.rs:312-313 β File I/O + key derivation per task β | |
| βββββββΌβββββββββββββββββββββββββββββββββββββββββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββββββββββββββββββββββββ€ | |
| β B3 β Policy config RPC call per task (no caching) β common.rs:121-124 β 10-100ms RPC per task, even for same β | |
| β β β β policy_client β | |
| βββββββΌβββββββββββββββββββββββββββββββββββββββββββββββββββββββββΌββββββββββββββββββββββββΌββββββββββββββββββββββββββββββββββββββββββββββ€ | |
| β B4 β 2N RwLock acquisitions per broadcast (stake reads per β operator.rs:1072,1252 β Lock contention scales with operator count β | |
| β β response) β β β | |
| βββββββ΄βββββββββββββββββββββββββββββββββββββββββββββββββββββββββ΄ββββββββββββββββββββββββ΄ββββββββββββββββββββββββββββββββββββββββββββββ | |
| HIGH β Impacts throughput under load | |
| βββββββ¬βββββββββββββββββββββββββββββββββββββββββββββββ¬βββββββββββββββββββββββββ¬βββββββββββββββββββββββββββββββββββββββββββββββββββββββ | |
| β ID β Bottleneck β Location β Impact β | |
| βββββββΌβββββββββββββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββββββββββββββββββββββββββ€ | |
| β B5 β gateway_tasks RwLock write in hot path β sync.rs:98-103 β Serializes all concurrent task creations β | |
| βββββββΌβββββββββββββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββββββββββββββββββββββββββ€ | |
| β B6 β Batch submitter pipeline semaphore blocks β batch_submitter.rs:551 β Back-pressure cascades to RPC handler when 3 TXs β | |
| β β accumulator β β in-flight β | |
| βββββββΌβββββββββββββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββββββββββββββββββββββββββ€ | |
| β B7 β RPC response cloned for every operator β operator.rs:1245 β ConsensusPrepareResponse can be large; N clones per β | |
| β β response β β broadcast β | |
| βββββββΌβββββββββββββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββββββββββββββββββββββββββ€ | |
| β B8 β Cold block cache on first request after β block_cache.rs:77 β 100-800ms + 3 retries on cache miss β | |
| β β startup β β β | |
| βββββββ΄βββββββββββββββββββββββββββββββββββββββββββββββ΄βββββββββββββββββββββββββ΄βββββββββββββββββββββββββββββββββββββββββββββββββββββββ | |
| MEDIUM β Reliability and resource usage | |
| βββββββ¬βββββββββββββββββββββββββββββββββββββββββββββ¬βββββββββββββββββββββββββββ¬βββββββββββββββββββββββββββββββββββββββββββββββββββββββ | |
| β ID β Bottleneck β Location β Impact β | |
| βββββββΌβββββββββββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββββββββββββββββββββββββββ€ | |
| β B9 β No startup validation of operator β handler/mod.rs:1050-1130 β Misconfigured sockets fail at first request, not β | |
| β β reachability β β startup β | |
| βββββββΌβββββββββββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββββββββββββββββββββββββββ€ | |
| β B10 β No startup validation of chain RPC health β handler/mod.rs:340-718 β Unhealthy RPC discovered at request time β | |
| βββββββΌβββββββββββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββββββββββββββββββββββββββ€ | |
| β B11 β No circuit breaker on chain RPC endpoints β Throughout β Cascading failures when RPC degrades β | |
| βββββββΌβββββββββββββββββββββββββββββββββββββββββββββΌβββββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββββββββββββββββββββββββββ€ | |
| β B12 β No backpressure signaling to clients β sync.rs, channel.rs β Silent queueing β eventual timeout β | |
| βββββββ΄βββββββββββββββββββββββββββββββββββββββββββββ΄βββββββββββββββββββββββββββ΄βββββββββββββββββββββββββββββββββββββββββββββββββββββββ | |
| --- | |
| Proposed Improvements | |
| Tier 1: High-Impact, Low-Risk (implement first) | |
| 1. Quorum-based early exit for Commit phase | |
| The Commit phase (broadcast_evaluate_and_sign at operator.rs:1676) collects ALL responses from eligible operators with no quorum check. | |
| Compare with Prepare phase which exits as soon as stake threshold is met. | |
| Change: Refactor Commit broadcast to use broadcast_with_unified_quorum pattern β exit as soon as stake-weighted quorum is met. The BLS | |
| aggregation only needs quorum anyway; extra signatures are discarded. | |
| Expected improvement: With 5 operators at equal stake and 67% quorum, latency drops from p100 (slowest operator) to ~p60 (3rd fastest). | |
| 2. Pass pre-loaded ECDSA signer instead of re-loading per task | |
| build_fetch_policy_data_request calls load_ecdsa(signer) on every invocation (common.rs:312). The signer is already loaded at | |
| GatewayServiceImpl construction (handler/mod.rs:407-408) as task_generator_signer. | |
| Change: Modify build_fetch_policy_data_request signature to accept &PrivateKeySigner directly. Pass self.task_generator_signer from | |
| sync.rs and channel.rs. | |
| 3. Moka cache for PolicyConfig per (chain_id, policy_client) | |
| get_policy_config_for_client makes an on-chain RPC call for every task (common.rs:121-124). Policy configs rarely change β typically only | |
| on setPolicy calls. | |
| Change: Add moka::future::Cache<(u64, Address), PolicyConfig> to ChainService with 10-30s TTL. Check cache before RPC. Moka is already a | |
| workspace dependency. | |
| Expected improvement: Eliminates ~10-100ms RPC for repeat policy clients (the common case in production). | |
| 4. Snapshot stake data once per broadcast, not per response | |
| Every operator response in the quorum loop triggers operator_stake.read().await and total_stake.read().await (operator.rs:1072,1252). | |
| With N operators, that's 2N async RwLock acquisitions. | |
| Change: At the start of each broadcast, snapshot operator_stake and total_stake into local HashMaps. Use the snapshots for all quorum | |
| calculations within that broadcast. Stake data changes only on OperatorAdded/Removed events β negligible during a single 30s broadcast | |
| window. | |
| Expected improvement: Reduces async lock contention from O(N) to O(1) per broadcast. Significant at high operator counts. | |
| 5. Replace gateway_tasks with DashMap | |
| gateway_tasks: Arc<tokio::sync::RwLock<HashMap<String, HashSet<TaskId>>>> requires a write lock for every task creation (sync.rs:98-103). | |
| Under concurrent load, this serializes all task initiations. | |
| Change: DashMap<String, DashSet<TaskId>> β lock-free per-API-key granularity. DashMap is already used for aggregator.task_states. | |
| Tier 2: Medium-Impact, Medium-Risk | |
| 6. Pre-warm block caches on startup | |
| First task after gateway start hits a cold block cache β cache miss β RPC call with up to 3 retries at 100-800ms backoff | |
| (block_cache.rs:21-26). | |
| Change: In Gateway::from_multi_chain_config(), call block_cache.refresh() for all chains before returning. Fail startup if RPC | |
| unreachable. | |
| 7. Avoid response clones in normal (quorum success) path | |
| all_responses.push((operator_id, response_result.clone())) at operator.rs:1245 clones the entire RPC response for every operator. | |
| ConsensusPrepareResponse includes policyTaskData which can be large. | |
| Change: Only store (OperatorId, is_success) in the tracking vec. Keep the full response_result only in results (move, not clone). On | |
| quorum failure, reconstruct errors from the tracked failures. | |
| 8. Startup validation: operator reachability + contract existence | |
| Changes: | |
| - After operator pool init, send a lightweight JSON-RPC ping (e.g., newt_getPublicKey or custom newt_health) to each operator. Log | |
| warnings for unreachable operators. Pre-open circuit breakers. | |
| - During ChainService init, call provider.get_code(contract_addr) for TaskManager, IdentityRegistry, ConfidentialDataRegistry. Fail | |
| startup if code is empty (zero-address check is not enough β the address could be non-zero but point to an EOA or wrong contract). | |
| 9. Chain-level RPC circuit breaker | |
| Currently circuit breakers exist per-operator but not per-chain-RPC. If the Ethereum Sepolia RPC goes down, every task attempting block | |
| cache refresh or policy config fetch hangs for the full timeout. | |
| Change: Add a failsafe circuit breaker wrapping the ChainRpcProviderConfig HTTP provider. Open after N consecutive failures, fail-fast | |
| subsequent requests, half-open after cooldown. | |
| Tier 3: Architectural Improvements (higher effort, plan for later) | |
| 10. Streaming consensus: incremental median during Prepare | |
| Instead of waiting for all Prepare responses, maintain a running median as responses arrive. When quorum is met, consensus is already | |
| computed β the 1-10ms compute_consensus_from_unsigned call becomes near-zero. | |
| 11. Request deduplication for identical tasks | |
| Hash (policy_client, intent, wasm_args) β check if an in-flight task with the same hash exists β return the pending result. Under burst | |
| traffic with duplicates, eliminates redundant operator broadcasts. | |
| 12. Concurrent operator requests via HTTP/2 multiplexing | |
| Each OperatorWorker processes requests sequentially (one at a time from its mpsc channel). HTTP/2 already supports multiplexing at the | |
| transport layer. Allow N concurrent in-flight requests per operator via semaphore-bounded processing, not strict sequential. | |
| 13. Backpressure signaling | |
| When batch submitter queue depth exceeds a threshold, surface this to the RPC handler and return HTTP 503 (Service Unavailable) with a | |
| Retry-After header. Currently, tasks silently queue and may hit TaskCreatedBlockTooOld on-chain if they sit too long. | |
| --- | |
| Fail-Fast & Reliability Checklist | |
| Startup | |
| ββββββββββββββββββββββββββββ¬βββββββββββββββββββββββββββββββββ¬βββββββββββββββββββββββββββββββββββββββββββββββββββ | |
| β Check β Current State β Proposed β | |
| ββββββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββββββββββββββββββββββ€ | |
| β Operator reachable β Not checked β Ping each operator, warn + pre-open CB β | |
| ββββββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββββββββββββββββββββββ€ | |
| β Chain RPC healthy β Implicit (first request fails) β Eager block cache refresh, fail startup on error β | |
| ββββββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββββββββββββββββββββββ€ | |
| β Contract deployed β Zero-address check only β get_code() check for critical contracts β | |
| ββββββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββββββββββββββββββββββ€ | |
| β Database connectivity β Checked β Already good β | |
| ββββββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββββββββββββββββββββββ€ | |
| β Redis connectivity β Silently degrades β Log prominently, mark feature as disabled β | |
| ββββββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββββββββββββββββββββββ€ | |
| β Batch submitter contract β Zero-address fallback β Already good β | |
| ββββββββββββββββββββββββββββ΄βββββββββββββββββββββββββββββββββ΄βββββββββββββββββββββββββββββββββββββββββββββββββββ | |
| Runtime | |
| ββββββββββββββββββββββββββββ¬ββββββββββββββββββββββββββββββ¬ββββββββββββββββββββββββββββββββββββββββββββββ | |
| β Check β Current State β Proposed β | |
| ββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββββββββββββββββββββ€ | |
| β Chain RPC degradation β No circuit breaker β Add chain-level circuit breaker β | |
| ββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββββββββββββββββββββ€ | |
| β Batch queue backpressure β Silent queueing β 503 when queue > threshold β | |
| ββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββββββββββββββββββββ€ | |
| β Operator pool empty β Returns error β Already good, but add metric for visibility β | |
| ββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββββββββββββββββββββ€ | |
| β Block cache staleness β TTL-based refresh β Add staleness metric, alert if > 2ΓTTL β | |
| ββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββββββββββββββββββββ€ | |
| β Aggregator zombie slots β Periodic cleanup (30s-6min) β Already good (cancel_task in all paths) β | |
| ββββββββββββββββββββββββββββ΄ββββββββββββββββββββββββββββββ΄ββββββββββββββββββββββββββββββββββββββββββββββ | |
| --- | |
| Execution Order | |
| I'd suggest: | |
| 1. B1 (Commit quorum early exit) + B2 (signer caching) + B4 (stake snapshot) β these are all in the hot path and can be done in parallel | |
| across files | |
| 2. B3 (policy config cache) β needs Moka cache struct added to ChainService | |
| 3. B5 (gateway_tasks DashMap) β mechanical type replacement | |
| 4. B7 (response clone avoidance) + B8 (block cache pre-warm) β cleanup | |
| 5. B9+B10 (startup validation) β reliability, not latency | |
| 6. B11 (chain RPC circuit breaker) β requires design discussion on failure modes | |
| The most impactful single change is B1 β making the Commit phase quorum-aware. Combined with B4 (stake snapshot), this removes the two | |
| biggest latency contributors in the normal path. | |
| --- | |
| What's your priority? Want me to start with the Tier 1 changes (B1-B5), or do you want to discuss the approach for any of these first? |
no Level 3 (single-phase fast path) is too unreliable because we have to trust the developer that they correctly set that param on the
poilcy or policy data contract. also, policy can always change what policy data(s) to use
βΊ You're right β trusting a developer-set flag is fragile. Policy clients can change their policyData references at any time, and a
previously-deterministic policy could start using time-varying data sources without updating the flag. The gateway would skip consensus
when it shouldn't, producing mismatched BLS signatures.
So the honest assessment:
The two-phase sequential pipeline is fundamentally correct for the problem it solves β you can't skip Prepare when you don't know if
operators will agree. And the remaining optimizations (Level 1 pre-fetch, Level 2 streaming BLS) save at most 5-10ms on a 5-30 second
pipeline β not meaningful.
The real latency bottleneck is operator processing time during both phases (WASM execution, on-chain data fetching, Rego evaluation, BLS
signing), not gateway architecture. The gateway is already doing the right thing: non-blocking broadcast, quorum-based early exit, stake
snapshots, concurrent per-operator requests.
At this point, the gateway optimizations across all three tiers are solid:
βββββββββββββββββββββββββββββββββββ¬βββββββββββββββββββββββββββββββββββββββββββββββββββββ
β What we optimized β Where the time goes now β
βββββββββββββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β Gateway overhead between phases β <10ms (negligible) β
βββββββββββββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β Lock contention β Eliminated (DashMap, stake snapshots) β
βββββββββββββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β Redundant RPC calls β Eliminated (PolicyConfig to operator, block cache) β
βββββββββββββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β Tail latency β Reduced (quorum early exit in both phases) β
βββββββββββββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β Head-of-line blocking β Eliminated (concurrent operator requests) β
βββββββββββββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β Remaining: operator processing β 5-30 seconds per phase β this is the next frontier β
βββββββββββββββββββββββββββββββββββ΄βββββββββββββββββββββββββββββββββββββββββββββββββββββ
The next meaningful improvement would be operator-side optimization β faster WASM execution, faster Rego evaluation, operator-side
caching of policy data/rules, or reducing the work operators do per task. That's a different domain from gateway architecture.
Want to shift focus to operator-side performance, or is there anything else on the gateway side you want to explore?
β― what about concurrency in terms of multiple tasks being processed? does gateway already support concurrent tasks in parallel?
βΊ Yes β let me trace exactly how concurrent tasks flow through the gateway to verify there are no hidden serialization points.
Searched for 1 pattern (ctrl+o to expand)
βΊ Let me trace the concurrency model from HTTP request arrival through to completion:
Read 1 file (ctrl+o to expand)
βΊ β
Insight βββββββββββββββββββββββββββββββββββββ
Yes, the gateway is fully concurrent for multiple tasks. Here's why:
Axum's concurrency model: Each HTTP request to /rpc is handled by its own tokio task. Axum spawns a task per connection, and within each
connection, each request is processed concurrently. There's no global mutex or serialization point at the HTTP layer.
The full concurrency trace for 3 simultaneous tasks:
Task A (t=0): rpc_handler β create_task β broadcast_two_phase_and_aggregate
β
Prepare broadcast (async, waits on operators)
β
Task B (t=0): rpc_handler β create_task β broadcast_two_phase_and_aggregate
β
Prepare broadcast (async, waits on operators)
β
Task C (t=0): rpc_handler β create_task β broadcast_two_phase_and_aggregate
β
Prepare broadcast (async, waits on operators)
All three tasks proceed through their two-phase pipeline independently and concurrently. While Task A is waiting for Prepare responses,
Tasks B and C are also broadcasting and waiting β they don't block each other.
Every shared resource is concurrent-safe:
- DashMap for gateway_tasks β lock-free per-key
- DashMap for aggregator.task_states β lock-free per-task
- ArcSwap for block_cache β lock-free reads
- Arc with per-operator workers β concurrent dispatch (Tier 3)
- Arc with mpsc channels β non-blocking send
- Operator stake snapshots β local per-broadcast, no contention
βββββββββββββββββββββββββββββββββββββββββββββββββ
There are zero serialization points between concurrent tasks. The only shared contention point was gateway_tasks with RwLock, which we
already replaced with DashMap in Tier 1.
The sendTask async path (channel.rs) is even more concurrent β it spawns the entire two-phase pipeline into a background tokio::spawn and
returns immediately with a task ID, so the HTTP handler is freed instantly.
One area worth examining: the batch submitter's mpsc channel. When many tasks complete BLS aggregation simultaneously, they all send to
the same combined_tx channel. But mpsc::send is non-blocking until the channel is full (capacity = 240), and we added backpressure 503 in
Tier 2 to handle that. So this is already covered.
The gateway's concurrent task processing is solid. No hidden bottlenecks.
β» Baked for 2m 24s
β― Pre-fetch operator/quorum registry state during Commit broadcast (~10-100ms saved) --> for this, why not just fetch once in Operator
registry every epoch, and use that? what are risks? I thought we already do this
Searched for 2 patterns (ctrl+o to expand)
βΊ Let me check the EigenSDK AvsRegistryServiceChainCaller cache:
Read 1 file (ctrl+o to expand)
βΊ β Insight βββββββββββββββββββββββββββββββββββββ
You're right β we already have a cache. The AvsRegistryServiceChainCaller has an operator_state_cache with:
So get_operators_avs_state_at_block is already fast on cache hit. The 10-100ms I estimated was the cold/miss case. In production, the
cache is warm from the previous task, so these calls are likely <1ms.
βββββββββββββββββββββββββββββββββββββββββββββββββ
Let me verify the TTL and check if it's the same for quorum state:
Searched for 1 pattern (ctrl+o to expand)
βΊ 300 seconds (5 minutes) TTL. Both operator state and quorum state use the same cache with the same TTL.
So here's the reality:
Level 1 (pre-fetch) is unnecessary. The operator/quorum registry state is already cached with a 5-minute TTL. After the first task warms
the cache, subsequent tasks hit the cache in <1ms. The 10-100ms RPC call only happens once every 5 minutes or on the very first request
(which is already handled by the startup pre-warm in Tier 1).
Risks of relying on the cache:
could use wrong APK β InvalidQuorumApkHash. This is already documented in lessons.md as a known issue ("Operator registry cache is
static... restart the gateway after deregistering an operator").
historical block. This is a known gotcha from lessons.md.
These risks already exist and are accepted. The pre-fetch optimization would have the same risks since it would hit the same cache.
Bottom line: Level 1 is already solved by the existing cache. Skip it entirely.
That leaves:
For Level 2, given that aggregate_responses is mostly cache hits + fast CPU work (<5ms total with warm cache), the win is also marginal.
The real dominance is the two network-bound phases.
Level 3 is by far the biggest win. Halving latency for deterministic policies (identity, confidential, allowlist β the majority of
production use cases) from 2 round-trips to 1 is a fundamentally larger improvement than shaving milliseconds off the aggregation step.
Want me to design Level 3 (single-phase fast path) as the next optimization? Or do you still want Level 2 first?
β» Worked for 1m 29s