Skip to content

Instantly share code, notes, and snippets.

@denniswon
Created April 10, 2026 13:14
Show Gist options
  • Select an option

  • Save denniswon/a5e55a5d515bb7234f550647df5cbc3a to your computer and use it in GitHub Desktop.

Select an option

Save denniswon/a5e55a5d515bb7234f550647df5cbc3a to your computer and use it in GitHub Desktop.
Pipelined Batch Transaction Submitter
one question: does the contract revert only detectable from transaction receipt? or can it be detected from just sendTransaction?
currently, the problem with batch tx submission is that it is a single loop, sequential process with retry. Instead, let's explore
pipelining architecture where the initial submission of the batch doesn't block, and the failures are handled on a separate queue
(will we need a failed tx handling dedicated task generator signer key that is separate from the main submission signer key?)
/clarify-plan
⏺ Skill(/clarify-plan)
⎿ Successfully loaded skill
⏺ Good questions. Let me first answer your initial question, then clarify the design space.
★ Insight ─────────────────────────────────────
Contract reverts are only detectable from the receipt (or simulation), not from sendTransaction itself. eth_sendTransaction /
eth_sendRawTransaction only validates the transaction envelope (nonce, gas, signature) — it doesn't execute the EVM. The EVM
execution happens during mining, and the result is in the receipt (status: 0x0 = revert). The only way to detect a revert before
mining is eth_call (simulation), which is what simulate_batch_create_and_respond already does post-failure.
─────────────────────────────────────────────────
Now for the pipelining architecture — let me clarify requirements:
Batch 1: Current bottleneck and goals
The current sequential loop does: submit batch TX → wait for receipt (up to 60s) → parse failures → strip poison → retry. During
that entire wait, no new batches can flush. Is the primary goal to decouple "submit" from "wait for receipt" so the accumulator can
keep flushing new batches while previous ones are in-flight? Or is there a different bottleneck you're seeing?
On the signer key question — the current architecture already routes all on-chain TXs through a single AvsWriter per chain to
avoid nonce races. If we pipeline (multiple batches in-flight simultaneously), the nonce manager needs to handle concurrent pending
TXs. Are you thinking of a dedicated retry signer (separate key, separate nonce stream) specifically for resubmitting failed items,
so the main submission pipeline is never blocked by retry logic? Or more like nonce pipelining (multiple TXs in-flight on the same
key)?
What's the target throughput you're aiming for? The current setup does ~2-4 batch TXs per minute on Sepolia under congestion. Is
the goal 10x (20-40/min), or is it more about reducing the tail latency (the 60s receipt timeout blocking everything)?
✻ Cogitated for 34s
❯ 1.the bottleneck is the waiting for receipt time + retries that add up quickly and make all subsequent batches to be too old for
task creation on chain. But we must ensure that we handle nonce of all transactions to be sequential, which is the challenging part.
2. more like nonce pipelining, or whatever the MOST RELIABLE and MOST EFFICIENT approach 3. the goal throughput is 100 tps
ultimately, but if difficult, 10x minimum
⏺ Good context. The nonce sequencing under concurrent in-flight TXs is the core design challenge. Let me dig deeper.
Batch 2: Nonce management and failure semantics
Current nonce behavior: The gateway uses alloy's CachedNonceManager which queries eth_getTransactionCount("pending"). With
multiple TXs in-flight, the "pending" count auto-increments. But if TX at nonce N fails (revert or gets dropped), nonce N+1 and N+2
are stuck — they can't mine until N is resolved. How do you want to handle this? Options range from:
- (a) Optimistic pipelining: send N, N+1, N+2 in parallel, accept that a revert at N blocks everything after it (simpler, works if
reverts are rare — which our pre-flight filter should ensure)
- (b) Replacement via gas bumping: if nonce N reverts or stalls, send a replacement TX at nonce N with higher gas to "unstick" the
pipeline
- (c) Cancel-and-resubmit: send a zero-value self-transfer at nonce N to burn the stuck slot, then resubmit N+1, N+2 with fresh
data
Receipt polling vs event-driven: Currently get_receipt() polls with backoff until the TX is mined. With pipelining, we'd have
multiple receipts to track. Are you open to switching to WebSocket-based TX confirmation (subscribe to newPendingTransactions or
newHeads and match TXs), or prefer to keep the polling model but run multiple pollers concurrently?
Failure isolation scope: If a batch TX reverts on-chain (receipt status: 0x0), should the retry of that batch's failed items go
into the next regular batch (mixed with fresh items), or into a dedicated retry queue that processes separately? Mixing means failed
items get another chance quickly but could poison the next batch too. Separating means fresh items are never delayed by retries, but
retry latency increases.
❯ 1. what do you recommend as most reliable and efficient (fastest) 2. would websocket be a more scalable, efficient approach? help me
evaluate 3. what would be most reliable and matches our architecture/context well the most? the dedicated retry queue?
⏺ ★ Insight ─────────────────────────────────────
The EVM nonce model is fundamentally sequential — there's no way around it. TX at nonce N+1 cannot mine until nonce N has mined
(success or revert). The design question is how to maximize throughput within this constraint while minimizing the blast radius when
a TX stalls or reverts.
─────────────────────────────────────────────────
Let me evaluate each dimension with concrete trade-offs for our architecture.
Nonce strategy: optimistic pipelining (a) is the right choice
┌─────────────────────────┬─────────────────────────────────┬──────────────────────────────────┬───────────────────────────────┐
│ Strategy │ Throughput │ Complexity │ Failure blast radius │
├─────────────────────────┼─────────────────────────────────┼──────────────────────────────────┼───────────────────────────────┤
│ (a) Optimistic pipeline │ High — N TXs in mempool at once │ Low — alloy's pending nonce │ If N reverts, N+1..N+k are │
│ │ │ handles it │ stuck until N mines │
├─────────────────────────┼─────────────────────────────────┼──────────────────────────────────┼───────────────────────────────┤
│ (b) Gas bumping │ Medium — requires monitoring + │ High — race conditions, EIP-1559 │ Better recovery but complex │
│ replacement │ replacement logic │ priority fee math │ edge cases │
├─────────────────────────┼─────────────────────────────────┼──────────────────────────────────┼───────────────────────────────┤
│ (c) Cancel-and-resubmit │ Low — burns gas on empty TXs │ Medium — needs explicit nonce │ Clean but wasteful, adds │
│ │ │ tracking │ latency │
└─────────────────────────┴─────────────────────────────────┴──────────────────────────────────┴───────────────────────────────┘
Recommendation: (a) with the pre-flight filter as the safety net. The pre-flight filter we just built removes 95%+ of would-revert
items before they enter the pipeline. With clean batches, reverts become rare — and when they do happen, the pipeline self-drains
(the reverted TX still mines with status: 0x0, which advances the nonce). The key insight: an on-chain revert is NOT a stuck nonce —
the TX still mines, nonce advances, and subsequent TXs proceed. Only dropped/replaced TXs actually stall the nonce. So the real risk
is TX-not-mining (gas too low, mempool eviction), not reverts.
For the rare stuck-TX case, a simple watchdog timer (e.g., 120s with no receipt) can trigger a gas-bumped replacement to unstick the
pipeline. Much simpler than full cancel-and-resubmit.
WebSocket vs polling for receipt tracking
┌─────────────────────────────────────────────┬────────────────┬────────────────────────┬────────────┬──────────────────────────┐
│ Approach │ Latency │ Reliability │ Complexity │ Our context │
├─────────────────────────────────────────────┼────────────────┼────────────────────────┼────────────┼──────────────────────────┤
│ │ 1-12s (polling │ High — works through │ │ Current approach, works │
│ Polling (get_receipt) │ interval) │ proxies, load │ Low │ with any RPC provider │
│ │ │ balancers │ │ │
├─────────────────────────────────────────────┼────────────────┼────────────────────────┼────────────┼──────────────────────────┤
│ │ │ Medium — WS │ │ QuikNode supports WS, │
│ WebSocket (newHeads + match) │ Sub-second │ connections drop, │ Medium │ but gateway already │
│ │ (push) │ reconnect logic needed │ │ juggles WS for chain │
│ │ │ │ │ watcher │
├─────────────────────────────────────────────┼────────────────┼────────────────────────┼────────────┼──────────────────────────┤
│ WebSocket │ │ Low — many providers │ │ Not reliable across │
│ (eth_subscribe("newPendingTransactions")) │ Near-instant │ don't support full TX │ High │ providers │
│ │ │ subscription │ │ │
└─────────────────────────────────────────────┴────────────────┴────────────────────────┴────────────┴──────────────────────────┘
Recommendation: Keep polling but run multiple pollers concurrently. With pipelining, we'd have e.g., 5 batch TXs in-flight, each with
its own tokio::spawned receipt poller. This is simple, reliable, and doesn't add a WS dependency to the submission path. The latency
difference (1-12s polling vs sub-second WS) is negligible compared to the 2-12s block time. WebSocket would be over-engineering here
— the bottleneck is block time, not notification latency.
Failure isolation: dedicated retry queue is the right fit
┌──────────────────────────────┬───────────────────┬───────────────────────┬──────────────────────────┬────────────────────────┐
│ Approach │ Fresh item │ Retry latency │ Isolation │ Our fit │
│ │ latency │ │ │ │
├──────────────────────────────┼───────────────────┼───────────────────────┼──────────────────────────┼────────────────────────┤
│ Mixed (retry into next │ Degraded under │ Low (next flush) │ None — poison items │ Bad for 100 tps target │
│ regular batch) │ failures │ │ delay fresh │ │
├──────────────────────────────┼───────────────────┼───────────────────────┼──────────────────────────┼────────────────────────┤
│ Dedicated retry queue │ Unaffected │ Higher (separate │ Full — fresh pipeline │ Matches our │
│ │ │ flush cycle) │ never blocked │ architecture │
├──────────────────────────────┼───────────────────┼───────────────────────┼──────────────────────────┼────────────────────────┤
│ Drop-and-notify (no retry) │ Unaffected │ N/A — caller gets │ Full │ Too aggressive for │
│ │ │ error │ │ production │
└──────────────────────────────┴───────────────────┴───────────────────────┴──────────────────────────┴────────────────────────┘
Recommendation: Dedicated retry queue, same signer key. Here's why same key works: the retry queue doesn't submit concurrently with
the main queue — it waits for the main pipeline's in-flight TXs to drain (or times out), then takes a turn. Think of it as a priority
inversion: main queue has priority, retry queue fills gaps. Since an on-chain revert advances the nonce (it still mines), the retry
queue's items get fresh nonces naturally. No second signer key needed.
Batch 3: Architecture details
Pipeline depth: How many batch TXs should be in-flight simultaneously? Each batch TX is ~400-800K gas per item, fitting ~30-37
items in a 30M gas block. With 100 tps target and 2s blocks on Base (12s on Sepolia), we'd need ~200 items per block on Base (or
~1200 per block on Sepolia). That's 6-32 batch TXs per block. Do you want to cap the in-flight pipeline at a fixed depth (e.g., 5-10
concurrent TXs), or let it grow to fill the block?
Caller notification timing: Currently the RPC caller blocks until the batch TX receipt arrives (success or failure). With
pipelining, should callers still block until their specific batch's receipt, or should we return early with a "task accepted, pending
on-chain" response and let callers poll for the result?
Per-chain or global pipeline: In multichain mode, each chain has its own BatchTransactionSubmitter. Should each chain's pipeline
be independent (simpler, current model scaled up), or should there be a global coordinator that balances across chains (more complex,
better utilization)?
✻ Worked for 46s
❯ For batch 2: a) with the pre-flight filter as the safety net. The pre-flight filter we just built removes 95%+ of would-revert
items before they enter the pipeline. With clean batches, reverts become rare — and when they do happen, the pipeline self-drains
(the reverted TX still mines with status: 0x0, which advances the nonce). The key insight: an on-chain revert is NOT a stuck nonce
the TX still mines, nonce advances, and subsequent TXs proceed. Only dropped/replaced TXs actually stall the nonce. So the real
risk
is TX-not-mining (gas too low, mempool eviction), not reverts.
For the rare stuck-TX case, a simple watchdog timer (e.g., 120s with no receipt) can trigger a gas-bumped replacement to unstick
the
pipeline. Much simpler than full cancel-and-resubmit. --> yes implement optimistic pipeline, with watchdog timer for gas-bumped
replacement. But note that Gateway tasks are fundamentally multichain --> would we need multiple pieplines for each chain? Make sure
our pipeline architecture doesn't incur memory overhead or memory leak for the gateway that leads to reliability or performance
degradation. Same for polling vs websocket discussion - polling is fine, but make sure even for multichain context, our architecture
is performative and memory leak safe. Lastly, for dedicated retry queue - make sure the retry queue transactions are ALSO within
the task creation block window as well. main queue having priority is the right call, but make sure the retry queue also just
doesn't end up failing all those tasks. Now, for Batch 3: 1 - what do you recommend to optimize for reliability but still
performance and latency are acceptable? (within the task creation block window deadline) 2 - what do you recommend? returning early
sounds favorable but I am concerned about the polling for receipt approach would cause performance degradation or reliability
issues. Help me decide on this. 3 - If not too complex global coordinator would be the best, but only if this is not too complex or
bug prone. Help me evaluate on this.
⏺ Good decisions. Let me address each and then ask the final batch.
Multichain pipeline: Yes, each chain already has its own BatchTransactionSubmitter (created per-chain in ChainService). Each gets its
own AvsWriter (own signer provider, own nonce stream). So per-chain pipelines are already the architecture — no shared nonce state
across chains, no cross-chain coordination needed. The pipeline depth, retry queue, and watchdog are all per-chain. Memory-wise, each
pipeline is a bounded channel + a small in-flight tracking map — negligible even with 10 chains.
Retry queue staleness: Correct concern. The retry queue MUST re-run the pre-flight staleness filter before resubmitting. Items that
were fresh when they first failed may be stale by retry time. The filter we built runs at the top of each retry iteration — this same
pattern applies to the dedicated retry queue. Stale retries get failed immediately to callers instead of wasting a nonce slot.
Now for Batch 3:
Pipeline depth — recommend fixed cap of 3-5 concurrent TXs
┌────────────┬─────────────────────┬────────────────────────┬──────────────────────────────────────────────┬───────────────────┐
│ Depth │ Throughput (Base │ Throughput (Sepolia │ Risk │ Memory │
│ │ 2s) │ 12s) │ │ │
├────────────┼─────────────────────┼────────────────────────┼──────────────────────────────────────────────┼───────────────────┤
│ 1 │ ~15-18 items/block │ ~30-37 items/block │ None │ Minimal │
│ (current) │ │ │ │ │
├────────────┼─────────────────────┼────────────────────────┼──────────────────────────────────────────────┼───────────────────┤
│ 3 │ ~45-54 items/block │ ~90-111 items/block │ Low — 3 pending nonces │ 3 tracking │
│ │ │ │ │ entries │
├────────────┼─────────────────────┼────────────────────────┼──────────────────────────────────────────────┼───────────────────┤
│ 5 │ ~75-90 items/block │ ~150-185 items/block │ Medium — 5 pending nonces │ 5 tracking │
│ │ │ │ │ entries │
├────────────┼─────────────────────┼────────────────────────┼──────────────────────────────────────────────┼───────────────────┤
│ 10+ │ ~150+ items/block │ ~300+ items/block │ High — deep nonce chain, one stuck TX blocks │ Proportional │
│ │ │ │ many │ │
└────────────┴─────────────────────┴────────────────────────┴──────────────────────────────────────────────┴───────────────────┘
Recommendation: depth=3 for production, configurable. At depth 3 on Base (2s blocks): ~45-54 items per 2s block = 22-27 tps. With 10
gateway replicas, that's 220-270 tps aggregate — exceeds the 100 tps target. On Sepolia (12s blocks): ~90 items per 12s = ~7.5 tps
per replica, 75 tps aggregate for 10 replicas.
Depth 3 is the sweet spot: if TX at nonce N gets stuck, only N+1 and N+2 are waiting — the watchdog timer fires at 120s and gas-bumps
N to unstick. With depth 10+, a stuck TX blocks 9 subsequent batches and recovery takes longer.
Caller notification — recommend early return with bounded result channel
The concern about polling degradation is valid. Here's the comparison:
┌──────────────────────────┬───────────────────────────────────────┬──────────────────────┬─────────────────────────────────────┐
│ Approach │ Caller experience │ Gateway memory │ Failure mode │
├──────────────────────────┼───────────────────────────────────────┼──────────────────────┼─────────────────────────────────────┤
│ Block until receipt │ Simple — caller gets final result │ Low (oneshot per │ 60s timeout = 60s caller latency │
│ │ │ task) │ │
├──────────────────────────┼───────────────────────────────────────┼──────────────────────┼─────────────────────────────────────┤
│ Early return + poll │ Fast response, caller polls │ Risk: unbounded │ Stale results accumulate if callers │
│ │ │ result map │ don't poll │
├──────────────────────────┼───────────────────────────────────────┼──────────────────────┼─────────────────────────────────────┤
│ Early return + bounded │ Fast response, result pushed via │ Same as current │ Caller's spawned task awaits, │
│ callback │ existing oneshot │ │ auto-cleans │
└──────────────────────────┴───────────────────────────────────────┴──────────────────────┴─────────────────────────────────────┘
Recommendation: keep the current oneshot pattern. The caller's spawned task in sync.rs already awaits the result_rx oneshot. With
pipelining, the only change is that the oneshot resolves later (when the batch TX mines, not when it's submitted). The caller doesn't
poll — it's already awaiting. No new result map, no memory leak risk. The RPC response to the external client still blocks until the
oneshot resolves, but the internal pipeline is unblocked — the accumulator keeps flushing new batches while previous ones are
in-flight. This gives us the throughput benefit without changing the external API contract.
The key insight: pipelining is internal to the batch submitter, invisible to callers. The accumulator flushes batch N, immediately
starts accumulating batch N+1, flushes it while N is still mining. Each batch's callers independently await their oneshot. No polling
API needed.
Global coordinator — recommend staying per-chain (independent pipelines)
┌───────────────────┬────────────────────────────────────────┬─────────────────────┬────────────────────────────────────────────┐
│ Approach │ Complexity │ Benefit │ Risk │
├───────────────────┼────────────────────────────────────────┼─────────────────────┼────────────────────────────────────────────┤
│ Per-chain │ Low — current model, just add depth │ Simple, isolated │ Uneven load across chains │
│ independent │ │ failures │ │
├───────────────────┼────────────────────────────────────────┼─────────────────────┼────────────────────────────────────────────┤
│ Global │ High — cross-chain nonce awareness, │ Better utilization │ Single point of failure, complex state │
│ coordinator │ priority scheduling │ │ machine, harder to debug │
└───────────────────┴────────────────────────────────────────┴─────────────────────┴────────────────────────────────────────────┘
Recommendation: per-chain independent pipelines. The global coordinator's only benefit is load balancing across chains, but each
chain has independent block space anyway — you can't use Sepolia's gas budget for a Base TX. The chains are fundamentally independent
resources. A global coordinator adds a shared state machine that's hard to reason about under failure, and the bug surface is much
larger. Per-chain pipelines are isolated: if Base's pipeline stalls, Sepolia's keeps going. This isolation is worth more than
theoretical utilization gains.
Final Batch: Implementation scope
Rollout strategy: This is a significant architectural change to a critical path (on-chain submission). Do you want to implement
this as a new PipelinedBatchSubmitter alongside the existing BatchTransactionSubmitter (feature-flagged, A/B testable), or as an
in-place evolution of the current submitter? Feature-flagging means we can fall back to the current sequential behavior if the
pipeline has issues in production.
Watchdog gas bumping: When a TX is stuck for 120s, the watchdog needs to resubmit with higher gas. Should we use a percentage bump
(e.g., 1.5x the original maxFeePerGas), or query current gas prices and resubmit at market rate? The percentage bump is simpler but
might not be enough if gas spiked. Market rate is more reliable but requires an extra RPC call.
Timeline and scope: Given the complexity (nonce pipelining + watchdog + retry queue + per-chain isolation), do you want to tackle
this as one PR or break it into phases? Phase 1 could be the pipeline depth (concurrent submission without blocking) + watchdog,
Phase 2 could be the dedicated retry queue separation. Or all-in-one?
✻ Brewed for 53s
❯ Batch 3 recommendations make sense. Agreed. Final Batch - 1: in place evolution 2. why would a tx be stuck for 120s? with
pipelining, we don't have inline retry because we have a dedicated retry queue right? just precentage bump is fine for now (10 or 20
percentage or something) 3: as one PR, all in one
⏺ Good. Let me clarify one thing on question 2 before we proceed to planning:
Why a TX can be stuck for 120s — this isn't about retries. A TX gets "stuck" when it's in the mempool but not mining. Causes:
Gas price too low during a congestion spike (other TXs outbid yours)
Mempool eviction (RPC node drops your TX under memory pressure)
RPC node not propagating the TX to validators
With pipelining at depth 3, if TX at nonce N is stuck in the mempool, nonces N+1 and N+2 are also stuck (EVM enforces sequential
nonce ordering). The watchdog detects "TX N has been pending for 120s with no receipt" and gas-bumps it (resubmit same nonce N with
20% higher gas). Once N mines, N+1 and N+2 immediately become eligible.
This is independent of the retry queue — the retry queue handles items that reverted on-chain (receipt arrived, status=0x0). The
watchdog handles TXs that never mine (no receipt at all). Different failure modes:
┌───────────────────────────────┬───────────────────────┬───────────────────────────────────┐
│ Failure │ Detection │ Handler │
├───────────────────────────────┼───────────────────────┼───────────────────────────────────┤
│ On-chain revert (status: 0x0) │ Receipt arrives │ Dedicated retry queue │
├───────────────────────────────┼───────────────────────┼───────────────────────────────────┤
│ TX stuck in mempool │ No receipt after 120s │ Watchdog gas bump (20%) │
├───────────────────────────────┼───────────────────────┼───────────────────────────────────┤
│ Pre-flight stale items │ Block number check │ Pre-flight filter (already built) │
└───────────────────────────────┴───────────────────────┴───────────────────────────────────┘
With that clarified — I have enough context to write the implementation plan. Let me do that now.
★ Insight ─────────────────────────────────────
Three-layer defense model: (1) Pre-flight filter catches stale items before submission. (2) Optimistic nonce pipeline maximizes
throughput with concurrent in-flight TXs. (3) Watchdog + retry queue handle the two remaining failure modes (stuck mempool, on-chain
revert) without blocking the main pipeline.
─────────────────────────────────────────────────
Let me write up the full implementation plan for your review before touching code.
⏺ Entered plan mode
Claude is now exploring and designing an implementation approach.
@denniswon
Copy link
Copy Markdown
Author

denniswon commented Apr 10, 2026

───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
Ready to code?

Here is Claude's plan:
╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌
Pipelined Batch Transaction Submitter

Context

Post-PR #515 load testing showed ~72% batch failure rate on Sepolia under sustained load. Root cause: the batch submitter's
sequential flush blocks on get_receipt() for 30-60s per batch TX, preventing new batches from flushing. Items queued during that wait
become stale (TaskCreatedBlockTooOld). PR #520 added a pre-flight staleness filter and time-normalized block windows, which catches
stale items before submission but doesn't address the sequential bottleneck.

The goal: decouple TX submission from receipt polling, allowing multiple batch TXs in-flight simultaneously. Target throughput: 100
tps aggregate (10 gateways, ~10 tps each).

Architecture

Split the monolithic flush_combined_batch_with_retry() into 3 decoupled stages:

Accumulator → Build → Submit (.send only) → [Receipt Tracker per TX] → Notify Callers
↑ ↓ (reverts)
└───────── Retry Queue ←───────────────┘

Key invariant: .send().await takes ~200ms (RPC broadcast). get_receipt() takes 30-60s (mining). By decoupling these, the accumulator
can flush a new batch every ~1s while previous batches are still mining.

Pipeline depth: 3 concurrent in-flight TXs (configurable). Controlled by a tokio::Semaphore. At 120 items/batch, that's 360 items
in-flight — up to 3x current throughput per gateway.

Files to Modify

crates/chainio/src/avs/writer.rs (~80 lines added)
Three new methods on AvsWriter:

send_batch_create_and_respond(): Same as existing batch_create_and_respond_to_tasks() lines 711-831, but stops after .send().await
returns Ok(pending). Returns (B256, u64) — TX hash + nonce used. Does NOT call get_receipt().
get_receipt_by_hash(tx_hash): Polls provider.get_transaction_receipt(tx_hash) with exponential backoff (500ms → 2s). No built-in
timeout — caller wraps with tokio::time::timeout.
gas_bump_batch_tx(nonce, tasks, responses, sigs, bump_percent): Resubmits at the same explicit nonce with 20% higher gas. Uses
provider.estimate_eip1559_fees() for current market rate, applies bump. Sets .nonce(original_nonce) explicitly to override
CachedNonceManager.
Critical nonce detail: After .send(), CachedNonceManager has already incremented past nonce N locally. For a gas-bump replacement at
the same nonce N, we MUST set .nonce(N) explicitly on the call builder. The InFlightBatch stores the original nonce.

crates/gateway/src/task/submitter.rs (~30 lines added)
Three new trait methods on TaskSubmitter with default error impls (fail-open for mocks):

async fn send_batch_tx(...) -> Result<(B256, u64), ChainIoError>; // tx_hash + nonce
async fn get_tx_receipt(&self, tx_hash: B256) -> Result<TransactionReceipt, ChainIoError>;
async fn gas_bump_batch_tx(&self, nonce: u64, ..., bump_percent: u32) -> Result<B256, ChainIoError>;

Implement for AvsWriter by delegating to the new methods.

crates/gateway/src/rpc/api/batch_submitter.rs (~400 lines changed/added)
New config fields in BatchSubmitterConfig:

pipeline_depth: u32 (default: 3, set 0 to use legacy synchronous path)
watchdog_timeout_secs: u64 (default: 120)
gas_bump_percent: u32 (default: 20)
max_retry_queue_size: usize (default: 256)
New types:
struct InFlightBatch {
tx_hash: B256,
nonce: u64, // for gas bump replacement
items: Vec,
_permit: OwnedSemaphorePermit, // released on drop
submitted_at: Instant,
batch_contract_address: Address,
tasks: Vec, // kept for gas bump resubmission
responses: Vec,
sig_data: Vec,
}

struct RetryBatchMessage {
items: Vec,
attempt: u32,
}

New fields on BatchTransactionSubmitter:

pipeline_semaphore: Arc
retry_tx: mpsc::Sender
Changes to new(): Create semaphore, retry channel. Spawn retry_loop background task.

Changes to combined_accumulator_loop(): Replace flush_combined_batch_with_retry().await with submit_batch_pipelined() which returns
after .send() (~200ms). Legacy path preserved behind pipeline_depth == 0 check.

New method: submit_batch_pipelined():

Build items (same as Phase 1 of existing flush)
Pre-flight staleness filter (same as existing)
Acquire pipeline semaphore permit (blocks if pipeline full)
Call task_submitter.send_batch_tx() (~200ms)
Spawn receipt_tracker task with InFlightBatch
Return immediately
New method: receipt_tracker() (spawned per TX):
loop {
match timeout(watchdog_timeout, task_submitter.get_tx_receipt(tx_hash)).await {
Ok(receipt) if receipt.status() → notify callers via oneshot, return
Ok(receipt) if !receipt.status() → simulate_and_classify, send healthy to retry_tx, return
Ok(Err(transient)) → sleep 2s, continue
Err(timeout) → gas_bump (max 3 attempts), update tx_hash, continue
}
}
On all exit paths: InFlightBatch drops, releasing semaphore permit. All item result_tx consumed (success or error).

New method: retry_loop() (background task):

Reads from retry_rx channel
Checks attempt < max_retries
Runs pre-flight staleness filter (same check as main path)
Acquires pipeline semaphore permit (yields to main queue)
Submits via send_batch_tx(), spawns receipt tracker
On retry queue full (try_send fails): fail items immediately
Legacy fallback: pipeline_depth: 0 calls existing flush_combined_batch_with_retry().await unchanged. This is the rollout safety
valve.

crates/gateway/src/handler/mod.rs (no changes)
BatchTransactionSubmitter::new() signature unchanged (already has block_time_ms). New config fields have serde defaults.

crates/gateway/tests/ (test updates + new tests)
Update existing mock TaskSubmitter impls for new trait methods (default impls cover most)
New PipelinedSubmitter mock with configurable receipt delays and results
New tests:
test_pipeline_concurrent_batches — verify 3 batches get concurrent nonces
test_pipeline_semaphore_backpressure — depth=2, 3rd batch blocks
test_watchdog_gas_bump — receipt timeout triggers bump
test_revert_to_retry_queue — reverted items reach retry queue
test_retry_queue_staleness_filter — stale retry items filtered
test_shutdown_cleans_up_in_flight — cancel token fails all callers
test_filter_stale_items — already exists from PR #520
Flows

Happy path: Accumulator flushes batch → .send() gets nonce N (200ms) → receipt tracker spawned → accumulator immediately starts next
batch → nonce N+1 sent → receipts arrive out of order → callers notified.

Revert path: Receipt arrives with status: 0x0 → simulate_and_classify each item → idempotent items get success, poison items get
error, healthy items go to retry queue → retry queue runs pre-flight filter → resubmits with fresh nonce.

Stuck TX path: 120s no receipt → gas bump at nonce N with 20% higher gas → new tx_hash → reset timer → if 3 bumps fail → fail all
callers, release permit. Note: reverted TXs advance the nonce (they mine), so they DON'T trigger the watchdog — only dropped/evicted
TXs do.

Memory Safety

Pipeline semaphore caps in-flight TXs at 3: max 360 items total (~720KB)
Retry queue bounded at 256 items, overflow items failed immediately
OwnedSemaphorePermit in InFlightBatch auto-releases on drop (panic-safe)
All result_tx oneshots consumed in every exit path (success, error, cancel, panic via catch_unwind)
CancellationToken checked in both receipt tracker and retry loop for clean shutdown
Rollout

Deploy with pipeline_depth: 1 — validates new code path, same throughput as before
Monitor batch_pipeline_depth and batch_receipt_latency_seconds metrics
Increase to pipeline_depth: 3
pipeline_depth: 0 falls back to legacy synchronous path if issues arise
Verification

cargo check -p newton-prover-gateway --tests — compile check
cargo clippy -p newton-prover-gateway -p newton-prover-chainio --tests -- -D warnings — lint
cargo test -p newton-prover-gateway --lib — unit tests including new pipeline tests
make gateway_e2e — E2E with real anvil (validates receipt flow end-to-end)
make e2e scenario=basic — unified E2E covering batch submission path
Load test on stagef with pipeline_depth: 3 to validate throughput improvement

Changes to combined_accumulator_loop(): Replace flush_combined_batch_with_retry().await with submit_batch_pipelined() which
returns after .send() (~200ms). Legacy path preserved behind pipeline_depth == 0 check. --> do we need to keep the legacy
path? what is the benefit?

Now to answer your question about the legacy path:

Do we need pipeline_depth: 0 as a legacy fallback?

Honestly, no. The pipelined path with pipeline_depth: 1 already gives equivalent behavior to the current sequential path (one TX
in-flight, accumulator unblocked during receipt wait — strictly better than today). There's no scenario where the old
flush_combined_batch_with_retry is preferable.

I'll drop the legacy fallback. The accumulator will always use submit_batch_pipelined(), and pipeline_depth defaults to 3. Setting
pipeline_depth: 1 gives "safe mode" behavior for cautious rollouts.

⏺ Now Task 7: Add the three new trait methods to TaskSubmitter and implement for AvsWriter.

@denniswon
Copy link
Copy Markdown
Author

Newton Privacy Layer — Current Status (2026-04-10)

Phase Status

┌──────────────────────────────┬──────────────────────────────────────────────────┬──────────────────┐
│ Phase │ Status │ Key PRs │
├──────────────────────────────┼──────────────────────────────────────────────────┼──────────────────┤
│ Phase 1: Foundation │ COMPLETE │ #368 │
├──────────────────────────────┼──────────────────────────────────────────────────┼──────────────────┤
│ Phase 1.5: Gateway-Mediated │ COMPLETE │ #510, #516 │
├──────────────────────────────┼──────────────────────────────────────────────────┼──────────────────┤
│ Phase 1.5B: KMS → HPKE │ COMPLETE, DEPLOYED │ #489, #499 │
├──────────────────────────────┼──────────────────────────────────────────────────┼──────────────────┤
│ Phase 2 DKG core │ MERGED │ #426→#478 │
├──────────────────────────────┼──────────────────────────────────────────────────┼──────────────────┤
│ Phase 2C: PSS epoch rotation │ IN PROGRESS — 3 PRs open, ~1,300 lines, 26 tests │ #523, #525, #526 │
├──────────────────────────────┼──────────────────────────────────────────────────┼──────────────────┤
│ Phase 2D: Privacy slashing │ Not started │ — │
├──────────────────────────────┼──────────────────────────────────────────────────┼──────────────────┤
│ Phase 3: MPC/ZK │ Not started │ — │
├──────────────────────────────┼──────────────────────────────────────────────────┼──────────────────┤
│ Phase 4: Ecosystem │ Not started │ — │
└──────────────────────────────┴──────────────────────────────────────────────────┴──────────────────┘

What's Shipped (on main)

  • Three-path privacy model: IdentityRegistry (data.identity.), ConfidentialDataRegistry (data.confidential.), inline ephemeral
    (data.privacy.inline[0].*)
  • HPKE encryption (RFC 9180), FROST DKG (Ristretto255), threshold decryption
  • Zeroizing<Vec> for all decryption output
  • Challenger privacy data resolution (PR #516)
  • make dkg command=initiate/status/cancel for manual DKG ceremonies
  • AWS KMS fully removed from all 5 repos

What's In Review (PRs open, not merged)

  • PR #523: PSS refresh + resharing crypto primitives, EpochMetadata/EpochConfig, multi-epoch keystore, epoch metrics
  • PR #525: RefreshCoordinator (1-round protocol), operator newt_dkgRefreshRound / newt_dkgRefreshApply handlers, EpochManager
    background task
  • PR #526: Coordinator HTTP wiring (zero TODOs), E2E crypto smoke test, make pss-refresh-e2e

Known Limitations

  1. PSS epoch rotation not yet on main — the 3 PRs are open and CI-ready (pending IPFS billing fix for unrelated multichain E2E). Once
    merged, the gateway can run automatic 24h epoch rotation with PSS.
  2. No EpochRegistry contract — epoch metadata (MPK, operator set hash, timestamps) exists only in-memory and keystores. No on-chain
    verifiability. (NEWT-639)
  3. No privacy-specific slashing — operators who submit invalid partial decryptions or miss DKG rounds face no economic penalty.
    (NEWT-629/640/641, blocked by NEWT-628)
  4. Gateway reconstructs plaintext in threshold mode — the gateway combines partial DH outputs and sees full plaintext during ephemeral
    task processing. Identity and confidential data are operator-decrypted locally. (Phase 3: MPC)
  5. No forward secrecy until epoch rotation merges — key shares from the initial DKG remain valid indefinitely until PSS refresh is
    operational.
  6. Operator set changes require manual DKG — resharing code exists in refresh.rs but automatic detection of operator set changes isn't
    wired yet. Permissioned set means coordinated manual process via make dkg.

Next Steps

┌──────────┬─────────────────────────────────────────┬──────────────┬──────────────────────────┐
│ Priority │ What │ Ticket │ Status │
├──────────┼─────────────────────────────────────────┼──────────────┼──────────────────────────┤
│ 1 │ Merge PSS PRs (#523→#525→#526) │ NEWT-628 │ Awaiting CI + review │
├──────────┼─────────────────────────────────────────┼──────────────┼──────────────────────────┤
│ 2 │ EpochRegistry Solidity contract │ NEWT-639 │ Backlog │
├──────────┼─────────────────────────────────────────┼──────────────┼──────────────────────────┤
│ 3 │ On-chain aggregation commitments │ NEWT-630 │ Backlog │
├──────────┼─────────────────────────────────────────┼──────────────┼──────────────────────────┤
│ 4 │ Privacy-specific slashing conditions │ NEWT-629 │ Backlog (blocked by 628) │
├──────────┼─────────────────────────────────────────┼──────────────┼──────────────────────────┤
│ 5 │ PrivacySlasher + DleqVerifier contracts │ NEWT-640/641 │ Backlog │
└──────────┴─────────────────────────────────────────┴──────────────┴──────────────────────────┘

Once PRs merge, NEWT-628 moves to Done. The remaining Phase 2 work is all Solidity contracts (NEWT-639/630/640/641/629) — the
Rust-side implementation is complete.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment