bedwards/caching_workflow.md

Created April 7, 2026 13:46

Star (0) You must be signed in to star a gist
Fork (0) You must be signed in to fork a gist

Select an option

Learn more about clone URLs
Clone this repository at <script src="https://gist.github.com/bedwards/42b198a6dd292980bc3982efb7fa0a71.js"></script>
Save bedwards/42b198a6dd292980bc3982efb7fa0a71 to your computer and use it in GitHub Desktop.

Download ZIP

Caching, Loops, and the Token Tsunami: what DeLong's $80 eight-hour Isaac576Bot run actually means for a Claude Code TUI workflow in April 2026

Raw

caching_workflow.md

Caching, Loops, and the Token Tsunami: What DeLong's $80 Eight-Hour Run Actually Means for Your Workflow

When Brad DeLong reports that his Isaac576Bot churned out 139,000 output tokens over an eight-hour session and the bill came in at roughly eighty dollars, the number that should jump out is not the output count but the silent multiplier hiding underneath it: prompt caching. Without caching, an agent loop that re-reads the same files, the same system prompt, and the same conversation history on every turn pays full freight for every token, every time. With caching turned on, those repeated prefixes drop to ten percent of the base input price, and the eighty-dollar day quietly becomes an eight-hundred-dollar day that never happened.

The mechanics, as of April 2026, are worth understanding in concrete terms. Anthropic charges 1.25x the base input price to write a five-minute ephemeral cache entry, or 2x to write a one-hour entry, and then 0.1x to read from either. That means a five-minute cache pays for itself after a single hit, and a one-hour cache pays for itself after two. Claude Code, the TUI most readers of this newsletter are probably running, handles all of this automatically — it places cache breakpoints on the system prompt, on tool definitions, and on the rolling conversation history, and it slides the breakpoint forward as the session grows. You do not configure it, you do not annotate your messages, you simply benefit. Heavy users routinely report that ninety percent or more of their input tokens are cache reads rather than cache writes once a session warms up.

This is exactly why a loop matters so much, and why DeLong's eight-hour Isaac576Bot run is not the extravagance it first appears to be. An agentic loop — whether it is the /loop slash command running a prompt every ten minutes, a scheduled cron trigger spinning up a remote agent, or just a long interactive Claude Code session where you keep asking follow-ups against the same growing context — is the single workflow shape that benefits most from caching, because every iteration after the first one is mostly cache reads. The token tsunami DeLong describes is real, but the bill only drowns you if you let the cache expire between turns. Five-minute TTL is the default, which is why tight loops feel cheap and why a session you walk away from for fifteen minutes feels expensive when you come back: the prefix has to be rewritten from scratch.

The practical implications for a workflow built around Claude Code, loops, and scheduled agents are straightforward. Keep your sessions warm — if you know you will be back in two minutes, stay in the same conversation rather than starting a fresh one. For recurring scheduled work, consider whether the one-hour TTL is worth the 2x write premium; if your cron fires every forty-five minutes against the same context, it almost certainly is. Put stable material — system prompts, large reference documents, the contents of files you keep re-reading — early in the conversation, because the cache works on prefixes and anything before the breakpoint is what gets reused. And do not be afraid of long sessions. The instinct to start over and "clean up" the context is almost always the wrong move economically; a hundred-thousand-token conversation that has been hot for an hour is one of the cheapest things you can do with this tool, not one of the most expensive.

The deeper point, and the one DeLong gestures at without quite naming, is that caching is what makes the eccentric-roommate metaphor economically coherent. A roommate you have to re-introduce yourself to every morning is not a roommate, it is a stranger you keep meeting. Caching is the substrate that lets the relationship persist across turns at a price that does not bankrupt either party. The token tsunami is real, but so is the seawall, and most of the people complaining about the flood have not noticed that the seawall is already built into the tool they are using.

Bottom line in three sentences. Claude Code's TUI already does prompt caching automatically, dropping repeated context to ten percent of base input price, which is why DeLong's eight-hour Isaac576Bot run cost eighty dollars instead of eight hundred. The cheapest workflow shape is a long, warm, looping session against a stable prefix — keep sessions hot, put stable material early, and consider one-hour TTL for cron-driven agents that fire less often than every five minutes. The token tsunami is real, but caching is the seawall, and it is on by default for anyone using Claude Code rather than rolling their own API calls.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment