It's two things working together:
- A Claude Code hook system (
PreToolUse/PostToolUse) that intercepts and redirects your normal tool calls - An MCP server providing "sandbox" execution tools with output truncation and an FTS5 search index
"MCP tool calls return large outputs into the context window, burning through tokens." They cite 143K tokens consumed by 81+ tools before a user message, and 40% context loss after 30 minutes.
The PreToolUse hook in routing.mjs does the following:
curl/wgetin Bash: Replaces the command entirely with anechomessage telling Claude to use context-mode's tool instead- WebFetch: Flat-out denies the call with a message redirecting to
ctx_fetch_and_index - Gradle/Maven builds: Replaces with echo redirect
- Read: Injects XML "guidance" once per session nudging Claude toward
ctx_execute_file - Grep: Same — nudge toward sandbox
- Agent/Task: Appends a ~1.5KB XML routing block to every subagent prompt, instructing it to use context-mode tools and keep responses under 500 words
ctx_execute— Runs code in a subprocess. AppliessmartTruncate(keep first 60% + last 40% by bytes). If anintentparameter is provided and output exceeds 5KB, it indexes the output into SQLite FTS5 and returns only section titles + one-line previews.ctx_execute_file— Same, but reads a file into aFILE_CONTENTvariable first. Claude writes a script to "summarize" the file, and only the script's stdout enters context.ctx_index/ctx_search— FTS5 BM25 knowledge base. Index content, then search it later.ctx_fetch_and_index— Fetch a URL, index it, return a summary.
The benchmarks compare raw data size vs the summary that context-mode returns. This is a meaningless comparison. Here's why:
Access log (500 req): 45.1 KB → 155 B (100% saved) — The LLM writes a script like console.log(lines.length + " requests"). Of course the summary is smaller. You threw away the data. You could do the same with wc -l access.log in Bash without any special tool.
Analytics CSV (500 rows): 85.5 KB → 222 B (100% saved) — Same trick. The script computes aggregates. The raw CSV was never a sensible thing to put in context, with or without this tool.
GitHub Issues (20): 58.9 KB → 1.1 KB (98% saved) — You get titles and counts. But if you actually need to read an issue's content (which is usually why you fetched issues), you call ctx_search, and context starts accumulating again. That follow-up cost is never benchmarked.
The fundamental sleight of hand: The tool doesn't "save" context — it discards information and returns summaries. Measuring "savings" as 1 - (summary_size / raw_size) is like saying "I saved 99% of the book by reading only the table of contents." The real question is whether the LLM can do its job with only the summary. When it can't, it makes additional search calls, each consuming context. The benchmark never measures this round-trip cost.
The subprocess environment in executor.ts passes through: GH_TOKEN, GITHUB_TOKEN, AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, AWS_SESSION_TOKEN, KUBECONFIG, DOCKER_HOST, SSH_AUTH_SOCK, NPM_TOKEN, and dozens more. It's just child_process.spawn with a curated env. Calling this "sandboxed" is generous — it's a subprocess with truncated output.
It silently replaces Bash commands, denies WebFetch, and injects instructions into every subagent prompt. The user doesn't see this happening — the hook runs before the tool executes. If Claude tries curl https://api.example.com/data, the command is silently replaced with an echo saying "use context-mode instead."
The pretooluse.mjs hook contains a "self-heal" block that silently rewrites ~/.claude/plugins/installed_plugins.json and ~/.claude/settings.json to update paths and versions. A Claude Code hook modifying Claude Code's own configuration files is a trust boundary violation.
- 9 additional MCP tool schemas added to every conversation
- The ~1.5KB
ROUTING_BLOCKXML injected into every Agent/Task prompt - Guidance nudges injected on first Read/Grep/Bash calls
- Additional tool calls (execute + search vs. a single read) adding per-call overhead
The website says "Open Source" but the license is Elastic License 2.0, which the OSI does not recognize as open source. It restricts competitive use.
-
smartTruncate(head 60% + tail 40%) — Keeping the beginning and end of output is a good pattern for logs/errors. This is ~50 lines of code and the most practically valuable idea. -
The FTS5 index-then-search pattern — For genuinely large documents (API docs, long READMEs), indexing and searching on demand is sound. But the "savings" depend entirely on how many searches you need.
-
The general principle — "Don't dump 85KB of CSV into context" is good advice. But you don't need a tool for that; you need a
| head -20or| wc -l.
Context-mode takes a reasonable observation (don't waste context on raw data dumps) and wraps it in an aggressive interception layer with inflated benchmarks. The "98% savings" number is measuring summary compression ratios, not actual context efficiency in real workflows. The tool adds complexity, latency, additional tool schemas, and injected prompt overhead — costs that the benchmarks never account for.