Skip to content

Instantly share code, notes, and snippets.

@mergeconflict
Created March 15, 2026 17:56
Show Gist options
  • Select an option

  • Save mergeconflict/dfd83fd8f6fd8824410e46b5286b1bc3 to your computer and use it in GitHub Desktop.

Select an option

Save mergeconflict/dfd83fd8f6fd8824410e46b5286b1bc3 to your computer and use it in GitHub Desktop.

context-mode analysis

What context-mode actually is

It's two things working together:

  1. A Claude Code hook system (PreToolUse/PostToolUse) that intercepts and redirects your normal tool calls
  2. An MCP server providing "sandbox" execution tools with output truncation and an FTS5 search index

The claimed problem

"MCP tool calls return large outputs into the context window, burning through tokens." They cite 143K tokens consumed by 81+ tools before a user message, and 40% context loss after 30 minutes.

How it actually works

The hook interception layer

The PreToolUse hook in routing.mjs does the following:

  • curl/wget in Bash: Replaces the command entirely with an echo message telling Claude to use context-mode's tool instead
  • WebFetch: Flat-out denies the call with a message redirecting to ctx_fetch_and_index
  • Gradle/Maven builds: Replaces with echo redirect
  • Read: Injects XML "guidance" once per session nudging Claude toward ctx_execute_file
  • Grep: Same — nudge toward sandbox
  • Agent/Task: Appends a ~1.5KB XML routing block to every subagent prompt, instructing it to use context-mode tools and keep responses under 500 words

The MCP tools

  • ctx_execute — Runs code in a subprocess. Applies smartTruncate (keep first 60% + last 40% by bytes). If an intent parameter is provided and output exceeds 5KB, it indexes the output into SQLite FTS5 and returns only section titles + one-line previews.
  • ctx_execute_file — Same, but reads a file into a FILE_CONTENT variable first. Claude writes a script to "summarize" the file, and only the script's stdout enters context.
  • ctx_index / ctx_search — FTS5 BM25 knowledge base. Index content, then search it later.
  • ctx_fetch_and_index — Fetch a URL, index it, return a summary.

Why the "98% savings" claim is misleading

The benchmarks compare raw data size vs the summary that context-mode returns. This is a meaningless comparison. Here's why:

Access log (500 req): 45.1 KB → 155 B (100% saved) — The LLM writes a script like console.log(lines.length + " requests"). Of course the summary is smaller. You threw away the data. You could do the same with wc -l access.log in Bash without any special tool.

Analytics CSV (500 rows): 85.5 KB → 222 B (100% saved) — Same trick. The script computes aggregates. The raw CSV was never a sensible thing to put in context, with or without this tool.

GitHub Issues (20): 58.9 KB → 1.1 KB (98% saved) — You get titles and counts. But if you actually need to read an issue's content (which is usually why you fetched issues), you call ctx_search, and context starts accumulating again. That follow-up cost is never benchmarked.

The fundamental sleight of hand: The tool doesn't "save" context — it discards information and returns summaries. Measuring "savings" as 1 - (summary_size / raw_size) is like saying "I saved 99% of the book by reading only the table of contents." The real question is whether the LLM can do its job with only the summary. When it can't, it makes additional search calls, each consuming context. The benchmark never measures this round-trip cost.

Other concerns

The "sandbox" isn't one

The subprocess environment in executor.ts passes through: GH_TOKEN, GITHUB_TOKEN, AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, AWS_SESSION_TOKEN, KUBECONFIG, DOCKER_HOST, SSH_AUTH_SOCK, NPM_TOKEN, and dozens more. It's just child_process.spawn with a curated env. Calling this "sandboxed" is generous — it's a subprocess with truncated output.

The hook system is aggressive and opaque

It silently replaces Bash commands, denies WebFetch, and injects instructions into every subagent prompt. The user doesn't see this happening — the hook runs before the tool executes. If Claude tries curl https://api.example.com/data, the command is silently replaced with an echo saying "use context-mode instead."

The self-heal code is concerning

The pretooluse.mjs hook contains a "self-heal" block that silently rewrites ~/.claude/plugins/installed_plugins.json and ~/.claude/settings.json to update paths and versions. A Claude Code hook modifying Claude Code's own configuration files is a trust boundary violation.

Context overhead from context-mode itself

  • 9 additional MCP tool schemas added to every conversation
  • The ~1.5KB ROUTING_BLOCK XML injected into every Agent/Task prompt
  • Guidance nudges injected on first Read/Grep/Bash calls
  • Additional tool calls (execute + search vs. a single read) adding per-call overhead

License misrepresentation

The website says "Open Source" but the license is Elastic License 2.0, which the OSI does not recognize as open source. It restricts competitive use.

What IS genuinely useful

  1. smartTruncate (head 60% + tail 40%) — Keeping the beginning and end of output is a good pattern for logs/errors. This is ~50 lines of code and the most practically valuable idea.

  2. The FTS5 index-then-search pattern — For genuinely large documents (API docs, long READMEs), indexing and searching on demand is sound. But the "savings" depend entirely on how many searches you need.

  3. The general principle — "Don't dump 85KB of CSV into context" is good advice. But you don't need a tool for that; you need a | head -20 or | wc -l.

Bottom line

Context-mode takes a reasonable observation (don't waste context on raw data dumps) and wraps it in an aggressive interception layer with inflated benchmarks. The "98% savings" number is measuring summary compression ratios, not actual context efficiency in real workflows. The tool adds complexity, latency, additional tool schemas, and injected prompt overhead — costs that the benchmarks never account for.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment