kieranklaassen/2026-01-23-feat-claude-code-multi-agent-orchestration-plan.md

Claude Code TeammateTool - Source Code Analysis

This is not a proposal. This documents existing but hidden functionality found in Claude Code v2.1.19 binary, plus speculation on how it could be used.

Executive Summary

TeammateTool already exists in Claude Code. We extracted this from the compiled binary at ~/.local/share/claude/versions/2.1.19 using strings analysis. The feature is fully implemented but gated behind feature flags (I9() && qFB()).

Part 1: What We Found in the Binary

How We Found This

# Location
~/.local/share/claude/versions/2.1.19  # Mach-O 64-bit executable

# Extract strings mentioning TeammateTool
strings ~/.local/share/claude/versions/2.1.19 | grep -i "TeammateTool"

# Extract team_name references
strings ~/.local/share/claude/versions/2.1.19 | grep -i "team_name"

TeammateTool Operations (Confirmed in Source)

Operation	Purpose
`spawnTeam`	Create a new team, become leader
`discoverTeams`	List available teams to join
`requestJoin`	Ask to join an existing team
`approveJoin`	Leader accepts a join request
`rejectJoin`	Leader declines a join request
`write`	Send message to specific teammate
`broadcast`	Send message to all teammates
`requestShutdown`	Ask a teammate to shut down
`approveShutdown`	Accept shutdown and exit
`rejectShutdown`	Decline shutdown, keep working
`approvePlan`	Leader approves teammate's plan
`rejectPlan`	Leader rejects plan with feedback
`cleanup`	Remove team directories

Error Messages (Verbatim from Binary)

"team_name is required for spawn operation. Either provide team_name in input
 or call spawnTeam first to establish team context."

"team_name is required for broadcast operation. Either provide team_name in input,
 set CLAUDE_CODE_TEAM_NAME, or create a team with spawnTeam first."

"proposed_name is required for requestJoin operation."

"does not exist. Call spawnTeam first to create the team."

Environment Variables (Confirmed)

Variable	Purpose
`CLAUDE_CODE_TEAM_NAME`	Current team context
`CLAUDE_CODE_AGENT_ID`	Agent identifier
`CLAUDE_CODE_AGENT_NAME`	Agent display name
`CLAUDE_CODE_AGENT_TYPE`	Agent role/type
`CLAUDE_CODE_PLAN_MODE_REQUIRED`	Whether plan approval needed

Feature Gating

isEnabled() {
  return I9() && qFB()  // Two feature flags must be true
}

Spawn Backends

Backend	Terminal	Use Case
iTerm2 split panes	Native macOS	Visual side-by-side agents
tmux windows	Cross-platform	Server/headless
In-process	None	Same process, fastest

File Structure

~/.claude/
├── teams/
│   └── {team-name}/
│       ├── config.json          # Team metadata, members
│       └── messages/            # Inter-agent mailbox
│           └── {session-id}/
├── tasks/
│   └── {team-name}/             # Team-scoped tasks
│       ├── 1.json
│       └── ...

Part 2: Speculative Use Cases

Everything below is speculation based on how the API could be used once enabled.

Use Case 1: The Code Review Swarm

Scenario: You open a PR and want thorough review from multiple perspectives.

You: "Review PR #1588 with a full team"

Claude (Leader):
  └── spawnTeam("pr-review-1588")
  └── spawn("security-sentinel", prompt="Review for vulnerabilities")
  └── spawn("performance-oracle", prompt="Check for N+1 queries, memory leaks")
  └── spawn("rails-expert", prompt="Check Rails conventions")
  └── spawn("test-coverage", prompt="Verify test coverage is adequate")

  [All agents work in parallel, each in their own iTerm2 pane]

  Leader polls for completion, aggregates findings:
  └── broadcast("Wrap up, send your findings")
  └── [Collects responses via inbox]
  └── requestShutdown("security-sentinel")
  └── requestShutdown("performance-oracle")
  └── ...
  └── cleanup()

  Leader: "Here's the consolidated review with 3 critical, 5 moderate findings..."

What you'd see: 5 terminal panes, each showing a different agent working. The leader coordinates and synthesizes.

Use Case 2: The Feature Factory

Scenario: Build a complete feature with specialized agents for each layer.

You: "Build user authentication with OAuth"

Claude (Leader):
  └── spawnTeam("auth-feature")

  Phase 1 - Planning:
  └── spawn("architect", prompt="Design the OAuth flow", plan_mode_required=true)
  └── [architect creates plan, sends plan_approval_request]
  └── approvePlan("architect", request_id="...")

  Phase 2 - Implementation (parallel):
  └── spawn("backend-dev", prompt="Implement OAuth controller and models")
  └── spawn("frontend-dev", prompt="Build login UI components")
  └── spawn("test-writer", prompt="Write integration tests", blockedBy=["backend-dev"])

  Phase 3 - Integration:
  └── write("backend-dev", "Frontend is using /auth/callback endpoint")
  └── write("frontend-dev", "Backend expects redirect_uri param")

  Phase 4 - Verification:
  └── spawn("qa-agent", prompt="Run full test suite and verify flow")
  └── broadcast("QA found issues in session handling, please fix")

  Phase 5 - Shutdown:
  └── requestShutdown("backend-dev")
  └── [backend-dev]: approveShutdown()  // Done with work
  └── requestShutdown("frontend-dev")
  └── [frontend-dev]: rejectShutdown(reason="Still fixing CSS")  // Not done
  └── [Leader waits, retries later]

The magic: Agents communicate, block on dependencies, and the leader orchestrates without micromanaging.

Use Case 3: The Bug Hunt Squad

Scenario: A production bug needs investigation from multiple angles.

You: "Users report checkout fails intermittently"

Claude (Leader):
  └── spawnTeam("bug-hunt-checkout")

  Investigation (parallel):
  └── spawn("log-analyst", prompt="Search AppSignal for checkout errors")
  └── spawn("code-archaeologist", prompt="git log -p on checkout paths")
  └── spawn("reproducer", prompt="Try to reproduce in test environment")
  └── spawn("db-detective", prompt="Check for data anomalies in orders table")

  [Agents work independently, report findings to leader]

  log-analyst → write("team-lead", "Found timeout errors correlating with 3rd party API")
  code-archaeologist → write("team-lead", "Recent change to retry logic looks suspicious")
  reproducer → write("team-lead", "Reproduced! Happens when API returns 503")

  Leader synthesizes:
  └── "Root cause: retry logic doesn't handle 503 correctly.
       code-archaeologist, please prepare a fix."
  └── write("code-archaeologist", "Implement exponential backoff for 503 responses")

  [Fix implemented, verified, PR created]
  └── broadcast("Bug fixed, shutting down")
  └── cleanup()

Use Case 4: The Self-Organizing Refactor

Scenario: Large refactoring with automatic work distribution.

You: "Refactor all service objects to use the new BaseService pattern"

Claude (Leader):
  └── spawnTeam("service-refactor")

  Discovery:
  └── spawn("scout", prompt="Find all service objects that need refactoring")
  └── [scout returns list of 47 services]

  Work Distribution:
  └── Creates 47 tasks with TaskCreate
  └── spawn("worker-1", prompt="Refactor services, claim tasks from list")
  └── spawn("worker-2", prompt="Refactor services, claim tasks from list")
  └── spawn("worker-3", prompt="Refactor services, claim tasks from list")

  [Workers autonomously claim tasks via TaskUpdate]
  worker-1: TaskUpdate(taskId="12", status="in_progress", owner="worker-1")
  worker-2: TaskUpdate(taskId="7", status="in_progress", owner="worker-2")

  [If worker-1 crashes, heartbeat timeout releases its task]
  [worker-3 claims the abandoned task]

  Verification:
  └── spawn("verifier", prompt="Run tests after each refactored service")
  └── [verifier monitors completed tasks, runs tests]

  [All 47 tasks complete]
  └── broadcast("All services refactored, final test run passing")
  └── cleanup()

Key insight: Workers self-organize around a shared task queue. No central assignment needed.

Use Case 5: The Research Council

Scenario: Evaluate multiple technical approaches before committing.

You: "Should we use Redis or PostgreSQL for our job queue?"

Claude (Leader):
  └── spawnTeam("tech-evaluation")

  └── spawn("redis-advocate", prompt="Make the case FOR Redis. Research benchmarks, patterns.")
  └── spawn("postgres-advocate", prompt="Make the case FOR PostgreSQL. Research benchmarks, patterns.")
  └── spawn("devil-advocate", prompt="Find problems with BOTH approaches in our context.")
  └── spawn("cost-analyst", prompt="Compare operational costs, hosting, maintenance.")

  [Each agent researches independently]

  Debate Phase:
  └── broadcast("Present your findings. Respond to each other's points.")

  redis-advocate → broadcast("Redis is 10x faster for queue operations")
  postgres-advocate → broadcast("But we already run Postgres, no new infrastructure")
  devil-advocate → broadcast("Redis advocate ignores connection pool limits")
  cost-analyst → broadcast("Redis adds $200/mo, Postgres is free")

  Leader synthesizes:
  └── "Recommendation: Use PostgreSQL with SKIP LOCKED pattern.
       Redis performance benefits don't justify operational complexity
       for our 10k jobs/day scale."

  └── cleanup()

Use Case 6: The Deployment Guardian

Scenario: Automated pre-deployment verification with multiple checkpoints.

You: "Deploy to production with full verification"

Claude (Leader):
  └── spawnTeam("deploy-2026-01-23")

  Pre-flight (parallel, all must pass):
  └── spawn("test-runner", prompt="Run full test suite")
  └── spawn("security-scan", prompt="Run Brakeman and bundler-audit")
  └── spawn("migration-check", prompt="Verify migrations are safe and reversible")
  └── spawn("perf-baseline", prompt="Capture current performance metrics")

  [All agents must approveShutdown before proceeding]

  Gate Check:
  └── IF any agent rejectShutdown with failures → abort deployment
  └── ELSE proceed

  Deploy:
  └── spawn("deployer", prompt="Run cap production deploy")

  Post-deploy (parallel):
  └── spawn("smoke-tester", prompt="Hit critical endpoints, verify responses")
  └── spawn("perf-compare", prompt="Compare metrics to baseline")
  └── spawn("log-watcher", prompt="Monitor for error spikes for 5 minutes")

  [If any post-deploy check fails]
  └── broadcast("ROLLBACK REQUIRED")
  └── spawn("rollback-agent", prompt="Execute rollback procedure")

  Success:
  └── "Deployment complete. All checks passed."
  └── cleanup()

Use Case 7: The Living Documentation Team

Scenario: Keep documentation in sync with code changes automatically.

You: "Update all docs affected by the API changes in this PR"

Claude (Leader):
  └── spawnTeam("docs-sync")

  Analysis:
  └── spawn("change-detector", prompt="Identify all API changes in PR #1590")
  └── [Returns: 3 new endpoints, 2 modified, 1 deprecated]

  Documentation (parallel):
  └── spawn("api-docs", prompt="Update OpenAPI spec for changed endpoints")
  └── spawn("readme-updater", prompt="Update README examples")
  └── spawn("changelog-writer", prompt="Add changelog entry")
  └── spawn("migration-guide", prompt="Write migration guide for deprecated endpoint")

  Review:
  └── spawn("docs-reviewer", prompt="Check all doc changes for accuracy and style")
  └── [reviewer sends feedback via write() to specific agents]

  └── cleanup()

Use Case 8: The Infinite Context Window

Scenario: Work on a massive codebase that exceeds context limits.

You: "Understand this entire 500-file codebase and answer questions"

Claude (Leader):
  └── spawnTeam("codebase-brain")

  Specialists (each handles a domain):
  └── spawn("models-expert", prompt="Become expert on app/models/")
  └── spawn("controllers-expert", prompt="Become expert on app/controllers/")
  └── spawn("services-expert", prompt="Become expert on app/services/")
  └── spawn("jobs-expert", prompt="Become expert on app/jobs/")
  └── spawn("tests-expert", prompt="Become expert on test/")

  [Each agent reads and indexes their domain]

  Query Routing:
  You: "How does user authentication work?"

  Leader:
  └── broadcast("Who knows about authentication?")
  └── controllers-expert: "I handle SessionsController"
  └── models-expert: "I handle User model with has_secure_password"
  └── services-expert: "I handle AuthenticationService"

  Leader:
  └── write("controllers-expert", "Explain the login flow")
  └── write("models-expert", "Explain the User auth methods")
  └── write("services-expert", "Explain AuthenticationService")
  └── [Synthesizes responses]

  [Team persists across questions - no re-reading needed]

The breakthrough: Each agent maintains context for their domain. Combined, they "know" the entire codebase.

Part 3: Predicted Interaction Patterns

The Leader Pattern

Leader creates team → Leader spawns workers → Workers report to leader → Leader synthesizes

Most common. One orchestrator, multiple specialists.

The Swarm Pattern

Leader creates team + tasks → Workers self-assign from task queue → Leader monitors

For embarrassingly parallel work. Workers are interchangeable.

The Pipeline Pattern

Agent A (blockedBy: []) → Agent B (blockedBy: [A]) → Agent C (blockedBy: [B])

Sequential processing with handoffs. Each agent waits for predecessor.

The Council Pattern

Multiple agents with same task → Each proposes solution → Leader picks best

For decisions where you want diverse perspectives.

The Watchdog Pattern

Worker agent does task → Watcher agent monitors → Watcher can trigger rollback

For critical operations needing safety checks.

Part 4: What Could Go Wrong (And How It's Handled)

Failure Mode	How System Handles It
Agent crashes mid-task	Heartbeat timeout (5min) releases task
Leader crashes	Workers complete current work, then idle
Infinite loop in agent	`requestShutdown` → timeout → force kill
Deadlocked dependencies	Cycle detection at task creation
Agent refuses shutdown	Timeout → forced termination
Resource exhaustion	Max agents per team limit

Part 5: Verification Commands

Confirm this exists on your system:

# Check Claude Code version
claude --version

# Find TeammateTool references
strings ~/.local/share/claude/versions/$(claude --version | cut -d' ' -f1) \
  | grep "TeammateTool" | head -5

# Find all operations
strings ~/.local/share/claude/versions/$(claude --version | cut -d' ' -f1) \
  | grep -E "spawnTeam|discoverTeams|requestJoin|approveJoin" | head -20

# Find environment variables
strings ~/.local/share/claude/versions/$(claude --version | cut -d' ' -f1) \
  | grep "CLAUDE_CODE_TEAM" | head -10

Conclusion

The future of Claude Code is multi-agent. The infrastructure exists:

13 TeammateTool operations
File-based coordination
Three spawn backends
Inter-agent messaging
Plan approval workflows
Graceful shutdown protocol

It's waiting behind feature flags. When enabled, we'll see:

Code review swarms
Feature development teams
Self-organizing refactors
Research councils
Deployment guardians
Distributed codebase understanding

The primitives are there. The creativity is up to us.

Analysis: 2026-01-23 Claude Code: v2.1.19 Binary: ~/.local/share/claude/versions/2.1.19

andrewleech · 2026-01-29T05:04:18Z

Sneaky github... ok I just git commit'ed it to a new gist :-D
Download it here: teammate-tool.tar.gz

Ah, but a quick test looks like the patch target patterns in the latest claude update have changed enough that this patcher needs updating... I'll update it once my limits reset later on tonight

@jfichtner - 2 acccounts is one through work, one personal

karlprasher012 · 2026-01-29T05:51:34Z

@andrewleech thank you for this! Would it not work until this happens? "anthropic server side feature flag
both needed to be enabled to access the tools."
So I am not sure if I am understanding you correctly - what is the point of the download if the feature cannot be tested?
ps. Appreciate your work on this, regardless!

andrewleech · 2026-01-29T19:45:27Z

@karlprasher012 the point of the patch is to work around the checks of the server side flag. It was working, but then the claude update broke the original patch. I thought it was with sharing regardless; anyone else could point their claude at my patch pack and ask to update it's compatibility for their installed claude.

That's basically what I just did and it found there's new checks added; now they're expecting even max users to have extra usage enabled to allow access to the team tool. My new patch works around that check too.

https://gist.github.com/andrewleech/aff6fc092b66ae5333c2294289b8d197/raw/e9851c430e9e39d60634f15d36a6ee0d46250e11/teammate-tool.tar.gz

2.1.23 (Claude Code)

andrewleech · 2026-01-30T07:19:43Z

This feature is under fast development, seems it's a bit of a cat and mouse game going on.
I think this patch may be a little more version robustwe'll see

v2.1.25

https://gist.github.com/andrewleech/aff6fc092b66ae5333c2294289b8d197/raw/4a7bd5d2feb2437f339744e7d75a1fcc366fa19b/teammate-tool.tar.gz

kieranklaassen/2026-01-23-feat-claude-code-multi-agent-orchestration-plan.md

Claude Code TeammateTool - Source Code Analysis

Executive Summary

Part 1: What We Found in the Binary

How We Found This

TeammateTool Operations (Confirmed in Source)

Error Messages (Verbatim from Binary)

Environment Variables (Confirmed)

Feature Gating

Spawn Backends

File Structure

Part 2: Speculative Use Cases

Use Case 1: The Code Review Swarm

Use Case 2: The Feature Factory

Use Case 3: The Bug Hunt Squad

Use Case 4: The Self-Organizing Refactor

Use Case 5: The Research Council

Use Case 6: The Deployment Guardian

Use Case 7: The Living Documentation Team

Use Case 8: The Infinite Context Window

Part 3: Predicted Interaction Patterns

The Leader Pattern

The Swarm Pattern

The Pipeline Pattern

The Council Pattern

The Watchdog Pattern

Part 4: What Could Go Wrong (And How It's Handled)

Part 5: Verification Commands

Conclusion

andrewleech commented Jan 29, 2026

Uh oh!

karlprasher012 commented Jan 29, 2026

Uh oh!

andrewleech commented Jan 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

andrewleech commented Jan 30, 2026

Uh oh!

andrewleech commented Jan 29, 2026 •

edited

Loading