cedrickchee/codex-claude-code-subagents-setup.md

SWE, Stop Using Codex and Claude Code Like Chatbots

Most software engineers are using Codex and Claude Code like they are opening a fresh chat window and hiring a new intern every time.

New thread. New prompt. Same repo. Same rediscovery tax.

The main agent has to re-learn the codebase, re-infer the architecture, and re-guess what matters. Then people wonder why results are inconsistent, slow, and fragile.

That is the wrong operating model.

The better setup is simple:

Keep one long-lived orchestrator thread per repo.

That thread does not write production code by default. Its job is to understand the codebase, maintain working memory, break work down, delegate to subagents, review results, and preserve the decisions that matter.

Implementation gets pushed to subagents with explicit scope and verification.

Once you do this, the agent stops starting from zero.

It starts compounding.

Codex is designed to adapt to project structure and conventions, and OpenAI explicitly recommends making guidance reusable through AGENTS.md instead of restating it manually. Claude Code similarly loads project memory at the start of conversations and supports custom subagents for specialized work. (OpenAI Developers)

The mistake most people make

The default workflow looks like this:

“Add feature X.” “Fix bug Y.” “Refactor module Z.”

Each task starts in a fresh thread. The main agent does the exploration, the planning, the implementation, and the review all in one place.

That feels convenient, but it collapses four different jobs into one context: repo understanding, memory, execution, and quality control.

It also means the agent keeps paying the same setup cost over and over.

You keep re-explaining the repo. The agent keeps rediscovering boundaries. Small mistakes pile up because nothing durable gets normalized.

The problem is not model intelligence.

The problem is operating the model with no control plane.

The better model: one orchestrator, many workers

Use the main thread as the repo orchestrator.

Its job is to:

learn the architecture
keep a living summary of conventions, decisions, and fragile areas
decompose work into bounded tasks
delegate implementation to subagents
review what comes back
update memory so future work gets easier

Then use subagents as scoped workers.

Each subagent should get a tight contract: what to do, what files it owns, what it must not touch, what conventions to follow, and how to verify the result before handing it back.

This is exactly where subagents shine. Claude Code’s subagent system is built around separate delegated contexts with their own descriptions and patterns, and OpenAI’s prompt guidance similarly emphasizes explicit output contracts, verification loops, and precise completion criteria for reliable agent performance. (Claude)

The point is not to create more prompts.

The point is to separate roles.

Your main agent should act like a tech lead. Your subagents should act like scoped implementers.

Why this works better

A long-lived orchestrator thread changes the economics of agentic coding.

Instead of spending tokens rediscovering the repo, the system spends tokens executing work against a growing internal map.

Instead of one giant, messy context, you get: durable repo guidance, living orchestrator memory, and clean task-level delegation.

Instead of the main thread becoming bloated and confused, it becomes more useful over time because it accumulates the exact things that matter: architecture, conventions, known pitfalls, and prior decisions.

Anthropic’s memory guidance explicitly notes that project memory is loaded at the start of each conversation and works best when it is specific and concise. OpenAI’s Codex guidance makes the same broader point: once a prompting pattern works, stop repeating it manually and encode it in AGENTS.md. (Claude)

This gives you four immediate gains:

1. Less repo rediscovery

The agent stops wasting time re-learning the same system.

2. Better delegation

Subagents get clearer prompts, narrower scope, and less room to wander.

3. More parallelism

Independent tasks can run in parallel without forcing one thread to juggle everything.

4. Better quality control

The orchestrator reviews returned work against scope, constraints, and verification instead of blindly trusting the first patch.

The key insight

The real upgrade is not “use a better prompt.”

It is this:

Stop using your main coding agent as a coder.

Use it as the control plane for the repo.

That is the shift.

Once the primary agent becomes a strategist, memory system, and dispatcher, everything else gets cleaner: planning, execution, reviews, follow-ups, and handoffs.

Where people still get it wrong

Even after discovering this pattern, people usually make one of two mistakes.

Mistake 1: they keep everything in chat

That is fragile.

Some knowledge belongs in durable repo files, not in the thread: commands, conventions, architecture notes, testing rules, forbidden areas, and definition of done.

For Codex, that belongs in AGENTS.md. For Claude Code, that belongs in CLAUDE.md. Both are designed to carry reusable project instructions across sessions. (OpenAI Developers)

Mistake 2: their subagent prompts are vague

“Go fix this.” “Look into this.” “Refactor this area.”

That is not delegation. That is abdication.

Subagents need a real contract: goal, scope, constraints, verification, and return format.

Especially with smaller or faster models, explicitness matters more, not less. OpenAI’s prompt guidance calls this out directly: smaller models are less likely to infer missing steps, so prompts need clearer structure and completion criteria. (OpenAI Developers)

Here is the version I would actually use.

You are the orchestration agent for this repository.

Your default role in this thread is not to implement code directly.
Your job is to:
1. understand the repo,
2. maintain working memory,
3. break work into scoped tasks,
4. delegate implementation to subagents,
5. review returned work,
6. preserve important learnings in durable repo memory.

## Operating rules

- Do not write production code in this thread unless I explicitly ask.
- Do not treat a partial read of the repo as full understanding.
- Do not leave important project knowledge trapped only in chat history.
- Do not send vague subagent prompts.

## First step: structured repo survey

Start with a targeted survey of the codebase. Do not make changes yet.

Produce a compact report with:
- repo purpose
- entry points
- major modules and boundaries
- key data flows
- build, test, lint, and dev commands
- important dependencies and external services
- coding conventions and patterns
- fragile or high-risk areas
- unclear areas and open questions
- recommended durable memory files

When surveying, prioritize high-signal files first:
README, package manifests, workspace config, build config, test config, app entry points, routing/setup files, and core modules.

After the survey, recommend:
- what should live in AGENTS.md
- what should live in CLAUDE.md
- whether the repo also needs an orchestrator memory file
- what categories of work should be delegated to dedicated subagents

Do not implement anything until I confirm the survey.

## Memory policy

Maintain a living summary of:
- architecture
- module boundaries
- conventions
- commands
- fragile areas
- decisions we have made
- active workstreams
- common failure modes

Continuously normalize this into concise reusable notes.

For durable memory, prefer repo files:
- AGENTS.md for Codex repo instructions
- CLAUDE.md for Claude Code project instructions
- optional: docs/agent-memory/orchestrator.md for evolving operational context

## Delegation policy

When I give you a task, do not implement it here by default.

Instead:
1. clarify the task,
2. identify exact scope,
3. decide whether one or multiple subagents are needed,
4. generate explicit subagent prompts,
5. review returned work,
6. update memory with what was learned.

Use separate subagents when tasks differ by domain, ownership, or verification path.

## Subagent prompt template

Every subagent prompt must include:

### Goal
Precise description of the task.

### Scope
Exact files or directories it may edit.

### Do not touch
Files, modules, or surfaces that are out of bounds.

### Context
Relevant architecture, conventions, and prior decisions.

### Constraints
Compatibility requirements, style rules, performance limits, migration boundaries, and API contracts.

### Verification
Exact commands or checks to run.
If verification cannot be run, say so clearly.

### Deliverable
Return:
- summary of changes
- files changed
- verification results
- risks or follow-ups
- open questions

## Review policy

When subagents return:
- check whether they stayed in scope
- check whether constraints were followed
- check whether verification actually matches the change
- check for adjacent damage
- extract durable learnings
- decide whether follow-up work is needed

Do not rubber-stamp output.

## Done criteria

A task is done only when:
- the requested goal is met
- scope rules were respected
- verification was completed or clearly explained
- important learnings were recorded
- I have a clean summary of what changed and what remains

## Compaction and recovery

Assume chat context may eventually be compacted.

Before important context is lost, preserve:
- architecture summary
- conventions
- commands
- decisions
- fragile areas
- active workstreams
- reusable subagent templates

Optimize for continuity across long-running work, not just success on the current turn.

Final takeaway

Most people are not bottlenecked by model quality.

They are bottlenecked by workflow quality.

If you keep restarting the agent from scratch, you are choosing amnesia. If you make the main thread do everything, you are choosing chaos. If you delegate with vague prompts, you are choosing rework.

The better pattern is straightforward:

persistent repo guidance, long-lived orchestration, scoped subagents, and explicit verification.

Once your main agent stops coding and starts orchestrating, the whole system gets sharper.

Less rediscovery. Better delegation. Cleaner reviews. More parallel work. More shipped per unit of attention.