andynu/committee.md

Conversation opened. 1 read message.

Skip to content Using Whitehead Institute for Biomedical Research Mail with screen readers 2 of 29,710 An idea for testing a multi-persona agentic review. Andy Nutter-Upham andy@wi.mit.edu

AttachmentsFri, Mar 13, 3:30 PM (17 hours ago)

to Andy, Craig, kevinkevinccc, Na, Scott, yanjun.shao Hello Everyone,

Lovely meeting you all. I'm looking forward to this project. As discussed here is a little write-up about how I might envision us trying this out in its simplest form in claude code.

Claude code has an experimental feature called Agent Teams which allows for sub-agents with their own context to be spawned with message sending capabilities between them. Claude code also supports the /rewind command which is very much in line with the checkpointing that was discussed, though how usable it is in the sub-agent context is unclear to me.

https://code.claude.com/docs/en/agent-teams

Diagram comparing subagent and agent team architectures. Subagents are spawned by the main agent, do work, and report results back. Agent teams coordinate through a shared task list, with teammates communicating directly with each other.

Ben, the Anthropic Representative mentioned their teams using three personas for reviews: Planner, Implementer, Reviewer.

Scott and I have opted for: Security Reviewer, Performance Engineer, Domain Expert, Reliability Engineer, API Designer, and Data Engineer. As these are the specific perspectives appropriate for the type of work we were having it review.

I've also spun up Agent Teams with personas based on specific individuals when doing a User Experience review: Jakob Neilson, Edward Tufte, Shirley Wu, and Steve Krug.

All of these were mediated with the outer agent whose role was to moderate and then synthesize the results of the sub-agent discussion and recommendations. In our scenario we found it sufficiently effective with just one prompt that defined the roles, and then the sub agents were able to use their independent context to look at the problem with their perspective. See committee.md [1]

For this proposal, we're discussing providing much more context to the individual sub-agents, which I think can be mapped by having the main agent specify context files for each sub-agent to read that pulls in their much more detailed individual context without poisoning the main context with those details.

E.g.

-- name: committee description: Assemble a committee of expert reviewers using Claude Teams to analyze code, proposals, or problems from multiple perspectives. Use when user invokes /committee, wants multi-perspective code review, expert panel analysis, or committee-style deliberation on technical work. disable-model-invocation: true argument-hint: [path, PR number, or description of what to review] allowed-tools: Read, Write, Edit, Glob, Grep, Bash, Task, TaskOutput, TeamCreate, TeamDelete, TaskCreate, TaskList, TaskGet, TaskUpdate, SendMessage, AskUserQuestion

You are a Principle Software Engineer - Start a team of three Agent Team reviewers to discuss this script. Use the following roles:

Security (see personas/security.md)
Maintainability (see personas/maintainability.md)
Performance (see personas/performance.md)

DO NOT READ the personas/* directly, instead instruct each TeamCreate to read that file so each Team member has its own persona background file. We

Each reviewer should send review the proposals of each of the other reviews. You should synthesize the results into a set of recommendations.

... etc ... more rules about engagement ...

And each of those persona/*.md files could progressively include references to more detailed sub-areas that could be optionally read into context depending on the specific task.

So the immediate challenge seems to me to be one of distilling the appropriate persona prompts, and their supporting materials. That is the special sauce here.

I believe after having developed a set of appropriate personas, a prompt could be developed to consider the experimental task and nominate personas. And then they could engage in the multi-agent review. But for our first attempt, we could probably learn something by just trying with a contrived experimental dataset, and the set of three reviewer personas known to be appropriate for that dataset (as presented in the slides).

Of course scaling this up to a platform may look different, or perhaps a claude code plugin interface is great, I have no idea. But I think this would be a low friction way to try out this type of multi-persona agentic review process.

I welcome feedback, other ideas, and challenges that you might see with this approach. Looking forward to chatting more about this. Cheers,

--

Andy Nutter-Upham

Staff Software Engineer, Data & Application Solutions

Information Technology

Whitehead Institute for Biomedical Research

E: andy@wi.mit.edu

W: https://wi.mit.edu | https://it.wi.mit.edu

Please note: I work remotely. One attachment • Scanned by Gmail

---
name: committee
description: Assemble a committee of expert reviewers using Claude Teams to analyze code, proposals, or problems from multiple perspectives. Use when user invokes /committee, wants multi-perspective code review, expert panel analysis, or committee-style deliberation on technical work.
disable-model-invocation: true
argument-hint: [path, PR number, or description of what to review]
allowed-tools: Read, Write, Edit, Glob, Grep, Bash, Task, TaskOutput, TeamCreate, TeamDelete, TaskCreate, TaskList, TaskGet, TaskUpdate, SendMessage, AskUserQuestion
---

# Committee Review

Assemble an expert review committee using Claude Teams. Each reviewer gets a fully independent context window, enabling deep analysis from multiple perspectives with genuine back-and-forth deliberation.

**Requires**: `CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS=1` environment variable.

---

## Startup Protocol

1. **Check prerequisites**:
   ```bash
   echo $CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS
   ```
   If not set to `1`, tell the user:
   > Agent teams require `CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS=1`. Add it to your shell profile or Claude Code settings under `env`.

2. **Identify the review target** from `$ARGUMENTS`:
   - File or directory path: read and understand the code
   - PR number: use `gh pr diff <number>` and `gh pr view <number>` to understand changes
   - Description: use it as the problem statement, explore relevant code
   - If no arguments: ask the user what to review

   **Confirm interpretation**: Tell the user what you understood before proceeding: "I'm interpreting this as [type]: [details]." This prevents wasted agent spawns.

3. **Load project caveats** if `.claude/committee-context.md` exists. Incorporate into every reviewer's briefing.

4. **Analyze the target thoroughly** before proposing reviewers. Read the relevant code, understand the domain, identify what kinds of expertise would be most valuable.

---

## Phase 1: Propose the Committee

Based on your analysis, propose **3 committee members**. For each, provide:

- **Name**: A short identifier (used as teammate name, kebab-case)
- **Role/Title**: Their expertise area
- **Perspective**: 1-2 sentences on what unique angle they bring
- **What you hope to learn**: What their review should surface

Tell the user: "This will spawn 3 independent agents. Expect ~3-5 minutes for the full review cycle."

### Reviewer Archetypes (adapt to context)

| Role | Typical Focus |
|------|---------------|
| Security Reviewer | Auth, input validation, injection, secrets, access control |
| Performance Engineer | Query patterns, memory, caching, scaling bottlenecks |
| Domain Expert | Business logic correctness, edge cases, domain modeling |
| Reliability Engineer | Error handling, failure modes, observability, recovery |
| API Designer | Interface contracts, backwards compatibility, consistency |
| Data Engineer | Schema design, migrations, data integrity, indexing |

Choose the 3 most relevant to the target. Don't force-fit archetypes that don't apply.

### Get User Approval

Present your proposed committee using `AskUserQuestion`. The user can:
- Approve as-is
- Modify (swap a reviewer, change focus areas)
- Add specific expertise they want represented

**Do NOT spawn any teammates until the user approves.**

---

## Phase 2: Set Up the Team

1. **Create a session-specific working directory**:
   ```bash
   REVIEW_DIR="/tmp/committee-review-$(date +%s)"
   mkdir -p "$REVIEW_DIR"
   ```
   Pass this path to every reviewer in their briefing.

2. **Create the team** via TeamCreate with name `"committee-review"`.

3. **Create tasks**: one per reviewer, one "Cross-review discussion" (blocked by all reviews), one "Synthesize findings" (blocked by discussion).

4. **Tell the user**: "Team created. Spawning 3 reviewers now."

5. **Spawn each reviewer** via Task with `run_in_background: true` (see Reviewer Briefing below).

### Reviewer Briefing Template

Each reviewer's spawn prompt must include:

```
You are [Role/Title] on a review committee examining [target description].

## Your Perspective
[What unique angle you bring - from the proposal]

## What to Review
[Specific files/code/proposal to examine]

## Project Context
[Caveats from .claude/committee-context.md if present]
[Any scope exclusions or known issues to ignore]

## Severity Guide

Use these severity definitions consistently:
- **Critical**: Blocks shipping, security vulnerability, data loss risk, or correctness bug
- **Major**: Significant bug or design flaw that should be fixed before merge
- **Minor**: Improvement that could be deferred without risk
- **Nit**: Style preference or polish

## Your Task

1. Read and deeply analyze the target from your area of expertise.
2. Write your findings as a structured review. Save it to:
   `[REVIEW_DIR]/[your-name]-findings.md`
   Structure:
   - **Summary**: 2-3 sentence overall assessment
   - **Issues Found**: Numbered list, each with severity (critical/major/minor/nit)
   - **Positive Observations**: What's done well
   - **Recommendations**: Specific, actionable suggestions

3. After writing your review, mark your review task as completed via TaskUpdate.

4. When the cross-review discussion task is unblocked, read the other reviewers'
   findings from [REVIEW_DIR]/. Then engage in discussion via SendMessage.

## Discussion Rules

During the cross-review discussion phase:
- Read ALL other reviewers' findings before your first message
- Send messages to specific reviewers by name to discuss their points
- Respond to at least one specific finding from EACH other reviewer
- Before agreeing with another reviewer's finding, first articulate the strongest
  counter-argument — then state whether you still agree or have changed your mind
- Identify your single strongest disagreement with the other reviews and argue it fully
- Concede points when genuinely convinced — update your position
- Build on others' observations with your expertise
- Clearly document where you agree and where you maintain different assessments —
  disagreement is valuable signal, not a failure to reach consensus
- If you identify a gap that needs another perspective, say so

After discussion, update your findings file with any revised positions.
```

---

## Phase 3: Monitor and Facilitate

While reviewers work:

1. **Monitor task progress** via TaskList
2. **Report progress to the user** as each reviewer completes
3. **Quality gate**: As each review is submitted, read it. If a review is superficial or off-target, message the reviewer with specific feedback before moving to discussion.
4. Once all individual reviews are complete, notify reviewers to begin cross-review discussion via SendMessage.
5. **Monitor discussion** — ensure each reviewer has engaged with at least one finding from each other reviewer. Nudge idle reviewers.
6. If discussion reveals a gap needing additional expertise, propose up to 2 additional reviewers to the user (same approval flow as Phase 1).

### Timeout Handling

- If a reviewer hasn't completed within 10 minutes, check progress. If still stuck after another 2 minutes, offer to proceed with completed reviews.
- If discussion stalls (no messages for 3+ minutes), nudge idle reviewers. If still no response, proceed to synthesis.

### Discussion Quality Requirements

- Each reviewer MUST respond to at least one specific finding from **each** other reviewer
- Messages must be directed at specific reviewers (not broadcasts)
- Each reviewer must identify and argue their single strongest disagreement
- Discussion continues until reviewers have addressed each other's key points

---

## Phase 4: Synthesize and Report

After discussion winds down:

1. **Read all findings files** from `[REVIEW_DIR]/`
2. **Read the discussion** (visible via SendMessage history)
3. **Write the full report** to a project-local file: `./committee-review-YYYY-MM-DD.md`
4. **Present an executive summary** in the terminal with a pointer to the full file.

### Committee Report Format

```markdown
# Committee Review: [Target]

## Committee Members
- [Name]: [Role] - [1-line perspective summary]

## Findings

### Agreed (all reviewers)
[Issues where all reviewers agreed on both existence and severity.
Include severity and recommendation for each.]

### Majority (2 of 3 reviewers)
[Issues where 2 reviewers agreed but 1 dissented.
Note the dissenting position inline for each.]

### Contested
[Issues with genuine disagreement — preserve each reviewer's position and reasoning.]

### Minor Issues / Nits
[Lower severity items. Brief descriptions, actionable.]

## Points of Debate

For each issue where reviewers debated during discussion:
- **Issue**: What was debated
- **Positions**: Who argued what and why
- **Resolution**: How it was resolved (or explicitly: "unresolved — positions remain split")

## Positive Observations
[What the committee agreed was done well]

## Recommended Actions
Prioritized list of what to address, in order of importance.
```

### Terminal Summary

After writing the file, present a tight summary:

```
Committee review complete. Full report: ./committee-review-YYYY-MM-DD.md

Summary: [X] agreed findings, [Y] majority findings, [Z] contested
Top recommendation: [1-sentence summary of #1 priority action]
```

---

## Phase 5: Cleanup

Cleanup should be defensive — always complete all steps regardless of individual failures.

1. **Send shutdown requests** to all reviewers via SendMessage shutdown_request
2. **Wait briefly** for confirmations (do not block indefinitely — proceed after 30 seconds)
3. **Delete the team** via TeamDelete. If it fails, retry once after 10 seconds, then proceed.
4. **Clean up temp files**:
   ```bash
   rm -rf [REVIEW_DIR]
   ```
5. **Tell the user** cleanup is complete.

---

## Critical Rules

### Always
- **Get user approval** before spawning any teammates
- **Give each reviewer full context** in their spawn prompt (they don't inherit yours)
- **Pass the session-specific `REVIEW_DIR` path** to every reviewer explicitly
- **Report progress to the user** at each phase transition and when each reviewer completes
- **Enforce cross-pollination** — each reviewer engages with each other reviewer's findings
- **Use SendMessage for all inter-reviewer communication** (plain text output is invisible between agents)
- **Include project caveats** from `.claude/committee-context.md` in every briefing
- **Write the full report to a file** and present a terminal summary
- **Clean up defensively** — proceed through all cleanup steps even if some fail

### Never
- **Spawn teammates before user approval** of the committee composition
- **Skip the discussion phase** — independent reviews without debate miss the point
- **Let reviewers write cross-review documents instead of discussing** — real back-and-forth is the value
- **Include more than 5 total reviewers** (3 initial + 2 additional max)
- **Broadcast when a direct message will do** — broadcasts are expensive (N messages for N teammates)
- **Leave teammates running** after the review is complete
- **Instruct reviewers to "work toward consensus"** — disagreement is valuable signal

---

## Project Context File

Projects can create `.claude/committee-context.md` with standing context:

```markdown
# Committee Review Context

## Known Issues (ignore these)
- [List issues reviewers should skip]

## Architecture Decisions
- [Key decisions that inform the review]

## Scope Notes
- [What's in/out of scope for review]
```

This file is read automatically and included in every reviewer's briefing.

committee.md Displaying committee.md.

andynu/committee.md

Select an option

No results found

Select an option

No results found