Skip to content

Instantly share code, notes, and snippets.

@raine
Created April 29, 2026 04:58
Show Gist options
  • Select an option

  • Save raine/95be94808b53b835542ad3ebe3044e17 to your computer and use it in GitHub Desktop.

Select an option

Save raine/95be94808b53b835542ad3ebe3044e17 to your computer and use it in GitHub Desktop.

The whole thing, end to end

Layer 1 — The coordinator (/phased-implement)

You invoked /phased-implement add support for https://github.com/QwenLM/qwen-code. That skill loads a coordinator that never writes source code itself — it only writes plans, spawns workers, and verifies their output. Its job is to turn one big task into a DAG of smaller phases that can be parallelized safely.

Phase 0 — Snapshot

Recorded START_HEAD=03f07d3, integration branch qwen, verified the working tree was clean. Created history/2026-04-28-phased-qwen-support/ with prompts/, captures/, and a manifest.md tracking each phase's state (pending → working → done-unverified → merging → merged).

history/ is gitignored and symlinked across all worktrees by workmux's setup, so the coordinator and all workers can read/write the same plan, prompts, and result sentinels without committing them.

Phase 1 — Master plan generation + review

Research first. I cloned QwenLM/qwen-code to /tmp/qwen-code and read its source directly (per CLAUDE.md's "source is ground truth" rule) to verify:

  • Config dir: ~/.qwen/settings.json (mirrors gemini)
  • Hook event names: UserPromptSubmit/PostToolUse/Stop/… — Claude-style, NOT gemini's BeforeAgent/AfterAgent
  • CLI flags: inherits --yolo, -i, -p, --continue from gemini-cli
  • Skills dir: ~/.qwen/skills/

Drafted plan.md with a 2-phase DAG (initially): integration + docs.

Plan review via consult-llm. Sent the plan + relevant existing files (gemini.rs, multiplexer/agent.rs, agent_setup/mod.rs) to gpt-5.5 with --task review, asking for Spec Check / Premortem / Plan Findings.

The review caught 4 high-severity issues:

  1. -p is headless; would kill the interactive pane. Must be -i.
  2. Notification matcher is an exact string in qwen, not a regex alternation. So "permission_prompt" alone, not "permission_prompt|elicitation_dialog".
  3. Missing sandbox surface entirely: docker/Dockerfile.qwen, KNOWN_AGENTS, DOCKERFILE_QWEN dispatch in container.rs, lima mount mapping, GitHub Actions matrix.
  4. Cargo.toml package include needed resources/qwen/**/* or cargo publish would ship without the embedded JSON.

Folded all four into a revised plan with 3 phases (core, sandbox, docs) and a Feedback Ledger documenting what was applied vs. deferred.

The DAG:

core ─┐
      ├──> docs
sandbox ─┘

core and sandbox touch disjoint files, so they run in parallel. docs waits for both.

Layer 2 — Worker spawn (workmux)

For each ready phase, the coordinator wrote prompts/<id>.md (a self-contained brief — workers can't see the coordinator's conversation), then ran:

workmux add qwen-core    -b --base qwen -P prompts/core.md
workmux add qwen-sandbox -b --base qwen -P prompts/sandbox.md

For each, workmux:

  1. Created a git worktree at ../qwen-core/ checked out to a new branch qwen-core based on the qwen branch.
  2. Ran post-create hooks (file copies, symlinks like history/, dependency setup).
  3. Created a tmux window in the background (-b) with the configured pane layout.
  4. Started a Claude Code agent in the editor pane and injected the prompt file's contents into the agent's first message via claude -i "$(cat PROMPT.md)".
  5. Wired up status hooks so the tmux window name shows 🤖 working / 💬 waiting / ✅ done.

Both workers were now running in parallel, each in its own filesystem checkout, each with its own Claude agent that had read its phase prompt and started working autonomously.

The coordinator sat in workmux wait qwen-core qwen-sandbox --any --timeout 300, blocking until any worker's status flipped to done.

Layer 3 — What each worker did (/implement)

The phase prompt instructed each worker to run /implement <description>. That skill is itself an autonomous workflow with these stages:

3a. Behavioral spec

Worker reads the acceptance criteria from its prompt and rewrites them as testable statements. For core, that meant ~8 specific assertions (enum variant exists with serde tag "qwen", QwenProfile.prompt_argument returns -i ..., Cargo.toml includes resources/qwen/**/*, etc.).

3b. Per-phase plan

Worker drafts its own implementation plan — file-by-file changes, in what order, what tests to add. This plan lives inside the worker's worktree, not in history/.

3c. Plan review (evidence-gated)

Worker fans out to external LLMs via consult-llm for a premortem ("what's most likely to break?") and an independent alternative ("how would you do it differently?"). Findings get filtered through a decision ledger: each one is either applied, deferred-with-reason, or rejected-with-reason. No silent ignoring.

3d. Implementation

Worker edits files. For core:

  • New file src/agent_setup/qwen.rs modeled line-by-line on gemini.rs but with Claude-style hook events.
  • New file resources/qwen/settings.json with the 5 hooks.
  • Edit src/agent_setup/mod.rs to add Agent::Qwen, wire it into check_all, install, name(), and update every test that enumerates the enum.
  • Edit src/multiplexer/agent.rs to add QwenProfile and register it in PROFILES.
  • Edit src/config.rs for ~/.qwen mapping.
  • Edit src/skills.rs to move qwen out of the None branch.
  • Edit Cargo.toml for the package include.

3e. Validation

Worker runs just check (cargo check + clippy + fmt + tests). If it fails, it enters a debug loop: re-read the failure, root-cause it, fix the underlying issue, re-run. It does not paper over symptoms with // TODO or --allow flags. For core, validation reported 1015 tests passing.

3f. Red-team pass (evidence-gated)

Worker sends the diff to another LLM with adversarial framing: "find the bug." Must-fix findings go through the decision ledger and get applied; the worker re-validates after.

3g. Commit

Single commit on the worker's branch (qwen-core), following the project's commit style from CLAUDE.md (lowercase, imperative, no conventional-commit prefixes, with a body explaining "why").

3h. Sentinel

Final action — the worker writes a one-line file to the shared history/ tree:

PHASE_RESULT id=core status=success commit=c6bd9e1... validation=passed

Then idles, waiting for the coordinator to send /merge --keep.

Layer 4 — Coordinator verification + merge handoff

When workmux wait returned with a worker in done state, the coordinator:

  1. Read the sentinel from the shared history/captures/<phase>.result. status=success, validation=passed — green light.
  2. Captured the agent's tail (workmux capture qwen-core -n 200) into captures/<phase>.tail for the audit trail. Useful for forensics if anything later turns out wrong.
  3. Serialized merges. The coordinator never lets two /merge calls run concurrently — they share the integration branch. If another phase was already merging, this one waits.
  4. Sent /merge --keep into the worker's tmux session via workmux send qwen-core "/merge --keep". The worker received it as if the user had typed it.

What /merge --keep did inside the worker

The /merge skill, running inside the worker's session:

  1. git fetch origin and rebase qwen-core onto the latest qwen (the integration branch). This rewrites commit SHAs, which is why the post-rebase tip is what matters for ancestry, not the pre-rebase sha in the sentinel.
  2. Fast-forward merges the rebased branch into qwen.
  3. Normally /merge would also delete the worktree, the local branch, and the tmux window. --keep suppresses that, leaving the worktree alive for the coordinator to inspect.

Coordinator ancestry verification

The coordinator then ran git rev-parse HEAD inside the (still-alive) worker worktree to get the post-rebase sha, and checked:

git merge-base --is-ancestor <post_tip> qwen

If the post-rebase sha is an ancestor of the integration branch, the merge truly landed. If not, it would mark the phase failed and halt dependents — never trust /merge exit code alone, since a silent rebase abort can look successful.

After verification, workmux remove qwen-core cleaned up the worktree, the local branch, and the tmux window in one shot.

Layer 5 — Looping the DAG

After a merge, the coordinator recomputed the ready set: phases whose status is pending and whose dependencies are all merged.

  • After core merged: sandbox was already running (independent), docs still blocked.
  • After sandbox merged: docs became ready. Coordinator wrote prompts/docs.md, spawned qwen-docs based on the now-much-newer qwen branch (which contained both prior commits), and the docs worker ran the same /implement workflow against acceptance criteria that explicitly forbade touching src/, docker/, etc.
  • After docs merged: ready set empty, nothing working — exit dispatch loop.

Final state on the qwen branch

8524df1 fix table formatting in open.md reference         ← docs phase fixup commit
000c41b document qwen-code support across README, …       ← docs phase main commit
e3fd493 add qwen sandbox image and wire through …         ← sandbox phase
c6bd9e1 add qwen-code as a first-class agent              ← core phase
03f07d3 sidebar: log fd count and watch lifecycle …       ← START_HEAD

Three phases. Three logical commits (one phase produced two — a main commit and a doc table fixup found by its own red-team pass). Each phase ran just check independently and the integration branch ran just check again at the end: 1016 tests, all green.

Why this shape rather than one big agent

  • Parallelism. core and sandbox ran simultaneously and saved wall-clock time.
  • File scope isolation. Two parallel agents can't conflict if their paths: don't overlap. Cross-phase corruption is avoided by the prompt's hard constraints, not by trust.
  • Smaller context per worker. Each worker only loads files relevant to its phase, leaving more context budget for /implement's plan + review + red-team passes.
  • Independent quality gates. Plan-review-redteam runs three times instead of once. The reviewer who would have caught the -p vs -i bug at the master-plan level is a different LLM than the one doing the per-phase review at implementation time. Bugs that slip past one model often get caught by another.
  • Hard cleanup boundary. A failed phase fails its own worktree and halts only its dependents. Other parallel phases finish, get merged normally, and the user gets a partial-success summary instead of a half-applied mess on the integration branch.

The whole run was ~15 minutes wall-clock. The coordinator's actual reasoning output was small — most of the tokens were spent inside the three worker agents doing the real engineering, in isolation, with their own review loops.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment