All four PRs respond to the same task — issue #712:
In Debug → Diagnostic info, add information about the various processes the server is running and holding in memory:
- Active filesystem watches (agents — sqlite, jsonl — sidebar tree, etc.)
- Server memory usage & uptime
- Collapse "Recent events" by default, expand on click
- Group xterm-related stuff
- Group server-related stuff separately
Each agent worked independently on the same master baseline. Below is a side-by-side comparison.
| PR | Agent | Files | +Lines | -Lines | Commits | Approach |
|---|---|---|---|---|---|---|
| #741 | GPT 5.5 (codex) | 43 | 1348 | 466 | 11 | New runtime-diagnostics package, integrations rewired, e2e test |
| #743 | Opus 4.7 | 15 | 387 | 77 | 12 | Categorical watch RPC; long refactor tail (hickey/lowy/elegance) |
| #746 | Kimi | 5 | 148 | 34 | 1 | Adds memory/uptime + section grouping; does not deliver the watches list |
| #747 | GLM | 13 | 293 | 47 | 4 | New watch-registry.ts, three sections, fact-check fix |
Carved out a brand-new packages/runtime-diagnostics workspace, then threaded a register/cleanup resource API into every integration package (anyagent, claude-code, codex, git, github, opencode) so file watches, timers, subscriptions, and SQLite handles all funnel through one registry. The client dialog was split into three component files (BrowserDiagnosticsSection, ServerDiagnosticsSection, XtermDiagnosticsSection) plus a format.ts and a useDiagnosticSnapshot.ts hook. Also added a Cucumber e2e (diagnostic-info.feature + step defs) and patched default.nix so the new package ships.
Trade-offs. The only PR that truly encapsulates the runtime-resource concern across the codebase, and the only one with an e2e. But 1.3k lines and a new package for what was scoped as a debug-dialog enhancement is heavy — and the changes to claude-code/core.ts, wal-subscription.ts, and three session-watchers each carry their own regression surface.
Introduced a one-shot server.diagnostics RPC returning a categorical view of watches — git-head per terminal, claude-transcript per active session, shared agent-external:* per provider kind — instead of trying to enumerate every fs.watch call site (the PR description explicitly calls out "instrumenting every fs.watch site would be invasive churn for modest payoff" — exactly the cost #741 paid). Reorganized the dialog into Browser / Server / Watches / Terminals / WebGL sections. Native <details> for the collapsible Recent events. <Switch>/<Match> for explicit error/loading/empty/data branching.
The commit log is the most interesting part: 12 commits, with eight tagged refactor(hickey), refactor(lowy), or refactor(police) — each a single, named structural improvement (atomic snapshot, extract captureMetrics, move pluralization off the server, narrow accessor return shape, drop redundant snapshot().server indirection, remove dead countActiveClaudeSessions). Reads like the /do workflow's quality passes actually firing.
Trade-offs. No new tests. Doesn't enumerate individual watcher handles — it's a count by category design. That's a deliberate scoping call but it does mean a future watcher kind needs a registry update.
A single squash commit. Adds a Server section with hostname, uptime, RSS + heap. Reorganizes into Server / Browser / Session / Terminals / xterm.js / WebGL groups. Collapses User agent and Recent events.
Trade-offs. Cleanest diff by far (148/-34 across 5 files, no new packages). But the issue's first bullet — "active file system watches" — is not addressed at all. It's a partial solution that nails the cosmetic asks and skips the substantive one.
Adds a server.diagnostics RPC returning memory, uptime, PID, Node version, active watch counts, and session/publisher counts. Introduces a small packages/server/src/watch-registry.ts (24 lines) — a centralized registry that git-HEAD watchers and agent session watchers register/unregister against. Three sections: Client / Server / xterm. Recent events collapsed by default.
Notable commit: fix(police): fact-check — don't unregister process-lifetime watch on terminal cleanup. The agent caught its own bug — external-change watchers are installed once per provider kind and live for the whole process; unregistering when the first installing terminal shut down would have left the registry under-counting. Catching that during self-review is the kind of thing the /do pipeline is supposed to surface.
Trade-offs. Lighter than Opus's refactor tail and lighter than codex's package split, but the watch-registry module only tracks the watchers the author remembered to wire up — no compile-time guarantee that future fs.watch callers will register.
| Requirement | #741 | #743 | #746 | #747 |
|---|---|---|---|---|
| Active FS watches | ✅ (registry across integrations) | ✅ (categorical) | ❌ | ✅ (small registry, server-only) |
| Memory + uptime | ✅ | ✅ | ✅ | ✅ |
| Collapse Recent events | ✅ | ✅ (native <details>) |
✅ | ✅ |
| Group xterm | ✅ | ✅ | ✅ | ✅ |
| Group server | ✅ | ✅ | ✅ | ✅ |
- #741 treats this as "the codebase needs a runtime-resource concept" — the most ambitious read. The new package is reusable; the cost is touching every integration.
- #743 treats this as "the dialog needs facts the server already knows" — minimal new infrastructure, categorical aggregation, wire shape stays facts-only with rendering on the client (a Lowy-style boundary).
- #747 lands between: a small registry on the server side only, no integration churn, fewer abstractions.
- #746 treats this as "polish the dialog" and stops at memory/uptime.
- #743 is the standout: 8 named refactor commits, each with a one-line why. Several reference Hickey/Lowy frameworks by name (the project ships
/hickeyand/lowyskills). Looks like the agent actually ran them. - #747 has one substantive self-found bug fix (
fact-checkcommit) and one hickey pass (deduplicate type, surface errors). - #741 has multiple refactor commits but most have terse one-line bodies; harder to audit why each landed.
- #746 is one squash commit — no visible self-review trail.
- #743 is the cleanest example: server emits
{ kind, sharedReconcilers: number }, client decides "shared across N terminal(s)". UI-layer pluralization stays out of the wire. - #741 does the moral equivalent by restricting the public
server.diagnosticsshape and keeping Claude-specific counters in an internal periodic log. - #746 ships a string
User agentand untyped sections; cosmetic only. - #747 mixes counters and identity directly; no obvious wire/UI separation issue but no explicit boundary either.
- #741: highest. Touches six integration packages and adds a new workspace member. Any regression in WAL subscription teardown or session-watcher cleanup ships under this PR.
- #743: low. Confined to
DiagnosticInfo.tsx,diagnostics.ts,meta/agent.ts, plus aterminals.tscleanup of an orphaned counter. - #746: lowest. Tiny additive change.
- #747: low–medium. New
watch-registrymodule is small, but the registration sites are scattered and there's no test.
For this issue as written, #743 (Opus 4.7) is the best fit: it answers every bullet, picks a defensible aggregation strategy, and the commit log is a model of how the project's /do and /hickey skills are supposed to compose. #747 (GLM) is a strong runner-up — pragmatic, found its own bug, lighter on refactor ceremony.
#741 (codex) would be the right answer to a different, larger question — "how should runtime resource accounting work across this codebase?" — and as a self-contained PR for a debug dialog it overshoots scope.
#746 (Kimi) ships the cosmetic half cleanly but skips the load-bearing requirement; it would need a follow-up to actually close #712.
Generated 2026-04-26 from PR metadata at the time of writing.