Skip to content

Instantly share code, notes, and snippets.

@galihcitta
Created May 13, 2026 17:28
Show Gist options
  • Select an option

  • Save galihcitta/3672a9d6763e8da0110eea89ae966d8c to your computer and use it in GitHub Desktop.

Select an option

Save galihcitta/3672a9d6763e8da0110eea89ae966d8c to your computer and use it in GitHub Desktop.
Chat Triage Agent — System Design Spec

Chat Triage Agent — System Design Spec

A copy-pasteable blueprint for an internal triage agent that watches your team chat (Slack / Discord / Lark / Teams), drafts replies, adversarially validates them, and ships bug fixes end-to-end. Built around Claude Code primitives but adaptable to any agent harness with a subagent / monitor / headless mode equivalent.


How to use this spec

This file is written so you can paste it into Claude Code (or any agent harness — Cursor, Aider, Continue, etc.) and have it build the system for your team's stack.

To use:

  1. Open your agent harness.
  2. Paste this entire file as context.
  3. Tell the agent: "Build this system for our team. Our chat platform is [Slack | Discord | Lark | Teams | other], our codebase root is [path], our team mostly speaks [language], and our investigations should read from [Grafana | Datadog | Honeycomb | other]. Adapt the Tier 0 classifier, watcher, and reply tooling accordingly. Ask me before deciding architecture-level things; default-implement everything else."
  4. Review the agent's plan before letting it write code.
  5. Build in phases (see §6). Don't start Phase N+1 until Phase N is verified.

The spec is opinionated on architecture and contracts (the what) and adapter-agnostic on tooling (the how). Follow the architecture closely; swap the adapters for whatever your team already uses.


§1 — Overview

What this builds

An internal agent that:

  • Subscribes to one or more chat channels via webhook or WebSocket.
  • Classifies every incoming message (cheap, rules-based — no LLM cost).
  • For actionable messages: spawns an investigator subagent that researches the question against your codebase + observability + git history, then drafts a reply.
  • Runs an adversarial validator subagent against every draft before a human sees it.
  • Surfaces validated drafts to the human operator for one-click approval.
  • Logs every approved reply with the full chain of evidence.
  • Optionally: opens a merge request for code fixes the investigator identified.

The human stays the final approval gate. The agent removes the coordination cost — the time senior engineers spend reading chats, switching context, drafting answers, and checking work — without removing oversight.

Why this architecture

A naive implementation runs all of this inside one long-lived LLM session. That breaks at scale:

  • The session burns 30–50k tokens per investigation (raw logs, file reads, git diffs).
  • Two concurrent investigations contaminate each other's context.
  • Hallucinations slip through because the human reviewing the draft also wrote the prompt.
  • A long investigation freezes the session for everything else.

The fix is a thin orchestrator that never reads raw evidence, dispatching work to specialized subagents with fresh context windows, with state on disk so the orchestrator carries almost nothing in memory.

Success looks like

  • Two concurrent investigations (different threads, different question types) handled without context blowup.
  • Orchestrator's per-investigation context cost is ≤2k tokens (down from ~40k in a single-session implementation).
  • Adversarial validator catches at least one hallucinated reference per week (sampling).
  • Long investigations escalate to a headless tier that streams to disk and survives session restarts.
  • Human can approve drafts in any order, mid-day, without losing thread context.

Anti-goals

  • Auto-reply without human approval in V1. Earn the privilege with data (see §7).
  • Mixing model tiers inside a single component. Investigator and validator both run on the same (capable) model. Don't tier inside; tier across components.
  • Replacing your chat platform's existing notification system. This agent attaches to the event stream, doesn't replace it.

§2 — Architecture

Three tiers, one orchestrator.

[Chat platform events]
        │
        ▼
┌─────────────────────────────────────────────────┐
│ Tier 0Watcher + Classifier (cheap, no LLM)   │
│  - WebSocket / webhook subscription              │
│  - Writes normalized events to events.ndjson     │
│  - Classifier tags each event:                   │
│    {actionable, ambient, ack}                    │
│  - ~80% of events drop here, never see an LLM    │
└─────────────────────────────────────────────────┘
        │ (only `actionable` events)
        ▼
┌─────────────────────────────────────────────────┐
│ Orchestrator (thin LLM session, ≤2k ctx/triage)│
│  - Monitors filtered event stream                │
│  - Reads/writes per-thread state files on disk   │
│  - Dispatches to Tier 1 subagents                │
│  - Never reads raw evidence                      │
│  - Surfaces validated drafts to human            │
└─────────────────────────────────────────────────┘
        │
        ├──▶ Tier 1aInvestigator subagentFresh context. Reads code, queries logs,
        │    drafts reply. Returns InvestigatorReturn JSON.
        │
        ├──▶ Tier 1bAdversarial validator subagentFresh context. Tries to break the draft.
        │    Spot-checks one evidence_ref. Returns ValidatorReturn.
        │
        └──▶ Tier 2Headless escalation (for >10min work)
             Long-running session in tmux/screen. Streams to disk.
             Survives orchestrator restart.

Key principle: the orchestrator dispatches, gates, forwards. It never opens a code file, never queries a log, never reads a git diff. All evidence stays in subagent contexts and on disk.


§3 — Components

§3.1 — Tier 0: Watcher

Responsibility: subscribe to chat events, normalize them, write to a single append-only file.

Implementation:

  • Long-running process (tmux session, systemd, or whatever your team uses for daemons).
  • Authenticates as a bot identity (so token doesn't expire).
  • Writes one JSON line per event to ~/.<agent>/events.ndjson.

Normalized event shape:

{
  "platform": "slack | discord | lark | teams | ...",
  "chat_id": "string",
  "chat_name": "string",
  "message_id": "string",
  "create_time": "RFC3339 timestamp",
  "msg_type": "text | image | file | thread_reply | ...",
  "content": "the actual text",
  "thread_id": "string | null",
  "sender": { "id": "string", "type": "user | bot" },
  "mentions": ["array of mentioned user/bot ids"]
}

Adapter notes:

  • For Slack: use Events API with app_mention, message, message.channels subscriptions.
  • For Discord: use the gateway WebSocket; bot needs MESSAGE_CONTENT intent.
  • For Lark: WebSocket subscription via lark-cli or the SDK; bot must be a member of watched chats.
  • For Teams: Graph API change notifications.

§3.2 — Tier 0: Classifier

Responsibility: tag every raw event with classification metadata before the orchestrator sees it.

Implementation: bash + regex + jq. No LLM. Cost: $0.

Process: tail events.ndjson, augment each line, write to events-classified.ndjson.

Output fields added to each event:

{
  is_bot_mention: boolean,          // bot was @-mentioned
  is_question: boolean,             // heuristic: ends with ?, contains question words
  is_ack_or_emoji: boolean,         // ack-only / emoji-only / "lgtm"
  is_internal_chatter: boolean,     // team-to-team, not directed at us
  mentions_thread_with_inflight: boolean,  // thread has an open state file
  classification: "actionable" | "ambient" | "ack",
  classifier_confidence: number,    // 0-1
  classifier_version: string,       // bumped when rules change
  classified_at: string             // RFC3339
}

Classification rules:

Pattern Sets
Message contains bot's user ID in mentions[] is_bot_mention = true
Ends with ? OR contains question words in your team's language is_question = true
Length < 30 chars AND matches ack patterns (ok, lgtm, noted, emoji-only) is_ack_or_emoji = true
thread_id matches an open state file mentions_thread_with_inflight = true
is_bot_mention OR (is_question AND NOT is_ack_or_emoji) classification = actionable
mentions_thread_with_inflight AND NOT is_ack_or_emoji classification = actionable
else if is_ack_or_emoji classification = ack
else classification = ambient

Orchestrator's monitor filter:

tail -F events-classified.ndjson | jq -c --unbuffered 'select(.classification == "actionable")'

Why this matters: in a real channel, ~70 events/day, ~10–15 worth attention. Filtering at this layer is 80%+ context reduction by itself, before any LLM gets involved. This is the single biggest win. Don't skip it.

§3.3 — Tier 1a: Investigator subagent

Responsibility: research one thread's question end-to-end, return a structured payload.

Spawn signature (Claude Code idiom — adapt for your agent harness):

Agent({
  description: "Investigate <thread_id> <topic>",
  subagent_type: "general-purpose",
  model: "<your most capable model>",
  prompt: <rubric + message + thread context + cross-investigation hints>
})

Permissions: all read tools (file read, grep, glob, bash for read-only commands), your observability MCP (Grafana, Datadog, etc.), git read commands. No write tools to source code. No ability to post to the chat directly.

Returns: InvestigatorReturn JSON (see §4.3).

Prompt template:

You are investigating a message in thread <thread_id> from <sender>:

> <full message verbatim, with translation gloss if your team's chat is non-English>

Expected answer shape: <technical_qa | status_check | data_lookup | api_contract>
Required evidence: at least one file:line reference, verified by Read in this turn.
Out of scope: broader refactor proposals, comments on adjacent code.
Tone: match the team register; no AI hedging, no excessive apology.
Confidence floor: 'high' requires you to have read the relevant code/docs THIS turn,
                  not paraphrased from training data.
Hard caps: 600 words summary, 300 words draft reply.

Cross-referencestoday's other investigations:
<bullet list of summaries from state/*.json with status != closed>

Existing memory and docs to grep before investigating:
- <path to your team's tribal-knowledge directory>
- <path to past triage logs>

Codebase root: <your codebase root>

Return JSON ONLY in the structure defined in §4.3.

§3.4 — Tier 1b: Adversarial validator subagent

Responsibility: review investigator's structured return, attack it, verify with one spot-check, return a verdict.

Spawn signature: same harness as investigator, fresh context window.

Permissions: Read only (for the one spot-check). Not given grep/bash to prevent re-running research.

Returns: ValidatorReturn JSON (see §4.4).

Prompt template:

You are reviewing an investigator's draft reply before a human sees it.

YOUR JOB IS TO BREAK THIS DRAFT, NOT TO ENDORSE IT.

This framing is load-bearing. AI reviewers default to politeness and miss issues.
Be adversarial. Find what's wrong.

Investigator's full return:
<JSON paste>

Investigator's original rubric:
<rubric>

Checks to run (all required):
1. SCHEMAdoes the return parse cleanly per §4.3?
2. SPOT-CHECKpick ONE evidence_ref. Read the file at the cited line.
   Does the cited content actually support the claim in the draft?
   Possible verdicts: supports | contradicts | fabricated | uncheckable.
3. CONFIDENCE LANGUAGE MATCHdoes the draft's language match the
   self-rated confidence? "Definitely" + "medium" = mismatch.
4. SCOPE DRIFTdoes the draft answer what was asked, or wander?
5. CROSS-INVESTIGATION CONSISTENCYdoes this contradict any open thread's
   conclusions (from state/*.json summaries)?
6. RISK GATEdoes the draft recommend code changes, customer-specific
   actions, or rollback advice? If so, confidence must be 'high'. Otherwise fail.
7. TONEdoes the draft have AI-smell (hedging, "I'd be happy to", excessive
   apology, generic disclaimers)?

Verdicts:
- `pass`forward draft to human with "✓ validated" badge
- `bounce`fixable; orchestrator re-prompts investigator with feedback (capped at 1 bounce)
- `escalate`disagreement or unrecoverable; surface to human, no auto-retry

Return JSON per §4.4.

Why "your job is to break this": without explicit adversarial framing, validators rubber-stamp. Test this: write a deliberately-bad investigator return, run both prompts, compare. The adversarial framing is the difference between catching the fabrication and missing it.

§3.5 — Tier 2: Headless escalation

Responsibility: run heavy or long investigations (>10 min expected) with streaming visibility and session-survival.

Implementation: launch a headless instance of your agent (claude --print, cursor headless, etc.) inside a tmux session. Stream stdout to a log file. Final return JSON written to disk as the last action.

Why a separate tier: Tier 1 subagents run inside the orchestrator's session. If the session dies, the investigation dies. Tier 2 detaches the investigation from the session lifecycle — the human can tmux attach to watch progress, kill the orchestrator, restart it, and the next morning brief picks up the completed escalation.

Coordination:

  • Orchestrator decides Tier 1 vs Tier 2 at dispatch time based on rubric heuristics.
  • Tier 2 session writes final return.json to ~/.<agent>/escalations/<thread_id>/return.json.
  • Orchestrator polls every ~30s. When return.json exists, treats it like a Tier 1 investigator return and runs the validator normally.

Spawn template (bash, for Claude Code):

#!/usr/bin/env bash
set -euo pipefail

THREAD_ID="$1"
RUBRIC_FILE="$2"
ESC_DIR="$HOME/.<agent>/escalations/$THREAD_ID"
mkdir -p "$ESC_DIR"

# The rubric MUST instruct the session to write final return JSON to
# $ESC_DIR/return.json as its last action.

tmux new-session -d -s "investigator-$THREAD_ID" \
  "timeout 30m claude --print --model <your-model> \"$(cat $RUBRIC_FILE)\" \
   2>&1 | tee $ESC_DIR/transcript.log; touch $ESC_DIR/.done"

Adapt the claude --print invocation for your agent harness's headless mode.

§3.6 — Orchestrator

Responsibility: glue. Dispatch, route, gate, forward. The orchestrator is not a researcher.

Implementation: a skill / system prompt / workflow definition in your agent harness. The orchestrator's behavior is defined entirely by its prompt and a few dispatch primitives:

  • Read/write per-thread state files (disk I/O, not in-context state).
  • Spawn investigator subagent.
  • Spawn validator subagent.
  • Forward draft to human with verdict badge.
  • Post approved reply via chat platform CLI/SDK.
  • Append reply to log.

The orchestrator never:

  • Opens a code file.
  • Queries a log dashboard.
  • Reads a git diff.
  • Drafts a reply itself.

If you find yourself writing orchestrator logic that does any of these, push it into a subagent.

§3.7 — Reply tooling

Responsibility: post approved replies back to the chat platform.

Adapter notes:

  • Slack: chat.postMessage API; reply-in-thread via thread_ts.
  • Discord: POST /channels/{channel.id}/messages with message_reference for replies.
  • Lark: lark-cli im +messages-reply --as bot --reply-in-thread.
  • Teams: Graph API POST /chats/{chat-id}/messages.

Always post as the bot identity (not a user). Bot tokens don't expire; user tokens do.

Reply log: append one JSON line per posted reply to ~/.<agent>/replies.ndjson:

{
  "chat_id": "string",
  "reply_to_message_id": "string",
  "posted_message_id": "string",
  "posted_at": "RFC3339",
  "reply_text": "string",
  "investigator_task_id": "string",
  "validator_verdict": "pass | bounce-then-pass | escalate-then-user-approved",
  "investigator_rounds": 1,
  "was_escalated": false,
  "triage_file": "path | null"
}

This powers the morning-brief "already replied" marker and end-of-day reporting.


§4 — Data Model

§4.1 — Per-thread state file

Path: ~/.<agent>/state/<thread_id>.json

{
  // Identity
  thread_id: string,
  chat_id: string,
  chat_name: string,

  // Origin
  original_message_id: string,
  original_sender_id: string,
  original_sender_name: string,
  question_type: "technical_qa" | "status_check" | "data_lookup" | "api_contract" | "other",

  // Status machine
  status: "investigating" | "awaiting-validation" | "pending-user" | "bounced-round-1" | "escalated" | "closed",
  status_history: Array<{ at: string, from: string, to: string }>,

  // Subagent tracking
  rubric: string,
  investigator_task_id: string | null,
  investigator_round: number,                // 1 or 2; >2 means escalated to user
  investigator_return: InvestigatorReturn | null,
  validator_task_id: string | null,
  validator_return: ValidatorReturn | null,

  // Tier 2
  is_escalated: boolean,
  escalation_tmux_session: string | null,
  escalation_transcript_path: string | null,

  // Approval
  draft_pending: string | null,
  user_approved_at: string | null,
  posted_message_id: string | null,
  triage_file_path: string | null,

  // Timestamps
  started_at: string,
  last_event_at: string,
  closed_at: string | null
}

Atomic write rule: always write tmp.json && mv tmp.json <thread_id>.json to avoid torn reads.

Access rules:

  • Orchestrator reads on every event in a thread.
  • Orchestrator writes on every status transition.
  • Subagents are READ-ONLY — the file path is passed in their prompt as context. They must not write back.

§4.2 — Classified event

See §3.2 for the augmented event schema.

§4.3 — InvestigatorReturn

Returned by Tier 1 investigator OR Tier 2 escalation (same shape).

{
  // Self-assessment
  confidence: "high" | "medium" | "low",
  confidence_reason: string,                 // 1 sentence

  // Outputs
  summary_for_orchestrator: string,          // ≤2 sentences
  draft_reply: string,                       // ≤300 words, ready to post
  draft_language: "id" | "en" | "mixed" | "<your team's code>",

  // Evidence
  evidence_refs: Array<{
    kind: "file" | "log_query" | "git_commit" | "external_doc" | "memory" | "triage_file",
    ref: string,                             // "path/to/file.js:103-106" or query string or commit SHA
    supports_claim: string                   // 1 sentence
  }>,

  // Optional
  proposed_triage_file: {
    filename: string,
    content: string
  } | null,
  open_questions: string[],

  // Escalation signals
  escalation_requested: boolean,
  escalation_reason: string | null,

  // Meta
  investigator_round: number,
  research_notes: string                     // ≤500 words, NOT for the chat reply
}

Hard caps enforced by the prompt:

  • summary_for_orchestrator: ≤2 sentences
  • draft_reply: ≤300 words
  • research_notes: ≤500 words
  • evidence_refs: ≤8 items

§4.4 — ValidatorReturn

{
  verdict: "pass" | "bounce" | "escalate",
  reasons: string[],

  // Spot-check (the load-bearing thing)
  spot_check_ref: string,                    // which evidence_ref was sampled
  spot_check_result: "supports" | "contradicts" | "fabricated" | "uncheckable",
  spot_check_note: string,                   // 1 sentence

  // Categorical assessments
  schema_check: "ok" | "fail",
  confidence_language_match: "match" | "mismatch",
  scope_drift: "none" | "minor" | "major",
  cross_investigation_consistency: "consistent" | "contradicts_<thread_id>" | "no_overlap",
  risk_gate_check: "passes" | "needs_high_confidence" | "fails",
  tone_assessment: "matches" | "off" | "ai_smell",

  // Bounce feedback (only if verdict == "bounce")
  bounce_feedback: string | null,

  // Meta
  validator_model: string,
  validated_at: string
}

Verdict rules:

  • pass requires: schema_check == ok AND spot_check_result in (supports, uncheckable) AND confidence_language_match == match AND risk_gate_check != fails AND tone_assessment != ai_smell.
  • bounce is the default when any above fail BUT the issue is fixable with another round.
  • escalate is when the issue is not fixable or beyond validator's authority.

§4.5 — Configuration

# Watcher
watched_chats:
  - chat_id: <your_chat_id_1>
    name: <human-readable name>
  - chat_id: <your_chat_id_2>
    name: <human-readable name>

# Classifier
classifier:
  bot_id: <your_bot_user_id>
  ack_patterns:
    - "^(ok|noted|lgtm|looks good|👍|🙏)\\W*$"
  question_keywords:
    - <your team's language: question words>

# Subagent defaults
investigator:
  model: <your most capable model>
  max_rounds: 2
  hard_cap_runtime_min: 5

validator:
  model: <same as investigator>
  max_rounds: 1

# Escalation
escalation:
  expected_runtime_min_threshold: 10
  enabled: true
  transcript_dir: ~/.<agent>/escalations

§5 — Orchestrator workflow

The orchestrator runs the following loop for every actionable event:

  1. Read state. Open or create state/<thread_id>.json.
  2. First-call-of-day log check. If this is the first triage of the session, grep your team's tribal-knowledge directory for context that might already answer the question.
  3. Surface the message to the human in plain text. Translate inline if your team's chat isn't English; show original then EN gloss.
  4. Decide question type. Technical / status / data-specific / api-contract. Different types get different rubrics.
  5. Ask the human for context if the answer hinges on info not in code or message — past decisions, who's asking, scope-specific conventions. Don't draft replies that rest on guessed premises.
  6. Spawn investigator subagent with rubric, message, cross-investigation hints from open state files. Wait for return.
  7. Spawn validator subagent with investigator's full return + original rubric. Wait for verdict.
  8. Branch on verdict:
    • pass → surface draft to human with "✓ validated" badge. On human approve: post reply, log, optionally save triage file.
    • bounce → re-prompt investigator with bounce_feedback (round 2, capped). Re-validate.
    • escalate → surface disagreement to human, no auto-retry.
  9. Close state file with status: closed after reply is posted or human dismisses.

The bounce-once rule

After one bounce, escalate to the human. If round 2 doesn't fix the issue, the agents are deadlocked — more rounds just rationalize. Surface to the human, let them decide.

[Investigator R1] ──▶ [Validator] ──pass──▶ [Human approval]
                          │
                          ├──bounce──▶ [Investigator R2 + feedback] ──▶ [Validator]
                          │                                                  │
                          │                                                  ├──pass──▶ [Human approval]
                          │                                                  └──any──▶ [Surface to human]
                          │
                          └──escalate──▶ [Surface to human]

§6 — Implementation phases

Four phases. Each phase is independently deliverable. Don't start phase N+1 until phase N is verified.

Phase 1 — Watcher + Classifier (0.5 day)

Goal: drop ambient/ack events without changing the watcher.

Deliverables:

  1. Watcher process subscribing to chat platform, writing events.ndjson.
  2. Classifier script tailing events.ndjsonevents-classified.ndjson.
  3. Config file with watched chat IDs, bot ID, ack patterns, question keywords.
  4. Health check script reporting watcher + classifier liveness.

Verification:

  • Run for 24h. Confirm actionable is <30% of total event volume.
  • Spot-check 20 random ambient/ack events — none should be genuinely actionable.
  • Spot-check 20 random actionable events — all should be worth attention.
  • Reboot mid-day; verify watcher + classifier auto-restart.

Phase 2 — Investigator subagent + state files (1–2 days)

Goal: orchestrator stops reading raw evidence; all research moves to a subagent.

Deliverables:

  1. Investigator prompt template per §3.3.
  2. State schema per §4.1; state/ directory created at runtime.
  3. Orchestrator's main loop: read/create state → spawn investigator → wait → update state → forward draft (no validator yet).
  4. Reply tooling per §3.7 (post bot reply + append to replies.ndjson).

Verification:

  • Real triage end-to-end. Orchestrator context per investigation <5k tokens (measure manually).
  • Two concurrent triages, different threads. Both complete; drafts not swapped.
  • Kill orchestrator mid-investigation. Restart. Confirm state file lets next session resume.

Phase 3 — Adversarial validator (1 day)

Goal: drafts are validated before reaching the human.

Deliverables:

  1. Validator prompt template per §3.4 — with explicit "your job is to break this" framing.
  2. Validator subagent spawn after investigator returns.
  3. Bounce-once-then-escalate flow per §5.
  4. Augment replies.ndjson writer to include validator_verdict, investigator_rounds.

Verification:

  • Synthetic test: feed investigator a fabricated rubric ("claim that path/to/foo.js:9999 contains X"). Confirm validator catches via spot-check.
  • Run for one week. Track pass / bounce / escalate counts.
  • Sanity-check: if 100% pass rate over 7 days, the validator's prompt isn't adversarial enough — tighten it.

Phase 4 — Headless escalation (0.5 day)

Goal: heavy investigations stream + survive session restarts.

Deliverables:

  1. Escalation spawn script per §3.5.
  2. Orchestrator logic: decide Tier 1 vs Tier 2 at dispatch; poll for return.json; treat as investigator return when present.
  3. Documented one-liner for human to tmux attach and watch live.

Verification:

  • Trigger a heavy investigation. Tmux session exists; transcript grows; return.json appears at end.
  • Kill orchestrator during escalation. Wait. Restart. Confirm next session picks up the completed escalation and runs validator.

Total effort: 3.5–4.5 days of focused work.


§7 — Acceptance criteria

Falsifiable. Don't claim done until each one passes.

# Criterion How to measure
AC-01 Classifier reduces orchestrator event volume by ≥60% Over 7 days, actionable events ≤40% of total
AC-02 Orchestrator context per investigation ≤2k tokens (median) Instrument skill; measure across 20+ investigations
AC-03 Two concurrent triages complete without state confusion Synthetic test: dispatch two within 5min; verify drafts not swapped
AC-04 Validator catches fabricated reference 3 synthetic tests (bad path, bad line, bad content); all must trigger non-pass
AC-05 Validator does not become a rubber stamp Over 7 days, first-round pass rate between 60–90%
AC-06 Bounce feedback drives real improvement For next 10 bounces, ≥7 round-2 outputs show specific fixes (not rationalization)
AC-07 Escalation completes and writes return.json Trigger one escalation; verify lifecycle end-to-end
AC-08 Human can attach to running escalation tmux attach shows live output
AC-09 Session-boundary survival Kill orchestrator during escalation; restart; next brief picks it up
AC-10 Human approval throughput unchanged or improved Median time-from-message-to-reply ≤ pre-rollout + 30s
AC-11 No silent triage misses Manually scan 7 days of ambient/ack; ≤1 mis-classification per 100 events

When to flip on V2 auto-reply (optional, separate work)

This spec defaults to human-in-the-loop on every reply. If you want to add narrow auto-reply on explicit @-mentions later:

Hard preconditions (ALL must be true):

  1. is_bot_mention == true (explicit invitation, not ambient).
  2. Investigator confidence == "high".
  3. Validator verdict == "pass" on first round.
  4. Validator risk_gate_check == "passes".
  5. question_type ∈ {technical_qa, api_contract, data_lookup} — never status_check, never production-incident, never account-specific.
  6. Within working hours.
  7. Audit notification to the human after every auto-reply.

Maturity gate: validator false-positive rate ≤2% over 4 consecutive weeks, zero false-positives on production-touching drafts. If FP rate creeps above 2%, revoke the privilege and go back to full gate. Don't argue with the data.


§8 — Adapter notes by stack

Chat platforms

Platform Subscribe via Bot identity Reply tooling
Slack Events API (message, app_mention) Bot token chat.postMessage w/ thread_ts
Discord Gateway WebSocket (MESSAGE_CONTENT intent) Bot token POST /channels/{id}/messages w/ message_reference
Lark WebSocket subscription Bot app + tenant token im.message.reply
Teams Graph change notifications Bot Service / App registration Graph POST /chats/{id}/messages
Mattermost Webhooks + WebSocket Bot user POST /posts

Codebase research

What investigator needs Adapter
Read files Built-in to most agent harnesses
Grep rg (ripgrep) — fastest
Git log/blame Standard git CLI
Symbol search Tree-sitter via grep, or LSP if available

Observability

Tool Subagent access
Grafana / Loki Grafana MCP
Datadog Datadog MCP
Honeycomb Honeycomb MCP
Splunk CLI wrapper
Sentry Sentry MCP

If your observability is behind a SaaS dashboard with no API access, that's a Tier 2 escalation candidate — the headless session can use a browser automation tool to query it.

Agent harnesses

Harness Subagent primitive Headless mode
Claude Code Agent({ subagent_type, model }) claude --print
Cursor Composer agent Cursor CLI
Continue Custom agents Continue CLI
Aider Subprocess (aider --message-file) Built-in
Roo Code / Cline Subagent extensions Headless mode

§9 — Tasks for your agent

If you're handing this spec to an agent to build, execute the tasks below in order. Don't skip phases.

Phase 1 — Watcher + Classifier:

  1. Identify the chat platform we're targeting. Confirm authentication approach (bot token, OAuth, etc.).
  2. Build the watcher process. Output: ~/.<agent>/events.ndjson, one line per normalized event. Match the schema in §3.1.
  3. Build the classifier script. Output: ~/.<agent>/events-classified.ndjson with augmented fields per §3.2.
  4. Build a health-check script that reports watcher liveness, classifier liveness, last classified-at timestamp.
  5. Run for 24h. Report actionable / ambient / ack counts. If actionable > 30%, tighten classifier rules.

Phase 2 — Investigator + state:

  1. Write the investigator prompt template per §3.3, with placeholders for codebase root, observability MCP, and team-language tone notes.
  2. Define the state file schema per §4.1. Create the state/ directory at runtime.
  3. Build the orchestrator's main loop per §5 (steps 1–6 only — no validator yet).
  4. Build the reply-tooling adapter per §3.7. Confirm it can post a test message as the bot.
  5. Run a real triage end-to-end. Measure orchestrator context cost. Should be <5k tokens.

Phase 3 — Validator:

  1. Write the validator prompt template per §3.4. Include the "your job is to break this" framing explicitly.
  2. Add the validator spawn after investigator returns. Implement bounce-once-then-escalate per §5.
  3. Augment replies.ndjson writer per §3.7.
  4. Run three synthetic tests per AC-04 (fabricated file path, bad line number, bad content). Validator must catch all three.

Phase 4 — Escalation:

  1. Write the headless spawn script per §3.5, adapted for your agent harness's headless mode.
  2. Implement Tier 1 vs Tier 2 decision logic in the orchestrator.
  3. Test session-boundary survival per AC-09.

Ship checklist:

  1. Verify all 11 acceptance criteria in §7.
  2. Document the bot's onboarding for new team members.
  3. Set up weekly metrics review: classifier mis-classification rate, validator pass/bounce/escalate breakdown, escalation count.

§10 — Why this works

A few principles, ordered by load-bearing-ness:

  1. Pre-filter cheap, then think expensive. Regex+jq drops 80% of context drain for $0. This is the single biggest win in the whole architecture. Don't skip Tier 0.

  2. Orchestrator + specialized subagents > one omniscient agent. Fresh context windows = independent thinking = catch different blind spots. Same idea borrowed from PR-review patterns where reviewers have explicit roles.

  3. Adversarial framing is load-bearing. "Review this" and "your job is to break this" produce qualitatively different findings. AI reviewers default to politeness — fight that with the prompt explicitly.

  4. State on disk, not in context. Per-thread JSON makes concurrency and session-restart-survival possible. The orchestrator carries dispatch glue, nothing more.

  5. Earn auto-actions with evidence. Human-gated first. Measure validator reliability. Unlock narrow auto-paths (V2 @-mention auto-reply) only when the data supports it. Default-deny.


License

Use it. Adapt it. Build something better. No attribution required, but if you do build with this, I'd love to hear what you changed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment