Based on: Rupert's OTM Spec v1.0 (2026-03-12) Updated by: Claudia (2026-03-12–14) Merged by: Claudia (2026-03-14) — consolidates OTM-SPEC v1.5 + OTM-FIELD-MAPPING v1.8
Note: v1.0–v1.5 changes were tracked in
OTM-SPEC. v1.0–v1.8 field mapping changes were tracked in a separateOTM-FIELD-MAPPINGdocument. Both changelogs are merged here.
| Version | Date | Source | Changes |
|---|---|---|---|
| v1.0 | 2026-03-12 | OTM-SPEC | Initial specification: state machine, SURE protocol, SE-1/SW-1, OTM-0 through OTM-6, actor registry, audit trail |
| v1.0 | 2026-03-14 | OTM-FIELD-MAPPING | Initial field mapping: script params, JSON format, Slack field mapping |
| v1.1 | 2026-03-12 | OTM-SPEC | Added rework flow (TT-08), Rejected state, Orchestrator manages subtasks |
| v1.1 | 2026-03-14 | OTM-FIELD-MAPPING | Added subtask support, task ID system (T-NNNNN) |
| v1.2 | 2026-03-12 | OTM-SPEC | Added Watchdog (OTM-6), archival (TT-12), Pending state (TT-03/TT-05) |
| v1.2 | 2026-03-14 | OTM-FIELD-MAPPING | Added pipeline flow diagram, directories |
| v1.3 | 2026-03-12 | OTM-SPEC | Removed TT-09, added Priority Scale, human actor type. Resolved Q5: agents call OTM API for subtask completion (not Slack directly). |
| v1.3 | 2026-03-14 | OTM-FIELD-MAPPING | Added completion detection (System 4) |
| v1.4 | 2026-03-12 | OTM-SPEC | Added Error Monitor (OTM-7), error catalogue (ERR-01 through ERR-12), dual-DB design |
| v1.4 | 2026-03-14 | OTM-FIELD-MAPPING | Added component heartbeats (System 5) |
| v1.5 | 2026-03-12 | OTM-SPEC | Human notifications via Slack task conversation (not DM). Error reports daily, threshold=1. Error monitoring in separate otm-errors.db. Only Orchestrator creates/deletes tasks. subtasks_remaining = count of unfinished subtasks. Added §12 Cost Analysis. SURE timeouts: 1+2min. Gateway restart detection (§4.7). Log file mirroring. task-orchestration skill added as deliverable. |
| v1.5 | 2026-03-14 | OTM-FIELD-MAPPING | Default confirmation subtask; deprecated fields cleanup |
| v1.6 | 2026-03-14 | OTM-FIELD-MAPPING | Added DMZ relay architecture (System 6), todo_completed (Col00) documentation, project label fixes |
| v1.7 | 2026-03-14 | OTM-FIELD-MAPPING | Added task dispatcher (System 3), task updates (System 3b), full architecture diagram, completion metrics, status lifecycle |
| v1.8 | 2026-03-14 | OTM-FIELD-MAPPING | Dispatcher triggers agents via Gateway WebSocket RPC (parallel, non-blocking); crash-safe operation order (file → Slack → RPC); idempotencyKey is NOT idempotent (design flaw documented); todo_completed checkbox (Col00) documentation; injector idempotency (dedup check); Slack archive API limitation noted |
| v1.8 | 2026-03-14 | MERGED | Consolidated OTM-SPEC v1.5 + OTM-FIELD-MAPPING v1.8 into single unified specification. Task ID format standardized to T-NNNNN (v1.8 implementation). Added Part 3 (File-Based Pipeline). All Slack column IDs included. |
The OpenClaw Task Manager (OTM) orchestrates the execution of tasks by AI and human agents. It uses Slack Lists as the task board, the Slack Events API as the event bus for human-originated changes, and the OTM API as the interface for AI actors.
The system is split into three cooperating layers:
- Slack Event Layer (SE-1) — Slack Events API (
list_item_updated) detected by our Slack app in socket mode. Zero intelligence. Forwards raw events to a single OTM entry point. Requires onlylists:readscope. - Slack Write Layer (SW-1) — Handles all writes to Slack: task field updates and conversation feed audit entries. Requires
lists:writescope. Called only by the OTM. - OpenClaw Task Manager (OTM) — Authoritative backend implemented as an OpenClaw plugin pipeline. Owns all state, routing logic, agent registry, and business rules. Persists to SQLite. Sole component with the ability to change task status, agent status, and counter fields.
📌 Design principles:
- ALL behaviour lives in the OTM. No business logic in SE-1 or SW-1.
- The OTM is the sole writer of task status and related fields. No actor writes status directly.
- Only the Orchestrator creates tasks and subtasks. The OTM never creates or deletes tasks — it manages their lifecycle after creation. The Orchestrator also manages subtask lists (creates, deletes, keeps completed) during rework flows; the OTM processes the resulting state changes.
- ALL events are logged in the Slack task conversation feed with timestamps (§7).
- Agent notifications use the SURE protocol: request + mandatory acknowledgement (§6).
| Actor | Current holder | Type | Role |
|---|---|---|---|
| Orchestrator | Claudia | AI | Creates tasks, sets assignments, validates completed work — always via OTM API |
| Agent | Devdas, Salvatore, etc. | AI | Executes tasks, reports progress directly to OTM API |
| Human | Rupert, clients | Human | Creates/edits tasks in Slack UI; changes detected by SE-1 and forwarded to OTM |
| OTM | (system) | System | Authoritative state machine, sole writer of all status and counter fields |
| Watchdog | OTM-6 cron | System | Recovery cron — detects anomalies, requests OTM to execute corrective transitions |
📌 Humans are identified by their Slack user ID. AI agents are identified by both their Slack user ID and their OpenClaw agent ID. Both types are managed in the same agent registry (§4.5).
The OTM is composed of six cooperating systems, from task creation to dashboard visibility:
┌─────────────────────────────────────────────────────────────────────────────────┐
│ OTM — One-Time Mission System │
│ │
│ ┌──────────────────────────────────────────────────────────────────────────┐ │
│ │ SYSTEM 1: Task Creation │ │
│ │ │ │
│ │ Claudia (orchestrator) │ │
│ │ │ │ │
│ │ ▼ │ │
│ │ otm-create-task.sh ──► JSON file ──► ~/…/otm/new-tasks/ │ │
│ │ │ │ │
│ │ ▼ │ │
│ │ next-task-id.json (T-NNNNN counter, flock) │ │
│ └──────────────────────────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌──────────────────────────────────────────────────────────────────────────┐ │
│ │ SYSTEM 2: Task Injection │ │
│ │ │ │
│ │ ai.openclaw.otm-watcher (WatchPaths) │ │
│ │ ai.openclaw.otm-sweeper (every 10 min) │ │
│ │ │ │ │
│ │ ▼ │ │
│ │ otm-injector.js ──► Slack Lists API ──► Rapido Task Campaign │ │
│ │ │ (slackLists.items.create + subtasks) │ │
│ │ ▼ │ │
│ │ processed/ or failed/ │ │
│ └──────────────────────────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ (task exists in Slack with status=new) │
│ ┌──────────────────────────────────────────────────────────────────────────┐ │
│ │ SYSTEM 3: Task Dispatcher │ │
│ │ │ │
│ │ ai.openclaw.otm-dispatcher (every 2 min) │ │
│ │ │ │ │
│ │ ├──► Scan: status=new + assignee set │ │
│ │ ├──► Write task-dispatch.json to agent workspace │ │
│ │ ├──► Trigger agent via Gateway WS RPC (parallel) │ │
│ │ └──► Update Slack: new → assigned + set assigned_at │ │
│ └──────────────────────────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ (agent session starts, picks up task, works, reports progress) │
│ ┌──────────────────────────────────────────────────────────────────────────┐ │
│ │ SYSTEM 3b: Task Updates (agent → Slack feedback) │ │
│ │ │ │
│ │ otm-update-task.sh ──► JSON ──► ~/…/otm/task-updates/ │ │
│ │ │ │ │
│ │ ▼ │ │
│ │ otm-injector.js ──► Slack Lists API (update status, subtask done) │ │
│ └──────────────────────────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ (all subtasks done → auto-promote) │
│ ┌──────────────────────────────────────────────────────────────────────────┐ │
│ │ SYSTEM 4: Completion Detection │ │
│ │ │ │
│ │ ai.openclaw.otm-completion-detector (every 2 min) │ │
│ │ │ │ │
│ │ ├──► Scan: status=in_progress + all subtasks done │ │
│ │ ├──► Update: completion % + subtasks_remaining │ │
│ │ └──► Promote: in_progress → agent_done │ │
│ └──────────────────────────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ (Claudia validates → done) │
│ ┌──────────────────────────────────────────────────────────────────────────┐ │
│ │ SYSTEM 5: Component Heartbeats │ │
│ │ │ │
│ │ Each component writes *-state.json after every run │ │
│ │ │ │ │
│ │ ▼ │ │
│ │ Collector (FSEvents) → SQLite → Reader (WebSocket) → Dashboard │ │
│ └──────────────────────────────────────────────────────────────────────────┘ │
│ │
│ ┌──────────────────────────────────────────────────────────────────────────┐ │
│ │ SYSTEM 6: DMZ Relay │ │
│ │ │ │
│ │ Collector ──HTTP POST──► Synology Receiver (127.0.0.1:3456) │ │
│ │ │ │ │
│ │ ▼ │ │
│ │ fab-state.json │ │
│ │ │ │ │
│ │ ▼ │ │
│ │ Broadcaster (0.0.0.0:3457) │ │
│ │ │ │ │ │
│ │ WSS /ws GET /api/state │ │
│ │ │ │ │ │
│ │ ▼ ▼ │ │
│ │ Vercel Dashboard (browser) │ │
│ └──────────────────────────────────────────────────────────────────────────┘ │
│ │
│ ┌──────────────────────────────────────────────────────────────────────────┐ │
│ │ Task Lifecycle │ │
│ │ │ │
│ │ new ──► assigned ──► in_progress ──► agent_done ──► done │ │
│ │ │ │ │ │ │ │
│ │ │ (dispatcher) (agent starts) (completion (Claudia │ │
│ │ │ detector) validates) │ │
│ │ │ │ │
│ │ └──► blocked (can happen at any stage) │ │
│ └──────────────────────────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────────────────────┘Each task is a top-level item in the Slack Task Board List. Subtasks are child items (linked via parent_item_id). The OTM is the sole writer of status and counter fields.
Slack List: F0ALE8DCW1F (Rapido Task Campaign v2), workspace: rapidocloud.slack.com
| Field | Type | Writer | Description |
|---|---|---|---|
| title | text | Orchestrator | Task name |
| task_id | text | OTM | Unique ID (T-NNNNN format — 5 digits, zero-padded) |
| assigned_to | person | Orchestrator | Slack user ID of assigned agent (AI or Human) |
| status | select | OTM | Current task state (see §3). OTM is sole writer — no exceptions |
| previous_status | select | OTM | Status before the last transition. Critical for failure analysis |
| priority | number | Orchestrator | 0=Critical … 4=Batchable (see §3.4) |
| context | select | Orchestrator | project, research, operations, support, internal |
| subtasks_remaining | number | OTM | Decremented counter, NOT live count |
| assigned_at | datetime | OTM | When agent started work |
| completed_at | datetime | OTM | When all subtasks done |
| validated_at | datetime | OTM | When Orchestrator validated |
| result_summary | text | Agent | Deliverables/output description |
| input_files | text | Orchestrator | Links to input resources |
| Field | Type | Writer | Description |
|---|---|---|---|
| title | text | Orchestrator | Subtask description |
| todo_completed | checkbox | OTM (via SW-1) | Built-in Slack Lists checkbox (Col00). Ticked when subtask is done. Must be set alongside status when marking items as done (see below). |
| parent_item_id | reference | System | Links to parent task |
📌 subtasks_remaining on the parent task is the canonical completion signal — not a live count.
📌 previous_status is set by the OTM on every transition. It enables post-mortem analysis when a task enters Failed state.
📌 Agents do NOT tick checkboxes directly in Slack. They report subtask completion to the OTM API (IE-02). The OTM then updates Slack via SW-1 — setting both the item status column and Col00 (checkbox).
The todo_completed field (Col00) is the built-in Slack Lists checkbox. It drives the visual checkmark ✅ in the Slack UI. Setting only the Status column to done does NOT check the box — both must be set explicitly.
| Scenario | Action |
|---|---|
Subtask marked done |
Set Col00: checkbox: true on the subtask |
Parent task marked done |
Set Col00: checkbox: true on the parent |
Parent task at agent_done or in_progress |
Do NOT set checkbox (task isn't finished yet) |
slackLists.items.archive does not exist as an API method. Archiving is manual-only via the Slack UI. The OTM sets status to archived via the pipeline, but the actual Slack archive action cannot be automated.
This table is operationally critical — it maps the JSON task format (used in the file pipeline) to Slack column IDs.
| JSON Field | Slack Column | Column ID | Slack Type | Notes |
|---|---|---|---|---|
title |
Title | Col0AKKTBJJKZ |
rich_text |
Clean one-liner only |
taskId |
Task ID | Col0ALVK2NA1E |
rich_text |
Format: T-NNNNN (5 digits, zero-padded) |
type |
Type | Col0AKUV4BF6F |
select |
action | decision | review |
agent |
Assignee | Col0AKZ9G5UAJ |
select |
Covers both agents and humans |
project |
Project 2 | Col0ALZBS9C8Z |
rich_text |
Free-text (migrated from select 2026-03-14) |
priority |
Priority | Col0ALE8DKWPK |
select |
See §3.4 priority mapping |
| (auto) | Status | Col0AL1B4UVLJ |
select |
Always set to new on creation |
subtasks[] |
(child items) | — | parent_item_id |
Each entry → child item with title + status new |
Slack Built-in Fields:
| Field | Column ID | Type | Notes |
|---|---|---|---|
todo_completed |
Col00 |
checkbox |
Built-in Slack Lists checkbox. Must be set explicitly alongside status. |
Fields NOT mapped to Slack columns (metadata only):
| JSON Field | Purpose |
|---|---|
id |
UUID for file tracking / idempotency |
createdAt |
Timestamp, implicit in Slack item creation |
status |
Internal pipeline status (pending → processed) |
Deprecated Slack Columns:
| Item | Column ID | Notes |
|---|---|---|
| Project (old select column) | Col0AL4UJ8BJ8 |
Replaced by Project 2 text column (2026-03-14) |
The status field follows these transitions. The OTM is the sole writer — all transitions are executed by the OTM, regardless of which actor requested them.
| Status | Description |
|---|---|
| New | Task created, not yet assigned |
| Assigned | Orchestrator has set an assignee; OTM evaluating agent availability |
| Pending | Agent is busy; task queued silently (no notification) |
| In Progress | Agent is actively working (SURE acknowledgement received) |
| Agent Done | All subtasks complete; awaiting Orchestrator review |
| Done | Orchestrator validated the work |
| Rejected | Orchestrator rejected; Orchestrator preparing rework subtasks |
| Failed | Unrecoverable error during execution |
| Cancelled | Task no longer needed; removed from active work |
| Archived | Terminal state; auto-moved 7 days after Done/Cancelled |
Orchestrator creates task
|
[New]
|
TT-01: Orch requests assignment → OTM executes
|
[Assigned]
/ \
TT-02: OTM TT-03: OTM
(IE-01 + agent (IE-01 + agent
idle in reg) busy in reg)
| |
[In Progress] [Pending]
(after SURE ack) |
| TT-05: OTM promotes
TT-04: OTM receives (IE-09 + pending
IE-02 subtask reports task found)
| |
subtasks_remaining=0 |
| |
[Agent Done] <------/
/ \
TT-06: Orch TT-07: Orch
validates rejects
(IE-04) (IE-05)
| |
[Done] [Rejected]
| / \
TT-12: OTM TT-08 TT-10
(IE-08+7d) (IE-06) (IE-06)
| | |
[Archived] [Assigned] [Cancelled]
(rework) (drop)
At any point before Agent Done:
TT-11: Orch/Human requests cancel (IE-06/IE-01) → OTM executes
TT-12: Watchdog requests archive (IE-08 + 7d check) → OTM executes
TT-13: OTM detects error (IE-09) → [Failed]
TT-14: Orch requests retry (IE-07) → [New]
TT-15: Orch/Human requests cancel (IE-06/IE-01) → [Cancelled] (from Failed)Each task transition is coded TT-xx. All transitions are executed by the OTM. The "Requesting Actor" is who initiates; the OTM validates and applies.
Orchestrator-requested transitions (executed by OTM):
| Code | From → To | Requesting Actor | Inbound Event | OTM Action | Outbound Event |
|---|---|---|---|---|---|
| TT-01 | New → Assigned | Orchestrator | IE-01: assigned_to field changed |
Validate assignment, set status | OE-06, OE-07 |
| TT-06 | Agent Done → Done | Orchestrator | IE-04: validate API call | Set validated_at, change status |
OE-06, OE-07, OE-02 |
| TT-07 | Agent Done → Rejected | Orchestrator | IE-05: reject API call | Change status, post reason | OE-06, OE-07 |
| TT-08 | Rejected → Assigned | Orchestrator | IE-06: rework API call (subtasks already prepared by Orchestrator) | Count unfinished subtasks, set counter, change status | OE-06, OE-07, OE-01 |
| TT-10 | Rejected → Cancelled | Orchestrator/Human | IE-06: cancel API / IE-01: Human Slack edit | Change status | OE-06, OE-07 |
| TT-11 | Any (pre-Agent Done) → Cancelled | Orchestrator/Human | IE-06: cancel API / IE-01: Human Slack edit | Free agent, change status | OE-06, OE-07, OE-04 |
| TT-14 | Failed → New | Orchestrator | IE-07: retry API call | Reset task, change status | OE-06, OE-07 |
| TT-15 | Failed → Cancelled | Orchestrator/Human | IE-06: cancel API / IE-01: Human Slack edit | Change status | OE-06, OE-07 |
OTM-initiated transitions (automated):
| Code | From → To | Inbound Event (trigger) | OTM Action | Outbound Event |
|---|---|---|---|---|
| TT-02 | Assigned → In Progress | IE-01: status changed to Assigned (OTM-0 routes to OTM-2, agent registry query returns Idle) |
Execute AT-01, write counter, send SURE notification | OE-01, OE-06, OE-07 |
| TT-03 | Assigned → Pending | IE-01: status changed to Assigned (OTM-0 routes to OTM-2, agent registry query returns Busy) |
Queue task silently | OE-06, OE-07 |
| TT-04 | In Progress → Agent Done | IE-02: agent reports subtask done (OTM-3 decrements counter to 0) | Execute AT-02, set completed_at |
OE-02, OE-06, OE-07 |
| TT-05 | Pending → Assigned | IE-09: OTM internal — check_next_task_for_agent() finds pending task after AT-02/AT-03/AT-04 |
Re-evaluate via OTM-2 path | OE-06, OE-07 |
| TT-13 | In Progress → Failed | IE-10: OTM detects agent error (OpenClaw hook timeout >5min / agent crash / unhandled exception reported) | Store previous_status, execute AT-04 |
OE-05, OE-06, OE-07 |
Watchdog-requested transitions (executed by OTM):
| Code | From → To | Inbound Event (trigger) | OTM Action | Outbound Event |
|---|---|---|---|---|
| TT-12 | Done/Cancelled → Archived | IE-08: watchdog cron tick (OTM-6 checks completed_at or cancelled_at + 7 days < now) |
Archive task | OE-06, OE-07 |
📌 Cross-reference: See §5 for TT-xx ↔ AT-xx ↔ handler mapping. See §6 for SURE protocol. See §7 for audit trail.
All transitions are submitted to the OTM which validates preconditions and executes the state change. Every transition produces at minimum OE-06 (Slack field update) and OE-07 (audit log entry).
| Code | From | To | Inbound Event | OTM Action | Outbound Event |
|---|---|---|---|---|---|
| TT-01 | New | Assigned | IE-01: assigned_to changed on New task |
Validate agent exists in registry, set status | OE-06, OE-07 |
| TT-02 | Assigned | In Progress | IE-01: status=Assigned detected + agent Idle in registry | Set agent busy (AT-01), init counter, send SURE task notification | OE-01, OE-06, OE-07 |
| TT-03 | Assigned | Pending | IE-01: status=Assigned detected + agent Busy in registry | Queue task, no notification | OE-06, OE-07 |
| TT-04 | In Progress | Agent Done | IE-02: subtask completion report + counter decrements to 0 | Set agent idle (AT-02), set completed_at, notify Orchestrator |
OE-02, OE-06, OE-07 |
| TT-05 | Pending | Assigned | IE-09: internal check_next_task_for_agent() + pending task found |
Re-route to OTM-2 (same as TT-01 path) | OE-06, OE-07 |
| TT-06 | Agent Done | Done | IE-04: Orchestrator validate call | Set validated_at, change status |
OE-02, OE-06, OE-07 |
| TT-07 | Agent Done | Rejected | IE-05: Orchestrator reject call with reason | Change status, log reason | OE-06, OE-07 |
| TT-08 | Rejected | Assigned | IE-06: Orchestrator rework call (subtasks already prepared) | Count unfinished subtasks, set counter, change status | OE-06, OE-07 |
| TT-10 | Rejected | Cancelled | IE-06: Orchestrator cancel call / IE-01: Human status edit | Change status | OE-06, OE-07 |
| TT-11 | New/Assigned/Pending/In Progress | Cancelled | IE-06: Orchestrator cancel call / IE-01: Human status edit | Free agent if applicable (AT-03), change status | OE-04, OE-06, OE-07 |
| TT-12 | Done/Cancelled | Archived | IE-08: watchdog tick + 7-day check passes | Archive task | OE-06, OE-07 |
| TT-13 | In Progress | Failed | IE-10: agent error detected (hook timeout/crash/exception) | Store previous_status, free agent (AT-04) |
OE-05, OE-06, OE-07 |
| TT-14 | Failed | New | IE-07: Orchestrator retry call | Reset task fields, change status | OE-06, OE-07 |
| TT-15 | Failed | Cancelled | IE-06: Orchestrator cancel call / IE-01: Human status edit | Change status | OE-06, OE-07 |
📌 TT-09 (Rejected → New) removed in v1.3. Reassignment is handled by the Orchestrator cancelling the rejected task (TT-10) and creating a new task for a different agent. This simplifies the state machine.
| Value | Label | Meaning | --priority flag |
|---|---|---|---|
| 0 | Critical | Blocking other work, immediate attention | critical |
| 1 | High | Important, do next | high |
| 2 | Medium/Normal | Normal priority | normal (or medium as alias) |
| 3 | Low | When bandwidth allows | low |
| 4 | Batchable | Large/expensive work, can run async via Batch API | batchable |
Queue ordering: priority ASC, posted_at ASC (0 = highest priority, FIFO within same priority).
The OTM maintains agent availability state in the Agent Registry (OTM-1). Agent transitions are coded AT-xx and are distinct from task transitions (TT-xx).
| Status | Description |
|---|---|
| Idle | Agent is available, not working on any task |
| Busy | Agent is actively working on a task (current_task is set) |
[Idle]
|
AT-01: OTM assigns task
(triggered by TT-02)
(SURE notification sent → OE-01)
|
[Busy]
|
AT-02: task completes (TT-04, IE-02 counter=0)
AT-03: task cancelled (TT-11, IE-01/IE-06)
AT-04: task fails (TT-13, IE-10)
|
[Idle]
|
→ OTM calls check_next_task_for_agent() (IE-09)
→ if pending task found: TT-05 → AT-01 againAll agent transitions are executed by the OTM. No actor changes agent status directly.
| Code | From → To | Inbound Event (trigger) | OTM Action | Outbound Event |
|---|---|---|---|---|
| AT-01 | Idle → Busy | IE-01: Assigned event + agent Idle (during OTM-2) | Set status=busy, current_task=task_id, task_started_at=now |
OE-01 (SURE notification), OE-07 (audit) |
| AT-02 | Busy → Idle | IE-02: subtask report + counter=0 (during OTM-3) | Set status=idle, clear current_task, call check_next_task_for_agent() |
OE-07 (audit) |
| AT-03 | Busy → Idle | IE-01/IE-06: cancellation request (during OTM-5) | Set status=idle, clear current_task, call check_next_task_for_agent() |
OE-04 (cancel notify), OE-07 (audit) |
| AT-04 | Busy → Idle | IE-10: agent error detected (during OTM error handler) | Set status=idle, clear current_task, call check_next_task_for_agent() |
OE-05 (admin alert), OE-07 (audit) |
| Code | From | To | Inbound Event | OTM Action | Outbound Event |
|---|---|---|---|---|---|
| AT-01 | Idle | Busy | IE-01 (Assigned event) | OTM-2 sets agent busy before sending SURE notification | OE-01, OE-07 |
| AT-02 | Busy | Idle | IE-02 (last subtask report) | OTM-3 frees agent, promotes next pending task | OE-07 |
| AT-03 | Busy | Idle | IE-01/IE-06 (cancellation) | OTM-5 frees agent if assigned to cancelled task | OE-04, OE-07 |
| AT-04 | Busy | Idle | IE-10 (agent error) | OTM error handler frees agent | OE-05, OE-07 |
📌 Agent ↔ Task coupling: Every AT-xx is triggered by a TT-xx. See §5 for the complete bidirectional mapping.
📌 Watchdog note: OTM-6 monitors agent heartbeats (last_seen >2h) but does NOT change agent state. It alerts the admin (OE-05). Only OTM handlers modify agent status.
CREATE TABLE agents (
slack_user_id TEXT PRIMARY KEY, -- Slack user ID (e.g., "U0AKEB27HNK")
otm_display_name TEXT NOT NULL, -- Display name for logs/UI (e.g., "Devdas")
openclaw_agent_id TEXT, -- OpenClaw agent ID (e.g., "devdas"). NULL for human agents.
agent_type TEXT NOT NULL -- 'ai' | 'human'
DEFAULT 'ai',
status TEXT DEFAULT 'idle', -- 'idle' | 'busy' (see §4.1)
current_task TEXT, -- task item ID or NULL
task_started_at INTEGER, -- Unix timestamp or NULL
last_seen INTEGER -- Unix timestamp
);Registry operations:
register_agent(slack_user_id, otm_display_name, openclaw_agent_id, agent_type)— AT startup or on first activityset_busy(slack_user_id, task_id)— AT-01set_idle(slack_user_id)— AT-02, AT-03, AT-04is_busy(slack_user_id) → boolean— checked during TT-02/TT-03get_current_task(slack_user_id) → task_id | null
On OTM startup:
- Read OpenClaw config — Query
openclaw.jsonagent configurations to populate AI agent entries automatically. Each configured OpenClaw agent that has aslack_user_idmapping is auto-registered withagent_type = 'ai'. - Reconcile from Slack — Query Slack List for all tasks with status
In ProgressorPending. Rebuildcurrent_taskandstatus(busy/idle) from those records. - Human agents — Hard-coded for v1. Rupert is pre-seeded in the agent registry at startup with
agent_type = 'human',openclaw_agent_id = NULL. Claudia is pre-seeded as the Orchestrator. Dynamic human registration (auto-detect fromassigned_toon first interaction) is deferred to a future version.
📌 v1 hard-coded actors:
| Slack user ID | Display name | Type | Role |
|---|---|---|---|
U06K407LVCY |
Rupert | human | Task assignee / reviewer |
U0AKEB27HNK |
Claudia | ai | Orchestrator (sole) |
Future versions will define a proper human user registration and re-registration protocol (see Open Question 9).
📌 AI agents are notified via OpenClaw /hooks/agent. Human agents are notified via the Slack task conversation thread (posted by SW-1 as an OE-07 audit entry addressed to the human). The notification channel is determined by agent_type.
The OTM runs as an OpenClaw plugin pipeline inside the gateway process. Several restart scenarios must be handled:
Scenario A: Gateway restarts (OTM restarts with it)
- OTM startup reconciliation (§4.6) runs automatically
- All agent states rebuilt from Slack List + openclaw.json
- SURE pending notifications checked: any outstanding acks >3 min old → ERR-06
- System event logged:
[timestamp] OTM SYSTEM: Gateway restart detected. Reconciliation complete. - Logged to both
event_log(in error DB) andotm-events.logfile
Scenario B: Gateway restarts but OTM was mid-processing
- SQLite WAL mode ensures no data corruption on crash
- On restart, OTM-7 (error monitor) runs within 60s and detects any inconsistencies:
- ERR-02/ERR-03: Agent-task mismatches from interrupted transitions
- ERR-08: Tasks stuck in Assigned from interrupted OTM-2
- ERR-04: Orphaned Pending from interrupted promotions
- All auto-correctable errors are fixed; others escalated
Scenario C: Gateway stops for extended period (>3 min)
- SE-1 stops receiving Slack events during downtime
- Agents cannot send IE-02/IE-03 reports (OpenClaw hooks are down)
- On restart: reconciliation rebuilds state from Slack List (source of truth for task fields)
- Pending SURE acks will have timed out → ERR-06 logged
- Agents that were mid-task may have completed work but couldn't report it:
- OTM-7 detects counter mismatches (ERR-05) on next cycle
- Watchdog cross-checks subtask completion status in Slack vs
subtasks_remaining
Orchestrator re-registration:
- The Orchestrator does NOT need to re-register agents. The OTM rebuilds the registry from
openclaw.jsonautomatically on startup (§4.6). - If
openclaw.jsonhas changed (new agent added, agent removed), the reconciliation picks up the delta.
Gateway restart logging:
- Every OTM startup logs a system event:
IE-SYS-01: OTM startupwith details including:- Agents reconciled (count + names)
- Tasks found in active states (In Progress, Pending, Assigned)
- SURE timeouts detected
- Errors found and corrected during reconciliation
- This event is logged to
event_log,otm-events.log, AND posted to Slack #alerts channel (OE-05)
| Task Transition | Triggers Agent Transition | Handler |
|---|---|---|
| TT-02 (Assigned → In Progress) | AT-01 (Idle → Busy) | OTM-2 |
| TT-04 (In Progress → Agent Done) | AT-02 (Busy → Idle) | OTM-3 |
| TT-11 (→ Cancelled) | AT-03 (Busy → Idle) | OTM-5 |
| TT-13 (In Progress → Failed) | AT-04 (Busy → Idle) | OTM error |
| Transition(s) | Primary Handler | Description |
|---|---|---|
| TT-01, TT-02, TT-03, TT-05 | OTM-2 | Task assignment, availability check, queue management |
| TT-04 | OTM-3 | Subtask completion, counter decrement, task completion |
| TT-06, TT-07, TT-08, TT-10 | OTM-4 | Validation, rejection, rework, cancel-after-reject |
| TT-11 | OTM-5 | Task cancellation |
| TT-12 | OTM-6 | Archival (watchdog requests, OTM executes) |
| TT-13, TT-14, TT-15 | OTM error handler | Failure detection, retry, abandon |
| AT-01 | OTM-2 | Agent set busy |
| AT-02 | OTM-3 | Agent freed on task completion |
| AT-03 | OTM-5 | Agent freed on task cancellation |
| AT-04 | OTM error handler | Agent freed on task failure |
| Code | Source | Description | Triggered by |
|---|---|---|---|
| IE-01 | SE-1 | Raw list_item_updated event from Slack (Human field edit) |
Human edits task in Slack UI |
| IE-02 | Agent | Subtask completion report via OTM API | Agent calls OTM after completing subtask |
| IE-03 | Agent | SURE acknowledgement via OTM API | Agent confirms receipt of task assignment |
| IE-04 | Orchestrator | Task validation request via OTM API | Orchestrator reviews and approves |
| IE-05 | Orchestrator | Task rejection request via OTM API (with reason) | Orchestrator reviews and rejects |
| IE-06 | Orchestrator | Task action request via OTM API (rework/cancel/retry) | Orchestrator requests state change |
| IE-07 | Orchestrator | Task retry request via OTM API (from Failed) | Orchestrator wants to retry failed task |
| IE-08 | Watchdog | Cron tick (every 60 seconds) | Timer fires |
| IE-09 | OTM internal | check_next_task_for_agent() result |
Triggered after AT-02/AT-03/AT-04 |
| IE-10 | OTM internal | Agent error detection (hook timeout >5min, crash, exception) | OpenClaw health monitoring |
| Code | Target | Description | Via |
|---|---|---|---|
| OE-01 | Agent | SURE task notification (requires acknowledgement IE-03) | OpenClaw hooks (AI) / Slack task conversation (Human) |
| OE-02 | Orchestrator | Task completion/validation notification | OpenClaw hooks |
| OE-03 | (reserved) | ||
| OE-04 | Agent | Task cancellation notification | OpenClaw hooks (AI) / Slack task conversation (Human) |
| OE-05 | Admin | Alert (anomaly, error, stale task, agent down) | Telegram / Slack #alerts |
| OE-06 | Slack | Task field update (status, counters, timestamps) | SW-1 |
| OE-07 | Slack | Audit log entry in task conversation feed | SW-1 |
- Task transitions (TT-xx): TT-01 through TT-15 (TT-09 removed in v1.3) — see §3.2.1 and §3.3
- Agent transitions (AT-xx): AT-01 through AT-04 — see §4.3 and §4.4
- Inbound events (IE-xx): IE-01 through IE-10 — see §5.3
- Outbound events (OE-xx): OE-01 through OE-07 — see §5.4
All task notifications to agents use the SURE protocol to guarantee delivery and acknowledgement.
OTM sends task assignment → OE-01
|
+-- OTM logs in task conversation (OE-07):
| "[2026-03-12 14:30:05] OTM → Agent(Devdas): Task assigned — <title>"
|
+-- Agent receives notification
|
+-- Agent sends acknowledgement → IE-03
|
+-- OTM logs in task conversation (OE-07):
"[2026-03-12 14:30:12] Agent(Devdas) → OTM: Task acknowledged"- Retry 1: If no IE-03 within 1 minute → OTM retries notification (OE-01), logs retry.
- Retry 2: If no IE-03 within 2 more minutes (3 min total) → OTM retries again, logs retry.
- Error: If no IE-03 after retry 2 → OTM logs ERR-06 error, alerts admin (OE-05). Task remains In Progress — admin decides.
📌 The 1+2 minute schedule is designed to allow time for a gateway restart (~2 min typical). If the gateway restarts but the OTM does not, the OTM resync procedure (§4.7) handles recovery.
| Event | SURE required? |
|---|---|
| Task assignment (OE-01) | ✅ Yes |
| Rework task (OE-01 after TT-08) | ✅ Yes |
| Task cancellation (OE-04) | ❌ No (fire-and-forget, agent stops) |
| Admin alert (OE-05) | ❌ No |
POST /api/otm/ack
{
"task_id": "<task item ID>",
"agent_id": "<slack_user_id>",
"type": "task_assigned"
}Every event processed by the OTM is logged in the Slack task conversation feed via SW-1. This provides a human-readable, timestamped record of all activity on each task.
| Event type | Log format |
|---|---|
| State change by OTM | [timestamp] OTM: Status changed <previous> → <new> (TT-xx) |
| Orchestrator request received | [timestamp] Orchestrator → OTM: <action> requested (IE-xx) |
| Human change detected | [timestamp] Human(<name>) change detected: <field> = <value> (IE-01) |
| Agent notification sent | [timestamp] OTM → Agent(<name>): <notification type> (OE-xx) |
| Agent acknowledgement received | [timestamp] Agent(<name>) → OTM: Acknowledged (IE-03) |
| Agent subtask report received | [timestamp] Agent(<name>) → OTM: Subtask done — <title> (IE-02). Remaining: <n> |
| Watchdog action | [timestamp] Watchdog: <check description> (IE-08) |
| Error/alert | [timestamp] OTM ERROR: <description> (IE-10) |
All audit entries are posted as replies in the Slack task's conversation thread via SW-1. This means:
- Every task's conversation is a complete history of its lifecycle
- No separate log table needed — Slack IS the audit log
- Human-readable without any tooling
- Searchable via Slack search
| Detail | |
|---|---|
| Actors | Slack Events API (source) |
| Inbound events | IE-01: any list_item_updated event on the Task Board List |
| Actions | Forward raw event payload to OTM-0 entry point |
| Outbound events | None — SE-1 has zero intelligence |
| Transitions | None — SE-1 is a passthrough |
The Slack app (Salvatore's app, socket mode) subscribes to list_item_updated events.
SE-1 does exactly one thing:
ON list_item_updated:
→ call otm_handle_event(raw_event_payload)No routing. No field inspection. No filtering. The OTM decides what to do with the event.
Required Slack App scope for SE-1: lists:read only.
| Detail | |
|---|---|
| Actors | OTM (sole caller) |
| Inbound events | OTM handler calls |
| Actions | Write task fields to Slack List, post audit entries to task conversation |
| Outbound events | OE-06: Slack field update, OE-07: audit log entry |
SW-1 is the sole component that writes to Slack. It provides two operations:
sw1_update_fields(task_id, fields)— Updates task item fields (status, counters, timestamps). Produces OE-06.sw1_post_audit(task_id, message)— Posts a timestamped message to the task's conversation thread. Produces OE-07.
Required Slack App scope for SW-1: lists:write.
📌 SE-1 (lists:read) and SW-1 (lists:write) are separate concerns. They may run in the same Slack app but are logically distinct.
📌 SW-1 does NOT create or delete tasks. Only the Orchestrator creates tasks and subtasks (via Slack API or UI). SW-1's write scope is limited to: updating existing task fields (OE-06) and posting audit entries to task conversations (OE-07). During rework, the Orchestrator manages subtask creation/deletion directly; the OTM then processes the state change via SW-1.
Implemented as an OpenClaw plugin pipeline set. Persists to SQLite. Sole component that writes task status, agent status, and counter fields.
| Detail | |
|---|---|
| Actors | OTM (internal) |
| Inbound events | IE-01: raw list_item_updated from SE-1 |
| Actions | ACT-R1: Parse event payload and identify change type |
| ACT-R2: Route to appropriate handler | |
| Outbound events | None directly — delegates to handlers |
Single entry point for all Slack-originated events. Contains the routing logic that was previously in SE-1.
RECEIVE raw_event_payload from SE-1
|
+-- Parse: what field(s) changed?
|
+-- IF assigned_to changed AND status = "New":
| → route to OTM-2 (task assignment)
|
+-- IF status changed to "Assigned" (from Pending promotion or rework):
| → route to OTM-2 (task re-assignment)
|
+-- IF status changed to "Cancelled" by Human:
| → route to OTM-5 (cancellation)
|
+-- IF other field changed by Human:
| → log via sw1_post_audit (OE-07): "Human(<name>) changed <field>"
| → no state transition
|
+-- ELSE: ignore📌 All routing intelligence lives here, not in SE-1. SE-1 is a dumb pipe.
| Detail | |
|---|---|
| Actors | OTM (owner/writer) |
| Inbound events | IE-08: startup reconciliation |
| IE-01: new agent detected (auto-register) | |
| Actions | ACT-A1: Register new agent (from openclaw.json or first interaction) |
| ACT-A2: Set agent busy (AT-01) | |
| ACT-A3: Set agent idle (AT-02, AT-03, AT-04) | |
| ACT-A4: Reconcile from Slack List on startup | |
| ACT-A5: Reconcile from openclaw.json on startup | |
| Outbound events | OE-07: audit log for registration events |
| Transitions | AT-01, AT-02, AT-03, AT-04 |
See §4.5 for schema and §4.6 for startup reconciliation.
| Detail | |
|---|---|
| Actors | OTM (executor), Agent (notified if idle) |
| Inbound events | IE-01: assigned_to changed or status=Assigned (from OTM-0) |
| Actions | ACT-T1: Count subtasks via Slack API |
| ACT-T2: Check agent availability in registry | |
| ACT-T3: Set task status (via SW-1) | |
| ACT-T4: Set agent busy (AT-01 via OTM-1) | |
| ACT-T5: Send SURE notification (OE-01) | |
| Outbound events | OE-01: SURE task notification (if agent idle) |
| OE-06: Slack field update | |
| OE-07: audit log entry | |
| Task transitions | TT-02 (→ In Progress) or TT-03 (→ Pending) |
| Agent transitions | AT-01 (Idle → Busy) if agent available |
RECEIVE task assignment event (from OTM-0)
|
+-- Read task fields: task_id, title, assigned_to, priority
+-- ACT-T1: Count child items (subtasks) via Slack API → subtask_count
+-- Store subtask_count as initial subtasks_remaining
|
+-- ACT-T2: Look up assigned_to agent in registry
|
+-- IF agent NOT found AND agent_type detectable:
| ACT-A1: Auto-register agent
| sw1_post_audit: "New agent registered: <name>"
|
+-- IF agent NOT found AND not detectable:
| OE-05: Alert admin
| sw1_post_audit: "ERROR: Unknown agent <id>"
| Task stays in current status
|
+-- IF agent is IDLE:
| Store previous_status
| ACT-T4: Execute AT-01 (agent → busy)
| sw1_update_fields: subtasks_remaining, status = "In Progress", assigned_at = now
| sw1_post_audit: "Status: Assigned → In Progress (TT-02). Agent: <name>"
| ACT-T5: Send SURE notification (OE-01)
| sw1_post_audit: "OTM → Agent(<name>): Task assigned (OE-01). Awaiting SURE ack."
|
+-- IF agent is BUSY:
Store previous_status
sw1_update_fields: status = "Pending"
sw1_post_audit: "Status: Assigned → Pending (TT-03). Agent <name> busy with <current_task>"| Detail | |
|---|---|
| Actors | Agent (reports completion), OTM (processes), Orchestrator (notified on task completion) |
| Inbound events | IE-02: agent subtask completion report via OTM API |
| Actions | ACT-S1: Validate subtask belongs to agent's current task |
| ACT-S2: Decrement counter | |
ACT-S3: Update Slack subtask checkbox (via SW-1) — sets both status AND Col00 |
|
| ACT-S4: Complete task if counter = 0 | |
| ACT-S5: Free agent (AT-02 via OTM-1) | |
| ACT-S6: Promote next pending task (IE-09) | |
| Outbound events | OE-02: Orchestrator notification (on task complete) |
| OE-06: Slack field update | |
| OE-07: audit log entry | |
| Task transitions | TT-04 (→ Agent Done when counter hits 0) |
| Agent transitions | AT-02 (Busy → Idle) when task completes |
RECEIVE subtask completion report (IE-02)
{task_id, subtask_id, agent_id}
|
+-- ACT-S1: Validate:
| - subtask belongs to task
| - agent is assigned to task
| - subtask not already completed (idempotency: check todo_completed field / Col00)
| IF already completed: discard, return OK
|
+-- ACT-S3: sw1_update_fields: subtask.todo_completed = true (Col00), subtask.status = done
+-- ACT-S2: Decrement task.subtasks_remaining by 1
+-- sw1_update_fields: subtasks_remaining
+-- sw1_post_audit: "Agent(<name>): Subtask done — <title> (IE-02). Remaining: <n>"
|
+-- IF subtasks_remaining > 0:
| Done. Await next report.
|
+-- IF subtasks_remaining = 0 (TASK COMPLETE):
Store previous_status = "In Progress"
sw1_update_fields: status = "Agent Done", completed_at = now
sw1_post_audit: "Status: In Progress → Agent Done (TT-04). All subtasks complete."
ACT-S5: Execute AT-02 (agent → idle)
ACT-S6: Notify Orchestrator (OE-02):
POST /hooks/agent {
agentId: "main",
message: "Task ready for review: <title>\nAgent: <name>\nElapsed: <time>\nResult: <result_summary>\nLink: <slack_link>"
}
sw1_post_audit: "OTM → Orchestrator: Task ready for review (OE-02)"
CALL: check_next_task_for_agent(agent_id) → IE-09check_next_task_for_agent(agent_id):
Query Slack List for tasks WHERE:
assigned_to = agent_id
AND status = "Pending"
ORDER BY priority ASC, posted_at ASC
LIMIT 1
|
+-- IF Pending task found:
| sw1_update_fields: pending_task.status = "Assigned"
| sw1_post_audit on pending task: "Status: Pending → Assigned (TT-05). Agent now available."
| (OTM-0 detects Assigned change → OTM-2 fires)
|
+-- IF no Pending task:
Agent remains idle.📌 Idempotency is handled by checking todo_completed (Col00) on the subtask before processing. No separate processed_events table needed. The Slack conversation feed (OE-07) serves as the complete audit trail.
📌 Agent reports directly to OTM API (IE-02), not by ticking checkboxes in Slack. The OTM then updates Slack via SW-1 (ACT-S3). This ensures all writes go through the OTM.
| Detail | |
|---|---|
| Actors | Orchestrator (caller), OTM (executor) |
| Inbound events | IE-04: Orchestrator requests validation |
| IE-05: Orchestrator requests rejection (with reason) | |
| IE-06: Orchestrator signals rework ready (subtasks already managed by Orchestrator) | |
| Actions | ACT-V1: Verify task status = "Agent Done" or "Rejected" |
| ACT-V2: Execute validation (TT-06) | |
| ACT-V3: Execute rejection (TT-07) | |
| ACT-V4: Count unfinished subtasks, set counter, change status (TT-08) | |
| Outbound events | OE-02: confirmation to Orchestrator |
| OE-06: Slack field update | |
| OE-07: audit log entries | |
| Task transitions | TT-06 (→ Done), TT-07 (→ Rejected), TT-08 (→ Assigned), TT-10 (→ Cancelled) |
Validate request (IE-04):
{
"task_id": "<task item ID>",
"outcome": "validated",
"comment": "<optional>"
}Processing — validated:
ACT-V1: Verify task.status = "Agent Done" (reject API call otherwise)
Store previous_status = "Agent Done"
sw1_post_audit: "Orchestrator → OTM: Validation requested (IE-04)"
ACT-V2: Execute TT-06
sw1_update_fields: status = "Done", validated_at = now, todo_completed = true (Col00)
sw1_post_audit: "Status: Agent Done → Done (TT-06). Validated."
IF comment: sw1_post_audit: "Orchestrator comment: <comment>"
OE-02: Notify Orchestrator: "Task <title> is Done"Reject request (IE-05):
{
"task_id": "<task item ID>",
"outcome": "rejected",
"reason": "<mandatory explanation>"
}Processing — rejected:
ACT-V1: Verify task.status = "Agent Done" (reject API call otherwise)
Store previous_status = "Agent Done"
sw1_post_audit: "Orchestrator → OTM: Rejection requested (IE-05). Reason: <reason>"
ACT-V3: Execute TT-07
sw1_update_fields: status = "Rejected"
sw1_post_audit: "Status: Agent Done → Rejected (TT-07)"Rework request (IE-06) — submitted after Orchestrator has already prepared subtasks:
The Orchestrator handles all subtask management before calling the OTM:
- Orchestrator deletes unnecessary/obsolete subtasks from the Slack List
- Orchestrator leaves completed subtasks in place (as a record)
- Orchestrator creates new subtasks (first = "Acknowledge rework request: ")
- Orchestrator then notifies the OTM that rework is ready:
{
"task_id": "<task item ID>",
"action": "rework"
}Processing — rework:
Verify task.status = "Rejected"
sw1_post_audit: "Orchestrator → OTM: Rework requested (IE-06)"
ACT-V4: Execute TT-08
Count unfinished subtasks via Slack API (todo_completed = false / Col00 unchecked)
Set subtasks_remaining = count of unfinished subtasks
sw1_update_fields: status = "Assigned", subtasks_remaining
sw1_post_audit: "Status: Rejected → Assigned (TT-08). Rework: <n> unfinished subtasks."
→ OTM-0 detects Assigned change → OTM-2 fires → SURE notification sent → agent acks📌 Rework flow — separation of concerns: The Orchestrator owns subtask management (create, delete, keep). The OTM owns state management (status transitions, counter recalculation, agent assignment). The OTM never creates or deletes subtasks. On rework, it counts unfinished subtasks to set subtasks_remaining, then triggers the normal assignment flow. The first new subtask is always an acknowledgement subtask — the agent closes it to confirm they understood the rework instructions.
Cancel-after-reject request (IE-06):
{
"task_id": "<task item ID>",
"action": "cancel"
}Verify task.status = "Rejected" → Execute TT-10 (→ Cancelled).
| Detail | |
|---|---|
| Actors | Orchestrator/Human (requests), OTM (executes), Agent (notified if was working) |
| Inbound events | IE-01: Human status edit detected by OTM-0 |
| IE-06: Orchestrator cancel API call | |
| Actions | ACT-C1: Store previous_status |
| ACT-C2: Free agent if applicable (AT-03) | |
| ACT-C3: Notify agent of cancellation (OE-04) | |
| ACT-C4: Promote next pending task (IE-09) | |
| Outbound events | OE-04: cancellation notification to agent |
| OE-06: Slack field update | |
| OE-07: audit log entry | |
| Task transitions | TT-11 (→ Cancelled) |
| Agent transitions | AT-03 (Busy → Idle) if agent was working |
RECEIVE cancellation request (IE-01 or IE-06)
|
+-- ACT-C1: Store previous_status
+-- sw1_post_audit: "Cancellation requested by <actor> (IE-xx)"
+-- sw1_update_fields: status = "Cancelled"
+-- sw1_post_audit: "Status: <previous> → Cancelled (TT-11)"
|
+-- IF task was In Progress or Assigned:
| ACT-C2: Execute AT-03 (agent → idle)
| ACT-C3: Send cancellation notification (OE-04)
| sw1_post_audit: "OTM → Agent(<name>): Task cancelled (OE-04)"
| ACT-C4: check_next_task_for_agent(agent_id) → IE-09
|
+-- IF task was Pending:
| sw1_post_audit: "Removed from queue (no agent notification)"
|
+-- IF task was New:
sw1_post_audit: "Cancelled before assignment"| Detail | |
|---|---|
| Actors | Watchdog cron (detector), OTM (executor), Admin (alerted) |
| Inbound events | IE-08: cron tick (every 60 seconds) |
| Actions | ACT-W1: Check for stale In Progress tasks (>24h, no subtask activity) |
| ACT-W2: Check for orphaned Pending tasks (agent idle but task pending) | |
| ACT-W3: Check counter mismatches (subtasks_remaining vs actual) | |
| ACT-W4: Check archival candidates (Done/Cancelled >7 days) | |
| ACT-W5: Check agent heartbeats (last_seen >2h) | |
| Outbound events | OE-05: admin alert (anomalies) |
| OE-06: Slack field update (archival) | |
| OE-07: audit log entries | |
| Task transitions | Requests TT-12 (→ Archived), requests TT-05 (orphaned Pending → Assigned) |
📌 The Watchdog does NOT write state directly. It calls OTM handler functions to execute transitions.
Checks:
ACT-W1: STALE IN-PROGRESS
Query tasks In Progress for >24h with no subtask activity in conversation feed
→ OE-05: Alert admin. Do NOT auto-reassign.
→ sw1_post_audit: "Watchdog: Stale task detected (>24h, no activity)"
ACT-W2: ORPHANED PENDING
Query tasks Pending WHERE assigned agent is Idle in registry
→ Request OTM to re-trigger: call OTM-2 (TT-05 → Assigned)
→ sw1_post_audit: "Watchdog: Orphaned pending — re-triggering assignment"
ACT-W3: COUNTER MISMATCH
Compare subtasks_remaining with actual unchecked subtask count
→ Recalculate and fix counter via SW-1
→ OE-05: Alert admin
→ sw1_post_audit: "Watchdog: Counter mismatch corrected (<old> → <new>)"
ACT-W4: ARCHIVAL
Query tasks Done or Cancelled WHERE completed_at/cancelled_at + 7 days < now
→ Request OTM to execute TT-12
→ sw1_update_fields: status = "Archived"
→ sw1_post_audit: "Status: <previous> → Archived (TT-12). Auto-archived after 7 days."
ACT-W5: AGENT HEARTBEAT
Query agents WHERE last_seen > 2h ago
→ OE-05: Alert admin (agent may be down). No state change.Orchestrator sets assigned_to on New task
|
[SE-1] receives list_item_updated → forwards raw event to OTM (IE-01)
|
[OTM-0] parses: assigned_to changed → routes to OTM-2
|
[OTM-2] checks registry: agent is IDLE
| sw1_update_fields: subtasks_remaining, status = "In Progress", assigned_at
| sw1_post_audit: "Status: New → Assigned → In Progress (TT-01, TT-02)"
| executes AT-01: agent → busy
| sends SURE notification (OE-01)
| sw1_post_audit: "OTM → Agent(<name>): Task assigned. Awaiting SURE ack."
|
Agent receives notification
| sends acknowledgement (IE-03)
|
[OTM] receives IE-03
| sw1_post_audit: "Agent(<name>): Task acknowledged (IE-03)"
|
Agent starts workOrchestrator sets assigned_to on New task
|
[SE-1] → IE-01 → [OTM-0] → routes to OTM-2
|
[OTM-2] checks registry: agent is BUSY
| sw1_update_fields: status = "Pending"
| sw1_post_audit: "Status: Assigned → Pending (TT-03). Agent busy with <current_task>"
|
Task waits silently. No agent notification.Agent completes subtask → reports to OTM API (IE-02)
|
[OTM-3] validates: subtask belongs to task, not already completed
| sw1_update_fields: subtask.todo_completed = true (Col00), subtask.status = done
| decrements subtasks_remaining (3 → 2)
| sw1_update_fields: subtasks_remaining
| sw1_post_audit: "Agent(<name>): Subtask done — <title>. Remaining: 2"
|
Agent continues working.Agent completes final subtask → reports to OTM API (IE-02)
|
[OTM-3] decrements (1 → 0)
| sw1_update_fields: status → "Agent Done", completed_at
| sw1_post_audit: "Status: In Progress → Agent Done (TT-04). All subtasks complete."
| executes AT-02: agent → idle
| notifies Orchestrator (OE-02)
| sw1_post_audit: "OTM → Orchestrator: Task ready for review"
| check_next_task_for_agent()
| → Pending task found? → TT-05 → OTM-0 → OTM-2 → Flow 1a
| → No pending? → agent stays idleOrchestrator tells OTM to validate task (IE-04)
|
[OTM-4] verifies status = "Agent Done"
| sw1_post_audit: "Orchestrator: Validation requested"
| sw1_update_fields: status → "Done", validated_at, todo_completed = true (Col00)
| sw1_post_audit: "Status: Agent Done → Done (TT-06). Validated."
| notifies Orchestrator: confirmed (OE-02)Step 1: Orchestrator tells OTM to reject task (IE-05)
|
[OTM-4] sw1_post_audit: "Orchestrator: Rejection. Reason: <reason>"
| sw1_update_fields: status → "Rejected"
| sw1_post_audit: "Status: Agent Done → Rejected (TT-07)"
Step 2: Orchestrator manages subtasks directly in Slack (NOT via OTM):
| - Deletes unnecessary/obsolete subtasks
| - Leaves completed subtasks in place (as record)
| - Creates new subtasks:
| 1. "Acknowledge rework request: <detailed reason and instructions>"
| 2. "Fix validation on email field"
| 3. "Add unit tests for edge cases"
Step 3: Orchestrator tells OTM that rework is ready (IE-06)
|
[OTM-4] receives rework signal:
| Counts unfinished subtasks via Slack API → 3
| sw1_update_fields: status → "Assigned", subtasks_remaining = 3
| sw1_post_audit: "Status: Rejected → Assigned (TT-08). Rework: 3 unfinished subtasks."
Step 4: Normal assignment flow (TT-02 → SURE → agent works)
|
[OTM-0] detects Assigned → OTM-2 → agent idle → In Progress
| SURE notification posted to Slack task conversation (OE-01)
| Agent acknowledges (IE-03)
|
Agent reads first subtask: "Acknowledge rework request: ..."
| Agent closes first subtask to acknowledge (IE-02)
| OTM-3 decrements: 3 → 2
| sw1_post_audit: "Agent(<name>): Acknowledged rework. Remaining: 2"
|
Agent works through remaining subtasks normallyOrchestrator tells OTM to cancel (IE-06) / Human changes status in Slack UI (IE-01)
|
[OTM-0] routes to OTM-5
|
[OTM-5] sw1_post_audit: "Cancellation requested by <actor>"
| sw1_update_fields: status → "Cancelled", cancelled_at = now
| executes AT-03: agent freed if was working
| sends cancellation notification (OE-04)
| sw1_post_audit: "Status: <previous> → Cancelled (TT-11)"
| check_next_task_for_agent() for freed agent[OTM-6 Watchdog] IE-08: cron tick fires (every 60s)
|
+-- ACT-W4: Query tasks WHERE:
| (status = "Done" AND validated_at + 7 days < now)
| OR (status = "Cancelled" AND cancelled_at + 7 days < now)
|
+-- FOR EACH archival candidate:
Store previous_status
sw1_update_fields: status → "Archived"
sw1_post_audit: "Status: <previous> → Archived (TT-12). Auto-archived after 7 days."
INSERT INTO task_history (snapshot of task fields)
DELETE task from active tasks (SQLite only — Slack List item untouched)📌 TT-12 is a watchdog-requested transition (IE-08 trigger). The watchdog detects the 7-day threshold; the OTM executes the archive. Archived tasks are snapshotted to task_history for long-term reporting before being removed from the active tasks table.
📌 Slack archive limitation: The OTM sets status = "Archived" in Slack via SW-1, but the actual "Archive item" action in the Slack UI cannot be triggered via API (slackLists.items.archive does not exist). Manual Slack UI archiving is required for items to visually disappear from the default Slack list view.
The file-based pipeline is the operational implementation of Systems 1–4. It bridges the Orchestrator's task-creation workflow to the Slack List and agent workspaces, using JSON files as the intermediary to avoid direct Slack API calls by agents.
The Orchestrator (Claudia) creates tasks by running a shell script. This populates a JSON file and atomically increments the task ID counter.
| Parameter | Required | Default | Description |
|---|---|---|---|
--title |
✅ | — | Short, actionable one-liner |
--agent |
✅ | — | Who works on it: claudia, devdas, archibald, frederic, salvatore, sylvain, rupert |
--priority |
❌ | normal |
critical | high | normal | medium | low | batchable |
--project |
❌ | (none) | Free-text project name (e.g. prj-012, fab-state) |
--type |
❌ | action |
action | decision | review |
--subtask |
❌ | "Confirm that task has been done" |
Repeatable. Each value becomes a Slack subtask. If none provided, a default confirmation subtask is auto-added (required for completion detection) |
# Simple action task
otm-create-task.sh --title "Fix login bug" --agent devdas --priority high --project prj-012
# Decision for Rupert
otm-create-task.sh --title "Approve budget for Q2" --agent rupert --type decision
# Task with subtasks
otm-create-task.sh --title "Build login page" --agent devdas --project prj-012 \
--subtask "Create login form component" \
--subtask "Add validation logic" \
--subtask "Write unit tests"
# Input/acknowledgement pattern (replaces --description)
otm-create-task.sh --title "Review API design" --agent frederic --type review \
--subtask "input: the API spec is at docs/api-v2.md" \
--subtask "Check endpoint naming conventions" \
--subtask "Validate error response format"
# Batchable priority (can wait for batch processing)
otm-create-task.sh --title "Update documentation" --agent archibald --priority batchableWritten to ~/Library/Application Support/OpenClaw/otm/new-tasks/<timestamp>-<uuid>.json
{
"id": "uuid",
"taskId": "T-00035",
"title": "Short task title",
"agent": "devdas",
"createdAt": "2026-03-14T08:20:19Z",
"priority": "normal",
"project": "prj-012",
"subtasks": ["Subtask 1", "Subtask 2"],
"type": "action",
"status": "pending"
}- Format:
T-NNNNN(5 digits, zero-padded) - Counter file:
~/Library/Application Support/OpenClaw/otm/next-task-id.json - Atomic increment with file locking (
flock) - Auto-assigned by
otm-create-task.sh— agents don't manage IDs manually
| Type | Purpose | Primary user |
|---|---|---|
action |
Task to execute (default) | Agents |
decision |
Requires a decision from someone | Rupert |
review |
Needs review / approval | Rupert or agents |
The injector watches the new-tasks/ directory and publishes JSON task files to the Slack Lists API.
| Component | File | Trigger |
|---|---|---|
| Watcher | ai.openclaw.otm-watcher.plist |
WatchPaths on new-tasks/ (and task-updates/) |
| Sweeper | ai.openclaw.otm-sweeper.plist |
Every 10 min (catches misses) |
The injector includes a dedup check before creating tasks in Slack to prevent duplicates on crash/retry:
- Before calling
slackLists.items.create, the injector fetches all existing items - Scans for a matching Task ID (
Col0ALVK2NA1E) - If the Task ID already exists → skips creation, moves file to
processed/with_skipped_duplicate: true - If the dedup API call fails → proceeds with creation (better to duplicate than lose a task)
This prevents duplicate Slack items when the watcher or sweeper re-processes a file already injected (e.g., after a crash, retry, or race condition).
| Path | Purpose |
|---|---|
~/Library/Application Support/OpenClaw/otm/new-tasks/ |
Inbox — pending task files |
~/Library/Application Support/OpenClaw/otm/task-updates/ |
Inbox — update files (System 3b) |
~/Library/Application Support/OpenClaw/otm/processed/ |
Successfully injected |
~/Library/Application Support/OpenClaw/otm/failed/ |
Failed (with error metadata) |
A lightweight Node.js scanner that detects tasks with an assignee and dispatches them to the appropriate agent workspace. Runs every 2 minutes via launchd.
- Fetches all tasks from the Slack list (
F0ALE8DCW1F) - Finds tasks where:
status = "new"ANDassigneeis set (not empty) - For each matching task (in this exact order for crash safety):
- Writes a dispatch file to the agent's workspace:
/Volumes/OPENCLAW/CLAUDIA/rapido-openclaw/workspaces/<agent>-workspace/task-dispatch.json - Updates the task's status in Slack from
new→assigned+ setsassigned_at(this is the dedup gate — onceassigned, future runs skip it) - Triggers the agent session via Gateway WebSocket RPC
- Writes a dispatch file to the agent's workspace:
- After all tasks processed, all agent triggers fire in parallel — agents start concurrently
- Writes its own component state file (System 5 heartbeat)
After writing task-dispatch.json, the dispatcher actively triggers each agent via the OpenClaw Gateway WebSocket RPC. This eliminates the passive "wait for heartbeat" gap — agents start working immediately.
Protocol: WebSocket JSON-RPC to ws://127.0.0.1:18789
Dispatcher Gateway Agents
│ │ │
│ ws.connect() │ │
│─────────────────────────────►│ │
│ │ │
│ { method: "agent", │ │
│ params: { │ │
│ agentId: "archibald", │ │
│ message: "Task T-00044 │ │
│ dispatched...", │ │
│ idempotencyKey: │ │
│ "otm-T-00044" │ │
│ }} │ │
│─────────────────────────────►│ ──► archibald session ──►│
│ │ │
│ { method: "agent", ... │ │
│ agentId: "devdas" } │ │
│─────────────────────────────►│ ──► devdas session ─────►│
│ │ │
│ { method: "agent", ... │ │
│ agentId: "salvatore" } │ │
│─────────────────────────────►│ ──► salvatore session ──►│
│ │ │
│ ws.close() │ (all 3 run in │
│─────────────────────────────►│ parallel) │
│ │ │
│ Dispatcher exits. │ Gateway manages │
│ Total time: ~2 seconds. │ concurrent sessions. │RPC call format:
{
"method": "agent",
"params": {
"message": "Task T-00044 dispatched. Check task-dispatch.json and execute it.",
"agentId": "archibald",
"idempotencyKey": "otm-T-00044"
}
}Response (immediate, non-blocking):
{
"runId": "otm-T-00044",
"status": "accepted",
"acceptedAt": 1773504558104
}Key properties:
- Parallel: All agent sessions start concurrently — no serialization
- Non-blocking: Gateway returns
acceptedimmediately; dispatcher doesn't wait - Auth: Gateway token passed via WebSocket connection headers
- Rupert excluded: Human users are notified via Slack UI, not RPC
idempotencyKey is NOT idempotent. Despite the name, the Gateway agent RPC method accepts duplicate calls with the same key — it uses the key as a runId label only. Calling twice with the same key = two separate agent sessions. Idempotency is the dispatcher's responsibility, not the gateway's. The Slack status flip (new → assigned) is the sole dedup mechanism for the dispatcher. Once a task is assigned, it's invisible to future dispatcher runs.
vs. HTTP Webhook alternative (POST /hooks/agent):
The webhook approach serializes agent sessions on CommandLane.Nested — agents run one at a time. With 6 agents × ~3 min each = ~18 min sequential vs. ~3 min parallel via WebSocket RPC. WebSocket is the correct choice for multi-agent dispatch.
The dispatcher must execute operations in this exact order to prevent duplicate agent triggers:
1. Write task-dispatch.json to agent workspace
2. Update Slack status: new → assigned (+ set assigned_at)
3. Trigger agent via WS RPCWhy this order matters:
| Crash point | Result | Recovery |
|---|---|---|
| After step 1, before step 2 | File written, Slack still new |
Next dispatcher run re-dispatches → file already has the task (append-only), agent gets triggered. Safe but duplicate file entry. |
| After step 2, before step 3 | Slack says assigned, agent never woke up |
Agent picks up task on next heartbeat or manual trigger. Safe — delayed but not lost. |
| After step 3 | All done | Clean path. |
The dangerous alternative (trigger agent FIRST, then update Slack) risks: crash after trigger → Slack still new → next run triggers agent AGAIN → duplicate work, wasted tokens. This is why Slack update must come before agent trigger.
{
"dispatched": [
{
"taskId": "T-00033",
"title": "Build login page",
"priority": "high",
"project": "PRJ-012 App",
"type": "action",
"subtasks": ["Create form", "Add validation", "Write tests"],
"dispatchedAt": "2026-03-14T15:00:00Z",
"slackItemId": "Rec0ALXYZ"
}
]
}The dispatch file is APPEND-ONLY — new tasks get added to the dispatched array. The agent removes entries when they pick them up (or marks them as "picked": true).
On session start, agents check for task-dispatch.json. If present, they:
- Pick up the highest-priority task
- Update their
agent-state.jsonto"working"with the task - Start working on it
- When done, mark subtasks complete via
otm-update-task.sh
| Agent | Workspace path |
|---|---|
| claudia | claudia-workspace |
| devdas | devdas-workspace |
| archibald | archibald-workspace |
| frederic | frederic-workspace |
| salvatore | salvatore-workspace |
| sylvain | sylvain-workspace |
| rupert | (skip — human, notified via Slack UI) |
| Item | Detail |
|---|---|
| File | otm-dispatcher.js |
| Plist | ai.openclaw.otm-dispatcher.plist |
| Schedule | Every 120 seconds |
| Log | ~/Library/Logs/OpenClaw/otm-dispatcher.log |
| State file | otm-dispatcher-state.json |
A file-drop mechanism that allows agents to update task status and mark subtasks complete — same security model as task creation (agents never touch the Slack API directly).
# Mark a subtask done
otm-update-task.sh --task-id T-00033 --subtask-done "Create login form"
# Update task status
otm-update-task.sh --task-id T-00033 --status in_progress
# Report blocked
otm-update-task.sh --task-id T-00033 --status blocked --reason "Waiting on API key"
# Multiple subtasks done at once
otm-update-task.sh --task-id T-00033 \
--subtask-done "Create login form" \
--subtask-done "Add validation logic"| Parameter | Required | Description |
|---|---|---|
--task-id |
✅ | Task ID (T-NNNNN) |
--status |
❌ | New status: in_progress | blocked | agent_done |
--subtask-done |
❌ | Subtask title to mark as done (repeatable) |
--reason |
❌ | Reason text (used with --status blocked) |
At least one of --status or --subtask-done is required.
Written to ~/Library/Application Support/OpenClaw/otm/task-updates/<timestamp>-<uuid>.json
{
"id": "uuid",
"taskId": "T-00033",
"action": "update",
"createdAt": "2026-03-14T16:00:00Z",
"status": "in_progress",
"subtasksDone": ["Create login form"],
"reason": null
}The otm-injector.js is extended to also watch task-updates/ directory:
- Reads the update JSON
- Looks up the task in Slack by Task ID (scans items, matches
Col0ALVK2NA1E) - If
statusis set → updates the Status column - If
subtasksDoneis set → finds matching child items by title → sets their status todoneAND setsCol00(checkbox) totrue - Moves file to
processed/on success,failed/on error
| Item | Detail |
|---|---|
| Script | otm-update-task.sh (shared tool) |
| Processor | otm-injector.js (extended) |
| Watcher | ai.openclaw.otm-watcher.plist (updated WatchPaths) |
A lightweight scanner that auto-promotes tasks to agent_done when all subtasks are complete.
Every task has at least one subtask — otm-create-task.sh auto-adds "Confirm that task has been done" if no subtasks are provided. This guarantees the completion detector always has a signal.
Completion Detector (Node.js, launchd every 2 min)
│
├── GET all tasks from Slack list
├── FILTER: status = "in_progress" + has child items (subtasks)
├── CHECK: all child items have status ∈ {done, agent_done} OR Col00 = true
│
└── YES → UPDATE parent task status → "agent_done"| Condition | Action |
|---|---|
Task in_progress + all subtasks done/agent_done |
→ promote to agent_done |
Task in_progress + some subtasks still open |
→ skip (work in progress) |
Task not in_progress |
→ skip (not active) |
- Technical tasks: Claudia validates the work →
done - Business/decision tasks: Claudia creates a
reviewtask for Rupert → Rupert approves →done
| Item | Detail |
|---|---|
| File | otm-completion-detector.js |
| Plist | ai.openclaw.otm-completion-detector.plist |
| Schedule | Every 120 seconds |
| Log | ~/Library/Logs/OpenClaw/otm-completion-detector.log |
Each OTM component writes a state file after every run. These files are watched by the collector and surfaced on the dashboard.
{
"component": {
"id": "otm-injector",
"status": "alive",
"lastRun": "2026-03-14T10:15:00Z",
"result": "success",
"details": "Processed 2 tasks, 0 failures"
}
}| Component | File | Written by |
|---|---|---|
| OTM Injector | otm-injector-state.json |
otm-injector.js (end of each run) |
| Dispatcher | otm-dispatcher-state.json |
otm-dispatcher.js (end of each run) |
| Completion Detector | otm-completion-detector-state.json |
otm-completion-detector.js (end of each run) |
| Watcher | otm-watcher-state.json |
watcher wrapper |
All files: ~/Library/Application Support/OpenClaw/otm/
The OTM Components card shows each component's status with staleness color coding:
- 🟢 Green: last run < 5 min ago
- 🟡 Yellow: last run 5–15 min ago
- 🔴 Red: last run > 15 min ago (action required)
State file → collector.js (FSEvents) → SQLite otm_state table
↓
reader.js polls + broadcasts
↓
Dashboard WebSocketThe DMZ relay bridges the private OpenClaw state to the public Vercel dashboard. It runs on a Synology NAS in the DMZ.
OpenClaw VM (private) Synology NAS (DMZ) Browser (Vercel dashboard)
│ ┌──────────────────┐
│ collector.js │ RECEIVER │
│ pushToRelay() │ 127.0.0.1:3456 │
│ ── HTTP POST ─────────► │ + bearer token │
│ on state change │ + atomic write │
│ │ │
│ │ fab-state.json │ ← shared state file
│ │ │
│ │ BROADCASTER │
│ │ 0.0.0.0:3457 │
│ │ (TLS via proxy) │ ◄── wss://nas.domain/ws
│ │ fs.watch → push │ ◄── GET /api/state
│ └──────────────────┘| Service | File | Port | Binding | Dependencies |
|---|---|---|---|---|
| Receiver | receiver.js |
3456 | 127.0.0.1 (localhost only) |
Zero — pure Node.js |
| Broadcaster | broadcaster.js |
3457 | 0.0.0.0 (behind TLS proxy) |
ws package only |
| Service | Var | Required | Description |
|---|---|---|---|
| Receiver | FAB_RELAY_TOKEN |
✅ | Shared secret bearer token |
| Receiver | STATE_FILE |
❌ | Path to state file (default: ./fab-state.json) |
| Broadcaster | STATE_FILE |
❌ | Same state file path |
| Collector (OpenClaw) | FAB_RELAY_URL |
❌ | If set, enables relay push |
| Collector (OpenClaw) | FAB_RELAY_TOKEN |
❌ | Must match receiver token |
| Layer | Protection |
|---|---|
| Bearer token | Constant-time comparison (timing-attack safe) |
| Receiver binding | 127.0.0.1 — not reachable from internet |
| Firewall | Port 3456: allow ONLY from OpenClaw VM IP |
| Broadcaster | Read-only; no auth needed (non-sensitive data) |
| TLS | All external traffic via Synology reverse proxy + Let's Encrypt |
| Atomic write | Receiver writes .tmp → rename; no partial reads |
// Only runs if FAB_RELAY_URL is set:
pushStateToRelay(); // called after every gateway/agent/OTM state changeThe pushToRelay() function in collector.js:
- Builds full snapshot from SQLite (gateway + agents + OTM)
- POSTs to
FAB_RELAY_URLwith bearer token - Handles errors gracefully — relay down = warning log, not crash
| File | Location |
|---|---|
receiver.js |
work/PROJECTS/fab-state/synology-relay/ |
broadcaster.js |
work/PROJECTS/fab-state/synology-relay/ |
package.json |
work/PROJECTS/fab-state/synology-relay/ |
SETUP-GUIDE.md |
work/PROJECTS/fab-state/synology-relay/ |
See work/PROJECTS/strategy-openclaw-org/docs/FAB-STATE.md System 6 for full documentation.
Claudia Filesystem Injector Slack Dispatcher Agent Workspace
│ │ │ │ │ │
│ otm-create-task.sh │ │ │ │ │
│────────────────────►│ .json │ │ │ │
│ │─── watcher ────────►│ │ │ │
│ │ │ items.create │ │ │
│ │ │──────────────►│ status=new │ │
│ │ │ + subtasks │ │ │
│ │ move to processed/ │ │ │ │
│ │◄────────────────────│ │ │ │
│ │ │ │ │ │
│ │ │ │ ◄── scan ───────│ (every 2 min) │
│ │ │ │ new + assignee │ │
│ │ │ │ ── update ──────►│ │
│ │ │ │ status=assigned │ │
│ │ │ │ │ task-dispatch.json │
│ │ │ │ │─────────────────────►│
│ │ │ │ │ │
│ │ │ │ │ WS RPC: agent() │
│ │ │ │ │──► Gateway ──► Agent │
│ │ │ │ │ (parallel start) │
│ │ │ │ │ │
│ │ │ │ │ (agent works) │
│ │ │ │ │ │
│ │ │ │ ◄── scan (completion detector, 2 min) │
│ │ │ │ in_progress + │ │
│ │ │ │ all subtasks done│ │
│ │ │ │ → agent_done │ │| Component | File | Trigger |
|---|---|---|
| Task creator | otm-create-task.sh |
Called by agents |
| Injector | otm-injector.js |
Called by watcher/sweeper |
| Watcher | ai.openclaw.otm-watcher.plist |
WatchPaths on new-tasks/ + task-updates/ |
| Sweeper | ai.openclaw.otm-sweeper.plist |
Every 10 min (catches misses) |
| Dispatcher | otm-dispatcher.js |
Every 2 min (launchd) |
| Completion detector | otm-completion-detector.js |
Every 2 min (launchd) |
The Error Monitor is a dedicated OTM component that detects state inconsistencies, traces errors for lessons learned, and triggers corrective actions. It runs as part of the Watchdog cycle (IE-08, every 60s) but is logically separate from the Watchdog's operational checks (OTM-6).
Each error condition is coded ERR-xx. The Error Monitor detects; the OTM corrects.
| Code | Condition | Detection Rule | Severity | Auto-correction | Manual escalation |
|---|---|---|---|---|---|
| ERR-01 | Stale In Progress | Task status = "In Progress" AND no IE-02 subtask report in >10 minutes | Warning | None — alert only | OE-05: Admin notified. May indicate agent crash, stuck task, or slow work. |
| ERR-02 | Agent-Task Mismatch (busy agent, no task) | Agent status = busy AND current_task not found in Slack List (or task status ≠ In Progress) |
Critical | Set agent idle (AT-04), call check_next_task_for_agent() |
OE-05: Admin alert with details of orphaned agent state |
| ERR-03 | Agent-Task Mismatch (idle agent, active task) | Task status = "In Progress" AND assigned agent status = idle in registry |
Critical | Set agent busy, re-send SURE notification (OE-01) | OE-05: Admin alert — state was inconsistent |
| ERR-04 | Orphaned Pending Task | Task status = "Pending" AND assigned agent status = idle |
High | Promote task: TT-05 → re-evaluate via OTM-2 | OE-07: Audit log on task |
| ERR-05 | Counter Mismatch | subtasks_remaining ≠ actual count of unchecked subtasks in Slack List |
High | Recalculate and fix counter via SW-1 | OE-05: Admin alert with old/new values |
| ERR-06 | SURE Timeout | OE-01 sent >3 minutes ago (1min + 2min retries) AND no IE-03 received | Critical | None — task stays In Progress | OE-05: Admin alert. Agent may be unreachable, or gateway may have restarted. |
| ERR-07 | Multiple Active Tasks per Agent | Agent has >1 task with status = "In Progress" assigned to them | Critical | Keep oldest task, move others to Pending | OE-05: Admin alert — invariant violation |
| ERR-08 | Stuck in Assigned | Task status = "Assigned" for >5 minutes (should transition immediately to In Progress or Pending) | High | Re-trigger OTM-2 for the task | OE-05: Admin alert if re-trigger fails |
| ERR-09 | Stuck in Rejected | Task status = "Rejected" for >24 hours (Orchestrator hasn't submitted rework or cancelled) | Warning | None — alert only | OE-05: Remind Orchestrator to act |
| ERR-10 | Ghost Agent | assigned_to field references a Slack user ID not in agent registry AND not auto-registerable |
Critical | Task stays in current status | OE-05: Admin alert — unknown agent |
| ERR-11 | Duplicate Subtask Reports | Same subtask_id reported done >1 time (idempotency check caught it) | Info | Silently discarded (idempotent) | Logged in event log (§9) for pattern analysis |
| ERR-12 | Stale Rework (no ack subtask closed) | Task in "In Progress" after TT-08 rework, first subtask ("Acknowledge rework…") not closed within 10 minutes | Warning | None — alert only | OE-05: Agent may not have read rework instructions |
OTM-7 runs every 60s (piggybacks on IE-08 watchdog cron):
|
FOR EACH error check ERR-01 through ERR-12:
|
+-- Run detection query (SQLite + Slack API as needed)
|
+-- IF condition detected:
| 1. Log error to event_log table (§9): {error_code, task_id, agent_id, details, timestamp}
| 2. Log to task conversation (OE-07): "[timestamp] OTM-7 ERROR: ERR-xx detected — <description>"
| 3. IF auto-correctable: execute correction, log correction action
| 4. IF manual escalation: send OE-05 alert to admin
| 5. Increment error counter in error_stats table (§10)
|
+-- IF condition NOT detected: skipErrors are not just fixed — they feed a continuous improvement loop.
- Error Statistics Table (
error_statsin error DB, see §10): Tracks frequency, first/last occurrence, auto-correction success rate per ERR-xx code. - Daily Error Report: OTM-7 generates a summary during the first watchdog cycle after 00:00 each day:
- Error counts by code (ERR-01 through ERR-12)
- Most frequent errors
- Auto-correction success/failure ratio
- New error patterns (first-time occurrences)
- Threshold Alerts: Alert on EVERY error (threshold = 1). During startup and early operation, all anomalies are surfaced immediately via OE-05. The threshold can be raised once the system is stable and baseline error rates are understood.
- Root Cause Tagging: Admin can tag errors with root cause via OTM API (
POST /api/otm/error/{id}/tag), enabling aggregate analysis.
| Action | Triggered by | Effect |
|---|---|---|
| Re-send SURE notification | ERR-03 (idle agent, active task) | Resynchronise agent with its task |
| Promote pending task | ERR-04 (orphaned pending) | Unblock queued work |
| Recalculate counter | ERR-05 (counter mismatch) | Fix data integrity |
| Free orphaned agent | ERR-02 (busy agent, no task) | Unblock agent for new work |
| Re-trigger OTM-2 | ERR-08 (stuck in Assigned) | Retry the assignment flow |
| Move excess tasks to Pending | ERR-07 (multiple active tasks) | Restore single-task invariant |
All OTM events are logged to three complementary systems:
- Slack task conversation (OE-07) — human-readable, per-task, searchable in Slack (§7)
- Internal event log (
event_logtable in error DB) — structured, queryable, machine-readable - Internal event log files — filesystem mirrors of database writes, for real-time tailing during tests and startup (§9.6)
CREATE TABLE event_log (
id INTEGER PRIMARY KEY AUTOINCREMENT,
timestamp INTEGER NOT NULL, -- Unix timestamp (ms precision)
event_type TEXT NOT NULL, -- 'inbound' | 'outbound' | 'transition' | 'error' | 'correction' | 'system'
event_code TEXT NOT NULL, -- IE-xx, OE-xx, TT-xx, AT-xx, ERR-xx
task_id TEXT, -- Slack List item ID (NULL for system events)
agent_id TEXT, -- slack_user_id (NULL if not agent-related)
handler TEXT, -- OTM-0 through OTM-7, SE-1, SW-1
source TEXT NOT NULL, -- 'se1', 'otm_api', 'watchdog', 'error_monitor', 'internal'
detail TEXT, -- JSON blob with event-specific data
duration_ms INTEGER, -- Processing time for this event
success INTEGER DEFAULT 1, -- 1 = success, 0 = failure
error_message TEXT -- Error details if success = 0
);
CREATE INDEX idx_event_log_task ON event_log(task_id);
CREATE INDEX idx_event_log_agent ON event_log(agent_id);
CREATE INDEX idx_event_log_type ON event_log(event_type, timestamp);
CREATE INDEX idx_event_log_code ON event_log(event_code, timestamp);
CREATE INDEX idx_event_log_time ON event_log(timestamp);Every IE-xx, OE-xx, TT-xx, AT-xx, and ERR-xx event produces one row in event_log. This includes:
| Event Category | Examples | Logged Fields |
|---|---|---|
| Inbound events | IE-01 (Slack event), IE-02 (subtask report), IE-03 (SURE ack) | source, task_id, agent_id, raw payload in detail |
| Outbound events | OE-01 (SURE notification), OE-06 (Slack write) | target, task_id, delivery status, duration_ms |
| Task transitions | TT-01 through TT-15 | from_status, to_status, task_id, requesting_actor |
| Agent transitions | AT-01 through AT-04 | from_status, to_status, agent_id, triggering_task |
| Errors | ERR-01 through ERR-12 | error_code, detection_details, correction_applied |
| System events | Startup, reconciliation, watchdog cycle | cycle_number, checks_run, anomalies_found |
The event log enables:
-- Task lifecycle: all events for a specific task
SELECT * FROM event_log WHERE task_id = ? ORDER BY timestamp;
-- Agent activity: all events involving a specific agent
SELECT * FROM event_log WHERE agent_id = ? ORDER BY timestamp;
-- Error frequency: last 24 hours
SELECT event_code, COUNT(*) as count
FROM event_log
WHERE event_type = 'error' AND timestamp > ?
GROUP BY event_code ORDER BY count DESC;
-- Average task completion time
SELECT AVG(e2.timestamp - e1.timestamp) / 1000 / 60 as avg_minutes
FROM event_log e1
JOIN event_log e2 ON e1.task_id = e2.task_id
WHERE e1.event_code = 'TT-02' AND e2.event_code = 'TT-04';
-- Slowest handlers (performance monitoring)
SELECT handler, AVG(duration_ms), MAX(duration_ms), COUNT(*)
FROM event_log
WHERE duration_ms IS NOT NULL
GROUP BY handler ORDER BY AVG(duration_ms) DESC;
-- SURE acknowledgement response times
SELECT AVG(ack.timestamp - notif.timestamp) / 1000 as avg_seconds
FROM event_log notif
JOIN event_log ack ON notif.task_id = ack.task_id
WHERE notif.event_code = 'OE-01' AND ack.event_code = 'IE-03';| Data | Retention | Archive strategy |
|---|---|---|
event_log (active) |
30 days | Rows older than 30 days → event_log_archive |
event_log_archive |
1 year | Monthly SQLite dump to filesystem (gzipped) |
error_stats |
Indefinite | Cumulative counters, never purged |
| Slack conversation audit | Indefinite | Lives in Slack (Slack's retention policy applies) |
Maintenance job (runs during OTM-6 watchdog, daily at 03:00):
-- Move old events to archive
INSERT INTO event_log_archive SELECT * FROM event_log WHERE timestamp < (now - 30 days);
DELETE FROM event_log WHERE timestamp < (now - 30 days);
-- Vacuum to reclaim space
VACUUM;Internal structured logging (SQLite) is chosen over Sentry because:
- No external dependency — OTM is self-contained
- Queryable — SQL enables arbitrary analysis (Sentry requires its query language)
- Correlated with task data — same DB, JOIN-able with agent registry
- Low volume — estimated <10,000 events/day (see §10), no need for distributed tracing
- Cost — zero (SQLite is free; Sentry has per-event pricing)
- Privacy — all data stays on the OpenClaw server
If event volume exceeds 100,000/day or distributed tracing across multiple servers becomes needed, Sentry or OpenTelemetry would be reconsidered.
All database writes to event_log and error_stats are mirrored to two filesystem log files in real-time:
| File | Content | Format | Purpose |
|---|---|---|---|
{OPENCLAW_DATA_DIR}/otm/logs/otm-events.log |
All event_log inserts | [ISO-timestamp] [event_code] [handler] [task_id] [agent_id] detail_json |
Monitor all OTM activity via tail -f |
{OPENCLAW_DATA_DIR}/otm/logs/otm-errors.log |
All ERR-xx detections + corrections | [ISO-timestamp] [ERR-xx] [severity] [task_id] [agent_id] description [correction: action/none] |
Monitor errors during tests and startup |
Implementation: Every INSERT INTO event_log and every error detection in OTM-7 appends one line to the corresponding log file. This is a synchronous append (negligible overhead at <350 events/day).
Log rotation: Daily at 03:00, rename to otm-events.log.YYYY-MM-DD and otm-errors.log.YYYY-MM-DD. Keep 30 days of rotated files. Older files deleted automatically.
Usage during development/testing:
# Watch all OTM events in real time
tail -f {OPENCLAW_DATA_DIR}/otm/logs/otm-events.log
# Watch errors only
tail -f {OPENCLAW_DATA_DIR}/otm/logs/otm-errors.log
# Filter for a specific task
tail -f otm-events.log | grep "T-00042"
# Filter for a specific error code
tail -f otm-errors.log | grep "ERR-03"📌 The log files are append-only mirrors — the database remains the source of truth for queries and analysis. The files exist purely for human monitoring convenience.
The OTM uses two separate SQLite databases:
- Main OTM DB (
otm.db) — Agent registry, SURE pending, task history. Core operational state. - Error & Event DB (
otm-errors.db) — Event log, error statistics. Monitoring and observability. Separated so that error monitoring is independent from the main OTM application and can be analysed, reset, or rebuilt without affecting operations.
Database files:
{OPENCLAW_DATA_DIR}/otm/otm.db— Main OTM DB{OPENCLAW_DATA_DIR}/otm/otm-errors.db— Error & Event DB- (Sylvain to confirm exact
OPENCLAW_DATA_DIRpath)
Library: better-sqlite3 (synchronous, fast, WAL mode for both)
Main OTM DB (otm.db):
| Table | Purpose | Writer(s) | Reader(s) | Rows (steady state) | Growth rate |
|---|---|---|---|---|---|
agents |
Agent registry (§4.5) | OTM-1 | OTM-0, OTM-2, OTM-3, OTM-5, OTM-6 | 5–15 | Near-zero (new agents rare) |
task_history |
Snapshot of archived tasks (§Flow 5) | OTM-6 (TT-12) | Admin queries, reporting | Growing | ~20–50/month |
sure_pending |
Outstanding SURE notifications awaiting ack (§6) | OTM-2, OTM ack handler | OTM-7 (timeout check) | 0–5 | Transient (cleared on ack/timeout) |
Error & Event DB (otm-errors.db):
| Table | Purpose | Writer(s) | Reader(s) | Rows (steady state) | Growth rate |
|---|---|---|---|---|---|
event_log |
Structured event log (§9.1) | All OTM handlers | OTM-7, admin queries | ~10,000 | ~300/day (purged monthly) |
event_log_archive |
Archived events >30 days (§9.4) | Maintenance job | Admin queries only | ~100,000 | ~9,000/month |
error_stats |
Error frequency counters (§8.3) | OTM-7 | OTM-7, daily report | 12 rows (one per ERR-xx) | Fixed |
Main OTM DB (otm.db):
-- Agent Registry (see §4.5 for column details)
CREATE TABLE agents (
slack_user_id TEXT PRIMARY KEY,
otm_display_name TEXT NOT NULL,
openclaw_agent_id TEXT,
agent_type TEXT NOT NULL DEFAULT 'ai',
status TEXT DEFAULT 'idle',
current_task TEXT,
task_started_at INTEGER,
last_seen INTEGER
);
-- Task History (archived tasks snapshot)
CREATE TABLE task_history (
id INTEGER PRIMARY KEY AUTOINCREMENT,
task_id TEXT NOT NULL, -- Original Slack List item ID
title TEXT NOT NULL,
assigned_to TEXT, -- slack_user_id
final_status TEXT NOT NULL, -- "Archived" (from Done or Cancelled)
previous_status TEXT, -- Status before archival
priority INTEGER,
context TEXT,
subtask_count INTEGER, -- Total subtasks at archival time
created_at INTEGER, -- Task creation timestamp
assigned_at INTEGER,
completed_at INTEGER,
validated_at INTEGER,
cancelled_at INTEGER,
archived_at INTEGER NOT NULL, -- When TT-12 executed
result_summary TEXT,
total_duration_ms INTEGER, -- assigned_at → completed_at
review_duration_ms INTEGER, -- completed_at → validated_at
rework_count INTEGER DEFAULT 0, -- Number of TT-08 rework cycles
error_count INTEGER DEFAULT 0 -- Number of ERR-xx events during lifecycle
);
CREATE INDEX idx_task_history_agent ON task_history(assigned_to);
CREATE INDEX idx_task_history_status ON task_history(final_status);
CREATE INDEX idx_task_history_archived ON task_history(archived_at);
-- SURE Pending Notifications (see §6)
CREATE TABLE sure_pending (
task_id TEXT PRIMARY KEY,
agent_id TEXT NOT NULL,
notification_type TEXT NOT NULL, -- 'task_assigned' | 'rework_assigned'
sent_at INTEGER NOT NULL, -- First OE-01 sent
retry_count INTEGER DEFAULT 0, -- 0, 1, 2, 3 (max)
last_retry_at INTEGER,
acknowledged_at INTEGER -- Set when IE-03 received. NULL = still pending.
);Error & Event DB (otm-errors.db):
-- Event Log (see §9.1)
CREATE TABLE event_log (
id INTEGER PRIMARY KEY AUTOINCREMENT,
timestamp INTEGER NOT NULL,
event_type TEXT NOT NULL,
event_code TEXT NOT NULL,
task_id TEXT,
agent_id TEXT,
handler TEXT,
source TEXT NOT NULL,
detail TEXT,
duration_ms INTEGER,
success INTEGER DEFAULT 1,
error_message TEXT
);
CREATE INDEX idx_event_log_task ON event_log(task_id);
CREATE INDEX idx_event_log_agent ON event_log(agent_id);
CREATE INDEX idx_event_log_type ON event_log(event_type, timestamp);
CREATE INDEX idx_event_log_code ON event_log(event_code, timestamp);
CREATE INDEX idx_event_log_time ON event_log(timestamp);
-- Event Log Archive (identical schema)
CREATE TABLE event_log_archive (
id INTEGER PRIMARY KEY,
timestamp INTEGER NOT NULL,
event_type TEXT NOT NULL,
event_code TEXT NOT NULL,
task_id TEXT,
agent_id TEXT,
handler TEXT,
source TEXT NOT NULL,
detail TEXT,
duration_ms INTEGER,
success INTEGER DEFAULT 1,
error_message TEXT
);
-- Error Statistics (see §8.3)
CREATE TABLE error_stats (
error_code TEXT PRIMARY KEY, -- ERR-01 through ERR-12
total_count INTEGER DEFAULT 0,
last_24h_count INTEGER DEFAULT 0, -- Reset daily by maintenance job
first_seen INTEGER, -- Unix timestamp
last_seen INTEGER, -- Unix timestamp
auto_corrected_count INTEGER DEFAULT 0,
escalated_count INTEGER DEFAULT 0
);| Operation | Frequency | Triggered by | Description |
|---|---|---|---|
| Event log rotation | Daily (03:00) | OTM-6 watchdog + time check | Move events >30 days to event_log_archive |
| Archive export | Monthly (1st, 03:30) | OTM-6 watchdog + date check | Dump event_log_archive to gzipped SQL file on disk, then TRUNCATE |
| Error stats reset | Daily (00:00) | OTM-6 watchdog + time check | Reset last_24h_count to 0 for all ERR-xx rows |
| SURE cleanup | Every 60s | OTM-7 error monitor | Remove sure_pending rows where acknowledged_at IS NOT NULL and >1 hour old |
| VACUUM | Weekly (Sunday 03:00) | OTM-6 watchdog + day check | Reclaim disk space after deletions |
| WAL checkpoint | Automatic | SQLite WAL mode | Handled by better-sqlite3 automatically |
| Backup | Daily (04:00) | Sylvain's backup cron | Copy otm.db to backup location (standard server backup) |
Assumptions: 5 active agents, ~10 tasks created/day, ~3 subtasks/task average, watchdog runs 1,440×/day.
Main OTM DB (otm.db):
| Table | Writes/day | Reads/day | Steady-state rows | Disk (est.) |
|---|---|---|---|---|
agents |
~10 (status flips) | ~500 (every handler checks registry) | 5–15 | <1 KB |
task_history |
~1–2 (archival events) | ~5 (reporting queries) | ~500/year | ~500 KB |
sure_pending |
~20 (insert + update on ack) | ~1,440 (timeout checks) | 0–5 (transient) | <1 KB |
| Subtotal | ~30/day | ~1,950/day | ~520 | <1 MB |
Error & Event DB (otm-errors.db):
| Table | Writes/day | Reads/day | Steady-state rows | Disk (est.) |
|---|---|---|---|---|
event_log |
~300 (all events) | ~50 (error monitor + queries) | ~9,000 (30-day window) | ~5 MB |
event_log_archive |
~9,000/month (from rotation) | ~5/month (admin queries) | ~100,000 (1-year window) | ~50 MB |
error_stats |
~20 (counter increments) | ~1,440 (every watchdog cycle) | 12 | <1 KB |
| Subtotal | ~320/day | ~1,500/day | ~109,000 | ~55 MB |
📌 At this scale, SQLite is well within its performance envelope for both databases. The separation means the error DB can be independently analysed, reset, or rebuilt without affecting OTM operations. A weekly VACUUM on each keeps files compact.
Main OTM DB (otm.db)
├── agents — live state, small, never archived
├── sure_pending — transient, cleaned hourly
└── task_history — growing archive of completed tasks
Error & Event DB (otm-errors.db)
├── event_log — rolling 30-day window
├── event_log_archive — rolling 1-year window
└── error_stats — cumulative counters, never purged
Filesystem log files (otm/logs/)
├── otm-events.log — real-time event mirror (rotated daily, 30-day keep)
└── otm-errors.log — real-time error mirror (rotated daily, 30-day keep)
Monthly export (filesystem)
└── {OPENCLAW_DATA_DIR}/otm/archive/
├── events-2026-01.sql.gz — monthly event log dump from otm-errors.db
├── events-2026-02.sql.gz
└── ...
Annual report (generated)
└── Aggregate stats from task_history (otm.db) + error_stats (otm-errors.db)
→ Feeds into CMMI metrics collection- All OTM handlers MUST be idempotent
- Subtask completion: check
todo_completed(Col00) field before processing (no separate dedup table) - State transitions: verify
previous_statusmatches expected before applying - File pipeline dedup: injector checks existing Slack items by Task ID before creating
previous_statusMUST be set before every status change- All status writes go through OTM → SW-1. No direct Slack writes by any actor.
- Watchdog requests transitions via OTM handler calls, not direct writes
todo_completed(Col00) MUST be set alongside status when marking items done
- Every OTM event logged in Slack task conversation feed with timestamp (§7)
- Slack conversation IS the audit log — no separate log table
- All log entries include event code (IE-xx, OE-xx, TT-xx, AT-xx) for traceability
- Agent registry in SQLite
- SQLite DB location: OpenClaw server data directory
- Startup reconciliation from openclaw.json + Slack List on OTM restart (§4.6)
- Event handling MUST complete within 5 seconds
- Slack API writes SHOULD complete within 5 seconds
- Agent notifications SHOULD be sent within 10 seconds
- SURE acknowledgement timeout: 1 min (first), 2 min (retry), then error (3 min total)
- Dispatcher runs every 2 min — maximum 2 min delay from task creation to agent trigger
- Completion detector runs every 2 min — maximum 2 min delay from last subtask to
agent_done
- Slack API calls: retry up to 3 times with exponential backoff
- Unhandled errors: alert admin via OE-05
- Tasks MUST NOT be silently lost
- All errors logged in task conversation feed (OE-07)
- Dispatcher crash: Slack
assignedstatus is the sole dedup gate (§System 3) - Gateway
idempotencyKeydoes NOT provide real idempotency — dispatcher owns dedup
- OTM-4 (validate/reject) restricted to registered reviewer agents
- All Slack API calls authenticated via bot tokens
- DMZ relay uses bearer token + constant-time comparison (§System 6)
- Receiver bound to
127.0.0.1only
| Component | Technology | Owner (who runs it) | Activity frequency | Data volume |
|---|---|---|---|---|
| OTM backend | TypeScript/Node.js, OpenClaw plugin pipeline | Devdas (builds), Sylvain (deploys) | Continuous — handles all events | ~350 events/day processed |
| File pipeline | Bash scripts + Node.js (injector, dispatcher, detector) | Devdas (builds), Sylvain (deploys) | launchd: watcher (FSEvents), sweeper (10 min), dispatcher (2 min), detector (2 min) | ~10 tasks/day through pipeline |
| SQLite DB | better-sqlite3, WAL mode | OTM (sole writer), Sylvain (backups) | ~350 writes/day, ~3,500 reads/day | ~55 MB steady state (see §10.5) |
| SE-1 (event listener) | Slack Events API, socket mode, Bolt SDK | Salvatore's Slack app (lists:read) |
~50 events/day (Slack → OTM) | <1 KB/event payload |
| SW-1 (writer) | Slack Web API (lists:write) |
Salvatore's Slack app (called by OTM) | ~200 API calls/day (field updates + audit posts) | <1 KB/call |
| OTM API | OpenClaw hooks / HTTP endpoints | OTM (receives), Orchestrator + Agents (call) | ~100 API calls/day | <1 KB/call |
| Agent notifications | OpenClaw Gateway WS RPC (AI) / Slack task conversation (Human) | OTM (sends), Agents (receive) | ~20 notifications/day | <1 KB/notification |
| Watchdog + Error Monitor | OpenClaw cron (60s interval) | OTM-6 + OTM-7 (automatic) | 1,440 cycles/day | ~20 error checks/cycle |
| Event logging | SQLite event_log table (see §9) |
OTM (writes), Admin (queries) | ~300 events/day, 30-day active window | ~5 MB active, ~50 MB archive |
| DMZ relay | Node.js receiver + broadcaster on Synology NAS | Sylvain (deploys) | On every state change | <1 KB/push |
| Testing | Vitest, mock Slack API | Devdas (writes + runs) | CI on every PR | — |
┌─────────────────────────────────────────────────────┐
│ Slack (Salvatore's Slack App) │
│ SE-1: lists:read (event listener) │
│ SW-1: lists:write (field updates + audit posts) │
└──────────────┬──────────────────────────┬───────────┘
│ IE-01 ▲ OE-06, OE-07
▼ │
┌─────────────────────────────────────────────────────┐
│ OTM (OpenClaw Plugin Pipeline) │
│ OTM-0: Event Router (internal) │
│ OTM-1: Agent Registry (internal) │
│ OTM-2: Handle Task Assigned │
│ OTM-3: Handle Subtask Done │
│ OTM-4: Task Validate/Reject │
│ OTM-5: Handle Task Cancelled │
│ OTM-6: Watchdog (cron, 60s) │
│ OTM-7: Error Monitor (cron, 60s) │
│ │
│ SQLite DB: agents, event_log, error_stats, │
│ task_history, sure_pending │
└──────────────┬──────────────────────────┬───────────┘
│ OE-01, OE-02, OE-04 ▲ IE-02, IE-03, IE-04–IE-07
▼ │
┌─────────────────────────────────────────────────────┐
│ OpenClaw Agents │
│ Orchestrator (Claudia): IE-04, IE-05, IE-06, IE-07│
│ Agents (Devdas, etc.): IE-02, IE-03 │
│ Human (Rupert): via Slack UI → SE-1 → IE-01 │
└─────────────────────────────────────────────────────┘
File Pipeline (Part 3):
┌─────────────────────────────────────────────────────┐
│ otm-create-task.sh → new-tasks/ → otm-injector.js │
│ → Slack List (status=new) │
│ → otm-dispatcher.js → task-dispatch.json │
│ → Gateway WS RPC → Agent sessions (parallel) │
│ → otm-update-task.sh → task-updates/ │
│ → otm-completion-detector.js → agent_done │
└─────────────────────────────────────────────────────┘| Component | Cost | Notes |
|---|---|---|
| Slack Pro | 1 user license | Already paid. Gives API access (SE-1 + SW-1). No per-API-call cost. |
| SQLite | $0 | Open source, embedded. No server, no license. |
| Node.js / TypeScript | $0 | Open source runtime. |
| Bolt SDK | $0 | Open source Slack SDK. |
| better-sqlite3 | $0 | Open source library. |
| OpenClaw | $0 (incremental) | OTM runs as a plugin inside the existing gateway. No additional instance. |
| Filesystem logging | $0 | Append to local files. |
| DMZ relay | $0 (incremental) | Runs on existing Synology NAS. |
| Total OTM cost | $0 incremental | Only pre-existing Slack Pro license required. |
The OTM uses zero AI. It is a deterministic state machine implemented in TypeScript. No LLM calls, no embeddings, no inference. Every decision is rule-based:
- Routing: field comparison (OTM-0)
- Agent availability: SQLite lookup (OTM-1)
- State transitions: precondition checks + status writes (OTM-2 through OTM-5)
- Error detection: SQL queries against known patterns (OTM-7)
- Watchdog: timer + threshold checks (OTM-6)
- File pipeline: filesystem watches + Slack API calls (Systems 1–4)
Token consumption by OTM: 0 tokens.
The actors that interact with the OTM do consume AI tokens, but this is outside the OTM's scope:
| Actor | AI usage | OTM interaction cost |
|---|---|---|
| Orchestrator (Claudia) | LLM calls for task planning, review, rework design | OTM API calls = HTTP requests, ~0 tokens |
| Agents (Devdas, etc.) | LLM calls for task execution | OTM API calls (IE-02, IE-03) = HTTP requests, ~0 tokens |
| Human (Rupert) | None (uses Slack UI) | Slack events = Slack infrastructure, ~0 tokens |
📌 The OTM API calls (IE-02 through IE-07) are simple HTTP POST requests with JSON payloads. They consume zero AI tokens. The only AI costs are generated by the agents and orchestrator doing their actual work — which they would do regardless of whether the OTM exists.
OTM operation cost: $0/month (zero AI, zero external services)
Slack API cost: $0/month (included in existing Pro plan)
Infrastructure cost: $0/month (runs on existing OpenClaw server + Synology NAS)
──────────────────────────────────────────────────────
Total incremental cost: $0/month| # | Deliverable | Owner | Description |
|---|---|---|---|
| D-01 | OTM-SPEC (this document) | Claudia | Specification and architecture |
| D-02 | OTM-TESTS | Claudia | Test scenarios document |
| D-03 | OTM implementation | Devdas | TypeScript plugin pipeline (OTM-0 through OTM-7, SE-1, SW-1) |
| D-04 | SQLite schemas + migrations | Devdas | otm.db and otm-errors.db setup |
| D-05 | Unit + integration tests | Devdas | Vitest test suite matching OTM-TESTS scenarios |
| D-06 | Infrastructure setup | Sylvain | DB paths, cron config, backup setup, log rotation |
| D-07 | task-orchestration skill |
Claudia | OpenClaw skill for Claudia's Orchestrator role: task creation, project → step → task decomposition, assignment logic, validation/rejection, rework subtask design. This skill encodes the Orchestrator's side of the OTM protocol. |
| D-08 | Slack app config | Salvatore | lists:read + lists:write scopes, socket mode setup |
| D-09 | End-to-end validation | Claudia + Devdas | Full test suite execution on real Slack workspace |
| D-10 | File pipeline scripts | Devdas | otm-create-task.sh, otm-update-task.sh, otm-injector.js, otm-dispatcher.js, otm-completion-detector.js |
| D-11 | DMZ relay deployment | Sylvain | receiver.js + broadcaster.js on Synology NAS, TLS proxy setup |
| D-12 | launchd plists | Sylvain | Watcher, sweeper, dispatcher, completion detector plists |
📌 D-07 (task-orchestration skill) will include Rupert's higher-level instructions on how to break down projects into steps and steps into tasks. It is part of the scope of the full-blown validation tests (D-09).
| # | Question | Status |
|---|---|---|
| 1 | Exact list_item_updated event payload schema |
Needs Salvatore to capture sample events |
| 2 | Can socket mode receive list events on Pro? | Needs verification (may need Events API HTTP mode) |
| 3 | Plugin pipeline registration mechanism in OpenClaw | Needs Devdas to investigate |
| 4 | SQLite file location on OpenClaw server | Sylvain to decide |
| 5 | How agents tick subtasks in practice | ✅ Resolved in v1.3: Agents call OTM API (IE-02), OTM updates Slack via SW-1. No direct Slack UI interaction. |
| 6 | Slack conversation API for List items — does it exist? | Needs Salvatore to verify (may need workaround) |
| 7 | SURE ack timeout values and gateway restart handling | ✅ Resolved in v1.5: Timeouts revised to 1min + 2min + error (3min total). Gateway restart handling defined in §4.7: OTM reconciles from Slack List + openclaw.json on startup, detects SURE timeouts, auto-corrects agent-task mismatches. Gateway restart logged as system event (IE-SYS-01). Orchestrator does not need to re-register agents. |
| 8 | OpenClaw agent → Slack user ID mapping in openclaw.json | Needs Sylvain to confirm config structure |
| 9 | Human user registration protocol | Deferred. v1 hard-codes Rupert + Claudia (§4.6). Future: how are new human users registered? Auto-detect from Slack assigned_to? Manual admin command? Re-registration after OTM restart? What about clients? |
| 10 | Slack archive API — can items be archived programmatically? | ✅ Resolved in v1.6: slackLists.items.archive does not exist. Archiving is manual-only via Slack UI. OTM sets status = archived but cannot trigger visual Slack archival. |
| 11 | Gateway idempotencyKey — does it prevent duplicate sessions? |
✅ Resolved in v1.8: No. The key is used as a runId label only. Duplicate calls with the same key = duplicate sessions. Dispatcher must use the Slack new → assigned status flip as the sole dedup mechanism. |
| Item | ID/Flag | Notes |
|---|---|---|
| Project (old select column) | Col0AL4UJ8BJ8 |
Replaced by Project 2 text column (Col0ALZBS9C8Z) — 2026-03-14 |
| Types: implementation, research, etc. | — | Removed — only action / decision / review are active |
--description parameter |
— | Removed from otm-create-task.sh — use --subtask instead |
--creator parameter |
— | Removed — merged with --agent |
--assignedTo parameter |
— | Removed — merged with --agent |
| TT-09 (Rejected → New) | — | Removed in v1.3 — use TT-10 (cancel) + new task instead |
End of Specification — v1.8 — OpenClaw Task Manager
Last updated: 2026-03-14
3.2 Task fields
might be interesting to have a last status field heading to the User but useful when the OTM fails and changes the status to fail.
For failure analysis, we need to know what status the task was in before that
Throughout the document when you say “Claudia” you should actually be saying “the orchestrator”
Transition T-12 : watchdog tells OTM to change the state
3.3 : ALL transitions requested by Claudia/the orchestrated should go through the OTM. The OTM state transition management system is the only one near the system which is ability ability to change task status And related fields.
We also Need to state machine and state transition table for the agent which can be idle busy et cetera.
So we have two State diagrams one task and one for agent. We have two sets of transition actions, and two sets of triggering events.
Rename and recur all of these with significant prefixes update to 1.2 version of this document and come back for me to review.