RupertBarrow/OTM-SPEC-v1.5.md

OTM-SPEC v1.8 — OpenClaw Task Manager Specification

Based on: Rupert's OTM Spec v1.0 (2026-03-12) Updated by: Claudia (2026-03-12–14) Merged by: Claudia (2026-03-14) — consolidates OTM-SPEC v1.5 + OTM-FIELD-MAPPING v1.8

Changelog

Note: v1.0–v1.5 changes were tracked in OTM-SPEC. v1.0–v1.8 field mapping changes were tracked in a separate OTM-FIELD-MAPPING document. Both changelogs are merged here.

Version	Date	Source	Changes
v1.0	2026-03-12	OTM-SPEC	Initial specification: state machine, SURE protocol, SE-1/SW-1, OTM-0 through OTM-6, actor registry, audit trail
v1.0	2026-03-14	OTM-FIELD-MAPPING	Initial field mapping: script params, JSON format, Slack field mapping
v1.1	2026-03-12	OTM-SPEC	Added rework flow (TT-08), Rejected state, Orchestrator manages subtasks
v1.1	2026-03-14	OTM-FIELD-MAPPING	Added subtask support, task ID system (T-NNNNN)
v1.2	2026-03-12	OTM-SPEC	Added Watchdog (OTM-6), archival (TT-12), Pending state (TT-03/TT-05)
v1.2	2026-03-14	OTM-FIELD-MAPPING	Added pipeline flow diagram, directories
v1.3	2026-03-12	OTM-SPEC	Removed TT-09, added Priority Scale, human actor type. Resolved Q5: agents call OTM API for subtask completion (not Slack directly).
v1.3	2026-03-14	OTM-FIELD-MAPPING	Added completion detection (System 4)
v1.4	2026-03-12	OTM-SPEC	Added Error Monitor (OTM-7), error catalogue (ERR-01 through ERR-12), dual-DB design
v1.4	2026-03-14	OTM-FIELD-MAPPING	Added component heartbeats (System 5)
v1.5	2026-03-12	OTM-SPEC	Human notifications via Slack task conversation (not DM). Error reports daily, threshold=1. Error monitoring in separate `otm-errors.db`. Only Orchestrator creates/deletes tasks. `subtasks_remaining` = count of unfinished subtasks. Added §12 Cost Analysis. SURE timeouts: 1+2min. Gateway restart detection (§4.7). Log file mirroring. `task-orchestration` skill added as deliverable.
v1.5	2026-03-14	OTM-FIELD-MAPPING	Default confirmation subtask; deprecated fields cleanup
v1.6	2026-03-14	OTM-FIELD-MAPPING	Added DMZ relay architecture (System 6), `todo_completed` (`Col00`) documentation, project label fixes
v1.7	2026-03-14	OTM-FIELD-MAPPING	Added task dispatcher (System 3), task updates (System 3b), full architecture diagram, completion metrics, status lifecycle
v1.8	2026-03-14	OTM-FIELD-MAPPING	Dispatcher triggers agents via Gateway WebSocket RPC (parallel, non-blocking); crash-safe operation order (file → Slack → RPC); `idempotencyKey` is NOT idempotent (design flaw documented); `todo_completed` checkbox (`Col00`) documentation; injector idempotency (dedup check); Slack archive API limitation noted
v1.8	2026-03-14	MERGED	Consolidated OTM-SPEC v1.5 + OTM-FIELD-MAPPING v1.8 into single unified specification. Task ID format standardized to `T-NNNNN` (v1.8 implementation). Added Part 3 (File-Based Pipeline). All Slack column IDs included.

1. Purpose & Scope

The OpenClaw Task Manager (OTM) orchestrates the execution of tasks by AI and human agents. It uses Slack Lists as the task board, the Slack Events API as the event bus for human-originated changes, and the OTM API as the interface for AI actors.

The system is split into three cooperating layers:

Slack Event Layer (SE-1) — Slack Events API (list_item_updated) detected by our Slack app in socket mode. Zero intelligence. Forwards raw events to a single OTM entry point. Requires only lists:read scope.
Slack Write Layer (SW-1) — Handles all writes to Slack: task field updates and conversation feed audit entries. Requires lists:write scope. Called only by the OTM.
OpenClaw Task Manager (OTM) — Authoritative backend implemented as an OpenClaw plugin pipeline. Owns all state, routing logic, agent registry, and business rules. Persists to SQLite. Sole component with the ability to change task status, agent status, and counter fields.

📌 Design principles:

ALL behaviour lives in the OTM. No business logic in SE-1 or SW-1.
The OTM is the sole writer of task status and related fields. No actor writes status directly.
Only the Orchestrator creates tasks and subtasks. The OTM never creates or deletes tasks — it manages their lifecycle after creation. The Orchestrator also manages subtask lists (creates, deletes, keeps completed) during rework flows; the OTM processes the resulting state changes.
ALL events are logged in the Slack task conversation feed with timestamps (§7).
Agent notifications use the SURE protocol: request + mandatory acknowledgement (§6).

1.1 Actors

Actor	Current holder	Type	Role
Orchestrator	Claudia	AI	Creates tasks, sets assignments, validates completed work — always via OTM API
Agent	Devdas, Salvatore, etc.	AI	Executes tasks, reports progress directly to OTM API
Human	Rupert, clients	Human	Creates/edits tasks in Slack UI; changes detected by SE-1 and forwarded to OTM
OTM	(system)	System	Authoritative state machine, sole writer of all status and counter fields
Watchdog	OTM-6 cron	System	Recovery cron — detects anomalies, requests OTM to execute corrective transitions

📌 Humans are identified by their Slack user ID. AI agents are identified by both their Slack user ID and their OpenClaw agent ID. Both types are managed in the same agent registry (§4.5).

1.2 System Architecture Overview

The OTM is composed of six cooperating systems, from task creation to dashboard visibility:

┌─────────────────────────────────────────────────────────────────────────────────┐
│                         OTM — One-Time Mission System                          │
│                                                                                │
│  ┌──────────────────────────────────────────────────────────────────────────┐   │
│  │ SYSTEM 1: Task Creation                                                  │   │
│  │                                                                          │   │
│  │  Claudia (orchestrator)                                                  │   │
│  │       │                                                                  │   │
│  │       ▼                                                                  │   │
│  │  otm-create-task.sh ──► JSON file ──► ~/…/otm/new-tasks/                │   │
│  │       │                                                                  │   │
│  │       ▼                                                                  │   │
│  │  next-task-id.json (T-NNNNN counter, flock)                             │   │
│  └──────────────────────────────────────────────────────────────────────────┘   │
│       │                                                                        │
│       ▼                                                                        │
│  ┌──────────────────────────────────────────────────────────────────────────┐   │
│  │ SYSTEM 2: Task Injection                                                 │   │
│  │                                                                          │   │
│  │  ai.openclaw.otm-watcher (WatchPaths)                                   │   │
│  │  ai.openclaw.otm-sweeper (every 10 min)                                 │   │
│  │       │                                                                  │   │
│  │       ▼                                                                  │   │
│  │  otm-injector.js ──► Slack Lists API ──► Rapido Task Campaign           │   │
│  │       │                (slackLists.items.create + subtasks)              │   │
│  │       ▼                                                                  │   │
│  │  processed/ or failed/                                                   │   │
│  └──────────────────────────────────────────────────────────────────────────┘   │
│       │                                                                        │
│       ▼  (task exists in Slack with status=new)                                │
│  ┌──────────────────────────────────────────────────────────────────────────┐   │
│  │ SYSTEM 3: Task Dispatcher                                                │   │
│  │                                                                          │   │
│  │  ai.openclaw.otm-dispatcher (every 2 min)                               │   │
│  │       │                                                                  │   │
│  │       ├──► Scan: status=new + assignee set                              │   │
│  │       ├──► Write task-dispatch.json to agent workspace                  │   │
│  │       ├──► Trigger agent via Gateway WS RPC (parallel)                  │   │
│  │       └──► Update Slack: new → assigned + set assigned_at               │   │
│  └──────────────────────────────────────────────────────────────────────────┘   │
│       │                                                                        │
│       ▼  (agent session starts, picks up task, works, reports progress)        │
│  ┌──────────────────────────────────────────────────────────────────────────┐   │
│  │ SYSTEM 3b: Task Updates (agent → Slack feedback)                         │   │
│  │                                                                          │   │
│  │  otm-update-task.sh ──► JSON ──► ~/…/otm/task-updates/                  │   │
│  │       │                                                                  │   │
│  │       ▼                                                                  │   │
│  │  otm-injector.js ──► Slack Lists API (update status, subtask done)      │   │
│  └──────────────────────────────────────────────────────────────────────────┘   │
│       │                                                                        │
│       ▼  (all subtasks done → auto-promote)                                    │
│  ┌──────────────────────────────────────────────────────────────────────────┐   │
│  │ SYSTEM 4: Completion Detection                                           │   │
│  │                                                                          │   │
│  │  ai.openclaw.otm-completion-detector (every 2 min)                      │   │
│  │       │                                                                  │   │
│  │       ├──► Scan: status=in_progress + all subtasks done                 │   │
│  │       ├──► Update: completion % + subtasks_remaining                    │   │
│  │       └──► Promote: in_progress → agent_done                           │   │
│  └──────────────────────────────────────────────────────────────────────────┘   │
│       │                                                                        │
│       ▼  (Claudia validates → done)                                            │
│  ┌──────────────────────────────────────────────────────────────────────────┐   │
│  │ SYSTEM 5: Component Heartbeats                                           │   │
│  │                                                                          │   │
│  │  Each component writes *-state.json after every run                     │   │
│  │       │                                                                  │   │
│  │       ▼                                                                  │   │
│  │  Collector (FSEvents) → SQLite → Reader (WebSocket) → Dashboard         │   │
│  └──────────────────────────────────────────────────────────────────────────┘   │
│                                                                                │
│  ┌──────────────────────────────────────────────────────────────────────────┐   │
│  │ SYSTEM 6: DMZ Relay                                                      │   │
│  │                                                                          │   │
│  │  Collector ──HTTP POST──► Synology Receiver (127.0.0.1:3456)            │   │
│  │                                 │                                        │   │
│  │                                 ▼                                        │   │
│  │                           fab-state.json                                 │   │
│  │                                 │                                        │   │
│  │                                 ▼                                        │   │
│  │                      Broadcaster (0.0.0.0:3457)                         │   │
│  │                           │            │                                 │   │
│  │                    WSS /ws        GET /api/state                         │   │
│  │                           │            │                                 │   │
│  │                           ▼            ▼                                 │   │
│  │                    Vercel Dashboard (browser)                            │   │
│  └──────────────────────────────────────────────────────────────────────────┘   │
│                                                                                │
│  ┌──────────────────────────────────────────────────────────────────────────┐   │
│  │ Task Lifecycle                                                           │   │
│  │                                                                          │   │
│  │  new ──► assigned ──► in_progress ──► agent_done ──► done               │   │
│  │   │         │              │              │                              │   │
│  │   │  (dispatcher)  (agent starts)  (completion     (Claudia            │   │
│  │   │                                  detector)       validates)          │   │
│  │   │                                                                      │   │
│  │   └──► blocked (can happen at any stage)                                │   │
│  └──────────────────────────────────────────────────────────────────────────┘   │
└─────────────────────────────────────────────────────────────────────────────────┘

2. Task Data Model

Each task is a top-level item in the Slack Task Board List. Subtasks are child items (linked via parent_item_id). The OTM is the sole writer of status and counter fields.

Slack List: F0ALE8DCW1F (Rapido Task Campaign v2), workspace: rapidocloud.slack.com

2.1 Task Item Fields

Field	Type	Writer	Description
title	text	Orchestrator	Task name
task_id	text	OTM	Unique ID (`T-NNNNN` format — 5 digits, zero-padded)
assigned_to	person	Orchestrator	Slack user ID of assigned agent (AI or Human)
status	select	OTM	Current task state (see §3). OTM is sole writer — no exceptions
previous_status	select	OTM	Status before the last transition. Critical for failure analysis
priority	number	Orchestrator	0=Critical … 4=Batchable (see §3.4)
context	select	Orchestrator	project, research, operations, support, internal
subtasks_remaining	number	OTM	Decremented counter, NOT live count
assigned_at	datetime	OTM	When agent started work
completed_at	datetime	OTM	When all subtasks done
validated_at	datetime	OTM	When Orchestrator validated
result_summary	text	Agent	Deliverables/output description
input_files	text	Orchestrator	Links to input resources

2.2 Subtask Item Fields

Field	Type	Writer	Description
title	text	Orchestrator	Subtask description
todo_completed	checkbox	OTM (via SW-1)	Built-in Slack Lists checkbox (`Col00`). Ticked when subtask is done. Must be set alongside status when marking items as done (see below).
parent_item_id	reference	System	Links to parent task

📌 subtasks_remaining on the parent task is the canonical completion signal — not a live count.

📌 previous_status is set by the OTM on every transition. It enables post-mortem analysis when a task enters Failed state.

📌 Agents do NOT tick checkboxes directly in Slack. They report subtask completion to the OTM API (IE-02). The OTM then updates Slack via SW-1 — setting both the item status column and Col00 (checkbox).

`todo_completed` (`Col00`) Checkbox

The todo_completed field (Col00) is the built-in Slack Lists checkbox. It drives the visual checkmark ✅ in the Slack UI. Setting only the Status column to done does NOT check the box — both must be set explicitly.

Scenario	Action
Subtask marked `done`	Set `Col00: checkbox: true` on the subtask
Parent task marked `done`	Set `Col00: checkbox: true` on the parent
Parent task at `agent_done` or `in_progress`	Do NOT set checkbox (task isn't finished yet)

⚠️ Archive limitation: Slack Lists has a UI "Archive item" action, but slackLists.items.archive does not exist as an API method. Archiving is manual-only via the Slack UI. The OTM sets status to archived via the pipeline, but the actual Slack archive action cannot be automated.

2.3 JSON → Slack Field Mapping

This table is operationally critical — it maps the JSON task format (used in the file pipeline) to Slack column IDs.

JSON Field	Slack Column	Column ID	Slack Type	Notes
`title`	Title	`Col0AKKTBJJKZ`	`rich_text`	Clean one-liner only
`taskId`	Task ID	`Col0ALVK2NA1E`	`rich_text`	Format: `T-NNNNN` (5 digits, zero-padded)
`type`	Type	`Col0AKUV4BF6F`	`select`	`action` \| `decision` \| `review`
`agent`	Assignee	`Col0AKZ9G5UAJ`	`select`	Covers both agents and humans
`project`	Project 2	`Col0ALZBS9C8Z`	`rich_text`	Free-text (migrated from select 2026-03-14)
`priority`	Priority	`Col0ALE8DKWPK`	`select`	See §3.4 priority mapping
(auto)	Status	`Col0AL1B4UVLJ`	`select`	Always set to `new` on creation
`subtasks[]`	(child items)	—	`parent_item_id`	Each entry → child item with title + status `new`

Slack Built-in Fields:

Field	Column ID	Type	Notes
`todo_completed`	`Col00`	`checkbox`	Built-in Slack Lists checkbox. Must be set explicitly alongside status.

Fields NOT mapped to Slack columns (metadata only):

JSON Field	Purpose
`id`	UUID for file tracking / idempotency
`createdAt`	Timestamp, implicit in Slack item creation
`status`	Internal pipeline status (`pending` → processed)

Deprecated Slack Columns:

Item	Column ID	Notes
Project (old select column)	`Col0AL4UJ8BJ8`	Replaced by Project 2 text column (2026-03-14)

3. Task State Machine

The status field follows these transitions. The OTM is the sole writer — all transitions are executed by the OTM, regardless of which actor requested them.

3.1 Task States

Status	Description
New	Task created, not yet assigned
Assigned	Orchestrator has set an assignee; OTM evaluating agent availability
Pending	Agent is busy; task queued silently (no notification)
In Progress	Agent is actively working (SURE acknowledgement received)
Agent Done	All subtasks complete; awaiting Orchestrator review
Done	Orchestrator validated the work
Rejected	Orchestrator rejected; Orchestrator preparing rework subtasks
Failed	Unrecoverable error during execution
Cancelled	Task no longer needed; removed from active work
Archived	Terminal state; auto-moved 7 days after Done/Cancelled

3.2 Task State Transition Diagram

                    Orchestrator creates task
                              |
                           [New]
                              |
              TT-01: Orch requests assignment → OTM executes
                              |
                         [Assigned]
                        /          \
                 TT-02: OTM      TT-03: OTM
              (IE-01 + agent   (IE-01 + agent
               idle in reg)    busy in reg)
                      |              |
               [In Progress]    [Pending]
               (after SURE ack)      |
                      |         TT-05: OTM promotes
         TT-04: OTM receives    (IE-09 + pending
         IE-02 subtask reports    task found)
                      |              |
           subtasks_remaining=0      |
                      |              |
                [Agent Done] <------/
                   /    \
          TT-06: Orch  TT-07: Orch
          validates     rejects
          (IE-04)       (IE-05)
                |           |
             [Done]    [Rejected]
               |        /      \
         TT-12: OTM  TT-08    TT-10
         (IE-08+7d)  (IE-06)  (IE-06)
               |       |        |
          [Archived] [Assigned] [Cancelled]
                     (rework)    (drop)

     At any point before Agent Done:
         TT-11: Orch/Human requests cancel (IE-06/IE-01) → OTM executes
         TT-12: Watchdog requests archive (IE-08 + 7d check) → OTM executes
         TT-13: OTM detects error (IE-09) → [Failed]
         TT-14: Orch requests retry (IE-07) → [New]
         TT-15: Orch/Human requests cancel (IE-06/IE-01) → [Cancelled] (from Failed)

3.2.1 Task Transition Action Index

Each task transition is coded TT-xx. All transitions are executed by the OTM. The "Requesting Actor" is who initiates; the OTM validates and applies.

Orchestrator-requested transitions (executed by OTM):

Code	From → To	Requesting Actor	Inbound Event	OTM Action	Outbound Event
TT-01	New → Assigned	Orchestrator	IE-01: `assigned_to` field changed	Validate assignment, set status	OE-06, OE-07
TT-06	Agent Done → Done	Orchestrator	IE-04: validate API call	Set `validated_at`, change status	OE-06, OE-07, OE-02
TT-07	Agent Done → Rejected	Orchestrator	IE-05: reject API call	Change status, post reason	OE-06, OE-07
TT-08	Rejected → Assigned	Orchestrator	IE-06: rework API call (subtasks already prepared by Orchestrator)	Count unfinished subtasks, set counter, change status	OE-06, OE-07, OE-01
TT-10	Rejected → Cancelled	Orchestrator/Human	IE-06: cancel API / IE-01: Human Slack edit	Change status	OE-06, OE-07
TT-11	Any (pre-Agent Done) → Cancelled	Orchestrator/Human	IE-06: cancel API / IE-01: Human Slack edit	Free agent, change status	OE-06, OE-07, OE-04
TT-14	Failed → New	Orchestrator	IE-07: retry API call	Reset task, change status	OE-06, OE-07
TT-15	Failed → Cancelled	Orchestrator/Human	IE-06: cancel API / IE-01: Human Slack edit	Change status	OE-06, OE-07

OTM-initiated transitions (automated):

Code	From → To	Inbound Event (trigger)	OTM Action	Outbound Event
TT-02	Assigned → In Progress	IE-01: `status` changed to Assigned (OTM-0 routes to OTM-2, agent registry query returns Idle)	Execute AT-01, write counter, send SURE notification	OE-01, OE-06, OE-07
TT-03	Assigned → Pending	IE-01: `status` changed to Assigned (OTM-0 routes to OTM-2, agent registry query returns Busy)	Queue task silently	OE-06, OE-07
TT-04	In Progress → Agent Done	IE-02: agent reports subtask done (OTM-3 decrements counter to 0)	Execute AT-02, set `completed_at`	OE-02, OE-06, OE-07
TT-05	Pending → Assigned	IE-09: OTM internal — `check_next_task_for_agent()` finds pending task after AT-02/AT-03/AT-04	Re-evaluate via OTM-2 path	OE-06, OE-07
TT-13	In Progress → Failed	IE-10: OTM detects agent error (OpenClaw hook timeout >5min / agent crash / unhandled exception reported)	Store `previous_status`, execute AT-04	OE-05, OE-06, OE-07

Watchdog-requested transitions (executed by OTM):

Code	From → To	Inbound Event (trigger)	OTM Action	Outbound Event
TT-12	Done/Cancelled → Archived	IE-08: watchdog cron tick (OTM-6 checks `completed_at` or `cancelled_at` + 7 days < now)	Archive task	OE-06, OE-07

📌 Cross-reference: See §5 for TT-xx ↔ AT-xx ↔ handler mapping. See §6 for SURE protocol. See §7 for audit trail.

3.3 Task Transition Rules

All transitions are submitted to the OTM which validates preconditions and executes the state change. Every transition produces at minimum OE-06 (Slack field update) and OE-07 (audit log entry).

Code	From	To	Inbound Event	OTM Action	Outbound Event
TT-01	New	Assigned	IE-01: `assigned_to` changed on New task	Validate agent exists in registry, set status	OE-06, OE-07
TT-02	Assigned	In Progress	IE-01: status=Assigned detected + agent Idle in registry	Set agent busy (AT-01), init counter, send SURE task notification	OE-01, OE-06, OE-07
TT-03	Assigned	Pending	IE-01: status=Assigned detected + agent Busy in registry	Queue task, no notification	OE-06, OE-07
TT-04	In Progress	Agent Done	IE-02: subtask completion report + counter decrements to 0	Set agent idle (AT-02), set `completed_at`, notify Orchestrator	OE-02, OE-06, OE-07
TT-05	Pending	Assigned	IE-09: internal `check_next_task_for_agent()` + pending task found	Re-route to OTM-2 (same as TT-01 path)	OE-06, OE-07
TT-06	Agent Done	Done	IE-04: Orchestrator validate call	Set `validated_at`, change status	OE-02, OE-06, OE-07
TT-07	Agent Done	Rejected	IE-05: Orchestrator reject call with reason	Change status, log reason	OE-06, OE-07
TT-08	Rejected	Assigned	IE-06: Orchestrator rework call (subtasks already prepared)	Count unfinished subtasks, set counter, change status	OE-06, OE-07
TT-10	Rejected	Cancelled	IE-06: Orchestrator cancel call / IE-01: Human status edit	Change status	OE-06, OE-07
TT-11	New/Assigned/Pending/In Progress	Cancelled	IE-06: Orchestrator cancel call / IE-01: Human status edit	Free agent if applicable (AT-03), change status	OE-04, OE-06, OE-07
TT-12	Done/Cancelled	Archived	IE-08: watchdog tick + 7-day check passes	Archive task	OE-06, OE-07
TT-13	In Progress	Failed	IE-10: agent error detected (hook timeout/crash/exception)	Store `previous_status`, free agent (AT-04)	OE-05, OE-06, OE-07
TT-14	Failed	New	IE-07: Orchestrator retry call	Reset task fields, change status	OE-06, OE-07
TT-15	Failed	Cancelled	IE-06: Orchestrator cancel call / IE-01: Human status edit	Change status	OE-06, OE-07

📌 TT-09 (Rejected → New) removed in v1.3. Reassignment is handled by the Orchestrator cancelling the rejected task (TT-10) and creating a new task for a different agent. This simplifies the state machine.

3.4 Priority Scale

Value	Label	Meaning	`--priority` flag
0	Critical	Blocking other work, immediate attention	`critical`
1	High	Important, do next	`high`
2	Medium/Normal	Normal priority	`normal` (or `medium` as alias)
3	Low	When bandwidth allows	`low`
4	Batchable	Large/expensive work, can run async via Batch API	`batchable`

Queue ordering: priority ASC, posted_at ASC (0 = highest priority, FIFO within same priority).

4. Agent State Machine

The OTM maintains agent availability state in the Agent Registry (OTM-1). Agent transitions are coded AT-xx and are distinct from task transitions (TT-xx).

4.1 Agent States

Status	Description
Idle	Agent is available, not working on any task
Busy	Agent is actively working on a task (`current_task` is set)

4.2 Agent State Transition Diagram

              [Idle]
                |
         AT-01: OTM assigns task
         (triggered by TT-02)
         (SURE notification sent → OE-01)
                |
             [Busy]
                |
         AT-02: task completes (TT-04, IE-02 counter=0)
         AT-03: task cancelled (TT-11, IE-01/IE-06)
         AT-04: task fails (TT-13, IE-10)
                |
             [Idle]
                |
         → OTM calls check_next_task_for_agent() (IE-09)
         → if pending task found: TT-05 → AT-01 again

4.3 Agent Transition Action Index

All agent transitions are executed by the OTM. No actor changes agent status directly.

Code	From → To	Inbound Event (trigger)	OTM Action	Outbound Event
AT-01	Idle → Busy	IE-01: Assigned event + agent Idle (during OTM-2)	Set `status=busy`, `current_task=task_id`, `task_started_at=now`	OE-01 (SURE notification), OE-07 (audit)
AT-02	Busy → Idle	IE-02: subtask report + counter=0 (during OTM-3)	Set `status=idle`, clear `current_task`, call `check_next_task_for_agent()`	OE-07 (audit)
AT-03	Busy → Idle	IE-01/IE-06: cancellation request (during OTM-5)	Set `status=idle`, clear `current_task`, call `check_next_task_for_agent()`	OE-04 (cancel notify), OE-07 (audit)
AT-04	Busy → Idle	IE-10: agent error detected (during OTM error handler)	Set `status=idle`, clear `current_task`, call `check_next_task_for_agent()`	OE-05 (admin alert), OE-07 (audit)

4.4 Agent Transition Rules

Code	From	To	Inbound Event	OTM Action	Outbound Event
AT-01	Idle	Busy	IE-01 (Assigned event)	OTM-2 sets agent busy before sending SURE notification	OE-01, OE-07
AT-02	Busy	Idle	IE-02 (last subtask report)	OTM-3 frees agent, promotes next pending task	OE-07
AT-03	Busy	Idle	IE-01/IE-06 (cancellation)	OTM-5 frees agent if assigned to cancelled task	OE-04, OE-07
AT-04	Busy	Idle	IE-10 (agent error)	OTM error handler frees agent	OE-05, OE-07

📌 Agent ↔ Task coupling: Every AT-xx is triggered by a TT-xx. See §5 for the complete bidirectional mapping.

📌 Watchdog note: OTM-6 monitors agent heartbeats (last_seen >2h) but does NOT change agent state. It alerts the admin (OE-05). Only OTM handlers modify agent status.

4.5 Agent Registry Schema

CREATE TABLE agents (
  slack_user_id TEXT PRIMARY KEY,   -- Slack user ID (e.g., "U0AKEB27HNK")
  otm_display_name TEXT NOT NULL,   -- Display name for logs/UI (e.g., "Devdas")
  openclaw_agent_id TEXT,           -- OpenClaw agent ID (e.g., "devdas"). NULL for human agents.
  agent_type TEXT NOT NULL           -- 'ai' | 'human'
      DEFAULT 'ai',
  status TEXT DEFAULT 'idle',       -- 'idle' | 'busy'  (see §4.1)
  current_task TEXT,                -- task item ID or NULL
  task_started_at INTEGER,          -- Unix timestamp or NULL
  last_seen INTEGER                 -- Unix timestamp
);

Registry operations:

register_agent(slack_user_id, otm_display_name, openclaw_agent_id, agent_type) — AT startup or on first activity
set_busy(slack_user_id, task_id) — AT-01
set_idle(slack_user_id) — AT-02, AT-03, AT-04
is_busy(slack_user_id) → boolean — checked during TT-02/TT-03
get_current_task(slack_user_id) → task_id | null

4.6 Startup Reconciliation (Dynamic Registry)

On OTM startup:

Read OpenClaw config — Query openclaw.json agent configurations to populate AI agent entries automatically. Each configured OpenClaw agent that has a slack_user_id mapping is auto-registered with agent_type = 'ai'.
Reconcile from Slack — Query Slack List for all tasks with status In Progress or Pending. Rebuild current_task and status (busy/idle) from those records.
Human agents — Hard-coded for v1. Rupert is pre-seeded in the agent registry at startup with agent_type = 'human', openclaw_agent_id = NULL. Claudia is pre-seeded as the Orchestrator. Dynamic human registration (auto-detect from assigned_to on first interaction) is deferred to a future version.

📌 v1 hard-coded actors:

Slack user ID	Display name	Type	Role
`U06K407LVCY`	Rupert	human	Task assignee / reviewer
`U0AKEB27HNK`	Claudia	ai	Orchestrator (sole)

Future versions will define a proper human user registration and re-registration protocol (see Open Question 9).

📌 AI agents are notified via OpenClaw /hooks/agent. Human agents are notified via the Slack task conversation thread (posted by SW-1 as an OE-07 audit entry addressed to the human). The notification channel is determined by agent_type.

4.7 Gateway Restart Detection & OTM Resync

The OTM runs as an OpenClaw plugin pipeline inside the gateway process. Several restart scenarios must be handled:

Scenario A: Gateway restarts (OTM restarts with it)

OTM startup reconciliation (§4.6) runs automatically
All agent states rebuilt from Slack List + openclaw.json
SURE pending notifications checked: any outstanding acks >3 min old → ERR-06
System event logged: [timestamp] OTM SYSTEM: Gateway restart detected. Reconciliation complete.
Logged to both event_log (in error DB) and otm-events.log file

Scenario B: Gateway restarts but OTM was mid-processing

SQLite WAL mode ensures no data corruption on crash
On restart, OTM-7 (error monitor) runs within 60s and detects any inconsistencies:
- ERR-02/ERR-03: Agent-task mismatches from interrupted transitions
- ERR-08: Tasks stuck in Assigned from interrupted OTM-2
- ERR-04: Orphaned Pending from interrupted promotions
All auto-correctable errors are fixed; others escalated

Scenario C: Gateway stops for extended period (>3 min)

SE-1 stops receiving Slack events during downtime
Agents cannot send IE-02/IE-03 reports (OpenClaw hooks are down)
On restart: reconciliation rebuilds state from Slack List (source of truth for task fields)
Pending SURE acks will have timed out → ERR-06 logged
Agents that were mid-task may have completed work but couldn't report it:
- OTM-7 detects counter mismatches (ERR-05) on next cycle
- Watchdog cross-checks subtask completion status in Slack vs subtasks_remaining

Orchestrator re-registration:

The Orchestrator does NOT need to re-register agents. The OTM rebuilds the registry from openclaw.json automatically on startup (§4.6).
If openclaw.json has changed (new agent added, agent removed), the reconciliation picks up the delta.

Gateway restart logging:

Every OTM startup logs a system event: IE-SYS-01: OTM startup with details including:
- Agents reconciled (count + names)
- Tasks found in active states (In Progress, Pending, Assigned)
- SURE timeouts detected
- Errors found and corrected during reconciliation
This event is logged to event_log, otm-events.log, AND posted to Slack #alerts channel (OE-05)

5. Cross-Reference Index

5.1 Task Transition → Agent Transition Mapping

Task Transition	Triggers Agent Transition	Handler
TT-02 (Assigned → In Progress)	AT-01 (Idle → Busy)	OTM-2
TT-04 (In Progress → Agent Done)	AT-02 (Busy → Idle)	OTM-3
TT-11 (→ Cancelled)	AT-03 (Busy → Idle)	OTM-5
TT-13 (In Progress → Failed)	AT-04 (Busy → Idle)	OTM error

5.2 Transition → Handler Mapping

Transition(s)	Primary Handler	Description
TT-01, TT-02, TT-03, TT-05	OTM-2	Task assignment, availability check, queue management
TT-04	OTM-3	Subtask completion, counter decrement, task completion
TT-06, TT-07, TT-08, TT-10	OTM-4	Validation, rejection, rework, cancel-after-reject
TT-11	OTM-5	Task cancellation
TT-12	OTM-6	Archival (watchdog requests, OTM executes)
TT-13, TT-14, TT-15	OTM error handler	Failure detection, retry, abandon
AT-01	OTM-2	Agent set busy
AT-02	OTM-3	Agent freed on task completion
AT-03	OTM-5	Agent freed on task cancellation
AT-04	OTM error handler	Agent freed on task failure

5.3 Inbound Event Index

Code	Source	Description	Triggered by
IE-01	SE-1	Raw `list_item_updated` event from Slack (Human field edit)	Human edits task in Slack UI
IE-02	Agent	Subtask completion report via OTM API	Agent calls OTM after completing subtask
IE-03	Agent	SURE acknowledgement via OTM API	Agent confirms receipt of task assignment
IE-04	Orchestrator	Task validation request via OTM API	Orchestrator reviews and approves
IE-05	Orchestrator	Task rejection request via OTM API (with reason)	Orchestrator reviews and rejects
IE-06	Orchestrator	Task action request via OTM API (rework/cancel/retry)	Orchestrator requests state change
IE-07	Orchestrator	Task retry request via OTM API (from Failed)	Orchestrator wants to retry failed task
IE-08	Watchdog	Cron tick (every 60 seconds)	Timer fires
IE-09	OTM internal	`check_next_task_for_agent()` result	Triggered after AT-02/AT-03/AT-04
IE-10	OTM internal	Agent error detection (hook timeout >5min, crash, exception)	OpenClaw health monitoring

5.4 Outbound Event Index

Code	Target	Description	Via
OE-01	Agent	SURE task notification (requires acknowledgement IE-03)	OpenClaw hooks (AI) / Slack task conversation (Human)
OE-02	Orchestrator	Task completion/validation notification	OpenClaw hooks
OE-03	(reserved)
OE-04	Agent	Task cancellation notification	OpenClaw hooks (AI) / Slack task conversation (Human)
OE-05	Admin	Alert (anomaly, error, stale task, agent down)	Telegram / Slack #alerts
OE-06	Slack	Task field update (status, counters, timestamps)	SW-1
OE-07	Slack	Audit log entry in task conversation feed	SW-1

5.5 All Code Summaries

Task transitions (TT-xx): TT-01 through TT-15 (TT-09 removed in v1.3) — see §3.2.1 and §3.3
Agent transitions (AT-xx): AT-01 through AT-04 — see §4.3 and §4.4
Inbound events (IE-xx): IE-01 through IE-10 — see §5.3
Outbound events (OE-xx): OE-01 through OE-07 — see §5.4

6. SURE Protocol (Send-Understand-Report-Execute)

All task notifications to agents use the SURE protocol to guarantee delivery and acknowledgement.

6.1 Flow

OTM sends task assignment → OE-01
  |
  +-- OTM logs in task conversation (OE-07):
  |     "[2026-03-12 14:30:05] OTM → Agent(Devdas): Task assigned — <title>"
  |
  +-- Agent receives notification
  |
  +-- Agent sends acknowledgement → IE-03
  |
  +-- OTM logs in task conversation (OE-07):
        "[2026-03-12 14:30:12] Agent(Devdas) → OTM: Task acknowledged"

6.2 Timeout

Retry 1: If no IE-03 within 1 minute → OTM retries notification (OE-01), logs retry.
Retry 2: If no IE-03 within 2 more minutes (3 min total) → OTM retries again, logs retry.
Error: If no IE-03 after retry 2 → OTM logs ERR-06 error, alerts admin (OE-05). Task remains In Progress — admin decides.

📌 The 1+2 minute schedule is designed to allow time for a gateway restart (~2 min typical). If the gateway restarts but the OTM does not, the OTM resync procedure (§4.7) handles recovery.

6.3 SURE applies to

Event	SURE required?
Task assignment (OE-01)	✅ Yes
Rework task (OE-01 after TT-08)	✅ Yes
Task cancellation (OE-04)	❌ No (fire-and-forget, agent stops)
Admin alert (OE-05)	❌ No

6.4 Agent acknowledgement API

POST /api/otm/ack
{
  "task_id": "<task item ID>",
  "agent_id": "<slack_user_id>",
  "type": "task_assigned"
}

7. Audit Trail

Every event processed by the OTM is logged in the Slack task conversation feed via SW-1. This provides a human-readable, timestamped record of all activity on each task.

7.1 What is logged

Event type	Log format
State change by OTM	`[timestamp] OTM: Status changed <previous> → <new> (TT-xx)`
Orchestrator request received	`[timestamp] Orchestrator → OTM: <action> requested (IE-xx)`
Human change detected	`[timestamp] Human(<name>) change detected: <field> = <value> (IE-01)`
Agent notification sent	`[timestamp] OTM → Agent(<name>): <notification type> (OE-xx)`
Agent acknowledgement received	`[timestamp] Agent(<name>) → OTM: Acknowledged (IE-03)`
Agent subtask report received	`[timestamp] Agent(<name>) → OTM: Subtask done — <title> (IE-02). Remaining: <n>`
Watchdog action	`[timestamp] Watchdog: <check description> (IE-08)`
Error/alert	`[timestamp] OTM ERROR: <description> (IE-10)`

7.2 Implementation

All audit entries are posted as replies in the Slack task's conversation thread via SW-1. This means:

Every task's conversation is a complete history of its lifecycle
No separate log table needed — Slack IS the audit log
Human-readable without any tooling
Searchable via Slack search

PART 1 — SLACK INTEGRATION LAYER

SE-1: Slack Event Listener

	Detail
Actors	Slack Events API (source)
Inbound events	IE-01: any `list_item_updated` event on the Task Board List
Actions	Forward raw event payload to OTM-0 entry point
Outbound events	None — SE-1 has zero intelligence
Transitions	None — SE-1 is a passthrough

The Slack app (Salvatore's app, socket mode) subscribes to list_item_updated events.

SE-1 does exactly one thing:

ON list_item_updated:
  → call otm_handle_event(raw_event_payload)

No routing. No field inspection. No filtering. The OTM decides what to do with the event.

Required Slack App scope for SE-1: lists:read only.

SW-1: Slack Writer

	Detail
Actors	OTM (sole caller)
Inbound events	OTM handler calls
Actions	Write task fields to Slack List, post audit entries to task conversation
Outbound events	OE-06: Slack field update, OE-07: audit log entry

SW-1 is the sole component that writes to Slack. It provides two operations:

sw1_update_fields(task_id, fields) — Updates task item fields (status, counters, timestamps). Produces OE-06.
sw1_post_audit(task_id, message) — Posts a timestamped message to the task's conversation thread. Produces OE-07.

Required Slack App scope for SW-1: lists:write.

📌 SE-1 (lists:read) and SW-1 (lists:write) are separate concerns. They may run in the same Slack app but are logically distinct.

📌 SW-1 does NOT create or delete tasks. Only the Orchestrator creates tasks and subtasks (via Slack API or UI). SW-1's write scope is limited to: updating existing task fields (OE-06) and posting audit entries to task conversations (OE-07). During rework, the Orchestrator manages subtask creation/deletion directly; the OTM then processes the state change via SW-1.

PART 2 — OPENCLAW TASK MANAGER (OTM)

Implemented as an OpenClaw plugin pipeline set. Persists to SQLite. Sole component that writes task status, agent status, and counter fields.

OTM-0: Event Router

	Detail
Actors	OTM (internal)
Inbound events	IE-01: raw `list_item_updated` from SE-1
Actions	ACT-R1: Parse event payload and identify change type
	ACT-R2: Route to appropriate handler
Outbound events	None directly — delegates to handlers

Single entry point for all Slack-originated events. Contains the routing logic that was previously in SE-1.

RECEIVE raw_event_payload from SE-1
  |
  +-- Parse: what field(s) changed?
  |
  +-- IF assigned_to changed AND status = "New":
  |       → route to OTM-2 (task assignment)
  |
  +-- IF status changed to "Assigned" (from Pending promotion or rework):
  |       → route to OTM-2 (task re-assignment)
  |
  +-- IF status changed to "Cancelled" by Human:
  |       → route to OTM-5 (cancellation)
  |
  +-- IF other field changed by Human:
  |       → log via sw1_post_audit (OE-07): "Human(<name>) changed <field>"
  |       → no state transition
  |
  +-- ELSE: ignore

📌 All routing intelligence lives here, not in SE-1. SE-1 is a dumb pipe.

OTM-1: Agent Registry

	Detail
Actors	OTM (owner/writer)
Inbound events	IE-08: startup reconciliation
	IE-01: new agent detected (auto-register)
Actions	ACT-A1: Register new agent (from openclaw.json or first interaction)
	ACT-A2: Set agent busy (AT-01)
	ACT-A3: Set agent idle (AT-02, AT-03, AT-04)
	ACT-A4: Reconcile from Slack List on startup
	ACT-A5: Reconcile from openclaw.json on startup
Outbound events	OE-07: audit log for registration events
Transitions	AT-01, AT-02, AT-03, AT-04

See §4.5 for schema and §4.6 for startup reconciliation.

OTM-2: Handle Task Assigned

	Detail
Actors	OTM (executor), Agent (notified if idle)
Inbound events	IE-01: `assigned_to` changed or status=Assigned (from OTM-0)
Actions	ACT-T1: Count subtasks via Slack API
	ACT-T2: Check agent availability in registry
	ACT-T3: Set task status (via SW-1)
	ACT-T4: Set agent busy (AT-01 via OTM-1)
	ACT-T5: Send SURE notification (OE-01)
Outbound events	OE-01: SURE task notification (if agent idle)
	OE-06: Slack field update
	OE-07: audit log entry
Task transitions	TT-02 (→ In Progress) or TT-03 (→ Pending)
Agent transitions	AT-01 (Idle → Busy) if agent available

RECEIVE task assignment event (from OTM-0)
  |
  +-- Read task fields: task_id, title, assigned_to, priority
  +-- ACT-T1: Count child items (subtasks) via Slack API → subtask_count
  +-- Store subtask_count as initial subtasks_remaining
  |
  +-- ACT-T2: Look up assigned_to agent in registry
  |
  +-- IF agent NOT found AND agent_type detectable:
  |       ACT-A1: Auto-register agent
  |       sw1_post_audit: "New agent registered: <name>"
  |
  +-- IF agent NOT found AND not detectable:
  |       OE-05: Alert admin
  |       sw1_post_audit: "ERROR: Unknown agent <id>"
  |       Task stays in current status
  |
  +-- IF agent is IDLE:
  |       Store previous_status
  |       ACT-T4: Execute AT-01 (agent → busy)
  |       sw1_update_fields: subtasks_remaining, status = "In Progress", assigned_at = now
  |       sw1_post_audit: "Status: Assigned → In Progress (TT-02). Agent: <name>"
  |       ACT-T5: Send SURE notification (OE-01)
  |       sw1_post_audit: "OTM → Agent(<name>): Task assigned (OE-01). Awaiting SURE ack."
  |
  +-- IF agent is BUSY:
          Store previous_status
          sw1_update_fields: status = "Pending"
          sw1_post_audit: "Status: Assigned → Pending (TT-03). Agent <name> busy with <current_task>"

OTM-3: Handle Subtask Done

	Detail
Actors	Agent (reports completion), OTM (processes), Orchestrator (notified on task completion)
Inbound events	IE-02: agent subtask completion report via OTM API
Actions	ACT-S1: Validate subtask belongs to agent's current task
	ACT-S2: Decrement counter
	ACT-S3: Update Slack subtask checkbox (via SW-1) — sets both status AND `Col00`
	ACT-S4: Complete task if counter = 0
	ACT-S5: Free agent (AT-02 via OTM-1)
	ACT-S6: Promote next pending task (IE-09)
Outbound events	OE-02: Orchestrator notification (on task complete)
	OE-06: Slack field update
	OE-07: audit log entry
Task transitions	TT-04 (→ Agent Done when counter hits 0)
Agent transitions	AT-02 (Busy → Idle) when task completes

RECEIVE subtask completion report (IE-02)
  {task_id, subtask_id, agent_id}
  |
  +-- ACT-S1: Validate:
  |     - subtask belongs to task
  |     - agent is assigned to task
  |     - subtask not already completed (idempotency: check todo_completed field / Col00)
  |       IF already completed: discard, return OK
  |
  +-- ACT-S3: sw1_update_fields: subtask.todo_completed = true (Col00), subtask.status = done
  +-- ACT-S2: Decrement task.subtasks_remaining by 1
  +-- sw1_update_fields: subtasks_remaining
  +-- sw1_post_audit: "Agent(<name>): Subtask done — <title> (IE-02). Remaining: <n>"
  |
  +-- IF subtasks_remaining > 0:
  |       Done. Await next report.
  |
  +-- IF subtasks_remaining = 0 (TASK COMPLETE):
          Store previous_status = "In Progress"
          sw1_update_fields: status = "Agent Done", completed_at = now
          sw1_post_audit: "Status: In Progress → Agent Done (TT-04). All subtasks complete."
          ACT-S5: Execute AT-02 (agent → idle)
          ACT-S6: Notify Orchestrator (OE-02):
            POST /hooks/agent {
              agentId: "main",
              message: "Task ready for review: <title>\nAgent: <name>\nElapsed: <time>\nResult: <result_summary>\nLink: <slack_link>"
            }
          sw1_post_audit: "OTM → Orchestrator: Task ready for review (OE-02)"
          CALL: check_next_task_for_agent(agent_id)  → IE-09

check_next_task_for_agent(agent_id):

Query Slack List for tasks WHERE:
  assigned_to = agent_id
  AND status = "Pending"
  ORDER BY priority ASC, posted_at ASC
  LIMIT 1
  |
  +-- IF Pending task found:
  |       sw1_update_fields: pending_task.status = "Assigned"
  |       sw1_post_audit on pending task: "Status: Pending → Assigned (TT-05). Agent now available."
  |       (OTM-0 detects Assigned change → OTM-2 fires)
  |
  +-- IF no Pending task:
          Agent remains idle.

📌 Idempotency is handled by checking todo_completed (Col00) on the subtask before processing. No separate processed_events table needed. The Slack conversation feed (OE-07) serves as the complete audit trail.

📌 Agent reports directly to OTM API (IE-02), not by ticking checkboxes in Slack. The OTM then updates Slack via SW-1 (ACT-S3). This ensures all writes go through the OTM.

OTM-4: Task Validate / Reject (Orchestrator API)

	Detail
Actors	Orchestrator (caller), OTM (executor)
Inbound events	IE-04: Orchestrator requests validation
	IE-05: Orchestrator requests rejection (with reason)
	IE-06: Orchestrator signals rework ready (subtasks already managed by Orchestrator)
Actions	ACT-V1: Verify task status = "Agent Done" or "Rejected"
	ACT-V2: Execute validation (TT-06)
	ACT-V3: Execute rejection (TT-07)
	ACT-V4: Count unfinished subtasks, set counter, change status (TT-08)
Outbound events	OE-02: confirmation to Orchestrator
	OE-06: Slack field update
	OE-07: audit log entries
Task transitions	TT-06 (→ Done), TT-07 (→ Rejected), TT-08 (→ Assigned), TT-10 (→ Cancelled)

Validate request (IE-04):

{
  "task_id": "<task item ID>",
  "outcome": "validated",
  "comment": "<optional>"
}

Processing — validated:

ACT-V1: Verify task.status = "Agent Done" (reject API call otherwise)
Store previous_status = "Agent Done"
sw1_post_audit: "Orchestrator → OTM: Validation requested (IE-04)"
ACT-V2: Execute TT-06
  sw1_update_fields: status = "Done", validated_at = now, todo_completed = true (Col00)
  sw1_post_audit: "Status: Agent Done → Done (TT-06). Validated."
  IF comment: sw1_post_audit: "Orchestrator comment: <comment>"
OE-02: Notify Orchestrator: "Task <title> is Done"

Reject request (IE-05):

{
  "task_id": "<task item ID>",
  "outcome": "rejected",
  "reason": "<mandatory explanation>"
}

Processing — rejected:

ACT-V1: Verify task.status = "Agent Done" (reject API call otherwise)
Store previous_status = "Agent Done"
sw1_post_audit: "Orchestrator → OTM: Rejection requested (IE-05). Reason: <reason>"
ACT-V3: Execute TT-07
  sw1_update_fields: status = "Rejected"
  sw1_post_audit: "Status: Agent Done → Rejected (TT-07)"

Rework request (IE-06) — submitted after Orchestrator has already prepared subtasks:

The Orchestrator handles all subtask management before calling the OTM:

Orchestrator deletes unnecessary/obsolete subtasks from the Slack List
Orchestrator leaves completed subtasks in place (as a record)
Orchestrator creates new subtasks (first = "Acknowledge rework request: ")
Orchestrator then notifies the OTM that rework is ready:

{
  "task_id": "<task item ID>",
  "action": "rework"
}

Processing — rework:

Verify task.status = "Rejected"
sw1_post_audit: "Orchestrator → OTM: Rework requested (IE-06)"
ACT-V4: Execute TT-08
  Count unfinished subtasks via Slack API (todo_completed = false / Col00 unchecked)
  Set subtasks_remaining = count of unfinished subtasks
  sw1_update_fields: status = "Assigned", subtasks_remaining
  sw1_post_audit: "Status: Rejected → Assigned (TT-08). Rework: <n> unfinished subtasks."
  → OTM-0 detects Assigned change → OTM-2 fires → SURE notification sent → agent acks

📌 Rework flow — separation of concerns: The Orchestrator owns subtask management (create, delete, keep). The OTM owns state management (status transitions, counter recalculation, agent assignment). The OTM never creates or deletes subtasks. On rework, it counts unfinished subtasks to set subtasks_remaining, then triggers the normal assignment flow. The first new subtask is always an acknowledgement subtask — the agent closes it to confirm they understood the rework instructions.

Cancel-after-reject request (IE-06):

{
  "task_id": "<task item ID>",
  "action": "cancel"
}

Verify task.status = "Rejected" → Execute TT-10 (→ Cancelled).

OTM-5: Handle Task Cancelled

	Detail
Actors	Orchestrator/Human (requests), OTM (executes), Agent (notified if was working)
Inbound events	IE-01: Human status edit detected by OTM-0
	IE-06: Orchestrator cancel API call
Actions	ACT-C1: Store previous_status
	ACT-C2: Free agent if applicable (AT-03)
	ACT-C3: Notify agent of cancellation (OE-04)
	ACT-C4: Promote next pending task (IE-09)
Outbound events	OE-04: cancellation notification to agent
	OE-06: Slack field update
	OE-07: audit log entry
Task transitions	TT-11 (→ Cancelled)
Agent transitions	AT-03 (Busy → Idle) if agent was working

RECEIVE cancellation request (IE-01 or IE-06)
  |
  +-- ACT-C1: Store previous_status
  +-- sw1_post_audit: "Cancellation requested by <actor> (IE-xx)"
  +-- sw1_update_fields: status = "Cancelled"
  +-- sw1_post_audit: "Status: <previous> → Cancelled (TT-11)"
  |
  +-- IF task was In Progress or Assigned:
  |       ACT-C2: Execute AT-03 (agent → idle)
  |       ACT-C3: Send cancellation notification (OE-04)
  |       sw1_post_audit: "OTM → Agent(<name>): Task cancelled (OE-04)"
  |       ACT-C4: check_next_task_for_agent(agent_id) → IE-09
  |
  +-- IF task was Pending:
  |       sw1_post_audit: "Removed from queue (no agent notification)"
  |
  +-- IF task was New:
          sw1_post_audit: "Cancelled before assignment"

OTM-6: Watchdog

	Detail
Actors	Watchdog cron (detector), OTM (executor), Admin (alerted)
Inbound events	IE-08: cron tick (every 60 seconds)
Actions	ACT-W1: Check for stale In Progress tasks (>24h, no subtask activity)
	ACT-W2: Check for orphaned Pending tasks (agent idle but task pending)
	ACT-W3: Check counter mismatches (subtasks_remaining vs actual)
	ACT-W4: Check archival candidates (Done/Cancelled >7 days)
	ACT-W5: Check agent heartbeats (last_seen >2h)
Outbound events	OE-05: admin alert (anomalies)
	OE-06: Slack field update (archival)
	OE-07: audit log entries
Task transitions	Requests TT-12 (→ Archived), requests TT-05 (orphaned Pending → Assigned)

📌 The Watchdog does NOT write state directly. It calls OTM handler functions to execute transitions.

Checks:

ACT-W1: STALE IN-PROGRESS
  Query tasks In Progress for >24h with no subtask activity in conversation feed
  → OE-05: Alert admin. Do NOT auto-reassign.
  → sw1_post_audit: "Watchdog: Stale task detected (>24h, no activity)"

ACT-W2: ORPHANED PENDING
  Query tasks Pending WHERE assigned agent is Idle in registry
  → Request OTM to re-trigger: call OTM-2 (TT-05 → Assigned)
  → sw1_post_audit: "Watchdog: Orphaned pending — re-triggering assignment"

ACT-W3: COUNTER MISMATCH
  Compare subtasks_remaining with actual unchecked subtask count
  → Recalculate and fix counter via SW-1
  → OE-05: Alert admin
  → sw1_post_audit: "Watchdog: Counter mismatch corrected (<old> → <new>)"

ACT-W4: ARCHIVAL
  Query tasks Done or Cancelled WHERE completed_at/cancelled_at + 7 days < now
  → Request OTM to execute TT-12
  → sw1_update_fields: status = "Archived"
  → sw1_post_audit: "Status: <previous> → Archived (TT-12). Auto-archived after 7 days."

ACT-W5: AGENT HEARTBEAT
  Query agents WHERE last_seen > 2h ago
  → OE-05: Alert admin (agent may be down). No state change.

End-to-End Flow Diagrams

Flow 1a — Task Assigned, Agent Idle

Orchestrator sets assigned_to on New task
  |
[SE-1] receives list_item_updated → forwards raw event to OTM (IE-01)
  |
[OTM-0] parses: assigned_to changed → routes to OTM-2
  |
[OTM-2] checks registry: agent is IDLE
  |  sw1_update_fields: subtasks_remaining, status = "In Progress", assigned_at
  |  sw1_post_audit: "Status: New → Assigned → In Progress (TT-01, TT-02)"
  |  executes AT-01: agent → busy
  |  sends SURE notification (OE-01)
  |  sw1_post_audit: "OTM → Agent(<name>): Task assigned. Awaiting SURE ack."
  |
Agent receives notification
  |  sends acknowledgement (IE-03)
  |
[OTM] receives IE-03
  |  sw1_post_audit: "Agent(<name>): Task acknowledged (IE-03)"
  |
Agent starts work

Flow 1b — Task Assigned, Agent Busy

Orchestrator sets assigned_to on New task
  |
[SE-1] → IE-01 → [OTM-0] → routes to OTM-2
  |
[OTM-2] checks registry: agent is BUSY
  |  sw1_update_fields: status = "Pending"
  |  sw1_post_audit: "Status: Assigned → Pending (TT-03). Agent busy with <current_task>"
  |
Task waits silently. No agent notification.

Flow 2a — Subtask Done (not last)

Agent completes subtask → reports to OTM API (IE-02)
  |
[OTM-3] validates: subtask belongs to task, not already completed
  |  sw1_update_fields: subtask.todo_completed = true (Col00), subtask.status = done
  |  decrements subtasks_remaining (3 → 2)
  |  sw1_update_fields: subtasks_remaining
  |  sw1_post_audit: "Agent(<name>): Subtask done — <title>. Remaining: 2"
  |
Agent continues working.

Flow 2b — Last Subtask (task complete)

Agent completes final subtask → reports to OTM API (IE-02)
  |
[OTM-3] decrements (1 → 0)
  |  sw1_update_fields: status → "Agent Done", completed_at
  |  sw1_post_audit: "Status: In Progress → Agent Done (TT-04). All subtasks complete."
  |  executes AT-02: agent → idle
  |  notifies Orchestrator (OE-02)
  |  sw1_post_audit: "OTM → Orchestrator: Task ready for review"
  |  check_next_task_for_agent()
  |    → Pending task found? → TT-05 → OTM-0 → OTM-2 → Flow 1a
  |    → No pending? → agent stays idle

Flow 3a — Orchestrator Validates

Orchestrator tells OTM to validate task (IE-04)
  |
[OTM-4] verifies status = "Agent Done"
  |  sw1_post_audit: "Orchestrator: Validation requested"
  |  sw1_update_fields: status → "Done", validated_at, todo_completed = true (Col00)
  |  sw1_post_audit: "Status: Agent Done → Done (TT-06). Validated."
  |  notifies Orchestrator: confirmed (OE-02)

Flow 3b — Orchestrator Rejects and Requests Rework

Step 1: Orchestrator tells OTM to reject task (IE-05)
  |
[OTM-4] sw1_post_audit: "Orchestrator: Rejection. Reason: <reason>"
  |  sw1_update_fields: status → "Rejected"
  |  sw1_post_audit: "Status: Agent Done → Rejected (TT-07)"

Step 2: Orchestrator manages subtasks directly in Slack (NOT via OTM):
  |  - Deletes unnecessary/obsolete subtasks
  |  - Leaves completed subtasks in place (as record)
  |  - Creates new subtasks:
  |      1. "Acknowledge rework request: <detailed reason and instructions>"
  |      2. "Fix validation on email field"
  |      3. "Add unit tests for edge cases"

Step 3: Orchestrator tells OTM that rework is ready (IE-06)
  |
[OTM-4] receives rework signal:
  |  Counts unfinished subtasks via Slack API → 3
  |  sw1_update_fields: status → "Assigned", subtasks_remaining = 3
  |  sw1_post_audit: "Status: Rejected → Assigned (TT-08). Rework: 3 unfinished subtasks."

Step 4: Normal assignment flow (TT-02 → SURE → agent works)
  |
[OTM-0] detects Assigned → OTM-2 → agent idle → In Progress
  |  SURE notification posted to Slack task conversation (OE-01)
  |  Agent acknowledges (IE-03)
  |
Agent reads first subtask: "Acknowledge rework request: ..."
  |  Agent closes first subtask to acknowledge (IE-02)
  |  OTM-3 decrements: 3 → 2
  |  sw1_post_audit: "Agent(<name>): Acknowledged rework. Remaining: 2"
  |
Agent works through remaining subtasks normally

Flow 4 — Cancellation

Orchestrator tells OTM to cancel (IE-06) / Human changes status in Slack UI (IE-01)
  |
[OTM-0] routes to OTM-5
  |
[OTM-5] sw1_post_audit: "Cancellation requested by <actor>"
  |  sw1_update_fields: status → "Cancelled", cancelled_at = now
  |  executes AT-03: agent freed if was working
  |  sends cancellation notification (OE-04)
  |  sw1_post_audit: "Status: <previous> → Cancelled (TT-11)"
  |  check_next_task_for_agent() for freed agent

Flow 5 — Auto-Archival (TT-12)

[OTM-6 Watchdog] IE-08: cron tick fires (every 60s)
  |
  +-- ACT-W4: Query tasks WHERE:
  |     (status = "Done" AND validated_at + 7 days < now)
  |     OR (status = "Cancelled" AND cancelled_at + 7 days < now)
  |
  +-- FOR EACH archival candidate:
        Store previous_status
        sw1_update_fields: status → "Archived"
        sw1_post_audit: "Status: <previous> → Archived (TT-12). Auto-archived after 7 days."
        INSERT INTO task_history (snapshot of task fields)
        DELETE task from active tasks (SQLite only — Slack List item untouched)

📌 TT-12 is a watchdog-requested transition (IE-08 trigger). The watchdog detects the 7-day threshold; the OTM executes the archive. Archived tasks are snapshotted to task_history for long-term reporting before being removed from the active tasks table.

📌 Slack archive limitation: The OTM sets status = "Archived" in Slack via SW-1, but the actual "Archive item" action in the Slack UI cannot be triggered via API (slackLists.items.archive does not exist). Manual Slack UI archiving is required for items to visually disappear from the default Slack list view.

PART 3 — FILE-BASED PIPELINE

The file-based pipeline is the operational implementation of Systems 1–4. It bridges the Orchestrator's task-creation workflow to the Slack List and agent workspaces, using JSON files as the intermediary to avoid direct Slack API calls by agents.

System 1: Task Creation (`otm-create-task.sh`)

The Orchestrator (Claudia) creates tasks by running a shell script. This populates a JSON file and atomically increments the task ID counter.

Script Parameters

Parameter	Required	Default	Description
`--title`	✅	—	Short, actionable one-liner
`--agent`	✅	—	Who works on it: `claudia`, `devdas`, `archibald`, `frederic`, `salvatore`, `sylvain`, `rupert`
`--priority`	❌	`normal`	`critical` \| `high` \| `normal` \| `medium` \| `low` \| `batchable`
`--project`	❌	(none)	Free-text project name (e.g. `prj-012`, `fab-state`)
`--type`	❌	`action`	`action` \| `decision` \| `review`
`--subtask`	❌	`"Confirm that task has been done"`	Repeatable. Each value becomes a Slack subtask. If none provided, a default confirmation subtask is auto-added (required for completion detection)

Examples

# Simple action task
otm-create-task.sh --title "Fix login bug" --agent devdas --priority high --project prj-012

# Decision for Rupert
otm-create-task.sh --title "Approve budget for Q2" --agent rupert --type decision

# Task with subtasks
otm-create-task.sh --title "Build login page" --agent devdas --project prj-012 \
  --subtask "Create login form component" \
  --subtask "Add validation logic" \
  --subtask "Write unit tests"

# Input/acknowledgement pattern (replaces --description)
otm-create-task.sh --title "Review API design" --agent frederic --type review \
  --subtask "input: the API spec is at docs/api-v2.md" \
  --subtask "Check endpoint naming conventions" \
  --subtask "Validate error response format"

# Batchable priority (can wait for batch processing)
otm-create-task.sh --title "Update documentation" --agent archibald --priority batchable

JSON File Format (intermediate)

Written to ~/Library/Application Support/OpenClaw/otm/new-tasks/<timestamp>-<uuid>.json

{
  "id": "uuid",
  "taskId": "T-00035",
  "title": "Short task title",
  "agent": "devdas",
  "createdAt": "2026-03-14T08:20:19Z",
  "priority": "normal",
  "project": "prj-012",
  "subtasks": ["Subtask 1", "Subtask 2"],
  "type": "action",
  "status": "pending"
}

Task ID System

Format: T-NNNNN (5 digits, zero-padded)
Counter file: ~/Library/Application Support/OpenClaw/otm/next-task-id.json
Atomic increment with file locking (flock)
Auto-assigned by otm-create-task.sh — agents don't manage IDs manually

Type Values

Type	Purpose	Primary user
`action`	Task to execute (default)	Agents
`decision`	Requires a decision from someone	Rupert
`review`	Needs review / approval	Rupert or agents

System 2: Task Injection (`otm-injector.js`)

The injector watches the new-tasks/ directory and publishes JSON task files to the Slack Lists API.

Triggers

Component	File	Trigger
Watcher	`ai.openclaw.otm-watcher.plist`	WatchPaths on `new-tasks/` (and `task-updates/`)
Sweeper	`ai.openclaw.otm-sweeper.plist`	Every 10 min (catches misses)

Idempotency (Dedup Check)

The injector includes a dedup check before creating tasks in Slack to prevent duplicates on crash/retry:

Before calling slackLists.items.create, the injector fetches all existing items
Scans for a matching Task ID (Col0ALVK2NA1E)
If the Task ID already exists → skips creation, moves file to processed/ with _skipped_duplicate: true
If the dedup API call fails → proceeds with creation (better to duplicate than lose a task)

This prevents duplicate Slack items when the watcher or sweeper re-processes a file already injected (e.g., after a crash, retry, or race condition).

Directories

Path	Purpose
`~/Library/Application Support/OpenClaw/otm/new-tasks/`	Inbox — pending task files
`~/Library/Application Support/OpenClaw/otm/task-updates/`	Inbox — update files (System 3b)
`~/Library/Application Support/OpenClaw/otm/processed/`	Successfully injected
`~/Library/Application Support/OpenClaw/otm/failed/`	Failed (with error metadata)

System 3: Task Dispatcher (`otm-dispatcher.js`)

A lightweight Node.js scanner that detects tasks with an assignee and dispatches them to the appropriate agent workspace. Runs every 2 minutes via launchd.

What it does

Fetches all tasks from the Slack list (F0ALE8DCW1F)
Finds tasks where: status = "new" AND assignee is set (not empty)
For each matching task (in this exact order for crash safety):
1. Writes a dispatch file to the agent's workspace: /Volumes/OPENCLAW/CLAUDIA/rapido-openclaw/workspaces/<agent>-workspace/task-dispatch.json
2. Updates the task's status in Slack from new → assigned + sets assigned_at (this is the dedup gate — once assigned, future runs skip it)
3. Triggers the agent session via Gateway WebSocket RPC
After all tasks processed, all agent triggers fire in parallel — agents start concurrently
Writes its own component state file (System 5 heartbeat)

Agent Triggering via Gateway WebSocket RPC

After writing task-dispatch.json, the dispatcher actively triggers each agent via the OpenClaw Gateway WebSocket RPC. This eliminates the passive "wait for heartbeat" gap — agents start working immediately.

Protocol: WebSocket JSON-RPC to ws://127.0.0.1:18789

Dispatcher                    Gateway                     Agents
  │                              │                          │
  │ ws.connect()                 │                          │
  │─────────────────────────────►│                          │
  │                              │                          │
  │ { method: "agent",           │                          │
  │   params: {                  │                          │
  │     agentId: "archibald",    │                          │
  │     message: "Task T-00044   │                          │
  │       dispatched...",        │                          │
  │     idempotencyKey:          │                          │
  │       "otm-T-00044"         │                          │
  │   }}                         │                          │
  │─────────────────────────────►│ ──► archibald session ──►│
  │                              │                          │
  │ { method: "agent", ...       │                          │
  │   agentId: "devdas" }        │                          │
  │─────────────────────────────►│ ──► devdas session ─────►│
  │                              │                          │
  │ { method: "agent", ...       │                          │
  │   agentId: "salvatore" }     │                          │
  │─────────────────────────────►│ ──► salvatore session ──►│
  │                              │                          │
  │ ws.close()                   │     (all 3 run in        │
  │─────────────────────────────►│      parallel)           │
  │                              │                          │
  │  Dispatcher exits.           │  Gateway manages         │
  │  Total time: ~2 seconds.     │  concurrent sessions.    │

RPC call format:

{
  "method": "agent",
  "params": {
    "message": "Task T-00044 dispatched. Check task-dispatch.json and execute it.",
    "agentId": "archibald",
    "idempotencyKey": "otm-T-00044"
  }
}

Response (immediate, non-blocking):

{
  "runId": "otm-T-00044",
  "status": "accepted",
  "acceptedAt": 1773504558104
}

Key properties:

Parallel: All agent sessions start concurrently — no serialization
Non-blocking: Gateway returns accepted immediately; dispatcher doesn't wait
Auth: Gateway token passed via WebSocket connection headers
Rupert excluded: Human users are notified via Slack UI, not RPC

⚠️ idempotencyKey is NOT idempotent. Despite the name, the Gateway agent RPC method accepts duplicate calls with the same key — it uses the key as a runId label only. Calling twice with the same key = two separate agent sessions. Idempotency is the dispatcher's responsibility, not the gateway's. The Slack status flip (new → assigned) is the sole dedup mechanism for the dispatcher. Once a task is assigned, it's invisible to future dispatcher runs.

vs. HTTP Webhook alternative (POST /hooks/agent): The webhook approach serializes agent sessions on CommandLane.Nested — agents run one at a time. With 6 agents × ~3 min each = ~18 min sequential vs. ~3 min parallel via WebSocket RPC. WebSocket is the correct choice for multi-agent dispatch.

Dispatch Operation Order (crash safety)

The dispatcher must execute operations in this exact order to prevent duplicate agent triggers:

1. Write task-dispatch.json to agent workspace
2. Update Slack status: new → assigned (+ set assigned_at)
3. Trigger agent via WS RPC

Why this order matters:

Crash point	Result	Recovery
After step 1, before step 2	File written, Slack still `new`	Next dispatcher run re-dispatches → file already has the task (append-only), agent gets triggered. Safe but duplicate file entry.
After step 2, before step 3	Slack says `assigned`, agent never woke up	Agent picks up task on next heartbeat or manual trigger. Safe — delayed but not lost.
After step 3	All done	Clean path.

The dangerous alternative (trigger agent FIRST, then update Slack) risks: crash after trigger → Slack still new → next run triggers agent AGAIN → duplicate work, wasted tokens. This is why Slack update must come before agent trigger.

Dispatch File Format (`task-dispatch.json`)

{
  "dispatched": [
    {
      "taskId": "T-00033",
      "title": "Build login page",
      "priority": "high",
      "project": "PRJ-012 App",
      "type": "action",
      "subtasks": ["Create form", "Add validation", "Write tests"],
      "dispatchedAt": "2026-03-14T15:00:00Z",
      "slackItemId": "Rec0ALXYZ"
    }
  ]
}

The dispatch file is APPEND-ONLY — new tasks get added to the dispatched array. The agent removes entries when they pick them up (or marks them as "picked": true).

Agent-Side Convention

On session start, agents check for task-dispatch.json. If present, they:

Pick up the highest-priority task
Update their agent-state.json to "working" with the task
Start working on it
When done, mark subtasks complete via otm-update-task.sh

Agent Workspace Mapping

Agent	Workspace path
claudia	`claudia-workspace`
devdas	`devdas-workspace`
archibald	`archibald-workspace`
frederic	`frederic-workspace`
salvatore	`salvatore-workspace`
sylvain	`sylvain-workspace`
rupert	(skip — human, notified via Slack UI)

Component

Item	Detail
File	`otm-dispatcher.js`
Plist	`ai.openclaw.otm-dispatcher.plist`
Schedule	Every 120 seconds
Log	`~/Library/Logs/OpenClaw/otm-dispatcher.log`
State file	`otm-dispatcher-state.json`

System 3b: Task Updates (`otm-update-task.sh`)

A file-drop mechanism that allows agents to update task status and mark subtasks complete — same security model as task creation (agents never touch the Slack API directly).

Script: `otm-update-task.sh`

# Mark a subtask done
otm-update-task.sh --task-id T-00033 --subtask-done "Create login form"

# Update task status
otm-update-task.sh --task-id T-00033 --status in_progress

# Report blocked
otm-update-task.sh --task-id T-00033 --status blocked --reason "Waiting on API key"

# Multiple subtasks done at once
otm-update-task.sh --task-id T-00033 \
  --subtask-done "Create login form" \
  --subtask-done "Add validation logic"

Parameters

Parameter	Required	Description
`--task-id`	✅	Task ID (`T-NNNNN`)
`--status`	❌	New status: `in_progress` \| `blocked` \| `agent_done`
`--subtask-done`	❌	Subtask title to mark as done (repeatable)
`--reason`	❌	Reason text (used with `--status blocked`)

At least one of --status or --subtask-done is required.

JSON Format (intermediate)

Written to ~/Library/Application Support/OpenClaw/otm/task-updates/<timestamp>-<uuid>.json

{
  "id": "uuid",
  "taskId": "T-00033",
  "action": "update",
  "createdAt": "2026-03-14T16:00:00Z",
  "status": "in_progress",
  "subtasksDone": ["Create login form"],
  "reason": null
}

Processing

The otm-injector.js is extended to also watch task-updates/ directory:

Reads the update JSON
Looks up the task in Slack by Task ID (scans items, matches Col0ALVK2NA1E)
If status is set → updates the Status column
If subtasksDone is set → finds matching child items by title → sets their status to done AND sets Col00 (checkbox) to true
Moves file to processed/ on success, failed/ on error

Component

Item	Detail
Script	`otm-update-task.sh` (shared tool)
Processor	`otm-injector.js` (extended)
Watcher	`ai.openclaw.otm-watcher.plist` (updated WatchPaths)

System 4: Completion Detection (`otm-completion-detector.js`)

A lightweight scanner that auto-promotes tasks to agent_done when all subtasks are complete.

Every task has at least one subtask — otm-create-task.sh auto-adds "Confirm that task has been done" if no subtasks are provided. This guarantees the completion detector always has a signal.

Flow

Completion Detector (Node.js, launchd every 2 min)
  │
  ├── GET all tasks from Slack list
  ├── FILTER: status = "in_progress" + has child items (subtasks)
  ├── CHECK: all child items have status ∈ {done, agent_done} OR Col00 = true
  │
  └── YES → UPDATE parent task status → "agent_done"

Rules

Condition	Action
Task `in_progress` + all subtasks `done`/`agent_done`	→ promote to `agent_done`
Task `in_progress` + some subtasks still open	→ skip (work in progress)
Task not `in_progress`	→ skip (not active)

What happens after `agent_done`

Technical tasks: Claudia validates the work → done
Business/decision tasks: Claudia creates a review task for Rupert → Rupert approves → done

Component

Item	Detail
File	`otm-completion-detector.js`
Plist	`ai.openclaw.otm-completion-detector.plist`
Schedule	Every 120 seconds
Log	`~/Library/Logs/OpenClaw/otm-completion-detector.log`

System 5: Component Heartbeats

Each OTM component writes a state file after every run. These files are watched by the collector and surfaced on the dashboard.

State File Format

{
  "component": {
    "id": "otm-injector",
    "status": "alive",
    "lastRun": "2026-03-14T10:15:00Z",
    "result": "success",
    "details": "Processed 2 tasks, 0 failures"
  }
}

State Files

Component	File	Written by
OTM Injector	`otm-injector-state.json`	`otm-injector.js` (end of each run)
Dispatcher	`otm-dispatcher-state.json`	`otm-dispatcher.js` (end of each run)
Completion Detector	`otm-completion-detector-state.json`	`otm-completion-detector.js` (end of each run)
Watcher	`otm-watcher-state.json`	watcher wrapper

All files: ~/Library/Application Support/OpenClaw/otm/

Dashboard

The OTM Components card shows each component's status with staleness color coding:

🟢 Green: last run < 5 min ago
🟡 Yellow: last run 5–15 min ago
🔴 Red: last run > 15 min ago (action required)

Observability Stack

State file → collector.js (FSEvents) → SQLite otm_state table
                                             ↓
                               reader.js polls + broadcasts
                                             ↓
                                    Dashboard WebSocket

System 6: DMZ Relay

The DMZ relay bridges the private OpenClaw state to the public Vercel dashboard. It runs on a Synology NAS in the DMZ.

Architecture

OpenClaw VM (private)     Synology NAS (DMZ)           Browser (Vercel dashboard)
  │                         ┌──────────────────┐
  │  collector.js           │  RECEIVER         │
  │  pushToRelay()          │  127.0.0.1:3456   │
  │ ── HTTP POST ─────────► │  + bearer token   │
  │  on state change         │  + atomic write   │
  │                          │                   │
  │                          │  fab-state.json   │ ← shared state file
  │                          │                   │
  │                          │  BROADCASTER      │
  │                          │  0.0.0.0:3457     │
  │                          │  (TLS via proxy)  │ ◄── wss://nas.domain/ws
  │                          │  fs.watch → push  │ ◄── GET /api/state
  │                          └──────────────────┘

Services

Service	File	Port	Binding	Dependencies
Receiver	`receiver.js`	3456	`127.0.0.1` (localhost only)	Zero — pure Node.js
Broadcaster	`broadcaster.js`	3457	`0.0.0.0` (behind TLS proxy)	`ws` package only

Env Vars

Service	Var	Required	Description
Receiver	`FAB_RELAY_TOKEN`	✅	Shared secret bearer token
Receiver	`STATE_FILE`	❌	Path to state file (default: `./fab-state.json`)
Broadcaster	`STATE_FILE`	❌	Same state file path
Collector (OpenClaw)	`FAB_RELAY_URL`	❌	If set, enables relay push
Collector (OpenClaw)	`FAB_RELAY_TOKEN`	❌	Must match receiver token

Security Model

Layer	Protection
Bearer token	Constant-time comparison (timing-attack safe)
Receiver binding	`127.0.0.1` — not reachable from internet
Firewall	Port 3456: allow ONLY from OpenClaw VM IP
Broadcaster	Read-only; no auth needed (non-sensitive data)
TLS	All external traffic via Synology reverse proxy + Let's Encrypt
Atomic write	Receiver writes `.tmp` → rename; no partial reads

Collector Integration

// Only runs if FAB_RELAY_URL is set:
pushStateToRelay(); // called after every gateway/agent/OTM state change

The pushToRelay() function in collector.js:

Builds full snapshot from SQLite (gateway + agents + OTM)
POSTs to FAB_RELAY_URL with bearer token
Handles errors gracefully — relay down = warning log, not crash

Files

File	Location
`receiver.js`	`work/PROJECTS/fab-state/synology-relay/`
`broadcaster.js`	`work/PROJECTS/fab-state/synology-relay/`
`package.json`	`work/PROJECTS/fab-state/synology-relay/`
`SETUP-GUIDE.md`	`work/PROJECTS/fab-state/synology-relay/`

See work/PROJECTS/strategy-openclaw-org/docs/FAB-STATE.md System 6 for full documentation.

Full Pipeline Flow

Claudia              Filesystem            Injector         Slack            Dispatcher           Agent Workspace
  │                     │                     │               │                  │                      │
  │ otm-create-task.sh  │                     │               │                  │                      │
  │────────────────────►│ .json               │               │                  │                      │
  │                     │─── watcher ────────►│               │                  │                      │
  │                     │                     │ items.create  │                  │                      │
  │                     │                     │──────────────►│ status=new       │                      │
  │                     │                     │  + subtasks   │                  │                      │
  │                     │ move to processed/  │               │                  │                      │
  │                     │◄────────────────────│               │                  │                      │
  │                     │                     │               │                  │                      │
  │                     │                     │               │ ◄── scan ───────│ (every 2 min)        │
  │                     │                     │               │ new + assignee   │                      │
  │                     │                     │               │ ── update ──────►│                      │
  │                     │                     │               │ status=assigned  │                      │
  │                     │                     │               │                  │ task-dispatch.json   │
  │                     │                     │               │                  │─────────────────────►│
  │                     │                     │               │                  │                      │
  │                     │                     │               │                  │ WS RPC: agent()      │
  │                     │                     │               │                  │──► Gateway ──► Agent │
  │                     │                     │               │                  │   (parallel start)   │
  │                     │                     │               │                  │                      │
  │                     │                     │               │                  │      (agent works)   │
  │                     │                     │               │                  │                      │
  │                     │                     │               │ ◄── scan (completion detector, 2 min)  │
  │                     │                     │               │ in_progress +    │                      │
  │                     │                     │               │ all subtasks done│                      │
  │                     │                     │               │ → agent_done     │                      │

All Pipeline Components Summary

Component	File	Trigger
Task creator	`otm-create-task.sh`	Called by agents
Injector	`otm-injector.js`	Called by watcher/sweeper
Watcher	`ai.openclaw.otm-watcher.plist`	WatchPaths on `new-tasks/` + `task-updates/`
Sweeper	`ai.openclaw.otm-sweeper.plist`	Every 10 min (catches misses)
Dispatcher	`otm-dispatcher.js`	Every 2 min (launchd)
Completion detector	`otm-completion-detector.js`	Every 2 min (launchd)

8. Error Monitor (OTM-7)

The Error Monitor is a dedicated OTM component that detects state inconsistencies, traces errors for lessons learned, and triggers corrective actions. It runs as part of the Watchdog cycle (IE-08, every 60s) but is logically separate from the Watchdog's operational checks (OTM-6).

8.1 Error Condition Catalogue

Each error condition is coded ERR-xx. The Error Monitor detects; the OTM corrects.

Code	Condition	Detection Rule	Severity	Auto-correction	Manual escalation
ERR-01	Stale In Progress	Task status = "In Progress" AND no IE-02 subtask report in >10 minutes	Warning	None — alert only	OE-05: Admin notified. May indicate agent crash, stuck task, or slow work.
ERR-02	Agent-Task Mismatch (busy agent, no task)	Agent `status = busy` AND `current_task` not found in Slack List (or task status ≠ In Progress)	Critical	Set agent idle (AT-04), call `check_next_task_for_agent()`	OE-05: Admin alert with details of orphaned agent state
ERR-03	Agent-Task Mismatch (idle agent, active task)	Task status = "In Progress" AND assigned agent `status = idle` in registry	Critical	Set agent busy, re-send SURE notification (OE-01)	OE-05: Admin alert — state was inconsistent
ERR-04	Orphaned Pending Task	Task status = "Pending" AND assigned agent `status = idle`	High	Promote task: TT-05 → re-evaluate via OTM-2	OE-07: Audit log on task
ERR-05	Counter Mismatch	`subtasks_remaining` ≠ actual count of unchecked subtasks in Slack List	High	Recalculate and fix counter via SW-1	OE-05: Admin alert with old/new values
ERR-06	SURE Timeout	OE-01 sent >3 minutes ago (1min + 2min retries) AND no IE-03 received	Critical	None — task stays In Progress	OE-05: Admin alert. Agent may be unreachable, or gateway may have restarted.
ERR-07	Multiple Active Tasks per Agent	Agent has >1 task with status = "In Progress" assigned to them	Critical	Keep oldest task, move others to Pending	OE-05: Admin alert — invariant violation
ERR-08	Stuck in Assigned	Task status = "Assigned" for >5 minutes (should transition immediately to In Progress or Pending)	High	Re-trigger OTM-2 for the task	OE-05: Admin alert if re-trigger fails
ERR-09	Stuck in Rejected	Task status = "Rejected" for >24 hours (Orchestrator hasn't submitted rework or cancelled)	Warning	None — alert only	OE-05: Remind Orchestrator to act
ERR-10	Ghost Agent	`assigned_to` field references a Slack user ID not in agent registry AND not auto-registerable	Critical	Task stays in current status	OE-05: Admin alert — unknown agent
ERR-11	Duplicate Subtask Reports	Same subtask_id reported done >1 time (idempotency check caught it)	Info	Silently discarded (idempotent)	Logged in event log (§9) for pattern analysis
ERR-12	Stale Rework (no ack subtask closed)	Task in "In Progress" after TT-08 rework, first subtask ("Acknowledge rework…") not closed within 10 minutes	Warning	None — alert only	OE-05: Agent may not have read rework instructions

8.2 Error Monitor Processing

OTM-7 runs every 60s (piggybacks on IE-08 watchdog cron):
  |
  FOR EACH error check ERR-01 through ERR-12:
    |
    +-- Run detection query (SQLite + Slack API as needed)
    |
    +-- IF condition detected:
    |     1. Log error to event_log table (§9): {error_code, task_id, agent_id, details, timestamp}
    |     2. Log to task conversation (OE-07): "[timestamp] OTM-7 ERROR: ERR-xx detected — <description>"
    |     3. IF auto-correctable: execute correction, log correction action
    |     4. IF manual escalation: send OE-05 alert to admin
    |     5. Increment error counter in error_stats table (§10)
    |
    +-- IF condition NOT detected: skip

8.3 Lessons Learned Pipeline

Errors are not just fixed — they feed a continuous improvement loop.

Error Statistics Table (error_stats in error DB, see §10): Tracks frequency, first/last occurrence, auto-correction success rate per ERR-xx code.
Daily Error Report: OTM-7 generates a summary during the first watchdog cycle after 00:00 each day:
- Error counts by code (ERR-01 through ERR-12)
- Most frequent errors
- Auto-correction success/failure ratio
- New error patterns (first-time occurrences)
Threshold Alerts: Alert on EVERY error (threshold = 1). During startup and early operation, all anomalies are surfaced immediately via OE-05. The threshold can be raised once the system is stable and baseline error rates are understood.
Root Cause Tagging: Admin can tag errors with root cause via OTM API (POST /api/otm/error/{id}/tag), enabling aggregate analysis.

8.4 Corrective Action Summary

Action	Triggered by	Effect
Re-send SURE notification	ERR-03 (idle agent, active task)	Resynchronise agent with its task
Promote pending task	ERR-04 (orphaned pending)	Unblock queued work
Recalculate counter	ERR-05 (counter mismatch)	Fix data integrity
Free orphaned agent	ERR-02 (busy agent, no task)	Unblock agent for new work
Re-trigger OTM-2	ERR-08 (stuck in Assigned)	Retry the assignment flow
Move excess tasks to Pending	ERR-07 (multiple active tasks)	Restore single-task invariant

9. Event Logging & Observability

All OTM events are logged to three complementary systems:

Slack task conversation (OE-07) — human-readable, per-task, searchable in Slack (§7)
Internal event log (event_log table in error DB) — structured, queryable, machine-readable
Internal event log files — filesystem mirrors of database writes, for real-time tailing during tests and startup (§9.6)

9.1 Event Log Schema

CREATE TABLE event_log (
  id INTEGER PRIMARY KEY AUTOINCREMENT,
  timestamp INTEGER NOT NULL,          -- Unix timestamp (ms precision)
  event_type TEXT NOT NULL,            -- 'inbound' | 'outbound' | 'transition' | 'error' | 'correction' | 'system'
  event_code TEXT NOT NULL,            -- IE-xx, OE-xx, TT-xx, AT-xx, ERR-xx
  task_id TEXT,                        -- Slack List item ID (NULL for system events)
  agent_id TEXT,                       -- slack_user_id (NULL if not agent-related)
  handler TEXT,                        -- OTM-0 through OTM-7, SE-1, SW-1
  source TEXT NOT NULL,                -- 'se1', 'otm_api', 'watchdog', 'error_monitor', 'internal'
  detail TEXT,                         -- JSON blob with event-specific data
  duration_ms INTEGER,                 -- Processing time for this event
  success INTEGER DEFAULT 1,           -- 1 = success, 0 = failure
  error_message TEXT                   -- Error details if success = 0
);

CREATE INDEX idx_event_log_task ON event_log(task_id);
CREATE INDEX idx_event_log_agent ON event_log(agent_id);
CREATE INDEX idx_event_log_type ON event_log(event_type, timestamp);
CREATE INDEX idx_event_log_code ON event_log(event_code, timestamp);
CREATE INDEX idx_event_log_time ON event_log(timestamp);

9.2 What is Logged

Every IE-xx, OE-xx, TT-xx, AT-xx, and ERR-xx event produces one row in event_log. This includes:

Event Category	Examples	Logged Fields
Inbound events	IE-01 (Slack event), IE-02 (subtask report), IE-03 (SURE ack)	source, task_id, agent_id, raw payload in detail
Outbound events	OE-01 (SURE notification), OE-06 (Slack write)	target, task_id, delivery status, duration_ms
Task transitions	TT-01 through TT-15	from_status, to_status, task_id, requesting_actor
Agent transitions	AT-01 through AT-04	from_status, to_status, agent_id, triggering_task
Errors	ERR-01 through ERR-12	error_code, detection_details, correction_applied
System events	Startup, reconciliation, watchdog cycle	cycle_number, checks_run, anomalies_found

9.3 Observability Queries

The event log enables:

-- Task lifecycle: all events for a specific task
SELECT * FROM event_log WHERE task_id = ? ORDER BY timestamp;

-- Agent activity: all events involving a specific agent
SELECT * FROM event_log WHERE agent_id = ? ORDER BY timestamp;

-- Error frequency: last 24 hours
SELECT event_code, COUNT(*) as count
FROM event_log
WHERE event_type = 'error' AND timestamp > ?
GROUP BY event_code ORDER BY count DESC;

-- Average task completion time
SELECT AVG(e2.timestamp - e1.timestamp) / 1000 / 60 as avg_minutes
FROM event_log e1
JOIN event_log e2 ON e1.task_id = e2.task_id
WHERE e1.event_code = 'TT-02' AND e2.event_code = 'TT-04';

-- Slowest handlers (performance monitoring)
SELECT handler, AVG(duration_ms), MAX(duration_ms), COUNT(*)
FROM event_log
WHERE duration_ms IS NOT NULL
GROUP BY handler ORDER BY AVG(duration_ms) DESC;

-- SURE acknowledgement response times
SELECT AVG(ack.timestamp - notif.timestamp) / 1000 as avg_seconds
FROM event_log notif
JOIN event_log ack ON notif.task_id = ack.task_id
WHERE notif.event_code = 'OE-01' AND ack.event_code = 'IE-03';

9.4 Retention & Historicisation

Data	Retention	Archive strategy
`event_log` (active)	30 days	Rows older than 30 days → `event_log_archive`
`event_log_archive`	1 year	Monthly SQLite dump to filesystem (gzipped)
`error_stats`	Indefinite	Cumulative counters, never purged
Slack conversation audit	Indefinite	Lives in Slack (Slack's retention policy applies)

Maintenance job (runs during OTM-6 watchdog, daily at 03:00):

-- Move old events to archive
INSERT INTO event_log_archive SELECT * FROM event_log WHERE timestamp < (now - 30 days);
DELETE FROM event_log WHERE timestamp < (now - 30 days);

-- Vacuum to reclaim space
VACUUM;

9.5 Why Not Sentry?

Internal structured logging (SQLite) is chosen over Sentry because:

No external dependency — OTM is self-contained
Queryable — SQL enables arbitrary analysis (Sentry requires its query language)
Correlated with task data — same DB, JOIN-able with agent registry
Low volume — estimated <10,000 events/day (see §10), no need for distributed tracing
Cost — zero (SQLite is free; Sentry has per-event pricing)
Privacy — all data stays on the OpenClaw server

If event volume exceeds 100,000/day or distributed tracing across multiple servers becomes needed, Sentry or OpenTelemetry would be reconsidered.

9.6 Filesystem Log File Mirroring

All database writes to event_log and error_stats are mirrored to two filesystem log files in real-time:

File	Content	Format	Purpose
`{OPENCLAW_DATA_DIR}/otm/logs/otm-events.log`	All event_log inserts	`[ISO-timestamp] [event_code] [handler] [task_id] [agent_id] detail_json`	Monitor all OTM activity via `tail -f`
`{OPENCLAW_DATA_DIR}/otm/logs/otm-errors.log`	All ERR-xx detections + corrections	`[ISO-timestamp] [ERR-xx] [severity] [task_id] [agent_id] description [correction: action/none]`	Monitor errors during tests and startup

Implementation: Every INSERT INTO event_log and every error detection in OTM-7 appends one line to the corresponding log file. This is a synchronous append (negligible overhead at <350 events/day).

Log rotation: Daily at 03:00, rename to otm-events.log.YYYY-MM-DD and otm-errors.log.YYYY-MM-DD. Keep 30 days of rotated files. Older files deleted automatically.

Usage during development/testing:

# Watch all OTM events in real time
tail -f {OPENCLAW_DATA_DIR}/otm/logs/otm-events.log

# Watch errors only
tail -f {OPENCLAW_DATA_DIR}/otm/logs/otm-errors.log

# Filter for a specific task
tail -f otm-events.log | grep "T-00042"

# Filter for a specific error code
tail -f otm-errors.log | grep "ERR-03"

📌 The log files are append-only mirrors — the database remains the source of truth for queries and analysis. The files exist purely for human monitoring convenience.

10. Database Design (SQLite)

10.1 Overview

The OTM uses two separate SQLite databases:

Main OTM DB (otm.db) — Agent registry, SURE pending, task history. Core operational state.
Error & Event DB (otm-errors.db) — Event log, error statistics. Monitoring and observability. Separated so that error monitoring is independent from the main OTM application and can be analysed, reset, or rebuilt without affecting operations.

Database files:

{OPENCLAW_DATA_DIR}/otm/otm.db — Main OTM DB
{OPENCLAW_DATA_DIR}/otm/otm-errors.db — Error & Event DB
(Sylvain to confirm exact OPENCLAW_DATA_DIR path)

Library: better-sqlite3 (synchronous, fast, WAL mode for both)

10.2 Tables

Main OTM DB (otm.db):

Table	Purpose	Writer(s)	Reader(s)	Rows (steady state)	Growth rate
`agents`	Agent registry (§4.5)	OTM-1	OTM-0, OTM-2, OTM-3, OTM-5, OTM-6	5–15	Near-zero (new agents rare)
`task_history`	Snapshot of archived tasks (§Flow 5)	OTM-6 (TT-12)	Admin queries, reporting	Growing	~20–50/month
`sure_pending`	Outstanding SURE notifications awaiting ack (§6)	OTM-2, OTM ack handler	OTM-7 (timeout check)	0–5	Transient (cleared on ack/timeout)

Error & Event DB (otm-errors.db):

Table	Purpose	Writer(s)	Reader(s)	Rows (steady state)	Growth rate
`event_log`	Structured event log (§9.1)	All OTM handlers	OTM-7, admin queries	~10,000	~300/day (purged monthly)
`event_log_archive`	Archived events >30 days (§9.4)	Maintenance job	Admin queries only	~100,000	~9,000/month
`error_stats`	Error frequency counters (§8.3)	OTM-7	OTM-7, daily report	12 rows (one per ERR-xx)	Fixed

10.3 Full Schema

Main OTM DB (otm.db):

-- Agent Registry (see §4.5 for column details)
CREATE TABLE agents (
  slack_user_id TEXT PRIMARY KEY,
  otm_display_name TEXT NOT NULL,
  openclaw_agent_id TEXT,
  agent_type TEXT NOT NULL DEFAULT 'ai',
  status TEXT DEFAULT 'idle',
  current_task TEXT,
  task_started_at INTEGER,
  last_seen INTEGER
);

-- Task History (archived tasks snapshot)
CREATE TABLE task_history (
  id INTEGER PRIMARY KEY AUTOINCREMENT,
  task_id TEXT NOT NULL,               -- Original Slack List item ID
  title TEXT NOT NULL,
  assigned_to TEXT,                     -- slack_user_id
  final_status TEXT NOT NULL,          -- "Archived" (from Done or Cancelled)
  previous_status TEXT,                -- Status before archival
  priority INTEGER,
  context TEXT,
  subtask_count INTEGER,               -- Total subtasks at archival time
  created_at INTEGER,                  -- Task creation timestamp
  assigned_at INTEGER,
  completed_at INTEGER,
  validated_at INTEGER,
  cancelled_at INTEGER,
  archived_at INTEGER NOT NULL,        -- When TT-12 executed
  result_summary TEXT,
  total_duration_ms INTEGER,           -- assigned_at → completed_at
  review_duration_ms INTEGER,          -- completed_at → validated_at
  rework_count INTEGER DEFAULT 0,      -- Number of TT-08 rework cycles
  error_count INTEGER DEFAULT 0        -- Number of ERR-xx events during lifecycle
);

CREATE INDEX idx_task_history_agent ON task_history(assigned_to);
CREATE INDEX idx_task_history_status ON task_history(final_status);
CREATE INDEX idx_task_history_archived ON task_history(archived_at);

-- SURE Pending Notifications (see §6)
CREATE TABLE sure_pending (
  task_id TEXT PRIMARY KEY,
  agent_id TEXT NOT NULL,
  notification_type TEXT NOT NULL,     -- 'task_assigned' | 'rework_assigned'
  sent_at INTEGER NOT NULL,            -- First OE-01 sent
  retry_count INTEGER DEFAULT 0,       -- 0, 1, 2, 3 (max)
  last_retry_at INTEGER,
  acknowledged_at INTEGER              -- Set when IE-03 received. NULL = still pending.
);

Error & Event DB (otm-errors.db):

-- Event Log (see §9.1)
CREATE TABLE event_log (
  id INTEGER PRIMARY KEY AUTOINCREMENT,
  timestamp INTEGER NOT NULL,
  event_type TEXT NOT NULL,
  event_code TEXT NOT NULL,
  task_id TEXT,
  agent_id TEXT,
  handler TEXT,
  source TEXT NOT NULL,
  detail TEXT,
  duration_ms INTEGER,
  success INTEGER DEFAULT 1,
  error_message TEXT
);

CREATE INDEX idx_event_log_task ON event_log(task_id);
CREATE INDEX idx_event_log_agent ON event_log(agent_id);
CREATE INDEX idx_event_log_type ON event_log(event_type, timestamp);
CREATE INDEX idx_event_log_code ON event_log(event_code, timestamp);
CREATE INDEX idx_event_log_time ON event_log(timestamp);

-- Event Log Archive (identical schema)
CREATE TABLE event_log_archive (
  id INTEGER PRIMARY KEY,
  timestamp INTEGER NOT NULL,
  event_type TEXT NOT NULL,
  event_code TEXT NOT NULL,
  task_id TEXT,
  agent_id TEXT,
  handler TEXT,
  source TEXT NOT NULL,
  detail TEXT,
  duration_ms INTEGER,
  success INTEGER DEFAULT 1,
  error_message TEXT
);

-- Error Statistics (see §8.3)
CREATE TABLE error_stats (
  error_code TEXT PRIMARY KEY,         -- ERR-01 through ERR-12
  total_count INTEGER DEFAULT 0,
  last_24h_count INTEGER DEFAULT 0,    -- Reset daily by maintenance job
  first_seen INTEGER,                  -- Unix timestamp
  last_seen INTEGER,                   -- Unix timestamp
  auto_corrected_count INTEGER DEFAULT 0,
  escalated_count INTEGER DEFAULT 0
);

10.4 Database Maintenance

Operation	Frequency	Triggered by	Description
Event log rotation	Daily (03:00)	OTM-6 watchdog + time check	Move events >30 days to `event_log_archive`
Archive export	Monthly (1st, 03:30)	OTM-6 watchdog + date check	Dump `event_log_archive` to gzipped SQL file on disk, then TRUNCATE
Error stats reset	Daily (00:00)	OTM-6 watchdog + time check	Reset `last_24h_count` to 0 for all ERR-xx rows
SURE cleanup	Every 60s	OTM-7 error monitor	Remove `sure_pending` rows where `acknowledged_at` IS NOT NULL and >1 hour old
VACUUM	Weekly (Sunday 03:00)	OTM-6 watchdog + day check	Reclaim disk space after deletions
WAL checkpoint	Automatic	SQLite WAL mode	Handled by `better-sqlite3` automatically
Backup	Daily (04:00)	Sylvain's backup cron	Copy `otm.db` to backup location (standard server backup)

10.5 Volume Estimates

Assumptions: 5 active agents, ~10 tasks created/day, ~3 subtasks/task average, watchdog runs 1,440×/day.

Main OTM DB (otm.db):

Table	Writes/day	Reads/day	Steady-state rows	Disk (est.)
`agents`	~10 (status flips)	~500 (every handler checks registry)	5–15	<1 KB
`task_history`	~1–2 (archival events)	~5 (reporting queries)	~500/year	~500 KB
`sure_pending`	~20 (insert + update on ack)	~1,440 (timeout checks)	0–5 (transient)	<1 KB
Subtotal	~30/day	~1,950/day	~520	<1 MB

Error & Event DB (otm-errors.db):

Table	Writes/day	Reads/day	Steady-state rows	Disk (est.)
`event_log`	~300 (all events)	~50 (error monitor + queries)	~9,000 (30-day window)	~5 MB
`event_log_archive`	~9,000/month (from rotation)	~5/month (admin queries)	~100,000 (1-year window)	~50 MB
`error_stats`	~20 (counter increments)	~1,440 (every watchdog cycle)	12	<1 KB
Subtotal	~320/day	~1,500/day	~109,000	~55 MB

📌 At this scale, SQLite is well within its performance envelope for both databases. The separation means the error DB can be independently analysed, reset, or rebuilt without affecting OTM operations. A weekly VACUUM on each keeps files compact.

10.6 Historicisation Strategy

Main OTM DB (otm.db)
  ├── agents              — live state, small, never archived
  ├── sure_pending        — transient, cleaned hourly
  └── task_history        — growing archive of completed tasks

Error & Event DB (otm-errors.db)
  ├── event_log           — rolling 30-day window
  ├── event_log_archive   — rolling 1-year window
  └── error_stats         — cumulative counters, never purged

Filesystem log files (otm/logs/)
  ├── otm-events.log      — real-time event mirror (rotated daily, 30-day keep)
  └── otm-errors.log      — real-time error mirror (rotated daily, 30-day keep)

Monthly export (filesystem)
  └── {OPENCLAW_DATA_DIR}/otm/archive/
      ├── events-2026-01.sql.gz    — monthly event log dump from otm-errors.db
      ├── events-2026-02.sql.gz
      └── ...

Annual report (generated)
  └── Aggregate stats from task_history (otm.db) + error_stats (otm-errors.db)
      → Feeds into CMMI metrics collection

Non-Functional Requirements

Idempotency

All OTM handlers MUST be idempotent
Subtask completion: check todo_completed (Col00) field before processing (no separate dedup table)
State transitions: verify previous_status matches expected before applying
File pipeline dedup: injector checks existing Slack items by Task ID before creating

State Integrity

previous_status MUST be set before every status change
All status writes go through OTM → SW-1. No direct Slack writes by any actor.
Watchdog requests transitions via OTM handler calls, not direct writes
todo_completed (Col00) MUST be set alongside status when marking items done

Audit Trail

Every OTM event logged in Slack task conversation feed with timestamp (§7)
Slack conversation IS the audit log — no separate log table
All log entries include event code (IE-xx, OE-xx, TT-xx, AT-xx) for traceability

Persistence

Agent registry in SQLite
SQLite DB location: OpenClaw server data directory
Startup reconciliation from openclaw.json + Slack List on OTM restart (§4.6)

Latency

Event handling MUST complete within 5 seconds
Slack API writes SHOULD complete within 5 seconds
Agent notifications SHOULD be sent within 10 seconds
SURE acknowledgement timeout: 1 min (first), 2 min (retry), then error (3 min total)
Dispatcher runs every 2 min — maximum 2 min delay from task creation to agent trigger
Completion detector runs every 2 min — maximum 2 min delay from last subtask to agent_done

Error Handling

Slack API calls: retry up to 3 times with exponential backoff
Unhandled errors: alert admin via OE-05
Tasks MUST NOT be silently lost
All errors logged in task conversation feed (OE-07)
Dispatcher crash: Slack assigned status is the sole dedup gate (§System 3)
Gateway idempotencyKey does NOT provide real idempotency — dispatcher owns dedup

Security

OTM-4 (validate/reject) restricted to registered reviewer agents
All Slack API calls authenticated via bot tokens
DMZ relay uses bearer token + constant-time comparison (§System 6)
Receiver bound to 127.0.0.1 only

Technology Stack

Component	Technology	Owner (who runs it)	Activity frequency	Data volume
OTM backend	TypeScript/Node.js, OpenClaw plugin pipeline	Devdas (builds), Sylvain (deploys)	Continuous — handles all events	~350 events/day processed
File pipeline	Bash scripts + Node.js (injector, dispatcher, detector)	Devdas (builds), Sylvain (deploys)	launchd: watcher (FSEvents), sweeper (10 min), dispatcher (2 min), detector (2 min)	~10 tasks/day through pipeline
SQLite DB	better-sqlite3, WAL mode	OTM (sole writer), Sylvain (backups)	~350 writes/day, ~3,500 reads/day	~55 MB steady state (see §10.5)
SE-1 (event listener)	Slack Events API, socket mode, Bolt SDK	Salvatore's Slack app (`lists:read`)	~50 events/day (Slack → OTM)	<1 KB/event payload
SW-1 (writer)	Slack Web API (`lists:write`)	Salvatore's Slack app (called by OTM)	~200 API calls/day (field updates + audit posts)	<1 KB/call
OTM API	OpenClaw hooks / HTTP endpoints	OTM (receives), Orchestrator + Agents (call)	~100 API calls/day	<1 KB/call
Agent notifications	OpenClaw Gateway WS RPC (AI) / Slack task conversation (Human)	OTM (sends), Agents (receive)	~20 notifications/day	<1 KB/notification
Watchdog + Error Monitor	OpenClaw cron (60s interval)	OTM-6 + OTM-7 (automatic)	1,440 cycles/day	~20 error checks/cycle
Event logging	SQLite `event_log` table (see §9)	OTM (writes), Admin (queries)	~300 events/day, 30-day active window	~5 MB active, ~50 MB archive
DMZ relay	Node.js receiver + broadcaster on Synology NAS	Sylvain (deploys)	On every state change	<1 KB/push
Testing	Vitest, mock Slack API	Devdas (writes + runs)	CI on every PR	—

Component Ownership Map

┌─────────────────────────────────────────────────────┐
│ Slack (Salvatore's Slack App)                       │
│   SE-1: lists:read (event listener)                 │
│   SW-1: lists:write (field updates + audit posts)   │
└──────────────┬──────────────────────────┬───────────┘
               │ IE-01                    ▲ OE-06, OE-07
               ▼                          │
┌─────────────────────────────────────────────────────┐
│ OTM (OpenClaw Plugin Pipeline)                      │
│   OTM-0: Event Router (internal)                    │
│   OTM-1: Agent Registry (internal)                  │
│   OTM-2: Handle Task Assigned                       │
│   OTM-3: Handle Subtask Done                        │
│   OTM-4: Task Validate/Reject                       │
│   OTM-5: Handle Task Cancelled                      │
│   OTM-6: Watchdog (cron, 60s)                       │
│   OTM-7: Error Monitor (cron, 60s)                  │
│                                                     │
│   SQLite DB: agents, event_log, error_stats,        │
│              task_history, sure_pending              │
└──────────────┬──────────────────────────┬───────────┘
               │ OE-01, OE-02, OE-04     ▲ IE-02, IE-03, IE-04–IE-07
               ▼                          │
┌─────────────────────────────────────────────────────┐
│ OpenClaw Agents                                     │
│   Orchestrator (Claudia): IE-04, IE-05, IE-06, IE-07│
│   Agents (Devdas, etc.): IE-02, IE-03              │
│   Human (Rupert): via Slack UI → SE-1 → IE-01      │
└─────────────────────────────────────────────────────┘

File Pipeline (Part 3):
┌─────────────────────────────────────────────────────┐
│ otm-create-task.sh → new-tasks/ → otm-injector.js  │
│   → Slack List (status=new)                         │
│   → otm-dispatcher.js → task-dispatch.json          │
│   → Gateway WS RPC → Agent sessions (parallel)      │
│   → otm-update-task.sh → task-updates/             │
│   → otm-completion-detector.js → agent_done        │
└─────────────────────────────────────────────────────┘

11. Cost Analysis

11.1 OTM Infrastructure Cost

Component	Cost	Notes
Slack Pro	1 user license	Already paid. Gives API access (SE-1 + SW-1). No per-API-call cost.
SQLite	$0	Open source, embedded. No server, no license.
Node.js / TypeScript	$0	Open source runtime.
Bolt SDK	$0	Open source Slack SDK.
better-sqlite3	$0	Open source library.
OpenClaw	$0 (incremental)	OTM runs as a plugin inside the existing gateway. No additional instance.
Filesystem logging	$0	Append to local files.
DMZ relay	$0 (incremental)	Runs on existing Synology NAS.
Total OTM cost	$0 incremental	Only pre-existing Slack Pro license required.

11.2 AI Usage by the OTM

The OTM uses zero AI. It is a deterministic state machine implemented in TypeScript. No LLM calls, no embeddings, no inference. Every decision is rule-based:

Routing: field comparison (OTM-0)
Agent availability: SQLite lookup (OTM-1)
State transitions: precondition checks + status writes (OTM-2 through OTM-5)
Error detection: SQL queries against known patterns (OTM-7)
Watchdog: timer + threshold checks (OTM-6)
File pipeline: filesystem watches + Slack API calls (Systems 1–4)

Token consumption by OTM: 0 tokens.

11.3 AI Usage by Actors (Outside OTM)

The actors that interact with the OTM do consume AI tokens, but this is outside the OTM's scope:

Actor	AI usage	OTM interaction cost
Orchestrator (Claudia)	LLM calls for task planning, review, rework design	OTM API calls = HTTP requests, ~0 tokens
Agents (Devdas, etc.)	LLM calls for task execution	OTM API calls (IE-02, IE-03) = HTTP requests, ~0 tokens
Human (Rupert)	None (uses Slack UI)	Slack events = Slack infrastructure, ~0 tokens

📌 The OTM API calls (IE-02 through IE-07) are simple HTTP POST requests with JSON payloads. They consume zero AI tokens. The only AI costs are generated by the agents and orchestrator doing their actual work — which they would do regardless of whether the OTM exists.

11.4 Cost Summary

OTM operation cost:     $0/month (zero AI, zero external services)
Slack API cost:         $0/month (included in existing Pro plan)
Infrastructure cost:    $0/month (runs on existing OpenClaw server + Synology NAS)
──────────────────────────────────────────────────────
Total incremental cost: $0/month

12. Project Deliverables

#	Deliverable	Owner	Description
D-01	OTM-SPEC (this document)	Claudia	Specification and architecture
D-02	OTM-TESTS	Claudia	Test scenarios document
D-03	OTM implementation	Devdas	TypeScript plugin pipeline (OTM-0 through OTM-7, SE-1, SW-1)
D-04	SQLite schemas + migrations	Devdas	`otm.db` and `otm-errors.db` setup
D-05	Unit + integration tests	Devdas	Vitest test suite matching OTM-TESTS scenarios
D-06	Infrastructure setup	Sylvain	DB paths, cron config, backup setup, log rotation
D-07	`task-orchestration` skill	Claudia	OpenClaw skill for Claudia's Orchestrator role: task creation, project → step → task decomposition, assignment logic, validation/rejection, rework subtask design. This skill encodes the Orchestrator's side of the OTM protocol.
D-08	Slack app config	Salvatore	`lists:read` + `lists:write` scopes, socket mode setup
D-09	End-to-end validation	Claudia + Devdas	Full test suite execution on real Slack workspace
D-10	File pipeline scripts	Devdas	`otm-create-task.sh`, `otm-update-task.sh`, `otm-injector.js`, `otm-dispatcher.js`, `otm-completion-detector.js`
D-11	DMZ relay deployment	Sylvain	`receiver.js` + `broadcaster.js` on Synology NAS, TLS proxy setup
D-12	launchd plists	Sylvain	Watcher, sweeper, dispatcher, completion detector plists

📌 D-07 (task-orchestration skill) will include Rupert's higher-level instructions on how to break down projects into steps and steps into tasks. It is part of the scope of the full-blown validation tests (D-09).

Open Questions

#	Question	Status
1	Exact `list_item_updated` event payload schema	Needs Salvatore to capture sample events
2	Can socket mode receive list events on Pro?	Needs verification (may need Events API HTTP mode)
3	Plugin pipeline registration mechanism in OpenClaw	Needs Devdas to investigate
4	SQLite file location on OpenClaw server	Sylvain to decide
5	How agents tick subtasks in practice	✅ Resolved in v1.3: Agents call OTM API (IE-02), OTM updates Slack via SW-1. No direct Slack UI interaction.
6	Slack conversation API for List items — does it exist?	Needs Salvatore to verify (may need workaround)
7	SURE ack timeout values and gateway restart handling	✅ Resolved in v1.5: Timeouts revised to 1min + 2min + error (3min total). Gateway restart handling defined in §4.7: OTM reconciles from Slack List + openclaw.json on startup, detects SURE timeouts, auto-corrects agent-task mismatches. Gateway restart logged as system event (IE-SYS-01). Orchestrator does not need to re-register agents.
8	OpenClaw agent → Slack user ID mapping in openclaw.json	Needs Sylvain to confirm config structure
9	Human user registration protocol	Deferred. v1 hard-codes Rupert + Claudia (§4.6). Future: how are new human users registered? Auto-detect from Slack `assigned_to`? Manual admin command? Re-registration after OTM restart? What about clients?
10	Slack archive API — can items be archived programmatically?	✅ Resolved in v1.6: `slackLists.items.archive` does not exist. Archiving is manual-only via Slack UI. OTM sets `status = archived` but cannot trigger visual Slack archival.
11	Gateway `idempotencyKey` — does it prevent duplicate sessions?	✅ Resolved in v1.8: No. The key is used as a `runId` label only. Duplicate calls with the same key = duplicate sessions. Dispatcher must use the Slack `new → assigned` status flip as the sole dedup mechanism.

Deprecated Items

Item	ID/Flag	Notes
Project (old select column)	`Col0AL4UJ8BJ8`	Replaced by Project 2 text column (`Col0ALZBS9C8Z`) — 2026-03-14
Types: implementation, research, etc.	—	Removed — only `action` / `decision` / `review` are active
`--description` parameter	—	Removed from `otm-create-task.sh` — use `--subtask` instead
`--creator` parameter	—	Removed — merged with `--agent`
`--assignedTo` parameter	—	Removed — merged with `--agent`
TT-09 (Rejected → New)	—	Removed in v1.3 — use TT-10 (cancel) + new task instead

End of Specification — v1.8 — OpenClaw Task Manager

Last updated: 2026-03-14

OTM-SPEC v1.8 — OpenClaw Task Manager Specification

Based on: Rupert's OTM Spec v1.0 (2026-03-12) Updated by: Claudia (2026-03-12–14) Merged by: Claudia (2026-03-14) — consolidates OTM-SPEC v1.5 + OTM-FIELD-MAPPING v1.8

Changelog

Note: v1.0–v1.5 changes were tracked in OTM-SPEC. v1.0–v1.8 field mapping changes were tracked in a separate OTM-FIELD-MAPPING document. Both changelogs are merged here.

Version	Date	Source	Changes
v1.0	2026-03-12	OTM-SPEC	Initial specification: state machine, SURE protocol, SE-1/SW-1, OTM-0 through OTM-6, actor registry, audit trail
v1.0	2026-03-14	OTM-FIELD-MAPPING	Initial field mapping: script params, JSON format, Slack field mapping
v1.1	2026-03-12	OTM-SPEC	Added rework flow (TT-08), Rejected state, Orchestrator manages subtasks
v1.1	2026-03-14	OTM-FIELD-MAPPING	Added subtask support, task ID system (T-NNNNN)
v1.2	2026-03-12	OTM-SPEC	Added Watchdog (OTM-6), archival (TT-12), Pending state (TT-03/TT-05)
v1.2	2026-03-14	OTM-FIELD-MAPPING	Added pipeline flow diagram, directories
v1.3	2026-03-12	OTM-SPEC	Removed TT-09, added Priority Scale, human actor type. Resolved Q5: agents call OTM API for subtask completion (not Slack directly).
v1.3	2026-03-14	OTM-FIELD-MAPPING	Added completion detection (System 4)
v1.4	2026-03-12	OTM-SPEC	Added Error Monitor (OTM-7), error catalogue (ERR-01 through ERR-12), dual-DB design
v1.4	2026-03-14	OTM-FIELD-MAPPING	Added component heartbeats (System 5)
v1.5	2026-03-12	OTM-SPEC	Human notifications via Slack task conversation (not DM). Error reports daily, threshold=1. Error monitoring in separate `otm-errors.db`. Only Orchestrator creates/deletes tasks. `subtasks_remaining` = count of unfinished subtasks. Added §12 Cost Analysis. SURE timeouts: 1+2min. Gateway restart detection (§4.7). Log file mirroring. `task-orchestration` skill added as deliverable.
v1.5	2026-03-14	OTM-FIELD-MAPPING	Default confirmation subtask; deprecated fields cleanup
v1.6	2026-03-14	OTM-FIELD-MAPPING	Added DMZ relay architecture (System 6), `todo_completed` (`Col00`) documentation, project label fixes
v1.7	2026-03-14	OTM-FIELD-MAPPING	Added task dispatcher (System 3), task updates (System 3b), full architecture diagram, completion metrics, status lifecycle
v1.8	2026-03-14	OTM-FIELD-MAPPING	Dispatcher triggers agents via Gateway WebSocket RPC (parallel, non-blocking); crash-safe operation order (file → Slack → RPC); `idempotencyKey` is NOT idempotent (design flaw documented); `todo_completed` checkbox (`Col00`) documentation; injector idempotency (dedup check); Slack archive API limitation noted
v1.8	2026-03-14	MERGED	Consolidated OTM-SPEC v1.5 + OTM-FIELD-MAPPING v1.8 into single unified specification. Task ID format standardized to `T-NNNNN` (v1.8 implementation). Added Part 3 (File-Based Pipeline). All Slack column IDs included.

1. Purpose & Scope

The OpenClaw Task Manager (OTM) orchestrates the execution of tasks by AI and human agents. It uses Slack Lists as the task board, the Slack Events API as the event bus for human-originated changes, and the OTM API as the interface for AI actors.

The system is split into three cooperating layers:

Slack Event Layer (SE-1) — Slack Events API (list_item_updated) detected by our Slack app in socket mode. Zero intelligence. Forwards raw events to a single OTM entry point. Requires only lists:read scope.
Slack Write Layer (SW-1) — Handles all writes to Slack: task field updates and conversation feed audit entries. Requires lists:write scope. Called only by the OTM.
OpenClaw Task Manager (OTM) — Authoritative backend implemented as an OpenClaw plugin pipeline. Owns all state, routing logic, agent registry, and business rules. Persists to SQLite. Sole component with the ability to change task status, agent status, and counter fields.

📌 Design principles:

ALL behaviour lives in the OTM. No business logic in SE-1 or SW-1.
The OTM is the sole writer of task status and related fields. No actor writes status directly.
Only the Orchestrator creates tasks and subtasks. The OTM never creates or deletes tasks — it manages their lifecycle after creation. The Orchestrator also manages subtask lists (creates, deletes, keeps completed) during rework flows; the OTM processes the resulting state changes.
ALL events are logged in the Slack task conversation feed with timestamps (§7).
Agent notifications use the SURE protocol: request + mandatory acknowledgement (§6).

1.1 Actors

Actor	Current holder	Type	Role
Orchestrator	Claudia	AI	Creates tasks, sets assignments, validates completed work — always via OTM API
Agent	Devdas, Salvatore, etc.	AI	Executes tasks, reports progress directly to OTM API
Human	Rupert, clients	Human	Creates/edits tasks in Slack UI; changes detected by SE-1 and forwarded to OTM
OTM	(system)	System	Authoritative state machine, sole writer of all status and counter fields
Watchdog	OTM-6 cron	System	Recovery cron — detects anomalies, requests OTM to execute corrective transitions

📌 Humans are identified by their Slack user ID. AI agents are identified by both their Slack user ID and their OpenClaw agent ID. Both types are managed in the same agent registry (§4.5).

1.2 System Architecture Overview

The OTM is composed of six cooperating systems, from task creation to dashboard visibility:

┌─────────────────────────────────────────────────────────────────────────────────┐
│                         OTM — One-Time Mission System                          │
│                                                                                │
│  ┌──────────────────────────────────────────────────────────────────────────┐   │
│  │ SYSTEM 1: Task Creation                                                  │   │
│  │                                                                          │   │
│  │  Claudia (orchestrator)                                                  │   │
│  │       │                                                                  │   │
│  │       ▼                                                                  │   │
│  │  otm-create-task.sh ──► JSON file ──► ~/…/otm/new-tasks/                │   │
│  │       │                                                                  │   │
│  │       ▼                                                                  │   │
│  │  next-task-id.json (T-NNNNN counter, flock)                             │   │
│  └──────────────────────────────────────────────────────────────────────────┘   │
│       │                                                                        │
│       ▼                                                                        │
│  ┌──────────────────────────────────────────────────────────────────────────┐   │
│  │ SYSTEM 2: Task Injection                                                 │   │
│  │                                                                          │   │
│  │  ai.openclaw.otm-watcher (WatchPaths)                                   │   │
│  │  ai.openclaw.otm-sweeper (every 10 min)                                 │   │
│  │       │                                                                  │   │
│  │       ▼                                                                  │   │
│  │  otm-injector.js ──► Slack Lists API ──► Rapido Task Campaign           │   │
│  │       │                (slackLists.items.create + subtasks)              │   │
│  │       ▼                                                                  │   │
│  │  processed/ or failed/                                                   │   │
│  └──────────────────────────────────────────────────────────────────────────┘   │
│       │                                                                        │
│       ▼  (task exists in Slack with status=new)                                │
│  ┌──────────────────────────────────────────────────────────────────────────┐   │
│  │ SYSTEM 3: Task Dispatcher                                                │   │
│  │                                                                          │   │
│  │  ai.openclaw.otm-dispatcher (every 2 min)                               │   │
│  │       │                                                                  │   │
│  │       ├──► Scan: status=new + assignee set                              │   │
│  │       ├──► Write task-dispatch.json to agent workspace                  │   │
│  │       ├──► Trigger agent via Gateway WS RPC (parallel)                  │   │
│  │       └──► Update Slack: new → assigned + set assigned_at               │   │
│  └──────────────────────────────────────────────────────────────────────────┘   │
│       │                                                                        │
│       ▼  (agent session starts, picks up task, works, reports progress)        │
│  ┌──────────────────────────────────────────────────────────────────────────┐   │
│  │ SYSTEM 3b: Task Updates (agent → Slack feedback)                         │   │
│  │                                                                          │   │
│  │  otm-update-task.sh ──► JSON ──► ~/…/otm/task-updates/                  │   │
│  │       │                                                                  │   │
│  │       ▼                                                                  │   │
│  │  otm-injector.js ──► Slack Lists API (update status, subtask done)      │   │
│  └──────────────────────────────────────────────────────────────────────────┘   │
│       │                                                                        │
│       ▼  (all subtasks done → auto-promote)                                    │
│  ┌──────────────────────────────────────────────────────────────────────────┐   │
│  │ SYSTEM 4: Completion Detection                                           │   │
│  │                                                                          │   │
│  │  ai.openclaw.otm-completion-detector (every 2 min)                      │   │
│  │       │                                                                  │   │
│  │       ├──► Scan: status=in_progress + all subtasks done                 │   │
│  │       ├──► Update: completion % + subtasks_remaining                    │   │
│  │       └──► Promote: in_progress → agent_done                           │   │
│  └──────────────────────────────────────────────────────────────────────────┘   │
│       │                                                                        │
│       ▼  (Claudia validates → done)                                            │
│  ┌──────────────────────────────────────────────────────────────────────────┐   │
│  │ SYSTEM 5: Component Heartbeats                                           │   │
│  │                                                                          │   │
│  │  Each component writes *-state.json after every run                     │   │
│  │       │                                                                  │   │
│  │       ▼                                                                  │   │
│  │  Collector (FSEvents) → SQLite → Reader (WebSocket) → Dashboard         │   │
│  └──────────────────────────────────────────────────────────────────────────┘   │
│                                                                                │
│  ┌──────────────────────────────────────────────────────────────────────────┐   │
│  │ SYSTEM 6: DMZ Relay                                                      │   │
│  │                                                                          │   │
│  │  Collector ──HTTP POST──► Synology Receiver (127.0.0.1:3456)            │   │
│  │                                 │                                        │   │
│  │                                 ▼                                        │   │
│  │                           fab-state.json                                 │   │
│  │                                 │                                        │   │
│  │                                 ▼                                        │   │
│  │                      Broadcaster (0.0.0.0:3457)                         │   │
│  │                           │            │                                 │   │
│  │                    WSS /ws        GET /api/state                         │   │
│  │                           │            │                                 │   │
│  │                           ▼            ▼                                 │   │
│  │                    Vercel Dashboard (browser)                            │   │
│  └──────────────────────────────────────────────────────────────────────────┘   │
│                                                                                │
│  ┌──────────────────────────────────────────────────────────────────────────┐   │
│  │ Task Lifecycle                                                           │   │
│  │                                                                          │   │
│  │  new ──► assigned ──► in_progress ──► agent_done ──► done               │   │
│  │   │         │              │              │                              │   │
│  │   │  (dispatcher)  (agent starts)  (completion     (Claudia            │   │
│  │   │                                  detector)       validates)          │   │
│  │   │                                                                      │   │
│  │   └──► blocked (can happen at any stage)                                │   │
│  └──────────────────────────────────────────────────────────────────────────┘   │
└─────────────────────────────────────────────────────────────────────────────────┘

2. Task Data Model

Each task is a top-level item in the Slack Task Board List. Subtasks are child items (linked via parent_item_id). The OTM is the sole writer of status and counter fields.

Slack List: F0ALE8DCW1F (Rapido Task Campaign v2), workspace: rapidocloud.slack.com

2.1 Task Item Fields

Field	Type	Writer	Description
title	text	Orchestrator	Task name
task_id	text	OTM	Unique ID (`T-NNNNN` format — 5 digits, zero-padded)
assigned_to	person	Orchestrator	Slack user ID of assigned agent (AI or Human)
status	select	OTM	Current task state (see §3). OTM is sole writer — no exceptions
previous_status	select	OTM	Status before the last transition. Critical for failure analysis
priority	number	Orchestrator	0=Critical … 4=Batchable (see §3.4)
context	select	Orchestrator	project, research, operations, support, internal
subtasks_remaining	number	OTM	Decremented counter, NOT live count
assigned_at	datetime	OTM	When agent started work
completed_at	datetime	OTM	When all subtasks done
validated_at	datetime	OTM	When Orchestrator validated
result_summary	text	Agent	Deliverables/output description
input_files	text	Orchestrator	Links to input resources

2.2 Subtask Item Fields

Field	Type	Writer	Description
title	text	Orchestrator	Subtask description
todo_completed	checkbox	OTM (via SW-1)	Built-in Slack Lists checkbox (`Col00`). Ticked when subtask is done. Must be set alongside status when marking items as done (see below).
parent_item_id	reference	System	Links to parent task

📌 subtasks_remaining on the parent task is the canonical completion signal — not a live count.

📌 previous_status is set by the OTM on every transition. It enables post-mortem analysis when a task enters Failed state.

📌 Agents do NOT tick checkboxes directly in Slack. They report subtask completion to the OTM API (IE-02). The OTM then updates Slack via SW-1 — setting both the item status column and Col00 (checkbox).

`todo_completed` (`Col00`) Checkbox

The todo_completed field (Col00) is the built-in Slack Lists checkbox. It drives the visual checkmark ✅ in the Slack UI. Setting only the Status column to done does NOT check the box — both must be set explicitly.

Scenario	Action
Subtask marked `done`	Set `Col00: checkbox: true` on the subtask
Parent task marked `done`	Set `Col00: checkbox: true` on the parent
Parent task at `agent_done` or `in_progress`	Do NOT set checkbox (task isn't finished yet)

⚠️ Archive limitation: Slack Lists has a UI "Archive item" action, but slackLists.items.archive does not exist as an API method. Archiving is manual-only via the Slack UI. The OTM sets status to archived via the pipeline, but the actual Slack archive action cannot be automated.

2.3 JSON → Slack Field Mapping

This table is operationally critical — it maps the JSON task format (used in the file pipeline) to Slack column IDs.

JSON Field	Slack Column	Column ID	Slack Type	Notes
`title`	Title	`Col0AKKTBJJKZ`	`rich_text`	Clean one-liner only
`taskId`	Task ID	`Col0ALVK2NA1E`	`rich_text`	Format: `T-NNNNN` (5 digits, zero-padded)
`type`	Type	`Col0AKUV4BF6F`	`select`	`action` \| `decision` \| `review`
`agent`	Assignee	`Col0AKZ9G5UAJ`	`select`	Covers both agents and humans
`project`	Project 2	`Col0ALZBS9C8Z`	`rich_text`	Free-text (migrated from select 2026-03-14)
`priority`	Priority	`Col0ALE8DKWPK`	`select`	See §3.4 priority mapping
(auto)	Status	`Col0AL1B4UVLJ`	`select`	Always set to `new` on creation
`subtasks[]`	(child items)	—	`parent_item_id`	Each entry → child item with title + status `new`

Slack Built-in Fields:

Field	Column ID	Type	Notes
`todo_completed`	`Col00`	`checkbox`	Built-in Slack Lists checkbox. Must be set explicitly alongside status.

Fields NOT mapped to Slack columns (metadata only):

JSON Field	Purpose
`id`	UUID for file tracking / idempotency
`createdAt`	Timestamp, implicit in Slack item creation
`status`	Internal pipeline status (`pending` → processed)

Deprecated Slack Columns:

Item	Column ID	Notes
Project (old select column)	`Col0AL4UJ8BJ8`	Replaced by Project 2 text column (2026-03-14)

3. Task State Machine

The status field follows these transitions. The OTM is the sole writer — all transitions are executed by the OTM, regardless of which actor requested them.

3.1 Task States

Status	Description
New	Task created, not yet assigned
Assigned	Orchestrator has set an assignee; OTM evaluating agent availability
Pending	Agent is busy; task queued silently (no notification)
In Progress	Agent is actively working (SURE acknowledgement received)
Agent Done	All subtasks complete; awaiting Orchestrator review
Done	Orchestrator validated the work
Rejected	Orchestrator rejected; Orchestrator preparing rework subtasks
Failed	Unrecoverable error during execution
Cancelled	Task no longer needed; removed from active work
Archived	Terminal state; auto-moved 7 days after Done/Cancelled

3.2 Task State Transition Diagram

                    Orchestrator creates task
                              |
                           [New]
                              |
              TT-01: Orch requests assignment → OTM executes
                              |
                         [Assigned]
                        /          \
                 TT-02: OTM      TT-03: OTM
              (IE-01 + agent   (IE-01 + agent
               idle in reg)    busy in reg)
                      |              |
               [In Progress]    [Pending]
               (after SURE ack)      |
                      |         TT-05: OTM promotes
         TT-04: OTM receives    (IE-09 + pending
         IE-02 subtask reports    task found)
                      |              |
           subtasks_remaining=0      |
                      |              |
                [Agent Done] <------/
                   /    \
          TT-06: Orch  TT-07: Orch
          validates     rejects
          (IE-04)       (IE-05)
                |           |
             [Done]    [Rejected]
               |        /      \
         TT-12: OTM  TT-08    TT-10
         (IE-08+7d)  (IE-06)  (IE-06)
               |       |        |
          [Archived] [Assigned] [Cancelled]
                     (rework)    (drop)

     At any point before Agent Done:
         TT-11: Orch/Human requests cancel (IE-06/IE-01) → OTM executes
         TT-12: Watchdog requests archive (IE-08 + 7d check) → OTM executes
         TT-13: OTM detects error (IE-09) → [Failed]
         TT-14: Orch requests retry (IE-07) → [New]
         TT-15: Orch/Human requests cancel (IE-06/IE-01) → [Cancelled] (from Failed)

3.2.1 Task Transition Action Index

Each task transition is coded TT-xx. All transitions are executed by the OTM. The "Requesting Actor" is who initiates; the OTM validates and applies.

Orchestrator-requested transitions (executed by OTM):

Code	From → To	Requesting Actor	Inbound Event	OTM Action	Outbound Event
TT-01	New → Assigned	Orchestrator	IE-01: `assigned_to` field changed	Validate assignment, set status	OE-06, OE-07
TT-06	Agent Done → Done	Orchestrator	IE-04: validate API call	Set `validated_at`, change status	OE-06, OE-07, OE-02
TT-07	Agent Done → Rejected	Orchestrator	IE-05: reject API call	Change status, post reason	OE-06, OE-07
TT-08	Rejected → Assigned	Orchestrator	IE-06: rework API call (subtasks already prepared by Orchestrator)	Count unfinished subtasks, set counter, change status	OE-06, OE-07, OE-01
TT-10	Rejected → Cancelled	Orchestrator/Human	IE-06: cancel API / IE-01: Human Slack edit	Change status	OE-06, OE-07
TT-11	Any (pre-Agent Done) → Cancelled	Orchestrator/Human	IE-06: cancel API / IE-01: Human Slack edit	Free agent, change status	OE-06, OE-07, OE-04
TT-14	Failed → New	Orchestrator	IE-07: retry API call	Reset task, change status	OE-06, OE-07
TT-15	Failed → Cancelled	Orchestrator/Human	IE-06: cancel API / IE-01: Human Slack edit	Change status	OE-06, OE-07

OTM-initiated transitions (automated):

Code	From → To	Inbound Event (trigger)	OTM Action	Outbound Event
TT-02	Assigned → In Progress	IE-01: `status` changed to Assigned (OTM-0 routes to OTM-2, agent registry query returns Idle)	Execute AT-01, write counter, send SURE notification	OE-01, OE-06, OE-07
TT-03	Assigned → Pending	IE-01: `status` changed to Assigned (OTM-0 routes to OTM-2, agent registry query returns Busy)	Queue task silently	OE-06, OE-07
TT-04	In Progress → Agent Done	IE-02: agent reports subtask done (OTM-3 decrements counter to 0)	Execute AT-02, set `completed_at`	OE-02, OE-06, OE-07
TT-05	Pending → Assigned	IE-09: OTM internal — `check_next_task_for_agent()` finds pending task after AT-02/AT-03/AT-04	Re-evaluate via OTM-2 path	OE-06, OE-07
TT-13	In Progress → Failed	IE-10: OTM detects agent error (OpenClaw hook timeout >5min / agent crash / unhandled exception reported)	Store `previous_status`, execute AT-04	OE-05, OE-06, OE-07

Watchdog-requested transitions (executed by OTM):

Code	From → To	Inbound Event (trigger)	OTM Action	Outbound Event
TT-12	Done/Cancelled → Archived	IE-08: watchdog cron tick (OTM-6 checks `completed_at` or `cancelled_at` + 7 days < now)	Archive task	OE-06, OE-07

📌 Cross-reference: See §5 for TT-xx ↔ AT-xx ↔ handler mapping. See §6 for SURE protocol. See §7 for audit trail.

3.3 Task Transition Rules

All transitions are submitted to the OTM which validates preconditions and executes the state change. Every transition produces at minimum OE-06 (Slack field update) and OE-07 (audit log entry).

Code	From	To	Inbound Event	OTM Action	Outbound Event
TT-01	New	Assigned	IE-01: `assigned_to` changed on New task	Validate agent exists in registry, set status	OE-06, OE-07
TT-02	Assigned	In Progress	IE-01: status=Assigned detected + agent Idle in registry	Set agent busy (AT-01), init counter, send SURE task notification	OE-01, OE-06, OE-07
TT-03	Assigned	Pending	IE-01: status=Assigned detected + agent Busy in registry	Queue task, no notification	OE-06, OE-07
TT-04	In Progress	Agent Done	IE-02: subtask completion report + counter decrements to 0	Set agent idle (AT-02), set `completed_at`, notify Orchestrator	OE-02, OE-06, OE-07
TT-05	Pending	Assigned	IE-09: internal `check_next_task_for_agent()` + pending task found	Re-route to OTM-2 (same as TT-01 path)	OE-06, OE-07
TT-06	Agent Done	Done	IE-04: Orchestrator validate call	Set `validated_at`, change status	OE-02, OE-06, OE-07
TT-07	Agent Done	Rejected	IE-05: Orchestrator reject call with reason	Change status, log reason	OE-06, OE-07
TT-08	Rejected	Assigned	IE-06: Orchestrator rework call (subtasks already prepared)	Count unfinished subtasks, set counter, change status	OE-06, OE-07
TT-10	Rejected	Cancelled	IE-06: Orchestrator cancel call / IE-01: Human status edit	Change status	OE-06, OE-07
TT-11	New/Assigned/Pending/In Progress	Cancelled	IE-06: Orchestrator cancel call / IE-01: Human status edit	Free agent if applicable (AT-03), change status	OE-04, OE-06, OE-07
TT-12	Done/Cancelled	Archived	IE-08: watchdog tick + 7-day check passes	Archive task	OE-06, OE-07
TT-13	In Progress	Failed	IE-10: agent error detected (hook timeout/crash/exception)	Store `previous_status`, free agent (AT-04)	OE-05, OE-06, OE-07
TT-14	Failed	New	IE-07: Orchestrator retry call	Reset task fields, change status	OE-06, OE-07
TT-15	Failed	Cancelled	IE-06: Orchestrator cancel call / IE-01: Human status edit	Change status	OE-06, OE-07

📌 TT-09 (Rejected → New) removed in v1.3. Reassignment is handled by the Orchestrator cancelling the rejected task (TT-10) and creating a new task for a different agent. This simplifies the state machine.

3.4 Priority Scale

Value	Label	Meaning	`--priority` flag
0	Critical	Blocking other work, immediate attention	`critical`
1	High	Important, do next	`high`
2	Medium/Normal	Normal priority	`normal` (or `medium` as alias)
3	Low	When bandwidth allows	`low`
4	Batchable	Large/expensive work, can run async via Batch API	`batchable`

Queue ordering: priority ASC, posted_at ASC (0 = highest priority, FIFO within same priority).

4. Agent State Machine

The OTM maintains agent availability state in the Agent Registry (OTM-1). Agent transitions are coded AT-xx and are distinct from task transitions (TT-xx).

4.1 Agent States

Status	Description
Idle	Agent is available, not working on any task
Busy	Agent is actively working on a task (`current_task` is set)

4.2 Agent State Transition Diagram

              [Idle]
                |
         AT-01: OTM assigns task
         (triggered by TT-02)
         (SURE notification sent → OE-01)
                |
             [Busy]
                |
         AT-02: task completes (TT-04, IE-02 counter=0)
         AT-03: task cancelled (TT-11, IE-01/IE-06)
         AT-04: task fails (TT-13, IE-10)
                |
             [Idle]
                |
         → OTM calls check_next_task_for_agent() (IE-09)
         → if pending task found: TT-05 → AT-01 again

4.3 Agent Transition Action Index

All agent transitions are executed by the OTM. No actor changes agent status directly.

Code	From → To	Inbound Event (trigger)	OTM Action	Outbound Event
AT-01	Idle → Busy	IE-01: Assigned event + agent Idle (during OTM-2)	Set `status=busy`, `current_task=task_id`, `task_started_at=now`	OE-01 (SURE notification), OE-07 (audit)
AT-02	Busy → Idle	IE-02: subtask report + counter=0 (during OTM-3)	Set `status=idle`, clear `current_task`, call `check_next_task_for_agent()`	OE-07 (audit)
AT-03	Busy → Idle	IE-01/IE-06: cancellation request (during OTM-5)	Set `status=idle`, clear `current_task`, call `check_next_task_for_agent()`	OE-04 (cancel notify), OE-07 (audit)
AT-04	Busy → Idle	IE-10: agent error detected (during OTM error handler)	Set `status=idle`, clear `current_task`, call `check_next_task_for_agent()`	OE-05 (admin alert), OE-07 (audit)

4.4 Agent Transition Rules

Code	From	To	Inbound Event	OTM Action	Outbound Event
AT-01	Idle	Busy	IE-01 (Assigned event)	OTM-2 sets agent busy before sending SURE notification	OE-01, OE-07
AT-02	Busy	Idle	IE-02 (last subtask report)	OTM-3 frees agent, promotes next pending task	OE-07
AT-03	Busy	Idle	IE-01/IE-06 (cancellation)	OTM-5 frees agent if assigned to cancelled task	OE-04, OE-07
AT-04	Busy	Idle	IE-10 (agent error)	OTM error handler frees agent	OE-05, OE-07

📌 Agent ↔ Task coupling: Every AT-xx is triggered by a TT-xx. See §5 for the complete bidirectional mapping.

📌 Watchdog note: OTM-6 monitors agent heartbeats (last_seen >2h) but does NOT change agent state. It alerts the admin (OE-05). Only OTM handlers modify agent status.

4.5 Agent Registry Schema

CREATE TABLE agents (
  slack_user_id TEXT PRIMARY KEY,   -- Slack user ID (e.g., "U0AKEB27HNK")
  otm_display_name TEXT NOT NULL,   -- Display name for logs/UI (e.g., "Devdas")
  openclaw_agent_id TEXT,           -- OpenClaw agent ID (e.g., "devdas"). NULL for human agents.
  agent_type TEXT NOT NULL           -- 'ai' | 'human'
      DEFAULT 'ai',
  status TEXT DEFAULT 'idle',       -- 'idle' | 'busy'  (see §4.1)
  current_task TEXT,                -- task item ID or NULL
  task_started_at INTEGER,          -- Unix timestamp or NULL
  last_seen INTEGER                 -- Unix timestamp
);

Registry operations:

register_agent(slack_user_id, otm_display_name, openclaw_agent_id, agent_type) — AT startup or on first activity
set_busy(slack_user_id, task_id) — AT-01
set_idle(slack_user_id) — AT-02, AT-03, AT-04
is_busy(slack_user_id) → boolean — checked during TT-02/TT-03
get_current_task(slack_user_id) → task_id | null

4.6 Startup Reconciliation (Dynamic Registry)

On OTM startup:

Read OpenClaw config — Query openclaw.json agent configurations to populate AI agent entries automatically. Each configured OpenClaw agent that has a slack_user_id mapping is auto-registered with agent_type = 'ai'.
Reconcile from Slack — Query Slack List for all tasks with status In Progress or Pending. Rebuild current_task and status (busy/idle) from those records.
Human agents — Hard-coded for v1. Rupert is pre-seeded in the agent registry at startup with agent_type = 'human', openclaw_agent_id = NULL. Claudia is pre-seeded as the Orchestrator. Dynamic human registration (auto-detect from assigned_to on first interaction) is deferred to a future version.

📌 v1 hard-coded actors:

Slack user ID	Display name	Type	Role
`U06K407LVCY`	Rupert	human	Task assignee / reviewer
`U0AKEB27HNK`	Claudia	ai	Orchestrator (sole)

Future versions will define a proper human user registration and re-registration protocol (see Open Question 9).

📌 AI agents are notified via OpenClaw /hooks/agent. Human agents are notified via the Slack task conversation thread (posted by SW-1 as an OE-07 audit entry addressed to the human). The notification channel is determined by agent_type.

4.7 Gateway Restart Detection & OTM Resync

The OTM runs as an OpenClaw plugin pipeline inside the gateway process. Several restart scenarios must be handled:

Scenario A: Gateway restarts (OTM restarts with it)

OTM startup reconciliation (§4.6) runs automatically
All agent states rebuilt from Slack List + openclaw.json
SURE pending notifications checked: any outstanding acks >3 min old → ERR-06
System event logged: [timestamp] OTM SYSTEM: Gateway restart detected. Reconciliation complete.
Logged to both event_log (in error DB) and otm-events.log file

Scenario B: Gateway restarts but OTM was mid-processing

SQLite WAL mode ensures no data corruption on crash
On restart, OTM-7 (error monitor) runs within 60s and detects any inconsistencies:
- ERR-02/ERR-03: Agent-task mismatches from interrupted transitions
- ERR-08: Tasks stuck in Assigned from interrupted OTM-2
- ERR-04: Orphaned Pending from interrupted promotions
All auto-correctable errors are fixed; others escalated

Scenario C: Gateway stops for extended period (>3 min)

SE-1 stops receiving Slack events during downtime
Agents cannot send IE-02/IE-03 reports (OpenClaw hooks are down)
On restart: reconciliation rebuilds state from Slack List (source of truth for task fields)
Pending SURE acks will have timed out → ERR-06 logged
Agents that were mid-task may have completed work but couldn't report it:
- OTM-7 detects counter mismatches (ERR-05) on next cycle
- Watchdog cross-checks subtask completion status in Slack vs subtasks_remaining

Orchestrator re-registration:

The Orchestrator does NOT need to re-register agents. The OTM rebuilds the registry from openclaw.json automatically on startup (§4.6).
If openclaw.json has changed (new agent added, agent removed), the reconciliation picks up the delta.

Gateway restart logging:

Every OTM startup logs a system event: IE-SYS-01: OTM startup with details including:
- Agents reconciled (count + names)
- Tasks found in active states (In Progress, Pending, Assigned)
- SURE timeouts detected
- Errors found and corrected during reconciliation
This event is logged to event_log, otm-events.log, AND posted to Slack #alerts channel (OE-05)

5. Cross-Reference Index

5.1 Task Transition → Agent Transition Mapping

Task Transition	Triggers Agent Transition	Handler
TT-02 (Assigned → In Progress)	AT-01 (Idle → Busy)	OTM-2
TT-04 (In Progress → Agent Done)	AT-02 (Busy → Idle)	OTM-3
TT-11 (→ Cancelled)	AT-03 (Busy → Idle)	OTM-5
TT-13 (In Progress → Failed)	AT-04 (Busy → Idle)	OTM error

5.2 Transition → Handler Mapping

Transition(s)	Primary Handler	Description
TT-01, TT-02, TT-03, TT-05	OTM-2	Task assignment, availability check, queue management
TT-04	OTM-3	Subtask completion, counter decrement, task completion
TT-06, TT-07, TT-08, TT-10	OTM-4	Validation, rejection, rework, cancel-after-reject
TT-11	OTM-5	Task cancellation
TT-12	OTM-6	Archival (watchdog requests, OTM executes)
TT-13, TT-14, TT-15	OTM error handler	Failure detection, retry, abandon
AT-01	OTM-2	Agent set busy
AT-02	OTM-3	Agent freed on task completion
AT-03	OTM-5	Agent freed on task cancellation
AT-04	OTM error handler	Agent freed on task failure

5.3 Inbound Event Index

Code	Source	Description	Triggered by
IE-01	SE-1	Raw `list_item_updated` event from Slack (Human field edit)	Human edits task in Slack UI
IE-02	Agent	Subtask completion report via OTM API	Agent calls OTM after completing subtask
IE-03	Agent	SURE acknowledgement via OTM API	Agent confirms receipt of task assignment
IE-04	Orchestrator	Task validation request via OTM API	Orchestrator reviews and approves
IE-05	Orchestrator	Task rejection request via OTM API (with reason)	Orchestrator reviews and rejects
IE-06	Orchestrator	Task action request via OTM API (rework/cancel/retry)	Orchestrator requests state change
IE-07	Orchestrator	Task retry request via OTM API (from Failed)	Orchestrator wants to retry failed task
IE-08	Watchdog	Cron tick (every 60 seconds)	Timer fires
IE-09	OTM internal	`check_next_task_for_agent()` result	Triggered after AT-02/AT-03/AT-04
IE-10	OTM internal	Agent error detection (hook timeout >5min, crash, exception)	OpenClaw health monitoring

5.4 Outbound Event Index

Code	Target	Description	Via
OE-01	Agent	SURE task notification (requires acknowledgement IE-03)	OpenClaw hooks (AI) / Slack task conversation (Human)
OE-02	Orchestrator	Task completion/validation notification	OpenClaw hooks
OE-03	(reserved)
OE-04	Agent	Task cancellation notification	OpenClaw hooks (AI) / Slack task conversation (Human)
OE-05	Admin	Alert (anomaly, error, stale task, agent down)	Telegram / Slack #alerts
OE-06	Slack	Task field update (status, counters, timestamps)	SW-1
OE-07	Slack	Audit log entry in task conversation feed	SW-1

5.5 All Code Summaries

Task transitions (TT-xx): TT-01 through TT-15 (TT-09 removed in v1.3) — see §3.2.1 and §3.3
Agent transitions (AT-xx): AT-01 through AT-04 — see §4.3 and §4.4
Inbound events (IE-xx): IE-01 through IE-10 — see §5.3
Outbound events (OE-xx): OE-01 through OE-07 — see §5.4

6. SURE Protocol (Send-Understand-Report-Execute)

All task notifications to agents use the SURE protocol to guarantee delivery and acknowledgement.

6.1 Flow

OTM sends task assignment → OE-01
  |
  +-- OTM logs in task conversation (OE-07):
  |     "[2026-03-12 14:30:05] OTM → Agent(Devdas): Task assigned — <title>"
  |
  +-- Agent receives notification
  |
  +-- Agent sends acknowledgement → IE-03
  |
  +-- OTM logs in task conversation (OE-07):
        "[2026-03-12 14:30:12] Agent(Devdas) → OTM: Task acknowledged"

6.2 Timeout

Retry 1: If no IE-03 within 1 minute → OTM retries notification (OE-01), logs retry.
Retry 2: If no IE-03 within 2 more minutes (3 min total) → OTM retries again, logs retry.
Error: If no IE-03 after retry 2 → OTM logs ERR-06 error, alerts admin (OE-05). Task remains In Progress — admin decides.

📌 The 1+2 minute schedule is designed to allow time for a gateway restart (~2 min typical). If the gateway restarts but the OTM does not, the OTM resync procedure (§4.7) handles recovery.

6.3 SURE applies to

Event	SURE required?
Task assignment (OE-01)	✅ Yes
Rework task (OE-01 after TT-08)	✅ Yes
Task cancellation (OE-04)	❌ No (fire-and-forget, agent stops)
Admin alert (OE-05)	❌ No

6.4 Agent acknowledgement API

POST /api/otm/ack
{
  "task_id": "<task item ID>",
  "agent_id": "<slack_user_id>",
  "type": "task_assigned"
}

7. Audit Trail

Every event processed by the OTM is logged in the Slack task conversation feed via SW-1. This provides a human-readable, timestamped record of all activity on each task.

7.1 What is logged

Event type	Log format
State change by OTM	`[timestamp] OTM: Status changed <previous> → <new> (TT-xx)`
Orchestrator request received	`[timestamp] Orchestrator → OTM: <action> requested (IE-xx)`
Human change detected	`[timestamp] Human(<name>) change detected: <field> = <value> (IE-01)`
Agent notification sent	`[timestamp] OTM → Agent(<name>): <notification type> (OE-xx)`
Agent acknowledgement received	`[timestamp] Agent(<name>) → OTM: Acknowledged (IE-03)`
Agent subtask report received	`[timestamp] Agent(<name>) → OTM: Subtask done — <title> (IE-02). Remaining: <n>`
Watchdog action	`[timestamp] Watchdog: <check description> (IE-08)`
Error/alert	`[timestamp] OTM ERROR: <description> (IE-10)`

7.2 Implementation

All audit entries are posted as replies in the Slack task's conversation thread via SW-1. This means:

Every task's conversation is a complete history of its lifecycle
No separate log table needed — Slack IS the audit log
Human-readable without any tooling
Searchable via Slack search

PART 1 — SLACK INTEGRATION LAYER

SE-1: Slack Event Listener

	Detail
Actors	Slack Events API (source)
Inbound events	IE-01: any `list_item_updated` event on the Task Board List
Actions	Forward raw event payload to OTM-0 entry point
Outbound events	None — SE-1 has zero intelligence
Transitions	None — SE-1 is a passthrough

The Slack app (Salvatore's app, socket mode) subscribes to list_item_updated events.

SE-1 does exactly one thing:

ON list_item_updated:
  → call otm_handle_event(raw_event_payload)

No routing. No field inspection. No filtering. The OTM decides what to do with the event.

Required Slack App scope for SE-1: lists:read only.

SW-1: Slack Writer

	Detail
Actors	OTM (sole caller)
Inbound events	OTM handler calls
Actions	Write task fields to Slack List, post audit entries to task conversation
Outbound events	OE-06: Slack field update, OE-07: audit log entry

SW-1 is the sole component that writes to Slack. It provides two operations:

sw1_update_fields(task_id, fields) — Updates task item fields (status, counters, timestamps). Produces OE-06.
sw1_post_audit(task_id, message) — Posts a timestamped message to the task's conversation thread. Produces OE-07.

Required Slack App scope for SW-1: lists:write.

📌 SE-1 (lists:read) and SW-1 (lists:write) are separate concerns. They may run in the same Slack app but are logically distinct.

📌 SW-1 does NOT create or delete tasks. Only the Orchestrator creates tasks and subtasks (via Slack API or UI). SW-1's write scope is limited to: updating existing task fields (OE-06) and posting audit entries to task conversations (OE-07). During rework, the Orchestrator manages subtask creation/deletion directly; the OTM then processes the state change via SW-1.

PART 2 — OPENCLAW TASK MANAGER (OTM)

Implemented as an OpenClaw plugin pipeline set. Persists to SQLite. Sole component that writes task status, agent status, and counter fields.

OTM-0: Event Router

	Detail
Actors	OTM (internal)
Inbound events	IE-01: raw `list_item_updated` from SE-1
Actions	ACT-R1: Parse event payload and identify change type
	ACT-R2: Route to appropriate handler
Outbound events	None directly — delegates to handlers

Single entry point for all Slack-originated events. Contains the routing logic that was previously in SE-1.

RECEIVE raw_event_payload from SE-1
  |
  +-- Parse: what field(s) changed?
  |
  +-- IF assigned_to changed AND status = "New":
  |       → route to OTM-2 (task assignment)
  |
  +-- IF status changed to "Assigned" (from Pending promotion or rework):
  |       → route to OTM-2 (task re-assignment)
  |
  +-- IF status changed to "Cancelled" by Human:
  |       → route to OTM-5 (cancellation)
  |
  +-- IF other field changed by Human:
  |       → log via sw1_post_audit (OE-07): "Human(<name>) changed <field>"
  |       → no state transition
  |
  +-- ELSE: ignore

📌 All routing intelligence lives here, not in SE-1. SE-1 is a dumb pipe.

OTM-1: Agent Registry

	Detail
Actors	OTM (owner/writer)
Inbound events	IE-08: startup reconciliation
	IE-01: new agent detected (auto-register)
Actions	ACT-A1: Register new agent (from openclaw.json or first interaction)
	ACT-A2: Set agent busy (AT-01)
	ACT-A3: Set agent idle (AT-02, AT-03, AT-04)
	ACT-A4: Reconcile from Slack List on startup
	ACT-A5: Reconcile from openclaw.json on startup
Outbound events	OE-07: audit log for registration events
Transitions	AT-01, AT-02, AT-03, AT-04

See §4.5 for schema and §4.6 for startup reconciliation.

OTM-2: Handle Task Assigned

	Detail
Actors	OTM (executor), Agent (notified if idle)
Inbound events	IE-01: `assigned_to` changed or status=Assigned (from OTM-0)
Actions	ACT-T1: Count subtasks via Slack API
	ACT-T2: Check agent availability in registry
	ACT-T3: Set task status (via SW-1)
	ACT-T4: Set agent busy (AT-01 via OTM-1)
	ACT-T5: Send SURE notification (OE-01)
Outbound events	OE-01: SURE task notification (if agent idle)
	OE-06: Slack field update
	OE-07: audit log entry
Task transitions	TT-02 (→ In Progress) or TT-03 (→ Pending)
Agent transitions	AT-01 (Idle → Busy) if agent available

RECEIVE task assignment event (from OTM-0)
  |
  +-- Read task fields: task_id, title, assigned_to, priority
  +-- ACT-T1: Count child items (subtasks) via Slack API → subtask_count
  +-- Store subtask_count as initial subtasks_remaining
  |
  +-- ACT-T2: Look up assigned_to agent in registry
  |
  +-- IF agent NOT found AND agent_type detectable:
  |       ACT-A1: Auto-register agent
  |       sw1_post_audit: "New agent registered: <name>"
  |
  +-- IF agent NOT found AND not detectable:
  |       OE-05: Alert admin
  |       sw1_post_audit: "ERROR: Unknown agent <id>"
  |       Task stays in current status
  |
  +-- IF agent is IDLE:
  |       Store previous_status
  |       ACT-T4: Execute AT-01 (agent → busy)
  |       sw1_update_fields: subtasks_remaining, status = "In Progress", assigned_at = now
  |       sw1_post_audit: "Status: Assigned → In Progress (TT-02). Agent: <name>"
  |       ACT-T5: Send SURE notification (OE-01)
  |       sw1_post_audit: "OTM → Agent(<name>): Task assigned (OE-01). Awaiting SURE ack."
  |
  +-- IF agent is BUSY:
          Store previous_status
          sw1_update_fields: status = "Pending"
          sw1_post_audit: "Status: Assigned → Pending (TT-03). Agent <name> busy with <current_task>"

OTM-3: Handle Subtask Done

	Detail
Actors	Agent (reports completion), OTM (processes), Orchestrator (notified on task completion)
Inbound events	IE-02: agent subtask completion report via OTM API
Actions	ACT-S1: Validate subtask belongs to agent's current task
	ACT-S2: Decrement counter
	ACT-S3: Update Slack subtask checkbox (via SW-1) — sets both status AND `Col00`
	ACT-S4: Complete task if counter = 0
	ACT-S5: Free agent (AT-02 via OTM-1)
	ACT-S6: Promote next pending task (IE-09)
Outbound events	OE-02: Orchestrator notification (on task complete)
	OE-06: Slack field update
	OE-07: audit log entry
Task transitions	TT-04 (→ Agent Done when counter hits 0)
Agent transitions	AT-02 (Busy → Idle) when task completes

RECEIVE subtask completion report (IE-02)
  {task_id, subtask_id, agent_id}
  |
  +-- ACT-S1: Validate:
  |     - subtask belongs to task
  |     - agent is assigned to task
  |     - subtask not already completed (idempotency: check todo_completed field / Col00)
  |       IF already completed: discard, return OK
  |
  +-- ACT-S3: sw1_update_fields: subtask.todo_completed = true (Col00), subtask.status = done
  +-- ACT-S2: Decrement task.subtasks_remaining by 1
  +-- sw1_update_fields: subtasks_remaining
  +-- sw1_post_audit: "Agent(<name>): Subtask done — <title> (IE-02). Remaining: <n>"
  |
  +-- IF subtasks_remaining > 0:
  |       Done. Await next report.
  |
  +-- IF subtasks_remaining = 0 (TASK COMPLETE):
          Store previous_status = "In Progress"
          sw1_update_fields: status = "Agent Done", completed_at = now
          sw1_post_audit: "Status: In Progress → Agent Done (TT-04). All subtasks complete."
          ACT-S5: Execute AT-02 (agent → idle)
          ACT-S6: Notify Orchestrator (OE-02):
            POST /hooks/agent {
              agentId: "main",
              message: "Task ready for review: <title>\nAgent: <name>\nElapsed: <time>\nResult: <result_summary>\nLink: <slack_link>"
            }
          sw1_post_audit: "OTM → Orchestrator: Task ready for review (OE-02)"
          CALL: check_next_task_for_agent(agent_id)  → IE-09

check_next_task_for_agent(agent_id):

Query Slack List for tasks WHERE:
  assigned_to = agent_id
  AND status = "Pending"
  ORDER BY priority ASC, posted_at ASC
  LIMIT 1
  |
  +-- IF Pending task found:
  |       sw1_update_fields: pending_task.status = "Assigned"
  |       sw1_post_audit on pending task: "Status: Pending → Assigned (TT-05). Agent now available."
  |       (OTM-0 detects Assigned change → OTM-2 fires)
  |
  +-- IF no Pending task:
          Agent remains idle.

📌 Idempotency is handled by checking todo_completed (Col00) on the subtask before processing. No separate processed_events table needed. The Slack conversation feed (OE-07) serves as the complete audit trail.

📌 Agent reports directly to OTM API (IE-02), not by ticking checkboxes in Slack. The OTM then updates Slack via SW-1 (ACT-S3). This ensures all writes go through the OTM.

OTM-4: Task Validate / Reject (Orchestrator API)

	Detail
Actors	Orchestrator (caller), OTM (executor)
Inbound events	IE-04: Orchestrator requests validation
	IE-05: Orchestrator requests rejection (with reason)
	IE-06: Orchestrator signals rework ready (subtasks already managed by Orchestrator)
Actions	ACT-V1: Verify task status = "Agent Done" or "Rejected"
	ACT-V2: Execute validation (TT-06)
	ACT-V3: Execute rejection (TT-07)
	ACT-V4: Count unfinished subtasks, set counter, change status (TT-08)
Outbound events	OE-02: confirmation to Orchestrator
	OE-06: Slack field update
	OE-07: audit log entries
Task transitions	TT-06 (→ Done), TT-07 (→ Rejected), TT-08 (→ Assigned), TT-10 (→ Cancelled)

Validate request (IE-04):

{
  "task_id": "<task item ID>",
  "outcome": "validated",
  "comment": "<optional>"
}

Processing — validated:

ACT-V1: Verify task.status = "Agent Done" (reject API call otherwise)
Store previous_status = "Agent Done"
sw1_post_audit: "Orchestrator → OTM: Validation requested (IE-04)"
ACT-V2: Execute TT-06
  sw1_update_fields: status = "Done", validated_at = now, todo_completed = true (Col00)
  sw1_post_audit: "Status: Agent Done → Done (TT-06). Validated."
  IF comment: sw1_post_audit: "Orchestrator comment: <comment>"
OE-02: Notify Orchestrator: "Task <title> is Done"

Reject request (IE-05):

{
  "task_id": "<task item ID>",
  "outcome": "rejected",
  "reason": "<mandatory explanation>"
}

Processing — rejected:

ACT-V1: Verify task.status = "Agent Done" (reject API call otherwise)
Store previous_status = "Agent Done"
sw1_post_audit: "Orchestrator → OTM: Rejection requested (IE-05). Reason: <reason>"
ACT-V3: Execute TT-07
  sw1_update_fields: status = "Rejected"
  sw1_post_audit: "Status: Agent Done → Rejected (TT-07)"

Rework request (IE-06) — submitted after Orchestrator has already prepared subtasks:

The Orchestrator handles all subtask management before calling the OTM:

Orchestrator deletes unnecessary/obsolete subtasks from the Slack List
Orchestrator leaves completed subtasks in place (as a record)
Orchestrator creates new subtasks (first = "Acknowledge rework request: ")
Orchestrator then notifies the OTM that rework is ready:

{
  "task_id": "<task item ID>",
  "action": "rework"
}

Processing — rework:

Verify task.status = "Rejected"
sw1_post_audit: "Orchestrator → OTM: Rework requested (IE-06)"
ACT-V4: Execute TT-08
  Count unfinished subtasks via Slack API (todo_completed = false / Col00 unchecked)
  Set subtasks_remaining = count of unfinished subtasks
  sw1_update_fields: status = "Assigned", subtasks_remaining
  sw1_post_audit: "Status: Rejected → Assigned (TT-08). Rework: <n> unfinished subtasks."
  → OTM-0 detects Assigned change → OTM-2 fires → SURE notification sent → agent acks

📌 Rework flow — separation of concerns: The Orchestrator owns subtask management (create, delete, keep). The OTM owns state management (status transitions, counter recalculation, agent assignment). The OTM never creates or deletes subtasks. On rework, it counts unfinished subtasks to set subtasks_remaining, then triggers the normal assignment flow. The first new subtask is always an acknowledgement subtask — the agent closes it to confirm they understood the rework instructions.

Cancel-after-reject request (IE-06):

{
  "task_id": "<task item ID>",
  "action": "cancel"
}

Verify task.status = "Rejected" → Execute TT-10 (→ Cancelled).

OTM-5: Handle Task Cancelled

	Detail
Actors	Orchestrator/Human (requests), OTM (executes), Agent (notified if was working)
Inbound events	IE-01: Human status edit detected by OTM-0
	IE-06: Orchestrator cancel API call
Actions	ACT-C1: Store previous_status
	ACT-C2: Free agent if applicable (AT-03)
	ACT-C3: Notify agent of cancellation (OE-04)
	ACT-C4: Promote next pending task (IE-09)
Outbound events	OE-04: cancellation notification to agent
	OE-06: Slack field update
	OE-07: audit log entry
Task transitions	TT-11 (→ Cancelled)
Agent transitions	AT-03 (Busy → Idle) if agent was working

RECEIVE cancellation request (IE-01 or IE-06)
  |
  +-- ACT-C1: Store previous_status
  +-- sw1_post_audit: "Cancellation requested by <actor> (IE-xx)"
  +-- sw1_update_fields: status = "Cancelled"
  +-- sw1_post_audit: "Status: <previous> → Cancelled (TT-11)"
  |
  +-- IF task was In Progress or Assigned:
  |       ACT-C2: Execute AT-03 (agent → idle)
  |       ACT-C3: Send cancellation notification (OE-04)
  |       sw1_post_audit: "OTM → Agent(<name>): Task cancelled (OE-04)"
  |       ACT-C4: check_next_task_for_agent(agent_id) → IE-09
  |
  +-- IF task was Pending:
  |       sw1_post_audit: "Removed from queue (no agent notification)"
  |
  +-- IF task was New:
          sw1_post_audit: "Cancelled before assignment"

OTM-6: Watchdog

	Detail
Actors	Watchdog cron (detector), OTM (executor), Admin (alerted)
Inbound events	IE-08: cron tick (every 60 seconds)
Actions	ACT-W1: Check for stale In Progress tasks (>24h, no subtask activity)
	ACT-W2: Check for orphaned Pending tasks (agent idle but task pending)
	ACT-W3: Check counter mismatches (subtasks_remaining vs actual)
	ACT-W4: Check archival candidates (Done/Cancelled >7 days)
	ACT-W5: Check agent heartbeats (last_seen >2h)
Outbound events	OE-05: admin alert (anomalies)
	OE-06: Slack field update (archival)
	OE-07: audit log entries
Task transitions	Requests TT-12 (→ Archived), requests TT-05 (orphaned Pending → Assigned)

📌 The Watchdog does NOT write state directly. It calls OTM handler functions to execute transitions.

Checks:

ACT-W1: STALE IN-PROGRESS
  Query tasks In Progress for >24h with no subtask activity in conversation feed
  → OE-05: Alert admin. Do NOT auto-reassign.
  → sw1_post_audit: "Watchdog: Stale task detected (>24h, no activity)"

ACT-W2: ORPHANED PENDING
  Query tasks Pending WHERE assigned agent is Idle in registry
  → Request OTM to re-trigger: call OTM-2 (TT-05 → Assigned)
  → sw1_post_audit: "Watchdog: Orphaned pending — re-triggering assignment"

ACT-W3: COUNTER MISMATCH
  Compare subtasks_remaining with actual unchecked subtask count
  → Recalculate and fix counter via SW-1
  → OE-05: Alert admin
  → sw1_post_audit: "Watchdog: Counter mismatch corrected (<old> → <new>)"

ACT-W4: ARCHIVAL
  Query tasks Done or Cancelled WHERE completed_at/cancelled_at + 7 days < now
  → Request OTM to execute TT-12
  → sw1_update_fields: status = "Archived"
  → sw1_post_audit: "Status: <previous> → Archived (TT-12). Auto-archived after 7 days."

ACT-W5: AGENT HEARTBEAT
  Query agents WHERE last_seen > 2h ago
  → OE-05: Alert admin (agent may be down). No state change.

End-to-End Flow Diagrams

Flow 1a — Task Assigned, Agent Idle

Orchestrator sets assigned_to on New task
  |
[SE-1] receives list_item_updated → forwards raw event to OTM (IE-01)
  |
[OTM-0] parses: assigned_to changed → routes to OTM-2
  |
[OTM-2] checks registry: agent is IDLE
  |  sw1_update_fields: subtasks_remaining, status = "In Progress", assigned_at
  |  sw1_post_audit: "Status: New → Assigned → In Progress (TT-01, TT-02)"
  |  executes AT-01: agent → busy
  |  sends SURE notification (OE-01)
  |  sw1_post_audit: "OTM → Agent(<name>): Task assigned. Awaiting SURE ack."
  |
Agent receives notification
  |  sends acknowledgement (IE-03)
  |
[OTM] receives IE-03
  |  sw1_post_audit: "Agent(<name>): Task acknowledged (IE-03)"
  |
Agent starts work

Flow 1b — Task Assigned, Agent Busy

Orchestrator sets assigned_to on New task
  |
[SE-1] → IE-01 → [OTM-0] → routes to OTM-2
  |
[OTM-2] checks registry: agent is BUSY
  |  sw1_update_fields: status = "Pending"
  |  sw1_post_audit: "Status: Assigned → Pending (TT-03). Agent busy with <current_task>"
  |
Task waits silently. No agent notification.

Flow 2a — Subtask Done (not last)

Agent completes subtask → reports to OTM API (IE-02)
  |
[OTM-3] validates: subtask belongs to task, not already completed
  |  sw1_update_fields: subtask.todo_completed = true (Col00), subtask.status = done
  |  decrements subtasks_remaining (3 → 2)
  |  sw1_update_fields: subtasks_remaining
  |  sw1_post_audit: "Agent(<name>): Subtask done — <title>. Remaining: 2"
  |
Agent continues working.

Flow 2b — Last Subtask (task complete)

Agent completes final subtask → reports to OTM API (IE-02)
  |
[OTM-3] decrements (1 → 0)
  |  sw1_update_fields: status → "Agent Done", completed_at
  |  sw1_post_audit: "Status: In Progress → Agent Done (TT-04). All subtasks complete."
  |  executes AT-02: agent → idle
  |  notifies Orchestrator (OE-02)
  |  sw1_post_audit: "OTM → Orchestrator: Task ready for review"
  |  check_next_task_for_agent()
  |    → Pending task found? → TT-05 → OTM-0 → OTM-2 → Flow 1a
  |    → No pending? → agent stays idle

Flow 3a — Orchestrator Validates

Orchestrator tells OTM to validate task (IE-04)
  |
[OTM-4] verifies status = "Agent Done"
  |  sw1_post_audit: "Orchestrator: Validation requested"
  |  sw1_update_fields: status → "Done", validated_at, todo_completed = true (Col00)
  |  sw1_post_audit: "Status: Agent Done → Done (TT-06). Validated."
  |  notifies Orchestrator: confirmed (OE-02)

Flow 3b — Orchestrator Rejects and Requests Rework

Step 1: Orchestrator tells OTM to reject task (IE-05)
  |
[OTM-4] sw1_post_audit: "Orchestrator: Rejection. Reason: <reason>"
  |  sw1_update_fields: status → "Rejected"
  |  sw1_post_audit: "Status: Agent Done → Rejected (TT-07)"

Step 2: Orchestrator manages subtasks directly in Slack (NOT via OTM):
  |  - Deletes unnecessary/obsolete subtasks
  |  - Leaves completed subtasks in place (as record)
  |  - Creates new subtasks:
  |      1. "Acknowledge rework request: <detailed reason and instructions>"
  |      2. "Fix validation on email field"
  |      3. "Add unit tests for edge cases"

Step 3: Orchestrator tells OTM that rework is ready (IE-06)
  |
[OTM-4] receives rework signal:
  |  Counts unfinished subtasks via Slack API → 3
  |  sw1_update_fields: status → "Assigned", subtasks_remaining = 3
  |  sw1_post_audit: "Status: Rejected → Assigned (TT-08). Rework: 3 unfinished subtasks."

Step 4: Normal assignment flow (TT-02 → SURE → agent works)
  |
[OTM-0] detects Assigned → OTM-2 → agent idle → In Progress
  |  SURE notification posted to Slack task conversation (OE-01)
  |  Agent acknowledges (IE-03)
  |
Agent reads first subtask: "Acknowledge rework request: ..."
  |  Agent closes first subtask to acknowledge (IE-02)
  |  OTM-3 decrements: 3 → 2
  |  sw1_post_audit: "Agent(<name>): Acknowledged rework. Remaining: 2"
  |
Agent works through remaining subtasks normally

Flow 4 — Cancellation

Orchestrator tells OTM to cancel (IE-06) / Human changes status in Slack UI (IE-01)
  |
[OTM-0] routes to OTM-5
  |
[OTM-5] sw1_post_audit: "Cancellation requested by <actor>"
  |  sw1_update_fields: status → "Cancelled", cancelled_at = now
  |  executes AT-03: agent freed if was working
  |  sends cancellation notification (OE-04)
  |  sw1_post_audit: "Status: <previous> → Cancelled (TT-11)"
  |  check_next_task_for_agent() for freed agent

Flow 5 — Auto-Archival (TT-12)

[OTM-6 Watchdog] IE-08: cron tick fires (every 60s)
  |
  +-- ACT-W4: Query tasks WHERE:
  |     (status = "Done" AND validated_at + 7 days < now)
  |     OR (status = "Cancelled" AND cancelled_at + 7 days < now)
  |
  +-- FOR EACH archival candidate:
        Store previous_status
        sw1_update_fields: status → "Archived"
        sw1_post_audit: "Status: <previous> → Archived (TT-12). Auto-archived after 7 days."
        INSERT INTO task_history (snapshot of task fields)
        DELETE task from active tasks (SQLite only — Slack List item untouched)

📌 TT-12 is a watchdog-requested transition (IE-08 trigger). The watchdog detects the 7-day threshold; the OTM executes the archive. Archived tasks are snapshotted to task_history for long-term reporting before being removed from the active tasks table.

📌 Slack archive limitation: The OTM sets status = "Archived" in Slack via SW-1, but the actual "Archive item" action in the Slack UI cannot be triggered via API (slackLists.items.archive does not exist). Manual Slack UI archiving is required for items to visually disappear from the default Slack list view.

PART 3 — FILE-BASED PIPELINE

The file-based pipeline is the operational implementation of Systems 1–4. It bridges the Orchestrator's task-creation workflow to the Slack List and agent workspaces, using JSON files as the intermediary to avoid direct Slack API calls by agents.

System 1: Task Creation (`otm-create-task.sh`)

The Orchestrator (Claudia) creates tasks by running a shell script. This populates a JSON file and atomically increments the task ID counter.

Script Parameters

Parameter	Required	Default	Description
`--title`	✅	—	Short, actionable one-liner
`--agent`	✅	—	Who works on it: `claudia`, `devdas`, `archibald`, `frederic`, `salvatore`, `sylvain`, `rupert`
`--priority`	❌	`normal`	`critical` \| `high` \| `normal` \| `medium` \| `low` \| `batchable`
`--project`	❌	(none)	Free-text project name (e.g. `prj-012`, `fab-state`)
`--type`	❌	`action`	`action` \| `decision` \| `review`
`--subtask`	❌	`"Confirm that task has been done"`	Repeatable. Each value becomes a Slack subtask. If none provided, a default confirmation subtask is auto-added (required for completion detection)

Examples

# Simple action task
otm-create-task.sh --title "Fix login bug" --agent devdas --priority high --project prj-012

# Decision for Rupert
otm-create-task.sh --title "Approve budget for Q2" --agent rupert --type decision

# Task with subtasks
otm-create-task.sh --title "Build login page" --agent devdas --project prj-012 \
  --subtask "Create login form component" \
  --subtask "Add validation logic" \
  --subtask "Write unit tests"

# Input/acknowledgement pattern (replaces --description)
otm-create-task.sh --title "Review API design" --agent frederic --type review \
  --subtask "input: the API spec is at docs/api-v2.md" \
  --subtask "Check endpoint naming conventions" \
  --subtask "Validate error response format"

# Batchable priority (can wait for batch processing)
otm-create-task.sh --title "Update documentation" --agent archibald --priority batchable

JSON File Format (intermediate)

Written to ~/Library/Application Support/OpenClaw/otm/new-tasks/<timestamp>-<uuid>.json

{
  "id": "uuid",
  "taskId": "T-00035",
  "title": "Short task title",
  "agent": "devdas",
  "createdAt": "2026-03-14T08:20:19Z",
  "priority": "normal",
  "project": "prj-012",
  "subtasks": ["Subtask 1", "Subtask 2"],
  "type": "action",
  "status": "pending"
}

Task ID System

Format: T-NNNNN (5 digits, zero-padded)
Counter file: ~/Library/Application Support/OpenClaw/otm/next-task-id.json
Atomic increment with file locking (flock)
Auto-assigned by otm-create-task.sh — agents don't manage IDs manually

Type Values

Type	Purpose	Primary user
`action`	Task to execute (default)	Agents
`decision`	Requires a decision from someone	Rupert
`review`	Needs review / approval	Rupert or agents

System 2: Task Injection (`otm-injector.js`)

The injector watches the new-tasks/ directory and publishes JSON task files to the Slack Lists API.

Triggers

Component	File	Trigger
Watcher	`ai.openclaw.otm-watcher.plist`	WatchPaths on `new-tasks/` (and `task-updates/`)
Sweeper	`ai.openclaw.otm-sweeper.plist`	Every 10 min (catches misses)

Idempotency (Dedup Check)

The injector includes a dedup check before creating tasks in Slack to prevent duplicates on crash/retry:

Before calling slackLists.items.create, the injector fetches all existing items
Scans for a matching Task ID (Col0ALVK2NA1E)
If the Task ID already exists → skips creation, moves file to processed/ with _skipped_duplicate: true
If the dedup API call fails → proceeds with creation (better to duplicate than lose a task)

This prevents duplicate Slack items when the watcher or sweeper re-processes a file already injected (e.g., after a crash, retry, or race condition).

Directories

Path	Purpose
`~/Library/Application Support/OpenClaw/otm/new-tasks/`	Inbox — pending task files
`~/Library/Application Support/OpenClaw/otm/task-updates/`	Inbox — update files (System 3b)
`~/Library/Application Support/OpenClaw/otm/processed/`	Successfully injected
`~/Library/Application Support/OpenClaw/otm/failed/`	Failed (with error metadata)

System 3: Task Dispatcher (`otm-dispatcher.js`)

A lightweight Node.js scanner that detects tasks with an assignee and dispatches them to the appropriate agent workspace. Runs every 2 minutes via launchd.

What it does

Fetches all tasks from the Slack list (F0ALE8DCW1F)
Finds tasks where: status = "new" AND assignee is set (not empty)
For each matching task (in this exact order for crash safety):
1. Writes a dispatch file to the agent's workspace: /Volumes/OPENCLAW/CLAUDIA/rapido-openclaw/workspaces/<agent>-workspace/task-dispatch.json
2. Updates the task's status in Slack from new → assigned + sets assigned_at (this is the dedup gate — once assigned, future runs skip it)
3. Triggers the agent session via Gateway WebSocket RPC
After all tasks processed, all agent triggers fire in parallel — agents start concurrently
Writes its own component state file (System 5 heartbeat)

Agent Triggering via Gateway WebSocket RPC

After writing task-dispatch.json, the dispatcher actively triggers each agent via the OpenClaw Gateway WebSocket RPC. This eliminates the passive "wait for heartbeat" gap — agents start working immediately.

Protocol: WebSocket JSON-RPC to ws://127.0.0.1:18789

Dispatcher                    Gateway                     Agents
  │                              │                          │
  │ ws.connect()                 │                          │
  │─────────────────────────────►│                          │
  │                              │                          │
  │ { method: "agent",           │                          │
  │   params: {                  │                          │
  │     agentId: "archibald",    │                          │
  │     message: "Task T-00044   │                          │
  │       dispatched...",        │                          │
  │     idempotencyKey:          │                          │
  │       "otm-T-00044"         │                          │
  │   }}                         │                          │
  │─────────────────────────────►│ ──► archibald session ──►│
  │                              │                          │
  │ { method: "agent", ...       │                          │
  │   agentId: "devdas" }        │                          │
  │─────────────────────────────►│ ──► devdas session ─────►│
  │                              │                          │
  │ { method: "agent", ...       │                          │
  │   agentId: "salvatore" }     │                          │
  │─────────────────────────────►│ ──► salvatore session ──►│
  │                              │                          │
  │ ws.close()                   │     (all 3 run in        │
  │─────────────────────────────►│      parallel)           │
  │                              │                          │
  │  Dispatcher exits.           │  Gateway manages         │
  │  Total time: ~2 seconds.     │  concurrent sessions.    │

RPC call format:

{
  "method": "agent",
  "params": {
    "message": "Task T-00044 dispatched. Check task-dispatch.json and execute it.",
    "agentId": "archibald",
    "idempotencyKey": "otm-T-00044"
  }
}

Response (immediate, non-blocking):

{
  "runId": "otm-T-00044",
  "status": "accepted",
  "acceptedAt": 1773504558104
}

Key properties:

Parallel: All agent sessions start concurrently — no serialization
Non-blocking: Gateway returns accepted immediately; dispatcher doesn't wait
Auth: Gateway token passed via WebSocket connection headers
Rupert excluded: Human users are notified via Slack UI, not RPC

⚠️ idempotencyKey is NOT idempotent. Despite the name, the Gateway agent RPC method accepts duplicate calls with the same key — it uses the key as a runId label only. Calling twice with the same key = two separate agent sessions. Idempotency is the dispatcher's responsibility, not the gateway's. The Slack status flip (new → assigned) is the sole dedup mechanism for the dispatcher. Once a task is assigned, it's invisible to future dispatcher runs.

vs. HTTP Webhook alternative (POST /hooks/agent): The webhook approach serializes agent sessions on CommandLane.Nested — agents run one at a time. With 6 agents × ~3 min each = ~18 min sequential vs. ~3 min parallel via WebSocket RPC. WebSocket is the correct choice for multi-agent dispatch.

Dispatch Operation Order (crash safety)

The dispatcher must execute operations in this exact order to prevent duplicate agent triggers:

1. Write task-dispatch.json to agent workspace
2. Update Slack status: new → assigned (+ set assigned_at)
3. Trigger agent via WS RPC

Why this order matters:

Crash point	Result	Recovery
After step 1, before step 2	File written, Slack still `new`	Next dispatcher run re-dispatches → file already has the task (append-only), agent gets triggered. Safe but duplicate file entry.
After step 2, before step 3	Slack says `assigned`, agent never woke up	Agent picks up task on next heartbeat or manual trigger. Safe — delayed but not lost.
After step 3	All done	Clean path.

The dangerous alternative (trigger agent FIRST, then update Slack) risks: crash after trigger → Slack still new → next run triggers agent AGAIN → duplicate work, wasted tokens. This is why Slack update must come before agent trigger.

Dispatch File Format (`task-dispatch.json`)

{
  "dispatched": [
    {
      "taskId": "T-00033",
      "title": "Build login page",
      "priority": "high",
      "project": "PRJ-012 App",
      "type": "action",
      "subtasks": ["Create form", "Add validation", "Write tests"],
      "dispatchedAt": "2026-03-14T15:00:00Z",
      "slackItemId": "Rec0ALXYZ"
    }
  ]
}

The dispatch file is APPEND-ONLY — new tasks get added to the dispatched array. The agent removes entries when they pick them up (or marks them as "picked": true).

Agent-Side Convention

On session start, agents check for task-dispatch.json. If present, they:

Pick up the highest-priority task
Update their agent-state.json to "working" with the task
Start working on it
When done, mark subtasks complete via otm-update-task.sh

Agent Workspace Mapping

Agent	Workspace path
claudia	`claudia-workspace`
devdas	`devdas-workspace`
archibald	`archibald-workspace`
frederic	`frederic-workspace`
salvatore	`salvatore-workspace`
sylvain	`sylvain-workspace`
rupert	(skip — human, notified via Slack UI)

Component

Item	Detail
File	`otm-dispatcher.js`
Plist	`ai.openclaw.otm-dispatcher.plist`
Schedule	Every 120 seconds
Log	`~/Library/Logs/OpenClaw/otm-dispatcher.log`
State file	`otm-dispatcher-state.json`

System 3b: Task Updates (`otm-update-task.sh`)

A file-drop mechanism that allows agents to update task status and mark subtasks complete — same security model as task creation (agents never touch the Slack API directly).

Script: `otm-update-task.sh`

# Mark a subtask done
otm-update-task.sh --task-id T-00033 --subtask-done "Create login form"

# Update task status
otm-update-task.sh --task-id T-00033 --status in_progress

# Report blocked
otm-update-task.sh --task-id T-00033 --status blocked --reason "Waiting on API key"

# Multiple subtasks done at once
otm-update-task.sh --task-id T-00033 \
  --subtask-done "Create login form" \
  --subtask-done "Add validation logic"

Parameters

Parameter	Required	Description
`--task-id`	✅	Task ID (`T-NNNNN`)
`--status`	❌	New status: `in_progress` \| `blocked` \| `agent_done`
`--subtask-done`	❌	Subtask title to mark as done (repeatable)
`--reason`	❌	Reason text (used with `--status blocked`)

At least one of --status or --subtask-done is required.

JSON Format (intermediate)

Written to ~/Library/Application Support/OpenClaw/otm/task-updates/<timestamp>-<uuid>.json

{
  "id": "uuid",
  "taskId": "T-00033",
  "action": "update",
  "createdAt": "2026-03-14T16:00:00Z",
  "status": "in_progress",
  "subtasksDone": ["Create login form"],
  "reason": null
}

Processing

The otm-injector.js is extended to also watch task-updates/ directory:

Reads the update JSON
Looks up the task in Slack by Task ID (scans items, matches Col0ALVK2NA1E)
If status is set → updates the Status column
If subtasksDone is set → finds matching child items by title → sets their status to done AND sets Col00 (checkbox) to true
Moves file to processed/ on success, failed/ on error

Component

Item	Detail
Script	`otm-update-task.sh` (shared tool)
Processor	`otm-injector.js` (extended)
Watcher	`ai.openclaw.otm-watcher.plist` (updated WatchPaths)

System 4: Completion Detection (`otm-completion-detector.js`)

A lightweight scanner that auto-promotes tasks to agent_done when all subtasks are complete.

Every task has at least one subtask — otm-create-task.sh auto-adds "Confirm that task has been done" if no subtasks are provided. This guarantees the completion detector always has a signal.

Flow

Completion Detector (Node.js, launchd every 2 min)
  │
  ├── GET all tasks from Slack list
  ├── FILTER: status = "in_progress" + has child items (subtasks)
  ├── CHECK: all child items have status ∈ {done, agent_done} OR Col00 = true
  │
  └── YES → UPDATE parent task status → "agent_done"

Rules

Condition	Action
Task `in_progress` + all subtasks `done`/`agent_done`	→ promote to `agent_done`
Task `in_progress` + some subtasks still open	→ skip (work in progress)
Task not `in_progress`	→ skip (not active)

What happens after `agent_done`

Technical tasks: Claudia validates the work → done
Business/decision tasks: Claudia creates a review task for Rupert → Rupert approves → done

Component

Item	Detail
File	`otm-completion-detector.js`
Plist	`ai.openclaw.otm-completion-detector.plist`
Schedule	Every 120 seconds
Log	`~/Library/Logs/OpenClaw/otm-completion-detector.log`

System 5: Component Heartbeats

Each OTM component writes a state file after every run. These files are watched by the collector and surfaced on the dashboard.

State File Format

{
  "component": {
    "id": "otm-injector",
    "status": "alive",
    "lastRun": "2026-03-14T10:15:00Z",
    "result": "success",
    "details": "Processed 2 tasks, 0 failures"
  }
}

State Files

Component	File	Written by
OTM Injector	`otm-injector-state.json`	`otm-injector.js` (end of each run)
Dispatcher	`otm-dispatcher-state.json`	`otm-dispatcher.js` (end of each run)
Completion Detector	`otm-completion-detector-state.json`	`otm-completion-detector.js` (end of each run)
Watcher	`otm-watcher-state.json`	watcher wrapper

All files: ~/Library/Application Support/OpenClaw/otm/

Dashboard

The OTM Components card shows each component's status with staleness color coding:

🟢 Green: last run < 5 min ago
🟡 Yellow: last run 5–15 min ago
🔴 Red: last run > 15 min ago (action required)

Observability Stack

State file → collector.js (FSEvents) → SQLite otm_state table
                                             ↓
                               reader.js polls + broadcasts
                                             ↓
                                    Dashboard WebSocket

System 6: DMZ Relay

The DMZ relay bridges the private OpenClaw state to the public Vercel dashboard. It runs on a Synology NAS in the DMZ.

Architecture

OpenClaw VM (private)     Synology NAS (DMZ)           Browser (Vercel dashboard)
  │                         ┌──────────────────┐
  │  collector.js           │  RECEIVER         │
  │  pushToRelay()          │  127.0.0.1:3456   │
  │ ── HTTP POST ─────────► │  + bearer token   │
  │  on state change         │  + atomic write   │
  │                          │                   │
  │                          │  fab-state.json   │ ← shared state file
  │                          │                   │
  │                          │  BROADCASTER      │
  │                          │  0.0.0.0:3457     │
  │                          │  (TLS via proxy)  │ ◄── wss://nas.domain/ws
  │                          │  fs.watch → push  │ ◄── GET /api/state
  │                          └──────────────────┘

Services

Service	File	Port	Binding	Dependencies
Receiver	`receiver.js`	3456	`127.0.0.1` (localhost only)	Zero — pure Node.js
Broadcaster	`broadcaster.js`	3457	`0.0.0.0` (behind TLS proxy)	`ws` package only

Env Vars

Service	Var	Required	Description
Receiver	`FAB_RELAY_TOKEN`	✅	Shared secret bearer token
Receiver	`STATE_FILE`	❌	Path to state file (default: `./fab-state.json`)
Broadcaster	`STATE_FILE`	❌	Same state file path
Collector (OpenClaw)	`FAB_RELAY_URL`	❌	If set, enables relay push
Collector (OpenClaw)	`FAB_RELAY_TOKEN`	❌	Must match receiver token

Security Model

Layer	Protection
Bearer token	Constant-time comparison (timing-attack safe)
Receiver binding	`127.0.0.1` — not reachable from internet
Firewall	Port 3456: allow ONLY from OpenClaw VM IP
Broadcaster	Read-only; no auth needed (non-sensitive data)
TLS	All external traffic via Synology reverse proxy + Let's Encrypt
Atomic write	Receiver writes `.tmp` → rename; no partial reads

Collector Integration

// Only runs if FAB_RELAY_URL is set:
pushStateToRelay(); // called after every gateway/agent/OTM state change

The pushToRelay() function in collector.js:

Builds full snapshot from SQLite (gateway + agents + OTM)
POSTs to FAB_RELAY_URL with bearer token
Handles errors gracefully — relay down = warning log, not crash

Files

File	Location
`receiver.js`	`work/PROJECTS/fab-state/synology-relay/`
`broadcaster.js`	`work/PROJECTS/fab-state/synology-relay/`
`package.json`	`work/PROJECTS/fab-state/synology-relay/`
`SETUP-GUIDE.md`	`work/PROJECTS/fab-state/synology-relay/`

See work/PROJECTS/strategy-openclaw-org/docs/FAB-STATE.md System 6 for full documentation.

Full Pipeline Flow

Claudia              Filesystem            Injector         Slack            Dispatcher           Agent Workspace
  │                     │                     │               │                  │                      │
  │ otm-create-task.sh  │                     │               │                  │                      │
  │────────────────────►│ .json               │               │                  │                      │
  │                     │─── watcher ────────►│               │                  │                      │
  │                     │                     │ items.create  │                  │                      │
  │                     │                     │──────────────►│ status=new       │                      │
  │                     │                     │  + subtasks   │                  │                      │
  │                     │ move to processed/  │               │                  │                      │
  │                     │◄────────────────────│               │                  │                      │
  │                     │                     │               │                  │                      │
  │                     │                     │               │ ◄── scan ───────│ (every 2 min)        │
  │                     │                     │               │ new + assignee   │                      │
  │                     │                     │               │ ── update ──────►│                      │
  │                     │                     │               │ status=assigned  │                      │
  │                     │                     │               │                  │ task-dispatch.json   │
  │                     │                     │               │                  │─────────────────────►│
  │                     │                     │               │                  │                      │
  │                     │                     │               │                  │ WS RPC: agent()      │
  │                     │                     │               │                  │──► Gateway ──► Agent │
  │                     │                     │               │                  │   (parallel start)   │
  │                     │                     │               │                  │                      │
  │                     │                     │               │                  │      (agent works)   │
  │                     │                     │               │                  │                      │
  │                     │                     │               │ ◄── scan (completion detector, 2 min)  │
  │                     │                     │               │ in_progress +    │                      │
  │                     │                     │               │ all subtasks done│                      │
  │                     │                     │               │ → agent_done     │                      │

All Pipeline Components Summary

Component	File	Trigger
Task creator	`otm-create-task.sh`	Called by agents
Injector	`otm-injector.js`	Called by watcher/sweeper
Watcher	`ai.openclaw.otm-watcher.plist`	WatchPaths on `new-tasks/` + `task-updates/`
Sweeper	`ai.openclaw.otm-sweeper.plist`	Every 10 min (catches misses)
Dispatcher	`otm-dispatcher.js`	Every 2 min (launchd)
Completion detector	`otm-completion-detector.js`	Every 2 min (launchd)

8. Error Monitor (OTM-7)

The Error Monitor is a dedicated OTM component that detects state inconsistencies, traces errors for lessons learned, and triggers corrective actions. It runs as part of the Watchdog cycle (IE-08, every 60s) but is logically separate from the Watchdog's operational checks (OTM-6).

8.1 Error Condition Catalogue

Each error condition is coded ERR-xx. The Error Monitor detects; the OTM corrects.

Code	Condition	Detection Rule	Severity	Auto-correction	Manual escalation
ERR-01	Stale In Progress	Task status = "In Progress" AND no IE-02 subtask report in >10 minutes	Warning	None — alert only	OE-05: Admin notified. May indicate agent crash, stuck task, or slow work.
ERR-02	Agent-Task Mismatch (busy agent, no task)	Agent `status = busy` AND `current_task` not found in Slack List (or task status ≠ In Progress)	Critical	Set agent idle (AT-04), call `check_next_task_for_agent()`	OE-05: Admin alert with details of orphaned agent state
ERR-03	Agent-Task Mismatch (idle agent, active task)	Task status = "In Progress" AND assigned agent `status = idle` in registry	Critical	Set agent busy, re-send SURE notification (OE-01)	OE-05: Admin alert — state was inconsistent
ERR-04	Orphaned Pending Task	Task status = "Pending" AND assigned agent `status = idle`	High	Promote task: TT-05 → re-evaluate via OTM-2	OE-07: Audit log on task
ERR-05	Counter Mismatch	`subtasks_remaining` ≠ actual count of unchecked subtasks in Slack List	High	Recalculate and fix counter via SW-1	OE-05: Admin alert with old/new values
ERR-06	SURE Timeout	OE-01 sent >3 minutes ago (1min + 2min retries) AND no IE-03 received	Critical	None — task stays In Progress	OE-05: Admin alert. Agent may be unreachable, or gateway may have restarted.
ERR-07	Multiple Active Tasks per Agent	Agent has >1 task with status = "In Progress" assigned to them	Critical	Keep oldest task, move others to Pending	OE-05: Admin alert — invariant violation
ERR-08	Stuck in Assigned	Task status = "Assigned" for >5 minutes (should transition immediately to In Progress or Pending)	High	Re-trigger OTM-2 for the task	OE-05: Admin alert if re-trigger fails
ERR-09	Stuck in Rejected	Task status = "Rejected" for >24 hours (Orchestrator hasn't submitted rework or cancelled)	Warning	None — alert only	OE-05: Remind Orchestrator to act
ERR-10	Ghost Agent	`assigned_to` field references a Slack user ID not in agent registry AND not auto-registerable	Critical	Task stays in current status	OE-05: Admin alert — unknown agent
ERR-11	Duplicate Subtask Reports	Same subtask_id reported done >1 time (idempotency check caught it)	Info	Silently discarded (idempotent)	Logged in event log (§9) for pattern analysis
ERR-12	Stale Rework (no ack subtask closed)	Task in "In Progress" after TT-08 rework, first subtask ("Acknowledge rework…") not closed within 10 minutes	Warning	None — alert only	OE-05: Agent may not have read rework instructions

8.2 Error Monitor Processing

OTM-7 runs every 60s (piggybacks on IE-08 watchdog cron):
  |
  FOR EACH error check ERR-01 through ERR-12:
    |
    +-- Run detection query (SQLite + Slack API as needed)
    |
    +-- IF condition detected:
    |     1. Log error to event_log table (§9): {error_code, task_id, agent_id, details, timestamp}
    |     2. Log to task conversation (OE-07): "[timestamp] OTM-7 ERROR: ERR-xx detected — <description>"
    |     3. IF auto-correctable: execute correction, log correction action
    |     4. IF manual escalation: send OE-05 alert to admin
    |     5. Increment error counter in error_stats table (§10)
    |
    +-- IF condition NOT detected: skip

8.3 Lessons Learned Pipeline

Errors are not just fixed — they feed a continuous improvement loop.

Error Statistics Table (error_stats in error DB, see §10): Tracks frequency, first/last occurrence, auto-correction success rate per ERR-xx code.
Daily Error Report: OTM-7 generates a summary during the first watchdog cycle after 00:00 each day:
- Error counts by code (ERR-01 through ERR-12)
- Most frequent errors
- Auto-correction success/failure ratio
- New error patterns (first-time occurrences)
Threshold Alerts: Alert on EVERY error (threshold = 1). During startup and early operation, all anomalies are surfaced immediately via OE-05. The threshold can be raised once the system is stable and baseline error rates are understood.
Root Cause Tagging: Admin can tag errors with root cause via OTM API (POST /api/otm/error/{id}/tag), enabling aggregate analysis.

8.4 Corrective Action Summary

Action	Triggered by	Effect
Re-send SURE notification	ERR-03 (idle agent, active task)	Resynchronise agent with its task
Promote pending task	ERR-04 (orphaned pending)	Unblock queued work
Recalculate counter	ERR-05 (counter mismatch)	Fix data integrity
Free orphaned agent	ERR-02 (busy agent, no task)	Unblock agent for new work
Re-trigger OTM-2	ERR-08 (stuck in Assigned)	Retry the assignment flow
Move excess tasks to Pending	ERR-07 (multiple active tasks)	Restore single-task invariant

9. Event Logging & Observability

All OTM events are logged to three complementary systems:

Slack task conversation (OE-07) — human-readable, per-task, searchable in Slack (§7)
Internal event log (event_log table in error DB) — structured, queryable, machine-readable
Internal event log files — filesystem mirrors of database writes, for real-time tailing during tests and startup (§9.6)

9.1 Event Log Schema

CREATE TABLE event_log (
  id INTEGER PRIMARY KEY AUTOINCREMENT,
  timestamp INTEGER NOT NULL,          -- Unix timestamp (ms precision)
  event_type TEXT NOT NULL,            -- 'inbound' | 'outbound' | 'transition' | 'error' | 'correction' | 'system'
  event_code TEXT NOT NULL,            -- IE-xx, OE-xx, TT-xx, AT-xx, ERR-xx
  task_id TEXT,                        -- Slack List item ID (NULL for system events)
  agent_id TEXT,                       -- slack_user_id (NULL if not agent-related)
  handler TEXT,                        -- OTM-0 through OTM-7, SE-1, SW-1
  source TEXT NOT NULL,                -- 'se1', 'otm_api', 'watchdog', 'error_monitor', 'internal'
  detail TEXT,                         -- JSON blob with event-specific data
  duration_ms INTEGER,                 -- Processing time for this event
  success INTEGER DEFAULT 1,           -- 1 = success, 0 = failure
  error_message TEXT                   -- Error details if success = 0
);

CREATE INDEX idx_event_log_task ON event_log(task_id);
CREATE INDEX idx_event_log_agent ON event_log(agent_id);
CREATE INDEX idx_event_log_type ON event_log(event_type, timestamp);
CREATE INDEX idx_event_log_code ON event_log(event_code, timestamp);
CREATE INDEX idx_event_log_time ON event_log(timestamp);

9.2 What is Logged

Every IE-xx, OE-xx, TT-xx, AT-xx, and ERR-xx event produces one row in event_log. This includes:

Event Category	Examples	Logged Fields
Inbound events	IE-01 (Slack event), IE-02 (subtask report), IE-03 (SURE ack)	source, task_id, agent_id, raw payload in detail
Outbound events	OE-01 (SURE notification), OE-06 (Slack write)	target, task_id, delivery status, duration_ms
Task transitions	TT-01 through TT-15	from_status, to_status, task_id, requesting_actor
Agent transitions	AT-01 through AT-04	from_status, to_status, agent_id, triggering_task
Errors	ERR-01 through ERR-12	error_code, detection_details, correction_applied
System events	Startup, reconciliation, watchdog cycle	cycle_number, checks_run, anomalies_found

9.3 Observability Queries

The event log enables:

-- Task lifecycle: all events for a specific task
SELECT * FROM event_log WHERE task_id = ? ORDER BY timestamp;

-- Agent activity: all events involving a specific agent
SELECT * FROM event_log WHERE agent_id = ? ORDER BY timestamp;

-- Error frequency: last 24 hours
SELECT event_code, COUNT(*) as count
FROM event_log
WHERE event_type = 'error' AND timestamp > ?
GROUP BY event_code ORDER BY count DESC;

-- Average task completion time
SELECT AVG(e2.timestamp - e1.timestamp) / 1000 / 60 as avg_minutes
FROM event_log e1
JOIN event_log e2 ON e1.task_id = e2.task_id
WHERE e1.event_code = 'TT-02' AND e2.event_code = 'TT-04';

-- Slowest handlers (performance monitoring)
SELECT handler, AVG(duration_ms), MAX(duration_ms), COUNT(*)
FROM event_log
WHERE duration_ms IS NOT NULL
GROUP BY handler ORDER BY AVG(duration_ms) DESC;

-- SURE acknowledgement response times
SELECT AVG(ack.timestamp - notif.timestamp) / 1000 as avg_seconds
FROM event_log notif
JOIN event_log ack ON notif.task_id = ack.task_id
WHERE notif.event_code = 'OE-01' AND ack.event_code = 'IE-03';

9.4 Retention & Historicisation

Data	Retention	Archive strategy
`event_log` (active)	30 days	Rows older than 30 days → `event_log_archive`
`event_log_archive`	1 year	Monthly SQLite dump to filesystem (gzipped)
`error_stats`	Indefinite	Cumulative counters, never purged
Slack conversation audit	Indefinite	Lives in Slack (Slack's retention policy applies)

Maintenance job (runs during OTM-6 watchdog, daily at 03:00):

-- Move old events to archive
INSERT INTO event_log_archive SELECT * FROM event_log WHERE timestamp < (now - 30 days);
DELETE FROM event_log WHERE timestamp < (now - 30 days);

-- Vacuum to reclaim space
VACUUM;

9.5 Why Not Sentry?

Internal structured logging (SQLite) is chosen over Sentry because:

No external dependency — OTM is self-contained
Queryable — SQL enables arbitrary analysis (Sentry requires its query language)
Correlated with task data — same DB, JOIN-able with agent registry
Low volume — estimated <10,000 events/day (see §10), no need for distributed tracing
Cost — zero (SQLite is free; Sentry has per-event pricing)
Privacy — all data stays on the OpenClaw server

If event volume exceeds 100,000/day or distributed tracing across multiple servers becomes needed, Sentry or OpenTelemetry would be reconsidered.

9.6 Filesystem Log File Mirroring

All database writes to event_log and error_stats are mirrored to two filesystem log files in real-time:

File	Content	Format	Purpose
`{OPENCLAW_DATA_DIR}/otm/logs/otm-events.log`	All event_log inserts	`[ISO-timestamp] [event_code] [handler] [task_id] [agent_id] detail_json`	Monitor all OTM activity via `tail -f`
`{OPENCLAW_DATA_DIR}/otm/logs/otm-errors.log`	All ERR-xx detections + corrections	`[ISO-timestamp] [ERR-xx] [severity] [task_id] [agent_id] description [correction: action/none]`	Monitor errors during tests and startup

Implementation: Every INSERT INTO event_log and every error detection in OTM-7 appends one line to the corresponding log file. This is a synchronous append (negligible overhead at <350 events/day).

Log rotation: Daily at 03:00, rename to otm-events.log.YYYY-MM-DD and otm-errors.log.YYYY-MM-DD. Keep 30 days of rotated files. Older files deleted automatically.

Usage during development/testing:

# Watch all OTM events in real time
tail -f {OPENCLAW_DATA_DIR}/otm/logs/otm-events.log

# Watch errors only
tail -f {OPENCLAW_DATA_DIR}/otm/logs/otm-errors.log

# Filter for a specific task
tail -f otm-events.log | grep "T-00042"

# Filter for a specific error code
tail -f otm-errors.log | grep "ERR-03"

📌 The log files are append-only mirrors — the database remains the source of truth for queries and analysis. The files exist purely for human monitoring convenience.

10. Database Design (SQLite)

10.1 Overview

The OTM uses two separate SQLite databases:

Main OTM DB (otm.db) — Agent registry, SURE pending, task history. Core operational state.
Error & Event DB (otm-errors.db) — Event log, error statistics. Monitoring and observability. Separated so that error monitoring is independent from the main OTM application and can be analysed, reset, or rebuilt without affecting operations.

Database files:

{OPENCLAW_DATA_DIR}/otm/otm.db — Main OTM DB
{OPENCLAW_DATA_DIR}/otm/otm-errors.db — Error & Event DB
(Sylvain to confirm exact OPENCLAW_DATA_DIR path)

Library: better-sqlite3 (synchronous, fast, WAL mode for both)

10.2 Tables

Main OTM DB (otm.db):

Table	Purpose	Writer(s)	Reader(s)	Rows (steady state)	Growth rate
`agents`	Agent registry (§4.5)	OTM-1	OTM-0, OTM-2, OTM-3, OTM-5, OTM-6	5–15	Near-zero (new agents rare)
`task_history`	Snapshot of archived tasks (§Flow 5)	OTM-6 (TT-12)	Admin queries, reporting	Growing	~20–50/month
`sure_pending`	Outstanding SURE notifications awaiting ack (§6)	OTM-2, OTM ack handler	OTM-7 (timeout check)	0–5	Transient (cleared on ack/timeout)

Error & Event DB (otm-errors.db):

Table	Purpose	Writer(s)	Reader(s)	Rows (steady state)	Growth rate
`event_log`	Structured event log (§9.1)	All OTM handlers	OTM-7, admin queries	~10,000	~300/day (purged monthly)
`event_log_archive`	Archived events >30 days (§9.4)	Maintenance job	Admin queries only	~100,000	~9,000/month
`error_stats`	Error frequency counters (§8.3)	OTM-7	OTM-7, daily report	12 rows (one per ERR-xx)	Fixed

10.3 Full Schema

Main OTM DB (otm.db):

-- Agent Registry (see §4.5 for column details)
CREATE TABLE agents (
  slack_user_id TEXT PRIMARY KEY,
  otm_display_name TEXT NOT NULL,
  openclaw_agent_id TEXT,
  agent_type TEXT NOT NULL DEFAULT 'ai',
  status TEXT DEFAULT 'idle',
  current_task TEXT,
  task_started_at INTEGER,
  last_seen INTEGER
);

-- Task History (archived tasks snapshot)
CREATE TABLE task_history (
  id INTEGER PRIMARY KEY AUTOINCREMENT,
  task_id TEXT NOT NULL,               -- Original Slack List item ID
  title TEXT NOT NULL,
  assigned_to TEXT,                     -- slack_user_id
  final_status TEXT NOT NULL,          -- "Archived" (from Done or Cancelled)
  previous_status TEXT,                -- Status before archival
  priority INTEGER,
  context TEXT,
  subtask_count INTEGER,               -- Total subtasks at archival time
  created_at INTEGER,                  -- Task creation timestamp
  assigned_at INTEGER,
  completed_at INTEGER,
  validated_at INTEGER,
  cancelled_at INTEGER,
  archived_at INTEGER NOT NULL,        -- When TT-12 executed
  result_summary TEXT,
  total_duration_ms INTEGER,           -- assigned_at → completed_at
  review_duration_ms INTEGER,          -- completed_at → validated_at
  rework_count INTEGER DEFAULT 0,      -- Number of TT-08 rework cycles
  error_count INTEGER DEFAULT 0        -- Number of ERR-xx events during lifecycle
);

CREATE INDEX idx_task_history_agent ON task_history(assigned_to);
CREATE INDEX idx_task_history_status ON task_history(final_status);
CREATE INDEX idx_task_history_archived ON task_history(archived_at);

-- SURE Pending Notifications (see §6)
CREATE TABLE sure_pending (
  task_id TEXT PRIMARY KEY,
  agent_id TEXT NOT NULL,
  notification_type TEXT NOT NULL,     -- 'task_assigned' | 'rework_assigned'
  sent_at INTEGER NOT NULL,            -- First OE-01 sent
  retry_count INTEGER DEFAULT 0,       -- 0, 1, 2, 3 (max)
  last_retry_at INTEGER,
  acknowledged_at INTEGER              -- Set when IE-03 received. NULL = still pending.
);

Error & Event DB (otm-errors.db):

-- Event Log (see §9.1)
CREATE TABLE event_log (
  id INTEGER PRIMARY KEY AUTOINCREMENT,
  timestamp INTEGER NOT NULL,
  event_type TEXT NOT NULL,
  event_code TEXT NOT NULL,
  task_id TEXT,
  agent_id TEXT,
  handler TEXT,
  source TEXT NOT NULL,
  detail TEXT,
  duration_ms INTEGER,
  success INTEGER DEFAULT 1,
  error_message TEXT
);

CREATE INDEX idx_event_log_task ON event_log(task_id);
CREATE INDEX idx_event_log_agent ON event_log(agent_id);
CREATE INDEX idx_event_log_type ON event_log(event_type, timestamp);
CREATE INDEX idx_event_log_code ON event_log(event_code, timestamp);
CREATE INDEX idx_event_log_time ON event_log(timestamp);

-- Event Log Archive (identical schema)
CREATE TABLE event_log_archive (
  id INTEGER PRIMARY KEY,
  timestamp INTEGER NOT NULL,
  event_type TEXT NOT NULL,
  event_code TEXT NOT NULL,
  task_id TEXT,
  agent_id TEXT,
  handler TEXT,
  source TEXT NOT NULL,
  detail TEXT,
  duration_ms INTEGER,
  success INTEGER DEFAULT 1,
  error_message TEXT
);

-- Error Statistics (see §8.3)
CREATE TABLE error_stats (
  error_code TEXT PRIMARY KEY,         -- ERR-01 through ERR-12
  total_count INTEGER DEFAULT 0,
  last_24h_count INTEGER DEFAULT 0,    -- Reset daily by maintenance job
  first_seen INTEGER,                  -- Unix timestamp
  last_seen INTEGER,                   -- Unix timestamp
  auto_corrected_count INTEGER DEFAULT 0,
  escalated_count INTEGER DEFAULT 0
);

10.4 Database Maintenance

Operation	Frequency	Triggered by	Description
Event log rotation	Daily (03:00)	OTM-6 watchdog + time check	Move events >30 days to `event_log_archive`
Archive export	Monthly (1st, 03:30)	OTM-6 watchdog + date check	Dump `event_log_archive` to gzipped SQL file on disk, then TRUNCATE
Error stats reset	Daily (00:00)	OTM-6 watchdog + time check	Reset `last_24h_count` to 0 for all ERR-xx rows
SURE cleanup	Every 60s	OTM-7 error monitor	Remove `sure_pending` rows where `acknowledged_at` IS NOT NULL and >1 hour old
VACUUM	Weekly (Sunday 03:00)	OTM-6 watchdog + day check	Reclaim disk space after deletions
WAL checkpoint	Automatic	SQLite WAL mode	Handled by `better-sqlite3` automatically
Backup	Daily (04:00)	Sylvain's backup cron	Copy `otm.db` to backup location (standard server backup)

10.5 Volume Estimates

Assumptions: 5 active agents, ~10 tasks created/day, ~3 subtasks/task average, watchdog runs 1,440×/day.

Main OTM DB (otm.db):

Table	Writes/day	Reads/day	Steady-state rows	Disk (est.)
`agents`	~10 (status flips)	~500 (every handler checks registry)	5–15	<1 KB
`task_history`	~1–2 (archival events)	~5 (reporting queries)	~500/year	~500 KB
`sure_pending`	~20 (insert + update on ack)	~1,440 (timeout checks)	0–5 (transient)	<1 KB
Subtotal	~30/day	~1,950/day	~520	<1 MB

Error & Event DB (otm-errors.db):

Table	Writes/day	Reads/day	Steady-state rows	Disk (est.)
`event_log`	~300 (all events)	~50 (error monitor + queries)	~9,000 (30-day window)	~5 MB
`event_log_archive`	~9,000/month (from rotation)	~5/month (admin queries)	~100,000 (1-year window)	~50 MB
`error_stats`	~20 (counter increments)	~1,440 (every watchdog cycle)	12	<1 KB
Subtotal	~320/day	~1,500/day	~109,000	~55 MB

📌 At this scale, SQLite is well within its performance envelope for both databases. The separation means the error DB can be independently analysed, reset, or rebuilt without affecting OTM operations. A weekly VACUUM on each keeps files compact.

10.6 Historicisation Strategy

Main OTM DB (otm.db)
  ├── agents              — live state, small, never archived
  ├── sure_pending        — transient, cleaned hourly
  └── task_history        — growing archive of completed tasks

Error & Event DB (otm-errors.db)
  ├── event_log           — rolling 30-day window
  ├── event_log_archive   — rolling 1-year window
  └── error_stats         — cumulative counters, never purged

Filesystem log files (otm/logs/)
  ├── otm-events.log      — real-time event mirror (rotated daily, 30-day keep)
  └── otm-errors.log      — real-time error mirror (rotated daily, 30-day keep)

Monthly export (filesystem)
  └── {OPENCLAW_DATA_DIR}/otm/archive/
      ├── events-2026-01.sql.gz    — monthly event log dump from otm-errors.db
      ├── events-2026-02.sql.gz
      └── ...

Annual report (generated)
  └── Aggregate stats from task_history (otm.db) + error_stats (otm-errors.db)
      → Feeds into CMMI metrics collection

Non-Functional Requirements

Idempotency

All OTM handlers MUST be idempotent
Subtask completion: check todo_completed (Col00) field before processing (no separate dedup table)
State transitions: verify previous_status matches expected before applying
File pipeline dedup: injector checks existing Slack items by Task ID before creating

State Integrity

previous_status MUST be set before every status change
All status writes go through OTM → SW-1. No direct Slack writes by any actor.
Watchdog requests transitions via OTM handler calls, not direct writes
todo_completed (Col00) MUST be set alongside status when marking items done

Audit Trail

Every OTM event logged in Slack task conversation feed with timestamp (§7)
Slack conversation IS the audit log — no separate log table
All log entries include event code (IE-xx, OE-xx, TT-xx, AT-xx) for traceability

Persistence

Agent registry in SQLite
SQLite DB location: OpenClaw server data directory
Startup reconciliation from openclaw.json + Slack List on OTM restart (§4.6)

Latency

Event handling MUST complete within 5 seconds
Slack API writes SHOULD complete within 5 seconds
Agent notifications SHOULD be sent within 10 seconds
SURE acknowledgement timeout: 1 min (first), 2 min (retry), then error (3 min total)
Dispatcher runs every 2 min — maximum 2 min delay from task creation to agent trigger
Completion detector runs every 2 min — maximum 2 min delay from last subtask to agent_done

Error Handling

Slack API calls: retry up to 3 times with exponential backoff
Unhandled errors: alert admin via OE-05
Tasks MUST NOT be silently lost
All errors logged in task conversation feed (OE-07)
Dispatcher crash: Slack assigned status is the sole dedup gate (§System 3)
Gateway idempotencyKey does NOT provide real idempotency — dispatcher owns dedup

Security

OTM-4 (validate/reject) restricted to registered reviewer agents
All Slack API calls authenticated via bot tokens
DMZ relay uses bearer token + constant-time comparison (§System 6)
Receiver bound to 127.0.0.1 only

Technology Stack

Component	Technology	Owner (who runs it)	Activity frequency	Data volume
OTM backend	TypeScript/Node.js, OpenClaw plugin pipeline	Devdas (builds), Sylvain (deploys)	Continuous — handles all events	~350 events/day processed
File pipeline	Bash scripts + Node.js (injector, dispatcher, detector)	Devdas (builds), Sylvain (deploys)	launchd: watcher (FSEvents), sweeper (10 min), dispatcher (2 min), detector (2 min)	~10 tasks/day through pipeline
SQLite DB	better-sqlite3, WAL mode	OTM (sole writer), Sylvain (backups)	~350 writes/day, ~3,500 reads/day	~55 MB steady state (see §10.5)
SE-1 (event listener)	Slack Events API, socket mode, Bolt SDK	Salvatore's Slack app (`lists:read`)	~50 events/day (Slack → OTM)	<1 KB/event payload
SW-1 (writer)	Slack Web API (`lists:write`)	Salvatore's Slack app (called by OTM)	~200 API calls/day (field updates + audit posts)	<1 KB/call
OTM API	OpenClaw hooks / HTTP endpoints	OTM (receives), Orchestrator + Agents (call)	~100 API calls/day	<1 KB/call
Agent notifications	OpenClaw Gateway WS RPC (AI) / Slack task conversation (Human)	OTM (sends), Agents (receive)	~20 notifications/day	<1 KB/notification
Watchdog + Error Monitor	OpenClaw cron (60s interval)	OTM-6 + OTM-7 (automatic)	1,440 cycles/day	~20 error checks/cycle
Event logging	SQLite `event_log` table (see §9)	OTM (writes), Admin (queries)	~300 events/day, 30-day active window	~5 MB active, ~50 MB archive
DMZ relay	Node.js receiver + broadcaster on Synology NAS	Sylvain (deploys)	On every state change	<1 KB/push
Testing	Vitest, mock Slack API	Devdas (writes + runs)	CI on every PR	—

Component Ownership Map

┌─────────────────────────────────────────────────────┐
│ Slack (Salvatore's Slack App)                       │
│   SE-1: lists:read (event listener)                 │
│   SW-1: lists:write (field updates + audit posts)   │
└──────────────┬──────────────────────────┬───────────┘
               │ IE-01                    ▲ OE-06, OE-07
               ▼                          │
┌─────────────────────────────────────────────────────┐
│ OTM (OpenClaw Plugin Pipeline)                      │
│   OTM-0: Event Router (internal)                    │
│   OTM-1: Agent Registry (internal)                  │
│   OTM-2: Handle Task Assigned                       │
│   OTM-3: Handle Subtask Done                        │
│   OTM-4: Task Validate/Reject                       │
│   OTM-5: Handle Task Cancelled                      │
│   OTM-6: Watchdog (cron, 60s)                       │
│   OTM-7: Error Monitor (cron, 60s)                  │
│                                                     │
│   SQLite DB: agents, event_log, error_stats,        │
│              task_history, sure_pending              │
└──────────────┬──────────────────────────┬───────────┘
               │ OE-01, OE-02, OE-04     ▲ IE-02, IE-03, IE-04–IE-07
               ▼                          │
┌─────────────────────────────────────────────────────┐
│ OpenClaw Agents                                     │
│   Orchestrator (Claudia): IE-04, IE-05, IE-06, IE-07│
│   Agents (Devdas, etc.): IE-02, IE-03              │
│   Human (Rupert): via Slack UI → SE-1 → IE-01      │
└─────────────────────────────────────────────────────┘

File Pipeline (Part 3):
┌─────────────────────────────────────────────────────┐
│ otm-create-task.sh → new-tasks/ → otm-injector.js  │
│   → Slack List (status=new)                         │
│   → otm-dispatcher.js → task-dispatch.json          │
│   → Gateway WS RPC → Agent sessions (parallel)      │
│   → otm-update-task.sh → task-updates/             │
│   → otm-completion-detector.js → agent_done        │
└─────────────────────────────────────────────────────┘

11. Cost Analysis

11.1 OTM Infrastructure Cost

Component	Cost	Notes
Slack Pro	1 user license	Already paid. Gives API access (SE-1 + SW-1). No per-API-call cost.
SQLite	$0	Open source, embedded. No server, no license.
Node.js / TypeScript	$0	Open source runtime.
Bolt SDK	$0	Open source Slack SDK.
better-sqlite3	$0	Open source library.
OpenClaw	$0 (incremental)	OTM runs as a plugin inside the existing gateway. No additional instance.
Filesystem logging	$0	Append to local files.
DMZ relay	$0 (incremental)	Runs on existing Synology NAS.
Total OTM cost	$0 incremental	Only pre-existing Slack Pro license required.

11.2 AI Usage by the OTM

The OTM uses zero AI. It is a deterministic state machine implemented in TypeScript. No LLM calls, no embeddings, no inference. Every decision is rule-based:

Routing: field comparison (OTM-0)
Agent availability: SQLite lookup (OTM-1)
State transitions: precondition checks + status writes (OTM-2 through OTM-5)
Error detection: SQL queries against known patterns (OTM-7)
Watchdog: timer + threshold checks (OTM-6)
File pipeline: filesystem watches + Slack API calls (Systems 1–4)

Token consumption by OTM: 0 tokens.

11.3 AI Usage by Actors (Outside OTM)

The actors that interact with the OTM do consume AI tokens, but this is outside the OTM's scope:

Actor	AI usage	OTM interaction cost
Orchestrator (Claudia)	LLM calls for task planning, review, rework design	OTM API calls = HTTP requests, ~0 tokens
Agents (Devdas, etc.)	LLM calls for task execution	OTM API calls (IE-02, IE-03) = HTTP requests, ~0 tokens
Human (Rupert)	None (uses Slack UI)	Slack events = Slack infrastructure, ~0 tokens

📌 The OTM API calls (IE-02 through IE-07) are simple HTTP POST requests with JSON payloads. They consume zero AI tokens. The only AI costs are generated by the agents and orchestrator doing their actual work — which they would do regardless of whether the OTM exists.

11.4 Cost Summary

OTM operation cost:     $0/month (zero AI, zero external services)
Slack API cost:         $0/month (included in existing Pro plan)
Infrastructure cost:    $0/month (runs on existing OpenClaw server + Synology NAS)
──────────────────────────────────────────────────────
Total incremental cost: $0/month

12. Project Deliverables

#	Deliverable	Owner	Description
D-01	OTM-SPEC (this document)	Claudia	Specification and architecture
D-02	OTM-TESTS	Claudia	Test scenarios document
D-03	OTM implementation	Devdas	TypeScript plugin pipeline (OTM-0 through OTM-7, SE-1, SW-1)
D-04	SQLite schemas + migrations	Devdas	`otm.db` and `otm-errors.db` setup
D-05	Unit + integration tests	Devdas	Vitest test suite matching OTM-TESTS scenarios
D-06	Infrastructure setup	Sylvain	DB paths, cron config, backup setup, log rotation
D-07	`task-orchestration` skill	Claudia	OpenClaw skill for Claudia's Orchestrator role: task creation, project → step → task decomposition, assignment logic, validation/rejection, rework subtask design. This skill encodes the Orchestrator's side of the OTM protocol.
D-08	Slack app config	Salvatore	`lists:read` + `lists:write` scopes, socket mode setup
D-09	End-to-end validation	Claudia + Devdas	Full test suite execution on real Slack workspace
D-10	File pipeline scripts	Devdas	`otm-create-task.sh`, `otm-update-task.sh`, `otm-injector.js`, `otm-dispatcher.js`, `otm-completion-detector.js`
D-11	DMZ relay deployment	Sylvain	`receiver.js` + `broadcaster.js` on Synology NAS, TLS proxy setup
D-12	launchd plists	Sylvain	Watcher, sweeper, dispatcher, completion detector plists

📌 D-07 (task-orchestration skill) will include Rupert's higher-level instructions on how to break down projects into steps and steps into tasks. It is part of the scope of the full-blown validation tests (D-09).

Open Questions

#	Question	Status
1	Exact `list_item_updated` event payload schema	Needs Salvatore to capture sample events
2	Can socket mode receive list events on Pro?	Needs verification (may need Events API HTTP mode)
3	Plugin pipeline registration mechanism in OpenClaw	Needs Devdas to investigate
4	SQLite file location on OpenClaw server	Sylvain to decide
5	How agents tick subtasks in practice	✅ Resolved in v1.3: Agents call OTM API (IE-02), OTM updates Slack via SW-1. No direct Slack UI interaction.
6	Slack conversation API for List items — does it exist?	Needs Salvatore to verify (may need workaround)
7	SURE ack timeout values and gateway restart handling	✅ Resolved in v1.5: Timeouts revised to 1min + 2min + error (3min total). Gateway restart handling defined in §4.7: OTM reconciles from Slack List + openclaw.json on startup, detects SURE timeouts, auto-corrects agent-task mismatches. Gateway restart logged as system event (IE-SYS-01). Orchestrator does not need to re-register agents.
8	OpenClaw agent → Slack user ID mapping in openclaw.json	Needs Sylvain to confirm config structure
9	Human user registration protocol	Deferred. v1 hard-codes Rupert + Claudia (§4.6). Future: how are new human users registered? Auto-detect from Slack `assigned_to`? Manual admin command? Re-registration after OTM restart? What about clients?
10	Slack archive API — can items be archived programmatically?	✅ Resolved in v1.6: `slackLists.items.archive` does not exist. Archiving is manual-only via Slack UI. OTM sets `status = archived` but cannot trigger visual Slack archival.
11	Gateway `idempotencyKey` — does it prevent duplicate sessions?	✅ Resolved in v1.8: No. The key is used as a `runId` label only. Duplicate calls with the same key = duplicate sessions. Dispatcher must use the Slack `new → assigned` status flip as the sole dedup mechanism.

Deprecated Items

Item	ID/Flag	Notes
Project (old select column)	`Col0AL4UJ8BJ8`	Replaced by Project 2 text column (`Col0ALZBS9C8Z`) — 2026-03-14
Types: implementation, research, etc.	—	Removed — only `action` / `decision` / `review` are active
`--description` parameter	—	Removed from `otm-create-task.sh` — use `--subtask` instead
`--creator` parameter	—	Removed — merged with `--agent`
`--assignedTo` parameter	—	Removed — merged with `--agent`
TT-09 (Rejected → New)	—	Removed in v1.3 — use TT-10 (cancel) + new task instead

End of Specification — v1.8 — OpenClaw Task Manager

Last updated: 2026-03-14

RupertBarrow · 2026-03-12T12:47:55Z

3.2 Task fields
might be interesting to have a last status field heading to the User but useful when the OTM fails and changes the status to fail.
For failure analysis, we need to know what status the task was in before that

Throughout the document when you say “Claudia” you should actually be saying “the orchestrator”

Transition T-12 : watchdog tells OTM to change the state

3.3 : ALL transitions requested by Claudia/the orchestrated should go through the OTM. The OTM state transition management system is the only one near the system which is ability ability to change task status And related fields.

We also Need to state machine and state transition table for the agent which can be idle busy et cetera.
So we have two State diagrams one task and one for agent. We have two sets of transition actions, and two sets of triggering events.

Rename and recur all of these with significant prefixes update to 1.2 version of this document and come back for me to review.

RupertBarrow · 2026-03-12T14:01:50Z

How does the OTM notify an agent to work on a task ?
We need a SURE system which confirms that the agent received the request, and expects an acknowledgement by the agent.
The 2 messages need to be logged by the OTM in the conversation feed of the Task in Slack, with date+time of the request and of the acknowledgement.

All state changed by the OTM will also be date+timed and traced in the same conversation feed.
All requests received by the OM from the Orchestrator will be date+timed and traced in the same conversation feed.
All changes by a human, detected by the OTM, will be date+timed and traced in the same conversation feed.

§3.2.1, OTM-initiated TT-02 : "Agent registry shows idle" is not a trigger, it is a state. Name precisely (with its code) the triggering event. Ditto for "Agent registry shows busy" in TT-03. TT-13 : how is this "Unrecoverable agent error" detected ? Name precisely (with its code) the triggering event.
TT-12 : ditto, name precisely (with its code) the triggering event "7 days elapsed".

§3.3 : split the last column into 2 : first "Inbound event" then "OTM action". Would it simplify things to add an "Outbound event" to this column ? (ditto §4.3)

§4.5 : name these 3 attributes more explicitely :
agent_id TEXT PRIMARY KEY, --> slack_user_id
agent_name TEXT NOT NULL, --> otm_display_name
openclaw_agent_id TEXT, --> openclaw_agent_id

Presumably, the slack_user_id of OpenClaw agents are already defined in the openclaw.json configuration : can we make this dynamically queried from OpenClaw ? Do this in "startup reconciliation".

Does the OTM only manage automated agents ? Or also human agents (Rupert, client ?) Are humans only identified by their Slack id in our system ?

SE-1 actions "Routes to OTM-2 (TT-01), OTM-3 (TT-04), or OTM-5 (TT-11) based on field change" : for maintainance simplicity, the list_item_updated event should be routed to 1 single entry point of the OTM, which will then do the routing. We want 0 intelligence in the Slack event listener. Move "2. Determines the event type" into the OTM itself.

SE-1 "Required Slack App Scopes" : separate concerns. The event listener in the Slack app only requires the lists:read Slack scope.

OTM-1 "triggering event" : codify each of the events. "actions" : codify each action. Lay out each item on a new line in the Detail column to make it more readable.
OTM-2, OTM-3, OTM-4, OTM-5, OTM-6 : ditto
OTM-3 : why do we need a "processed_events" table ? How will this grow ? How will it be maintained, cleaned up, etc ? "event_id" : events are numbered ?
Aren't traces on Slack tasks' conversations (human-readable) enough ?

Flow 2a :

"Agent ticks subtask checkbox in Slack" : I think the agent should talk directly to the OTM : the OTM can then handle updating the task and the rest of the flow. Same in 2b.
OTM-3 : "writes to Slack" : how ? This should mention the "write" part of the Slack app, which does not appear as part of the system yet.

Flow 3a : "Orchestrator reviews task → POST /api/task-validate { outcome: "validated" }"
The orchestrator should tell the OTM to validate the task. Ditto in 3b.

Flow 3b : we need to rework this : the orchestrator is going to ask the OTM to reject the task, and ask him to notify to the agent that it needs reworking. This supposes that the orchestrator :

does not touch completed subtasks of the task
removes other subtasks and replaces them with a new list of subtasks
passes a message to understand the reason why a rework is requested.
I suggest that the first subtask re-added by the orchestrator to the task for rework be labeled something like "acknowledge rework request : (+ detailed explanation)". The agent receiving this rework task should take the first subtask and close it to acknowledge.
I'm not sure we need a "rework_action"; the OTM doesn't have to notify the agent of "rework needed".

We also need to have an error control mechanism to detect inconsistencies like :

an task "in progress" or "rework" have not been updated in more than 10 minutes
there are "in progress" tasks assigned to an agent who is "idle"
etc.
List all possible situations and design the error monitor to detect these, trace errors for lessons learned and quality improvements, and reassigning tasks, unblocking agents or channel communications etc.

Also plan a tracking/event logging system (such as Sentry or internal) to trace all events and actions of the OTM.

Flow 4 : "7 days later". Already mentioned, codify this event

Technology Stack
Before this chapter, summarise the features, tables, and users of the SQLite database. Explain how it is maintained, flushed, historicised.
Confirm "who" (which part of the system) uses each of these components.
Estimate activity, frequency, data volumes, etc. for each component.

Tests (in another doc) : write up (if not already done) test scenarios on a mock project in the real system.

Open Question 5 "How agents tick subtasks in practice (API call? Slack UI?)" : answer = via the OTM.

RupertBarrow/OTM-SPEC-v1.5.md

OTM-SPEC v1.8 — OpenClaw Task Manager Specification

Changelog

1. Purpose & Scope

1.1 Actors

1.2 System Architecture Overview

2. Task Data Model

2.1 Task Item Fields

2.2 Subtask Item Fields

todo_completed (Col00) Checkbox

2.3 JSON → Slack Field Mapping

3. Task State Machine

3.1 Task States

3.2 Task State Transition Diagram

3.2.1 Task Transition Action Index

3.3 Task Transition Rules

3.4 Priority Scale

4. Agent State Machine

4.1 Agent States

4.2 Agent State Transition Diagram

4.3 Agent Transition Action Index

4.4 Agent Transition Rules

4.5 Agent Registry Schema

4.6 Startup Reconciliation (Dynamic Registry)

4.7 Gateway Restart Detection & OTM Resync

5. Cross-Reference Index

5.1 Task Transition → Agent Transition Mapping

5.2 Transition → Handler Mapping

5.3 Inbound Event Index

5.4 Outbound Event Index

5.5 All Code Summaries

6. SURE Protocol (Send-Understand-Report-Execute)

6.1 Flow

6.2 Timeout

6.3 SURE applies to

6.4 Agent acknowledgement API

7. Audit Trail

7.1 What is logged

7.2 Implementation

PART 1 — SLACK INTEGRATION LAYER

SE-1: Slack Event Listener

SW-1: Slack Writer

PART 2 — OPENCLAW TASK MANAGER (OTM)

OTM-0: Event Router

OTM-1: Agent Registry

OTM-2: Handle Task Assigned

OTM-3: Handle Subtask Done

OTM-4: Task Validate / Reject (Orchestrator API)

OTM-5: Handle Task Cancelled

OTM-6: Watchdog

End-to-End Flow Diagrams

Flow 1a — Task Assigned, Agent Idle

Flow 1b — Task Assigned, Agent Busy

Flow 2a — Subtask Done (not last)

Flow 2b — Last Subtask (task complete)

Flow 3a — Orchestrator Validates

Flow 3b — Orchestrator Rejects and Requests Rework

Flow 4 — Cancellation

Flow 5 — Auto-Archival (TT-12)

PART 3 — FILE-BASED PIPELINE

System 1: Task Creation (otm-create-task.sh)

Script Parameters

Examples

JSON File Format (intermediate)

Task ID System

Type Values

System 2: Task Injection (otm-injector.js)

Triggers

Idempotency (Dedup Check)

Directories

System 3: Task Dispatcher (otm-dispatcher.js)

What it does

Agent Triggering via Gateway WebSocket RPC

Dispatch Operation Order (crash safety)

Dispatch File Format (task-dispatch.json)

Agent-Side Convention

Agent Workspace Mapping

Component

System 3b: Task Updates (otm-update-task.sh)

Script: otm-update-task.sh

`todo_completed` (`Col00`) Checkbox

System 1: Task Creation (`otm-create-task.sh`)

System 2: Task Injection (`otm-injector.js`)

System 3: Task Dispatcher (`otm-dispatcher.js`)

Dispatch File Format (`task-dispatch.json`)

System 3b: Task Updates (`otm-update-task.sh`)

Script: `otm-update-task.sh`

System 4: Completion Detection (`otm-completion-detector.js`)

What happens after `agent_done`