Skip to content

Instantly share code, notes, and snippets.

@RupertBarrow
Last active March 14, 2026 16:35
Show Gist options
  • Select an option

  • Save RupertBarrow/22732713a14837fa2362df1d7f64d04a to your computer and use it in GitHub Desktop.

Select an option

Save RupertBarrow/22732713a14837fa2362df1d7f64d04a to your computer and use it in GitHub Desktop.
OTM-SPEC v1.8 — OpenClaw Task Manager Specification (merged: v1.5 spec + v1.8 field mapping)

OTM-SPEC v1.8 — OpenClaw Task Manager Specification

Based on: Rupert's OTM Spec v1.0 (2026-03-12) Updated by: Claudia (2026-03-12–14) Merged by: Claudia (2026-03-14) — consolidates OTM-SPEC v1.5 + OTM-FIELD-MAPPING v1.8


Changelog

Note: v1.0–v1.5 changes were tracked in OTM-SPEC. v1.0–v1.8 field mapping changes were tracked in a separate OTM-FIELD-MAPPING document. Both changelogs are merged here.

Version Date Source Changes
v1.0 2026-03-12 OTM-SPEC Initial specification: state machine, SURE protocol, SE-1/SW-1, OTM-0 through OTM-6, actor registry, audit trail
v1.0 2026-03-14 OTM-FIELD-MAPPING Initial field mapping: script params, JSON format, Slack field mapping
v1.1 2026-03-12 OTM-SPEC Added rework flow (TT-08), Rejected state, Orchestrator manages subtasks
v1.1 2026-03-14 OTM-FIELD-MAPPING Added subtask support, task ID system (T-NNNNN)
v1.2 2026-03-12 OTM-SPEC Added Watchdog (OTM-6), archival (TT-12), Pending state (TT-03/TT-05)
v1.2 2026-03-14 OTM-FIELD-MAPPING Added pipeline flow diagram, directories
v1.3 2026-03-12 OTM-SPEC Removed TT-09, added Priority Scale, human actor type. Resolved Q5: agents call OTM API for subtask completion (not Slack directly).
v1.3 2026-03-14 OTM-FIELD-MAPPING Added completion detection (System 4)
v1.4 2026-03-12 OTM-SPEC Added Error Monitor (OTM-7), error catalogue (ERR-01 through ERR-12), dual-DB design
v1.4 2026-03-14 OTM-FIELD-MAPPING Added component heartbeats (System 5)
v1.5 2026-03-12 OTM-SPEC Human notifications via Slack task conversation (not DM). Error reports daily, threshold=1. Error monitoring in separate otm-errors.db. Only Orchestrator creates/deletes tasks. subtasks_remaining = count of unfinished subtasks. Added §12 Cost Analysis. SURE timeouts: 1+2min. Gateway restart detection (§4.7). Log file mirroring. task-orchestration skill added as deliverable.
v1.5 2026-03-14 OTM-FIELD-MAPPING Default confirmation subtask; deprecated fields cleanup
v1.6 2026-03-14 OTM-FIELD-MAPPING Added DMZ relay architecture (System 6), todo_completed (Col00) documentation, project label fixes
v1.7 2026-03-14 OTM-FIELD-MAPPING Added task dispatcher (System 3), task updates (System 3b), full architecture diagram, completion metrics, status lifecycle
v1.8 2026-03-14 OTM-FIELD-MAPPING Dispatcher triggers agents via Gateway WebSocket RPC (parallel, non-blocking); crash-safe operation order (file → Slack → RPC); idempotencyKey is NOT idempotent (design flaw documented); todo_completed checkbox (Col00) documentation; injector idempotency (dedup check); Slack archive API limitation noted
v1.8 2026-03-14 MERGED Consolidated OTM-SPEC v1.5 + OTM-FIELD-MAPPING v1.8 into single unified specification. Task ID format standardized to T-NNNNN (v1.8 implementation). Added Part 3 (File-Based Pipeline). All Slack column IDs included.

1. Purpose & Scope

The OpenClaw Task Manager (OTM) orchestrates the execution of tasks by AI and human agents. It uses Slack Lists as the task board, the Slack Events API as the event bus for human-originated changes, and the OTM API as the interface for AI actors.

The system is split into three cooperating layers:

  1. Slack Event Layer (SE-1) — Slack Events API (list_item_updated) detected by our Slack app in socket mode. Zero intelligence. Forwards raw events to a single OTM entry point. Requires only lists:read scope.
  2. Slack Write Layer (SW-1) — Handles all writes to Slack: task field updates and conversation feed audit entries. Requires lists:write scope. Called only by the OTM.
  3. OpenClaw Task Manager (OTM) — Authoritative backend implemented as an OpenClaw plugin pipeline. Owns all state, routing logic, agent registry, and business rules. Persists to SQLite. Sole component with the ability to change task status, agent status, and counter fields.

📌 Design principles:

  • ALL behaviour lives in the OTM. No business logic in SE-1 or SW-1.
  • The OTM is the sole writer of task status and related fields. No actor writes status directly.
  • Only the Orchestrator creates tasks and subtasks. The OTM never creates or deletes tasks — it manages their lifecycle after creation. The Orchestrator also manages subtask lists (creates, deletes, keeps completed) during rework flows; the OTM processes the resulting state changes.
  • ALL events are logged in the Slack task conversation feed with timestamps (§7).
  • Agent notifications use the SURE protocol: request + mandatory acknowledgement (§6).

1.1 Actors

Actor Current holder Type Role
Orchestrator Claudia AI Creates tasks, sets assignments, validates completed work — always via OTM API
Agent Devdas, Salvatore, etc. AI Executes tasks, reports progress directly to OTM API
Human Rupert, clients Human Creates/edits tasks in Slack UI; changes detected by SE-1 and forwarded to OTM
OTM (system) System Authoritative state machine, sole writer of all status and counter fields
Watchdog OTM-6 cron System Recovery cron — detects anomalies, requests OTM to execute corrective transitions

📌 Humans are identified by their Slack user ID. AI agents are identified by both their Slack user ID and their OpenClaw agent ID. Both types are managed in the same agent registry (§4.5).


1.2 System Architecture Overview

The OTM is composed of six cooperating systems, from task creation to dashboard visibility:

┌─────────────────────────────────────────────────────────────────────────────────┐
│                         OTM — One-Time Mission System                          │
│                                                                                │
│  ┌──────────────────────────────────────────────────────────────────────────┐   │
│  │ SYSTEM 1: Task Creation                                                  │   │
│  │                                                                          │   │
│  │  Claudia (orchestrator)                                                  │   │
│  │       │                                                                  │   │
│  │       ▼                                                                  │   │
│  │  otm-create-task.sh ──► JSON file ──► ~/…/otm/new-tasks/                │   │
│  │       │                                                                  │   │
│  │       ▼                                                                  │   │
│  │  next-task-id.json (T-NNNNN counter, flock)                             │   │
│  └──────────────────────────────────────────────────────────────────────────┘   │
│       │                                                                        │
│       ▼                                                                        │
│  ┌──────────────────────────────────────────────────────────────────────────┐   │
│  │ SYSTEM 2: Task Injection                                                 │   │
│  │                                                                          │   │
│  │  ai.openclaw.otm-watcher (WatchPaths)                                   │   │
│  │  ai.openclaw.otm-sweeper (every 10 min)                                 │   │
│  │       │                                                                  │   │
│  │       ▼                                                                  │   │
│  │  otm-injector.js ──► Slack Lists API ──► Rapido Task Campaign           │   │
│  │       │                (slackLists.items.create + subtasks)              │   │
│  │       ▼                                                                  │   │
│  │  processed/ or failed/                                                   │   │
│  └──────────────────────────────────────────────────────────────────────────┘   │
│       │                                                                        │
│       ▼  (task exists in Slack with status=new)                                │
│  ┌──────────────────────────────────────────────────────────────────────────┐   │
│  │ SYSTEM 3: Task Dispatcher                                                │   │
│  │                                                                          │   │
│  │  ai.openclaw.otm-dispatcher (every 2 min)                               │   │
│  │       │                                                                  │   │
│  │       ├──► Scan: status=new + assignee set                              │   │
│  │       ├──► Write task-dispatch.json to agent workspace                  │   │
│  │       ├──► Trigger agent via Gateway WS RPC (parallel)                  │   │
│  │       └──► Update Slack: new → assigned + set assigned_at               │   │
│  └──────────────────────────────────────────────────────────────────────────┘   │
│       │                                                                        │
│       ▼  (agent session starts, picks up task, works, reports progress)        │
│  ┌──────────────────────────────────────────────────────────────────────────┐   │
│  │ SYSTEM 3b: Task Updates (agent → Slack feedback)                         │   │
│  │                                                                          │   │
│  │  otm-update-task.sh ──► JSON ──► ~/…/otm/task-updates/                  │   │
│  │       │                                                                  │   │
│  │       ▼                                                                  │   │
│  │  otm-injector.js ──► Slack Lists API (update status, subtask done)      │   │
│  └──────────────────────────────────────────────────────────────────────────┘   │
│       │                                                                        │
│       ▼  (all subtasks done → auto-promote)                                    │
│  ┌──────────────────────────────────────────────────────────────────────────┐   │
│  │ SYSTEM 4: Completion Detection                                           │   │
│  │                                                                          │   │
│  │  ai.openclaw.otm-completion-detector (every 2 min)                      │   │
│  │       │                                                                  │   │
│  │       ├──► Scan: status=in_progress + all subtasks done                 │   │
│  │       ├──► Update: completion % + subtasks_remaining                    │   │
│  │       └──► Promote: in_progress → agent_done                           │   │
│  └──────────────────────────────────────────────────────────────────────────┘   │
│       │                                                                        │
│       ▼  (Claudia validates → done)                                            │
│  ┌──────────────────────────────────────────────────────────────────────────┐   │
│  │ SYSTEM 5: Component Heartbeats                                           │   │
│  │                                                                          │   │
│  │  Each component writes *-state.json after every run                     │   │
│  │       │                                                                  │   │
│  │       ▼                                                                  │   │
│  │  Collector (FSEvents) → SQLite → Reader (WebSocket) → Dashboard         │   │
│  └──────────────────────────────────────────────────────────────────────────┘   │
│                                                                                │
│  ┌──────────────────────────────────────────────────────────────────────────┐   │
│  │ SYSTEM 6: DMZ Relay                                                      │   │
│  │                                                                          │   │
│  │  Collector ──HTTP POST──► Synology Receiver (127.0.0.1:3456)            │   │
│  │                                 │                                        │   │
│  │                                 ▼                                        │   │
│  │                           fab-state.json                                 │   │
│  │                                 │                                        │   │
│  │                                 ▼                                        │   │
│  │                      Broadcaster (0.0.0.0:3457)                         │   │
│  │                           │            │                                 │   │
│  │                    WSS /ws        GET /api/state                         │   │
│  │                           │            │                                 │   │
│  │                           ▼            ▼                                 │   │
│  │                    Vercel Dashboard (browser)                            │   │
│  └──────────────────────────────────────────────────────────────────────────┘   │
│                                                                                │
│  ┌──────────────────────────────────────────────────────────────────────────┐   │
│  │ Task Lifecycle                                                           │   │
│  │                                                                          │   │
│  │  new ──► assigned ──► in_progress ──► agent_done ──► done               │   │
│  │   │         │              │              │                              │   │
│  │   │  (dispatcher)  (agent starts)  (completion     (Claudia            │   │
│  │   │                                  detector)       validates)          │   │
│  │   │                                                                      │   │
│  │   └──► blocked (can happen at any stage)                                │   │
│  └──────────────────────────────────────────────────────────────────────────┘   │
└─────────────────────────────────────────────────────────────────────────────────┘

2. Task Data Model

Each task is a top-level item in the Slack Task Board List. Subtasks are child items (linked via parent_item_id). The OTM is the sole writer of status and counter fields.

Slack List: F0ALE8DCW1F (Rapido Task Campaign v2), workspace: rapidocloud.slack.com

2.1 Task Item Fields

Field Type Writer Description
title text Orchestrator Task name
task_id text OTM Unique ID (T-NNNNN format — 5 digits, zero-padded)
assigned_to person Orchestrator Slack user ID of assigned agent (AI or Human)
status select OTM Current task state (see §3). OTM is sole writer — no exceptions
previous_status select OTM Status before the last transition. Critical for failure analysis
priority number Orchestrator 0=Critical … 4=Batchable (see §3.4)
context select Orchestrator project, research, operations, support, internal
subtasks_remaining number OTM Decremented counter, NOT live count
assigned_at datetime OTM When agent started work
completed_at datetime OTM When all subtasks done
validated_at datetime OTM When Orchestrator validated
result_summary text Agent Deliverables/output description
input_files text Orchestrator Links to input resources

2.2 Subtask Item Fields

Field Type Writer Description
title text Orchestrator Subtask description
todo_completed checkbox OTM (via SW-1) Built-in Slack Lists checkbox (Col00). Ticked when subtask is done. Must be set alongside status when marking items as done (see below).
parent_item_id reference System Links to parent task

📌 subtasks_remaining on the parent task is the canonical completion signal — not a live count.

📌 previous_status is set by the OTM on every transition. It enables post-mortem analysis when a task enters Failed state.

📌 Agents do NOT tick checkboxes directly in Slack. They report subtask completion to the OTM API (IE-02). The OTM then updates Slack via SW-1 — setting both the item status column and Col00 (checkbox).

todo_completed (Col00) Checkbox

The todo_completed field (Col00) is the built-in Slack Lists checkbox. It drives the visual checkmark ✅ in the Slack UI. Setting only the Status column to done does NOT check the box — both must be set explicitly.

Scenario Action
Subtask marked done Set Col00: checkbox: true on the subtask
Parent task marked done Set Col00: checkbox: true on the parent
Parent task at agent_done or in_progress Do NOT set checkbox (task isn't finished yet)

⚠️ Archive limitation: Slack Lists has a UI "Archive item" action, but slackLists.items.archive does not exist as an API method. Archiving is manual-only via the Slack UI. The OTM sets status to archived via the pipeline, but the actual Slack archive action cannot be automated.

2.3 JSON → Slack Field Mapping

This table is operationally critical — it maps the JSON task format (used in the file pipeline) to Slack column IDs.

JSON Field Slack Column Column ID Slack Type Notes
title Title Col0AKKTBJJKZ rich_text Clean one-liner only
taskId Task ID Col0ALVK2NA1E rich_text Format: T-NNNNN (5 digits, zero-padded)
type Type Col0AKUV4BF6F select action | decision | review
agent Assignee Col0AKZ9G5UAJ select Covers both agents and humans
project Project 2 Col0ALZBS9C8Z rich_text Free-text (migrated from select 2026-03-14)
priority Priority Col0ALE8DKWPK select See §3.4 priority mapping
(auto) Status Col0AL1B4UVLJ select Always set to new on creation
subtasks[] (child items) parent_item_id Each entry → child item with title + status new

Slack Built-in Fields:

Field Column ID Type Notes
todo_completed Col00 checkbox Built-in Slack Lists checkbox. Must be set explicitly alongside status.

Fields NOT mapped to Slack columns (metadata only):

JSON Field Purpose
id UUID for file tracking / idempotency
createdAt Timestamp, implicit in Slack item creation
status Internal pipeline status (pending → processed)

Deprecated Slack Columns:

Item Column ID Notes
Project (old select column) Col0AL4UJ8BJ8 Replaced by Project 2 text column (2026-03-14)

3. Task State Machine

The status field follows these transitions. The OTM is the sole writer — all transitions are executed by the OTM, regardless of which actor requested them.

3.1 Task States

Status Description
New Task created, not yet assigned
Assigned Orchestrator has set an assignee; OTM evaluating agent availability
Pending Agent is busy; task queued silently (no notification)
In Progress Agent is actively working (SURE acknowledgement received)
Agent Done All subtasks complete; awaiting Orchestrator review
Done Orchestrator validated the work
Rejected Orchestrator rejected; Orchestrator preparing rework subtasks
Failed Unrecoverable error during execution
Cancelled Task no longer needed; removed from active work
Archived Terminal state; auto-moved 7 days after Done/Cancelled

3.2 Task State Transition Diagram

                    Orchestrator creates task
                              |
                           [New]
                              |
              TT-01: Orch requests assignment → OTM executes
                              |
                         [Assigned]
                        /          \
                 TT-02: OTM      TT-03: OTM
              (IE-01 + agent   (IE-01 + agent
               idle in reg)    busy in reg)
                      |              |
               [In Progress]    [Pending]
               (after SURE ack)      |
                      |         TT-05: OTM promotes
         TT-04: OTM receives    (IE-09 + pending
         IE-02 subtask reports    task found)
                      |              |
           subtasks_remaining=0      |
                      |              |
                [Agent Done] <------/
                   /    \
          TT-06: Orch  TT-07: Orch
          validates     rejects
          (IE-04)       (IE-05)
                |           |
             [Done]    [Rejected]
               |        /      \
         TT-12: OTM  TT-08    TT-10
         (IE-08+7d)  (IE-06)  (IE-06)
               |       |        |
          [Archived] [Assigned] [Cancelled]
                     (rework)    (drop)

     At any point before Agent Done:
         TT-11: Orch/Human requests cancel (IE-06/IE-01) → OTM executes
         TT-12: Watchdog requests archive (IE-08 + 7d check) → OTM executes
         TT-13: OTM detects error (IE-09) → [Failed]
         TT-14: Orch requests retry (IE-07) → [New]
         TT-15: Orch/Human requests cancel (IE-06/IE-01) → [Cancelled] (from Failed)

3.2.1 Task Transition Action Index

Each task transition is coded TT-xx. All transitions are executed by the OTM. The "Requesting Actor" is who initiates; the OTM validates and applies.

Orchestrator-requested transitions (executed by OTM):

Code From → To Requesting Actor Inbound Event OTM Action Outbound Event
TT-01 New → Assigned Orchestrator IE-01: assigned_to field changed Validate assignment, set status OE-06, OE-07
TT-06 Agent Done → Done Orchestrator IE-04: validate API call Set validated_at, change status OE-06, OE-07, OE-02
TT-07 Agent Done → Rejected Orchestrator IE-05: reject API call Change status, post reason OE-06, OE-07
TT-08 Rejected → Assigned Orchestrator IE-06: rework API call (subtasks already prepared by Orchestrator) Count unfinished subtasks, set counter, change status OE-06, OE-07, OE-01
TT-10 Rejected → Cancelled Orchestrator/Human IE-06: cancel API / IE-01: Human Slack edit Change status OE-06, OE-07
TT-11 Any (pre-Agent Done) → Cancelled Orchestrator/Human IE-06: cancel API / IE-01: Human Slack edit Free agent, change status OE-06, OE-07, OE-04
TT-14 Failed → New Orchestrator IE-07: retry API call Reset task, change status OE-06, OE-07
TT-15 Failed → Cancelled Orchestrator/Human IE-06: cancel API / IE-01: Human Slack edit Change status OE-06, OE-07

OTM-initiated transitions (automated):

Code From → To Inbound Event (trigger) OTM Action Outbound Event
TT-02 Assigned → In Progress IE-01: status changed to Assigned (OTM-0 routes to OTM-2, agent registry query returns Idle) Execute AT-01, write counter, send SURE notification OE-01, OE-06, OE-07
TT-03 Assigned → Pending IE-01: status changed to Assigned (OTM-0 routes to OTM-2, agent registry query returns Busy) Queue task silently OE-06, OE-07
TT-04 In Progress → Agent Done IE-02: agent reports subtask done (OTM-3 decrements counter to 0) Execute AT-02, set completed_at OE-02, OE-06, OE-07
TT-05 Pending → Assigned IE-09: OTM internal — check_next_task_for_agent() finds pending task after AT-02/AT-03/AT-04 Re-evaluate via OTM-2 path OE-06, OE-07
TT-13 In Progress → Failed IE-10: OTM detects agent error (OpenClaw hook timeout >5min / agent crash / unhandled exception reported) Store previous_status, execute AT-04 OE-05, OE-06, OE-07

Watchdog-requested transitions (executed by OTM):

Code From → To Inbound Event (trigger) OTM Action Outbound Event
TT-12 Done/Cancelled → Archived IE-08: watchdog cron tick (OTM-6 checks completed_at or cancelled_at + 7 days < now) Archive task OE-06, OE-07

📌 Cross-reference: See §5 for TT-xx ↔ AT-xx ↔ handler mapping. See §6 for SURE protocol. See §7 for audit trail.

3.3 Task Transition Rules

All transitions are submitted to the OTM which validates preconditions and executes the state change. Every transition produces at minimum OE-06 (Slack field update) and OE-07 (audit log entry).

Code From To Inbound Event OTM Action Outbound Event
TT-01 New Assigned IE-01: assigned_to changed on New task Validate agent exists in registry, set status OE-06, OE-07
TT-02 Assigned In Progress IE-01: status=Assigned detected + agent Idle in registry Set agent busy (AT-01), init counter, send SURE task notification OE-01, OE-06, OE-07
TT-03 Assigned Pending IE-01: status=Assigned detected + agent Busy in registry Queue task, no notification OE-06, OE-07
TT-04 In Progress Agent Done IE-02: subtask completion report + counter decrements to 0 Set agent idle (AT-02), set completed_at, notify Orchestrator OE-02, OE-06, OE-07
TT-05 Pending Assigned IE-09: internal check_next_task_for_agent() + pending task found Re-route to OTM-2 (same as TT-01 path) OE-06, OE-07
TT-06 Agent Done Done IE-04: Orchestrator validate call Set validated_at, change status OE-02, OE-06, OE-07
TT-07 Agent Done Rejected IE-05: Orchestrator reject call with reason Change status, log reason OE-06, OE-07
TT-08 Rejected Assigned IE-06: Orchestrator rework call (subtasks already prepared) Count unfinished subtasks, set counter, change status OE-06, OE-07
TT-10 Rejected Cancelled IE-06: Orchestrator cancel call / IE-01: Human status edit Change status OE-06, OE-07
TT-11 New/Assigned/Pending/In Progress Cancelled IE-06: Orchestrator cancel call / IE-01: Human status edit Free agent if applicable (AT-03), change status OE-04, OE-06, OE-07
TT-12 Done/Cancelled Archived IE-08: watchdog tick + 7-day check passes Archive task OE-06, OE-07
TT-13 In Progress Failed IE-10: agent error detected (hook timeout/crash/exception) Store previous_status, free agent (AT-04) OE-05, OE-06, OE-07
TT-14 Failed New IE-07: Orchestrator retry call Reset task fields, change status OE-06, OE-07
TT-15 Failed Cancelled IE-06: Orchestrator cancel call / IE-01: Human status edit Change status OE-06, OE-07

📌 TT-09 (Rejected → New) removed in v1.3. Reassignment is handled by the Orchestrator cancelling the rejected task (TT-10) and creating a new task for a different agent. This simplifies the state machine.

3.4 Priority Scale

Value Label Meaning --priority flag
0 Critical Blocking other work, immediate attention critical
1 High Important, do next high
2 Medium/Normal Normal priority normal (or medium as alias)
3 Low When bandwidth allows low
4 Batchable Large/expensive work, can run async via Batch API batchable

Queue ordering: priority ASC, posted_at ASC (0 = highest priority, FIFO within same priority).


4. Agent State Machine

The OTM maintains agent availability state in the Agent Registry (OTM-1). Agent transitions are coded AT-xx and are distinct from task transitions (TT-xx).

4.1 Agent States

Status Description
Idle Agent is available, not working on any task
Busy Agent is actively working on a task (current_task is set)

4.2 Agent State Transition Diagram

              [Idle]
                |
         AT-01: OTM assigns task
         (triggered by TT-02)
         (SURE notification sent → OE-01)
                |
             [Busy]
                |
         AT-02: task completes (TT-04, IE-02 counter=0)
         AT-03: task cancelled (TT-11, IE-01/IE-06)
         AT-04: task fails (TT-13, IE-10)
                |
             [Idle]
                |
         → OTM calls check_next_task_for_agent() (IE-09)
         → if pending task found: TT-05 → AT-01 again

4.3 Agent Transition Action Index

All agent transitions are executed by the OTM. No actor changes agent status directly.

Code From → To Inbound Event (trigger) OTM Action Outbound Event
AT-01 Idle → Busy IE-01: Assigned event + agent Idle (during OTM-2) Set status=busy, current_task=task_id, task_started_at=now OE-01 (SURE notification), OE-07 (audit)
AT-02 Busy → Idle IE-02: subtask report + counter=0 (during OTM-3) Set status=idle, clear current_task, call check_next_task_for_agent() OE-07 (audit)
AT-03 Busy → Idle IE-01/IE-06: cancellation request (during OTM-5) Set status=idle, clear current_task, call check_next_task_for_agent() OE-04 (cancel notify), OE-07 (audit)
AT-04 Busy → Idle IE-10: agent error detected (during OTM error handler) Set status=idle, clear current_task, call check_next_task_for_agent() OE-05 (admin alert), OE-07 (audit)

4.4 Agent Transition Rules

Code From To Inbound Event OTM Action Outbound Event
AT-01 Idle Busy IE-01 (Assigned event) OTM-2 sets agent busy before sending SURE notification OE-01, OE-07
AT-02 Busy Idle IE-02 (last subtask report) OTM-3 frees agent, promotes next pending task OE-07
AT-03 Busy Idle IE-01/IE-06 (cancellation) OTM-5 frees agent if assigned to cancelled task OE-04, OE-07
AT-04 Busy Idle IE-10 (agent error) OTM error handler frees agent OE-05, OE-07

📌 Agent ↔ Task coupling: Every AT-xx is triggered by a TT-xx. See §5 for the complete bidirectional mapping.

📌 Watchdog note: OTM-6 monitors agent heartbeats (last_seen >2h) but does NOT change agent state. It alerts the admin (OE-05). Only OTM handlers modify agent status.

4.5 Agent Registry Schema

CREATE TABLE agents (
  slack_user_id TEXT PRIMARY KEY,   -- Slack user ID (e.g., "U0AKEB27HNK")
  otm_display_name TEXT NOT NULL,   -- Display name for logs/UI (e.g., "Devdas")
  openclaw_agent_id TEXT,           -- OpenClaw agent ID (e.g., "devdas"). NULL for human agents.
  agent_type TEXT NOT NULL           -- 'ai' | 'human'
      DEFAULT 'ai',
  status TEXT DEFAULT 'idle',       -- 'idle' | 'busy'  (see §4.1)
  current_task TEXT,                -- task item ID or NULL
  task_started_at INTEGER,          -- Unix timestamp or NULL
  last_seen INTEGER                 -- Unix timestamp
);

Registry operations:

  • register_agent(slack_user_id, otm_display_name, openclaw_agent_id, agent_type) — AT startup or on first activity
  • set_busy(slack_user_id, task_id) — AT-01
  • set_idle(slack_user_id) — AT-02, AT-03, AT-04
  • is_busy(slack_user_id) → boolean — checked during TT-02/TT-03
  • get_current_task(slack_user_id) → task_id | null

4.6 Startup Reconciliation (Dynamic Registry)

On OTM startup:

  1. Read OpenClaw config — Query openclaw.json agent configurations to populate AI agent entries automatically. Each configured OpenClaw agent that has a slack_user_id mapping is auto-registered with agent_type = 'ai'.
  2. Reconcile from Slack — Query Slack List for all tasks with status In Progress or Pending. Rebuild current_task and status (busy/idle) from those records.
  3. Human agentsHard-coded for v1. Rupert is pre-seeded in the agent registry at startup with agent_type = 'human', openclaw_agent_id = NULL. Claudia is pre-seeded as the Orchestrator. Dynamic human registration (auto-detect from assigned_to on first interaction) is deferred to a future version.

📌 v1 hard-coded actors:

Slack user ID Display name Type Role
U06K407LVCY Rupert human Task assignee / reviewer
U0AKEB27HNK Claudia ai Orchestrator (sole)

Future versions will define a proper human user registration and re-registration protocol (see Open Question 9).

📌 AI agents are notified via OpenClaw /hooks/agent. Human agents are notified via the Slack task conversation thread (posted by SW-1 as an OE-07 audit entry addressed to the human). The notification channel is determined by agent_type.

4.7 Gateway Restart Detection & OTM Resync

The OTM runs as an OpenClaw plugin pipeline inside the gateway process. Several restart scenarios must be handled:

Scenario A: Gateway restarts (OTM restarts with it)

  • OTM startup reconciliation (§4.6) runs automatically
  • All agent states rebuilt from Slack List + openclaw.json
  • SURE pending notifications checked: any outstanding acks >3 min old → ERR-06
  • System event logged: [timestamp] OTM SYSTEM: Gateway restart detected. Reconciliation complete.
  • Logged to both event_log (in error DB) and otm-events.log file

Scenario B: Gateway restarts but OTM was mid-processing

  • SQLite WAL mode ensures no data corruption on crash
  • On restart, OTM-7 (error monitor) runs within 60s and detects any inconsistencies:
    • ERR-02/ERR-03: Agent-task mismatches from interrupted transitions
    • ERR-08: Tasks stuck in Assigned from interrupted OTM-2
    • ERR-04: Orphaned Pending from interrupted promotions
  • All auto-correctable errors are fixed; others escalated

Scenario C: Gateway stops for extended period (>3 min)

  • SE-1 stops receiving Slack events during downtime
  • Agents cannot send IE-02/IE-03 reports (OpenClaw hooks are down)
  • On restart: reconciliation rebuilds state from Slack List (source of truth for task fields)
  • Pending SURE acks will have timed out → ERR-06 logged
  • Agents that were mid-task may have completed work but couldn't report it:
    • OTM-7 detects counter mismatches (ERR-05) on next cycle
    • Watchdog cross-checks subtask completion status in Slack vs subtasks_remaining

Orchestrator re-registration:

  • The Orchestrator does NOT need to re-register agents. The OTM rebuilds the registry from openclaw.json automatically on startup (§4.6).
  • If openclaw.json has changed (new agent added, agent removed), the reconciliation picks up the delta.

Gateway restart logging:

  • Every OTM startup logs a system event: IE-SYS-01: OTM startup with details including:
    • Agents reconciled (count + names)
    • Tasks found in active states (In Progress, Pending, Assigned)
    • SURE timeouts detected
    • Errors found and corrected during reconciliation
  • This event is logged to event_log, otm-events.log, AND posted to Slack #alerts channel (OE-05)

5. Cross-Reference Index

5.1 Task Transition → Agent Transition Mapping

Task Transition Triggers Agent Transition Handler
TT-02 (Assigned → In Progress) AT-01 (Idle → Busy) OTM-2
TT-04 (In Progress → Agent Done) AT-02 (Busy → Idle) OTM-3
TT-11 (→ Cancelled) AT-03 (Busy → Idle) OTM-5
TT-13 (In Progress → Failed) AT-04 (Busy → Idle) OTM error

5.2 Transition → Handler Mapping

Transition(s) Primary Handler Description
TT-01, TT-02, TT-03, TT-05 OTM-2 Task assignment, availability check, queue management
TT-04 OTM-3 Subtask completion, counter decrement, task completion
TT-06, TT-07, TT-08, TT-10 OTM-4 Validation, rejection, rework, cancel-after-reject
TT-11 OTM-5 Task cancellation
TT-12 OTM-6 Archival (watchdog requests, OTM executes)
TT-13, TT-14, TT-15 OTM error handler Failure detection, retry, abandon
AT-01 OTM-2 Agent set busy
AT-02 OTM-3 Agent freed on task completion
AT-03 OTM-5 Agent freed on task cancellation
AT-04 OTM error handler Agent freed on task failure

5.3 Inbound Event Index

Code Source Description Triggered by
IE-01 SE-1 Raw list_item_updated event from Slack (Human field edit) Human edits task in Slack UI
IE-02 Agent Subtask completion report via OTM API Agent calls OTM after completing subtask
IE-03 Agent SURE acknowledgement via OTM API Agent confirms receipt of task assignment
IE-04 Orchestrator Task validation request via OTM API Orchestrator reviews and approves
IE-05 Orchestrator Task rejection request via OTM API (with reason) Orchestrator reviews and rejects
IE-06 Orchestrator Task action request via OTM API (rework/cancel/retry) Orchestrator requests state change
IE-07 Orchestrator Task retry request via OTM API (from Failed) Orchestrator wants to retry failed task
IE-08 Watchdog Cron tick (every 60 seconds) Timer fires
IE-09 OTM internal check_next_task_for_agent() result Triggered after AT-02/AT-03/AT-04
IE-10 OTM internal Agent error detection (hook timeout >5min, crash, exception) OpenClaw health monitoring

5.4 Outbound Event Index

Code Target Description Via
OE-01 Agent SURE task notification (requires acknowledgement IE-03) OpenClaw hooks (AI) / Slack task conversation (Human)
OE-02 Orchestrator Task completion/validation notification OpenClaw hooks
OE-03 (reserved)
OE-04 Agent Task cancellation notification OpenClaw hooks (AI) / Slack task conversation (Human)
OE-05 Admin Alert (anomaly, error, stale task, agent down) Telegram / Slack #alerts
OE-06 Slack Task field update (status, counters, timestamps) SW-1
OE-07 Slack Audit log entry in task conversation feed SW-1

5.5 All Code Summaries

  • Task transitions (TT-xx): TT-01 through TT-15 (TT-09 removed in v1.3) — see §3.2.1 and §3.3
  • Agent transitions (AT-xx): AT-01 through AT-04 — see §4.3 and §4.4
  • Inbound events (IE-xx): IE-01 through IE-10 — see §5.3
  • Outbound events (OE-xx): OE-01 through OE-07 — see §5.4

6. SURE Protocol (Send-Understand-Report-Execute)

All task notifications to agents use the SURE protocol to guarantee delivery and acknowledgement.

6.1 Flow

OTM sends task assignment → OE-01
  |
  +-- OTM logs in task conversation (OE-07):
  |     "[2026-03-12 14:30:05] OTM → Agent(Devdas): Task assigned — <title>"
  |
  +-- Agent receives notification
  |
  +-- Agent sends acknowledgement → IE-03
  |
  +-- OTM logs in task conversation (OE-07):
        "[2026-03-12 14:30:12] Agent(Devdas) → OTM: Task acknowledged"

6.2 Timeout

  • Retry 1: If no IE-03 within 1 minute → OTM retries notification (OE-01), logs retry.
  • Retry 2: If no IE-03 within 2 more minutes (3 min total) → OTM retries again, logs retry.
  • Error: If no IE-03 after retry 2 → OTM logs ERR-06 error, alerts admin (OE-05). Task remains In Progress — admin decides.

📌 The 1+2 minute schedule is designed to allow time for a gateway restart (~2 min typical). If the gateway restarts but the OTM does not, the OTM resync procedure (§4.7) handles recovery.

6.3 SURE applies to

Event SURE required?
Task assignment (OE-01) ✅ Yes
Rework task (OE-01 after TT-08) ✅ Yes
Task cancellation (OE-04) ❌ No (fire-and-forget, agent stops)
Admin alert (OE-05) ❌ No

6.4 Agent acknowledgement API

POST /api/otm/ack
{
  "task_id": "<task item ID>",
  "agent_id": "<slack_user_id>",
  "type": "task_assigned"
}

7. Audit Trail

Every event processed by the OTM is logged in the Slack task conversation feed via SW-1. This provides a human-readable, timestamped record of all activity on each task.

7.1 What is logged

Event type Log format
State change by OTM [timestamp] OTM: Status changed <previous> → <new> (TT-xx)
Orchestrator request received [timestamp] Orchestrator → OTM: <action> requested (IE-xx)
Human change detected [timestamp] Human(<name>) change detected: <field> = <value> (IE-01)
Agent notification sent [timestamp] OTM → Agent(<name>): <notification type> (OE-xx)
Agent acknowledgement received [timestamp] Agent(<name>) → OTM: Acknowledged (IE-03)
Agent subtask report received [timestamp] Agent(<name>) → OTM: Subtask done — <title> (IE-02). Remaining: <n>
Watchdog action [timestamp] Watchdog: <check description> (IE-08)
Error/alert [timestamp] OTM ERROR: <description> (IE-10)

7.2 Implementation

All audit entries are posted as replies in the Slack task's conversation thread via SW-1. This means:

  • Every task's conversation is a complete history of its lifecycle
  • No separate log table needed — Slack IS the audit log
  • Human-readable without any tooling
  • Searchable via Slack search

PART 1 — SLACK INTEGRATION LAYER

SE-1: Slack Event Listener

Detail
Actors Slack Events API (source)
Inbound events IE-01: any list_item_updated event on the Task Board List
Actions Forward raw event payload to OTM-0 entry point
Outbound events None — SE-1 has zero intelligence
Transitions None — SE-1 is a passthrough

The Slack app (Salvatore's app, socket mode) subscribes to list_item_updated events.

SE-1 does exactly one thing:

ON list_item_updated:
  → call otm_handle_event(raw_event_payload)

No routing. No field inspection. No filtering. The OTM decides what to do with the event.

Required Slack App scope for SE-1: lists:read only.

SW-1: Slack Writer

Detail
Actors OTM (sole caller)
Inbound events OTM handler calls
Actions Write task fields to Slack List, post audit entries to task conversation
Outbound events OE-06: Slack field update, OE-07: audit log entry

SW-1 is the sole component that writes to Slack. It provides two operations:

  1. sw1_update_fields(task_id, fields) — Updates task item fields (status, counters, timestamps). Produces OE-06.
  2. sw1_post_audit(task_id, message) — Posts a timestamped message to the task's conversation thread. Produces OE-07.

Required Slack App scope for SW-1: lists:write.

📌 SE-1 (lists:read) and SW-1 (lists:write) are separate concerns. They may run in the same Slack app but are logically distinct.

📌 SW-1 does NOT create or delete tasks. Only the Orchestrator creates tasks and subtasks (via Slack API or UI). SW-1's write scope is limited to: updating existing task fields (OE-06) and posting audit entries to task conversations (OE-07). During rework, the Orchestrator manages subtask creation/deletion directly; the OTM then processes the state change via SW-1.


PART 2 — OPENCLAW TASK MANAGER (OTM)

Implemented as an OpenClaw plugin pipeline set. Persists to SQLite. Sole component that writes task status, agent status, and counter fields.

OTM-0: Event Router

Detail
Actors OTM (internal)
Inbound events IE-01: raw list_item_updated from SE-1
Actions ACT-R1: Parse event payload and identify change type
ACT-R2: Route to appropriate handler
Outbound events None directly — delegates to handlers

Single entry point for all Slack-originated events. Contains the routing logic that was previously in SE-1.

RECEIVE raw_event_payload from SE-1
  |
  +-- Parse: what field(s) changed?
  |
  +-- IF assigned_to changed AND status = "New":
  |       → route to OTM-2 (task assignment)
  |
  +-- IF status changed to "Assigned" (from Pending promotion or rework):
  |       → route to OTM-2 (task re-assignment)
  |
  +-- IF status changed to "Cancelled" by Human:
  |       → route to OTM-5 (cancellation)
  |
  +-- IF other field changed by Human:
  |       → log via sw1_post_audit (OE-07): "Human(<name>) changed <field>"
  |       → no state transition
  |
  +-- ELSE: ignore

📌 All routing intelligence lives here, not in SE-1. SE-1 is a dumb pipe.

OTM-1: Agent Registry

Detail
Actors OTM (owner/writer)
Inbound events IE-08: startup reconciliation
IE-01: new agent detected (auto-register)
Actions ACT-A1: Register new agent (from openclaw.json or first interaction)
ACT-A2: Set agent busy (AT-01)
ACT-A3: Set agent idle (AT-02, AT-03, AT-04)
ACT-A4: Reconcile from Slack List on startup
ACT-A5: Reconcile from openclaw.json on startup
Outbound events OE-07: audit log for registration events
Transitions AT-01, AT-02, AT-03, AT-04

See §4.5 for schema and §4.6 for startup reconciliation.

OTM-2: Handle Task Assigned

Detail
Actors OTM (executor), Agent (notified if idle)
Inbound events IE-01: assigned_to changed or status=Assigned (from OTM-0)
Actions ACT-T1: Count subtasks via Slack API
ACT-T2: Check agent availability in registry
ACT-T3: Set task status (via SW-1)
ACT-T4: Set agent busy (AT-01 via OTM-1)
ACT-T5: Send SURE notification (OE-01)
Outbound events OE-01: SURE task notification (if agent idle)
OE-06: Slack field update
OE-07: audit log entry
Task transitions TT-02 (→ In Progress) or TT-03 (→ Pending)
Agent transitions AT-01 (Idle → Busy) if agent available
RECEIVE task assignment event (from OTM-0)
  |
  +-- Read task fields: task_id, title, assigned_to, priority
  +-- ACT-T1: Count child items (subtasks) via Slack API → subtask_count
  +-- Store subtask_count as initial subtasks_remaining
  |
  +-- ACT-T2: Look up assigned_to agent in registry
  |
  +-- IF agent NOT found AND agent_type detectable:
  |       ACT-A1: Auto-register agent
  |       sw1_post_audit: "New agent registered: <name>"
  |
  +-- IF agent NOT found AND not detectable:
  |       OE-05: Alert admin
  |       sw1_post_audit: "ERROR: Unknown agent <id>"
  |       Task stays in current status
  |
  +-- IF agent is IDLE:
  |       Store previous_status
  |       ACT-T4: Execute AT-01 (agent → busy)
  |       sw1_update_fields: subtasks_remaining, status = "In Progress", assigned_at = now
  |       sw1_post_audit: "Status: Assigned → In Progress (TT-02). Agent: <name>"
  |       ACT-T5: Send SURE notification (OE-01)
  |       sw1_post_audit: "OTM → Agent(<name>): Task assigned (OE-01). Awaiting SURE ack."
  |
  +-- IF agent is BUSY:
          Store previous_status
          sw1_update_fields: status = "Pending"
          sw1_post_audit: "Status: Assigned → Pending (TT-03). Agent <name> busy with <current_task>"

OTM-3: Handle Subtask Done

Detail
Actors Agent (reports completion), OTM (processes), Orchestrator (notified on task completion)
Inbound events IE-02: agent subtask completion report via OTM API
Actions ACT-S1: Validate subtask belongs to agent's current task
ACT-S2: Decrement counter
ACT-S3: Update Slack subtask checkbox (via SW-1) — sets both status AND Col00
ACT-S4: Complete task if counter = 0
ACT-S5: Free agent (AT-02 via OTM-1)
ACT-S6: Promote next pending task (IE-09)
Outbound events OE-02: Orchestrator notification (on task complete)
OE-06: Slack field update
OE-07: audit log entry
Task transitions TT-04 (→ Agent Done when counter hits 0)
Agent transitions AT-02 (Busy → Idle) when task completes
RECEIVE subtask completion report (IE-02)
  {task_id, subtask_id, agent_id}
  |
  +-- ACT-S1: Validate:
  |     - subtask belongs to task
  |     - agent is assigned to task
  |     - subtask not already completed (idempotency: check todo_completed field / Col00)
  |       IF already completed: discard, return OK
  |
  +-- ACT-S3: sw1_update_fields: subtask.todo_completed = true (Col00), subtask.status = done
  +-- ACT-S2: Decrement task.subtasks_remaining by 1
  +-- sw1_update_fields: subtasks_remaining
  +-- sw1_post_audit: "Agent(<name>): Subtask done — <title> (IE-02). Remaining: <n>"
  |
  +-- IF subtasks_remaining > 0:
  |       Done. Await next report.
  |
  +-- IF subtasks_remaining = 0 (TASK COMPLETE):
          Store previous_status = "In Progress"
          sw1_update_fields: status = "Agent Done", completed_at = now
          sw1_post_audit: "Status: In Progress → Agent Done (TT-04). All subtasks complete."
          ACT-S5: Execute AT-02 (agent → idle)
          ACT-S6: Notify Orchestrator (OE-02):
            POST /hooks/agent {
              agentId: "main",
              message: "Task ready for review: <title>\nAgent: <name>\nElapsed: <time>\nResult: <result_summary>\nLink: <slack_link>"
            }
          sw1_post_audit: "OTM → Orchestrator: Task ready for review (OE-02)"
          CALL: check_next_task_for_agent(agent_id)  → IE-09

check_next_task_for_agent(agent_id):

Query Slack List for tasks WHERE:
  assigned_to = agent_id
  AND status = "Pending"
  ORDER BY priority ASC, posted_at ASC
  LIMIT 1
  |
  +-- IF Pending task found:
  |       sw1_update_fields: pending_task.status = "Assigned"
  |       sw1_post_audit on pending task: "Status: Pending → Assigned (TT-05). Agent now available."
  |       (OTM-0 detects Assigned change → OTM-2 fires)
  |
  +-- IF no Pending task:
          Agent remains idle.

📌 Idempotency is handled by checking todo_completed (Col00) on the subtask before processing. No separate processed_events table needed. The Slack conversation feed (OE-07) serves as the complete audit trail.

📌 Agent reports directly to OTM API (IE-02), not by ticking checkboxes in Slack. The OTM then updates Slack via SW-1 (ACT-S3). This ensures all writes go through the OTM.

OTM-4: Task Validate / Reject (Orchestrator API)

Detail
Actors Orchestrator (caller), OTM (executor)
Inbound events IE-04: Orchestrator requests validation
IE-05: Orchestrator requests rejection (with reason)
IE-06: Orchestrator signals rework ready (subtasks already managed by Orchestrator)
Actions ACT-V1: Verify task status = "Agent Done" or "Rejected"
ACT-V2: Execute validation (TT-06)
ACT-V3: Execute rejection (TT-07)
ACT-V4: Count unfinished subtasks, set counter, change status (TT-08)
Outbound events OE-02: confirmation to Orchestrator
OE-06: Slack field update
OE-07: audit log entries
Task transitions TT-06 (→ Done), TT-07 (→ Rejected), TT-08 (→ Assigned), TT-10 (→ Cancelled)

Validate request (IE-04):

{
  "task_id": "<task item ID>",
  "outcome": "validated",
  "comment": "<optional>"
}

Processing — validated:

ACT-V1: Verify task.status = "Agent Done" (reject API call otherwise)
Store previous_status = "Agent Done"
sw1_post_audit: "Orchestrator → OTM: Validation requested (IE-04)"
ACT-V2: Execute TT-06
  sw1_update_fields: status = "Done", validated_at = now, todo_completed = true (Col00)
  sw1_post_audit: "Status: Agent Done → Done (TT-06). Validated."
  IF comment: sw1_post_audit: "Orchestrator comment: <comment>"
OE-02: Notify Orchestrator: "Task <title> is Done"

Reject request (IE-05):

{
  "task_id": "<task item ID>",
  "outcome": "rejected",
  "reason": "<mandatory explanation>"
}

Processing — rejected:

ACT-V1: Verify task.status = "Agent Done" (reject API call otherwise)
Store previous_status = "Agent Done"
sw1_post_audit: "Orchestrator → OTM: Rejection requested (IE-05). Reason: <reason>"
ACT-V3: Execute TT-07
  sw1_update_fields: status = "Rejected"
  sw1_post_audit: "Status: Agent Done → Rejected (TT-07)"

Rework request (IE-06) — submitted after Orchestrator has already prepared subtasks:

The Orchestrator handles all subtask management before calling the OTM:

  1. Orchestrator deletes unnecessary/obsolete subtasks from the Slack List
  2. Orchestrator leaves completed subtasks in place (as a record)
  3. Orchestrator creates new subtasks (first = "Acknowledge rework request: ")
  4. Orchestrator then notifies the OTM that rework is ready:
{
  "task_id": "<task item ID>",
  "action": "rework"
}

Processing — rework:

Verify task.status = "Rejected"
sw1_post_audit: "Orchestrator → OTM: Rework requested (IE-06)"
ACT-V4: Execute TT-08
  Count unfinished subtasks via Slack API (todo_completed = false / Col00 unchecked)
  Set subtasks_remaining = count of unfinished subtasks
  sw1_update_fields: status = "Assigned", subtasks_remaining
  sw1_post_audit: "Status: Rejected → Assigned (TT-08). Rework: <n> unfinished subtasks."
  → OTM-0 detects Assigned change → OTM-2 fires → SURE notification sent → agent acks

📌 Rework flow — separation of concerns: The Orchestrator owns subtask management (create, delete, keep). The OTM owns state management (status transitions, counter recalculation, agent assignment). The OTM never creates or deletes subtasks. On rework, it counts unfinished subtasks to set subtasks_remaining, then triggers the normal assignment flow. The first new subtask is always an acknowledgement subtask — the agent closes it to confirm they understood the rework instructions.

Cancel-after-reject request (IE-06):

{
  "task_id": "<task item ID>",
  "action": "cancel"
}

Verify task.status = "Rejected" → Execute TT-10 (→ Cancelled).

OTM-5: Handle Task Cancelled

Detail
Actors Orchestrator/Human (requests), OTM (executes), Agent (notified if was working)
Inbound events IE-01: Human status edit detected by OTM-0
IE-06: Orchestrator cancel API call
Actions ACT-C1: Store previous_status
ACT-C2: Free agent if applicable (AT-03)
ACT-C3: Notify agent of cancellation (OE-04)
ACT-C4: Promote next pending task (IE-09)
Outbound events OE-04: cancellation notification to agent
OE-06: Slack field update
OE-07: audit log entry
Task transitions TT-11 (→ Cancelled)
Agent transitions AT-03 (Busy → Idle) if agent was working
RECEIVE cancellation request (IE-01 or IE-06)
  |
  +-- ACT-C1: Store previous_status
  +-- sw1_post_audit: "Cancellation requested by <actor> (IE-xx)"
  +-- sw1_update_fields: status = "Cancelled"
  +-- sw1_post_audit: "Status: <previous> → Cancelled (TT-11)"
  |
  +-- IF task was In Progress or Assigned:
  |       ACT-C2: Execute AT-03 (agent → idle)
  |       ACT-C3: Send cancellation notification (OE-04)
  |       sw1_post_audit: "OTM → Agent(<name>): Task cancelled (OE-04)"
  |       ACT-C4: check_next_task_for_agent(agent_id) → IE-09
  |
  +-- IF task was Pending:
  |       sw1_post_audit: "Removed from queue (no agent notification)"
  |
  +-- IF task was New:
          sw1_post_audit: "Cancelled before assignment"

OTM-6: Watchdog

Detail
Actors Watchdog cron (detector), OTM (executor), Admin (alerted)
Inbound events IE-08: cron tick (every 60 seconds)
Actions ACT-W1: Check for stale In Progress tasks (>24h, no subtask activity)
ACT-W2: Check for orphaned Pending tasks (agent idle but task pending)
ACT-W3: Check counter mismatches (subtasks_remaining vs actual)
ACT-W4: Check archival candidates (Done/Cancelled >7 days)
ACT-W5: Check agent heartbeats (last_seen >2h)
Outbound events OE-05: admin alert (anomalies)
OE-06: Slack field update (archival)
OE-07: audit log entries
Task transitions Requests TT-12 (→ Archived), requests TT-05 (orphaned Pending → Assigned)

📌 The Watchdog does NOT write state directly. It calls OTM handler functions to execute transitions.

Checks:

ACT-W1: STALE IN-PROGRESS
  Query tasks In Progress for >24h with no subtask activity in conversation feed
  → OE-05: Alert admin. Do NOT auto-reassign.
  → sw1_post_audit: "Watchdog: Stale task detected (>24h, no activity)"

ACT-W2: ORPHANED PENDING
  Query tasks Pending WHERE assigned agent is Idle in registry
  → Request OTM to re-trigger: call OTM-2 (TT-05 → Assigned)
  → sw1_post_audit: "Watchdog: Orphaned pending — re-triggering assignment"

ACT-W3: COUNTER MISMATCH
  Compare subtasks_remaining with actual unchecked subtask count
  → Recalculate and fix counter via SW-1
  → OE-05: Alert admin
  → sw1_post_audit: "Watchdog: Counter mismatch corrected (<old> → <new>)"

ACT-W4: ARCHIVAL
  Query tasks Done or Cancelled WHERE completed_at/cancelled_at + 7 days < now
  → Request OTM to execute TT-12
  → sw1_update_fields: status = "Archived"
  → sw1_post_audit: "Status: <previous> → Archived (TT-12). Auto-archived after 7 days."

ACT-W5: AGENT HEARTBEAT
  Query agents WHERE last_seen > 2h ago
  → OE-05: Alert admin (agent may be down). No state change.

End-to-End Flow Diagrams

Flow 1a — Task Assigned, Agent Idle

Orchestrator sets assigned_to on New task
  |
[SE-1] receives list_item_updated → forwards raw event to OTM (IE-01)
  |
[OTM-0] parses: assigned_to changed → routes to OTM-2
  |
[OTM-2] checks registry: agent is IDLE
  |  sw1_update_fields: subtasks_remaining, status = "In Progress", assigned_at
  |  sw1_post_audit: "Status: New → Assigned → In Progress (TT-01, TT-02)"
  |  executes AT-01: agent → busy
  |  sends SURE notification (OE-01)
  |  sw1_post_audit: "OTM → Agent(<name>): Task assigned. Awaiting SURE ack."
  |
Agent receives notification
  |  sends acknowledgement (IE-03)
  |
[OTM] receives IE-03
  |  sw1_post_audit: "Agent(<name>): Task acknowledged (IE-03)"
  |
Agent starts work

Flow 1b — Task Assigned, Agent Busy

Orchestrator sets assigned_to on New task
  |
[SE-1] → IE-01 → [OTM-0] → routes to OTM-2
  |
[OTM-2] checks registry: agent is BUSY
  |  sw1_update_fields: status = "Pending"
  |  sw1_post_audit: "Status: Assigned → Pending (TT-03). Agent busy with <current_task>"
  |
Task waits silently. No agent notification.

Flow 2a — Subtask Done (not last)

Agent completes subtask → reports to OTM API (IE-02)
  |
[OTM-3] validates: subtask belongs to task, not already completed
  |  sw1_update_fields: subtask.todo_completed = true (Col00), subtask.status = done
  |  decrements subtasks_remaining (3 → 2)
  |  sw1_update_fields: subtasks_remaining
  |  sw1_post_audit: "Agent(<name>): Subtask done — <title>. Remaining: 2"
  |
Agent continues working.

Flow 2b — Last Subtask (task complete)

Agent completes final subtask → reports to OTM API (IE-02)
  |
[OTM-3] decrements (1 → 0)
  |  sw1_update_fields: status → "Agent Done", completed_at
  |  sw1_post_audit: "Status: In Progress → Agent Done (TT-04). All subtasks complete."
  |  executes AT-02: agent → idle
  |  notifies Orchestrator (OE-02)
  |  sw1_post_audit: "OTM → Orchestrator: Task ready for review"
  |  check_next_task_for_agent()
  |    → Pending task found? → TT-05 → OTM-0 → OTM-2 → Flow 1a
  |    → No pending? → agent stays idle

Flow 3a — Orchestrator Validates

Orchestrator tells OTM to validate task (IE-04)
  |
[OTM-4] verifies status = "Agent Done"
  |  sw1_post_audit: "Orchestrator: Validation requested"
  |  sw1_update_fields: status → "Done", validated_at, todo_completed = true (Col00)
  |  sw1_post_audit: "Status: Agent Done → Done (TT-06). Validated."
  |  notifies Orchestrator: confirmed (OE-02)

Flow 3b — Orchestrator Rejects and Requests Rework

Step 1: Orchestrator tells OTM to reject task (IE-05)
  |
[OTM-4] sw1_post_audit: "Orchestrator: Rejection. Reason: <reason>"
  |  sw1_update_fields: status → "Rejected"
  |  sw1_post_audit: "Status: Agent Done → Rejected (TT-07)"

Step 2: Orchestrator manages subtasks directly in Slack (NOT via OTM):
  |  - Deletes unnecessary/obsolete subtasks
  |  - Leaves completed subtasks in place (as record)
  |  - Creates new subtasks:
  |      1. "Acknowledge rework request: <detailed reason and instructions>"
  |      2. "Fix validation on email field"
  |      3. "Add unit tests for edge cases"

Step 3: Orchestrator tells OTM that rework is ready (IE-06)
  |
[OTM-4] receives rework signal:
  |  Counts unfinished subtasks via Slack API → 3
  |  sw1_update_fields: status → "Assigned", subtasks_remaining = 3
  |  sw1_post_audit: "Status: Rejected → Assigned (TT-08). Rework: 3 unfinished subtasks."

Step 4: Normal assignment flow (TT-02 → SURE → agent works)
  |
[OTM-0] detects Assigned → OTM-2 → agent idle → In Progress
  |  SURE notification posted to Slack task conversation (OE-01)
  |  Agent acknowledges (IE-03)
  |
Agent reads first subtask: "Acknowledge rework request: ..."
  |  Agent closes first subtask to acknowledge (IE-02)
  |  OTM-3 decrements: 3 → 2
  |  sw1_post_audit: "Agent(<name>): Acknowledged rework. Remaining: 2"
  |
Agent works through remaining subtasks normally

Flow 4 — Cancellation

Orchestrator tells OTM to cancel (IE-06) / Human changes status in Slack UI (IE-01)
  |
[OTM-0] routes to OTM-5
  |
[OTM-5] sw1_post_audit: "Cancellation requested by <actor>"
  |  sw1_update_fields: status → "Cancelled", cancelled_at = now
  |  executes AT-03: agent freed if was working
  |  sends cancellation notification (OE-04)
  |  sw1_post_audit: "Status: <previous> → Cancelled (TT-11)"
  |  check_next_task_for_agent() for freed agent

Flow 5 — Auto-Archival (TT-12)

[OTM-6 Watchdog] IE-08: cron tick fires (every 60s)
  |
  +-- ACT-W4: Query tasks WHERE:
  |     (status = "Done" AND validated_at + 7 days < now)
  |     OR (status = "Cancelled" AND cancelled_at + 7 days < now)
  |
  +-- FOR EACH archival candidate:
        Store previous_status
        sw1_update_fields: status → "Archived"
        sw1_post_audit: "Status: <previous> → Archived (TT-12). Auto-archived after 7 days."
        INSERT INTO task_history (snapshot of task fields)
        DELETE task from active tasks (SQLite only — Slack List item untouched)

📌 TT-12 is a watchdog-requested transition (IE-08 trigger). The watchdog detects the 7-day threshold; the OTM executes the archive. Archived tasks are snapshotted to task_history for long-term reporting before being removed from the active tasks table.

📌 Slack archive limitation: The OTM sets status = "Archived" in Slack via SW-1, but the actual "Archive item" action in the Slack UI cannot be triggered via API (slackLists.items.archive does not exist). Manual Slack UI archiving is required for items to visually disappear from the default Slack list view.


PART 3 — FILE-BASED PIPELINE

The file-based pipeline is the operational implementation of Systems 1–4. It bridges the Orchestrator's task-creation workflow to the Slack List and agent workspaces, using JSON files as the intermediary to avoid direct Slack API calls by agents.

System 1: Task Creation (otm-create-task.sh)

The Orchestrator (Claudia) creates tasks by running a shell script. This populates a JSON file and atomically increments the task ID counter.

Script Parameters

Parameter Required Default Description
--title Short, actionable one-liner
--agent Who works on it: claudia, devdas, archibald, frederic, salvatore, sylvain, rupert
--priority normal critical | high | normal | medium | low | batchable
--project (none) Free-text project name (e.g. prj-012, fab-state)
--type action action | decision | review
--subtask "Confirm that task has been done" Repeatable. Each value becomes a Slack subtask. If none provided, a default confirmation subtask is auto-added (required for completion detection)

Examples

# Simple action task
otm-create-task.sh --title "Fix login bug" --agent devdas --priority high --project prj-012

# Decision for Rupert
otm-create-task.sh --title "Approve budget for Q2" --agent rupert --type decision

# Task with subtasks
otm-create-task.sh --title "Build login page" --agent devdas --project prj-012 \
  --subtask "Create login form component" \
  --subtask "Add validation logic" \
  --subtask "Write unit tests"

# Input/acknowledgement pattern (replaces --description)
otm-create-task.sh --title "Review API design" --agent frederic --type review \
  --subtask "input: the API spec is at docs/api-v2.md" \
  --subtask "Check endpoint naming conventions" \
  --subtask "Validate error response format"

# Batchable priority (can wait for batch processing)
otm-create-task.sh --title "Update documentation" --agent archibald --priority batchable

JSON File Format (intermediate)

Written to ~/Library/Application Support/OpenClaw/otm/new-tasks/<timestamp>-<uuid>.json

{
  "id": "uuid",
  "taskId": "T-00035",
  "title": "Short task title",
  "agent": "devdas",
  "createdAt": "2026-03-14T08:20:19Z",
  "priority": "normal",
  "project": "prj-012",
  "subtasks": ["Subtask 1", "Subtask 2"],
  "type": "action",
  "status": "pending"
}

Task ID System

  • Format: T-NNNNN (5 digits, zero-padded)
  • Counter file: ~/Library/Application Support/OpenClaw/otm/next-task-id.json
  • Atomic increment with file locking (flock)
  • Auto-assigned by otm-create-task.sh — agents don't manage IDs manually

Type Values

Type Purpose Primary user
action Task to execute (default) Agents
decision Requires a decision from someone Rupert
review Needs review / approval Rupert or agents

System 2: Task Injection (otm-injector.js)

The injector watches the new-tasks/ directory and publishes JSON task files to the Slack Lists API.

Triggers

Component File Trigger
Watcher ai.openclaw.otm-watcher.plist WatchPaths on new-tasks/ (and task-updates/)
Sweeper ai.openclaw.otm-sweeper.plist Every 10 min (catches misses)

Idempotency (Dedup Check)

The injector includes a dedup check before creating tasks in Slack to prevent duplicates on crash/retry:

  1. Before calling slackLists.items.create, the injector fetches all existing items
  2. Scans for a matching Task ID (Col0ALVK2NA1E)
  3. If the Task ID already exists → skips creation, moves file to processed/ with _skipped_duplicate: true
  4. If the dedup API call fails → proceeds with creation (better to duplicate than lose a task)

This prevents duplicate Slack items when the watcher or sweeper re-processes a file already injected (e.g., after a crash, retry, or race condition).

Directories

Path Purpose
~/Library/Application Support/OpenClaw/otm/new-tasks/ Inbox — pending task files
~/Library/Application Support/OpenClaw/otm/task-updates/ Inbox — update files (System 3b)
~/Library/Application Support/OpenClaw/otm/processed/ Successfully injected
~/Library/Application Support/OpenClaw/otm/failed/ Failed (with error metadata)

System 3: Task Dispatcher (otm-dispatcher.js)

A lightweight Node.js scanner that detects tasks with an assignee and dispatches them to the appropriate agent workspace. Runs every 2 minutes via launchd.

What it does

  1. Fetches all tasks from the Slack list (F0ALE8DCW1F)
  2. Finds tasks where: status = "new" AND assignee is set (not empty)
  3. For each matching task (in this exact order for crash safety):
    1. Writes a dispatch file to the agent's workspace: /Volumes/OPENCLAW/CLAUDIA/rapido-openclaw/workspaces/<agent>-workspace/task-dispatch.json
    2. Updates the task's status in Slack from newassigned + sets assigned_at (this is the dedup gate — once assigned, future runs skip it)
    3. Triggers the agent session via Gateway WebSocket RPC
  4. After all tasks processed, all agent triggers fire in parallel — agents start concurrently
  5. Writes its own component state file (System 5 heartbeat)

Agent Triggering via Gateway WebSocket RPC

After writing task-dispatch.json, the dispatcher actively triggers each agent via the OpenClaw Gateway WebSocket RPC. This eliminates the passive "wait for heartbeat" gap — agents start working immediately.

Protocol: WebSocket JSON-RPC to ws://127.0.0.1:18789

Dispatcher                    Gateway                     Agents
  │                              │                          │
  │ ws.connect()                 │                          │
  │─────────────────────────────►│                          │
  │                              │                          │
  │ { method: "agent",           │                          │
  │   params: {                  │                          │
  │     agentId: "archibald",    │                          │
  │     message: "Task T-00044   │                          │
  │       dispatched...",        │                          │
  │     idempotencyKey:          │                          │
  │       "otm-T-00044"         │                          │
  │   }}                         │                          │
  │─────────────────────────────►│ ──► archibald session ──►│
  │                              │                          │
  │ { method: "agent", ...       │                          │
  │   agentId: "devdas" }        │                          │
  │─────────────────────────────►│ ──► devdas session ─────►│
  │                              │                          │
  │ { method: "agent", ...       │                          │
  │   agentId: "salvatore" }     │                          │
  │─────────────────────────────►│ ──► salvatore session ──►│
  │                              │                          │
  │ ws.close()                   │     (all 3 run in        │
  │─────────────────────────────►│      parallel)           │
  │                              │                          │
  │  Dispatcher exits.           │  Gateway manages         │
  │  Total time: ~2 seconds.     │  concurrent sessions.    │

RPC call format:

{
  "method": "agent",
  "params": {
    "message": "Task T-00044 dispatched. Check task-dispatch.json and execute it.",
    "agentId": "archibald",
    "idempotencyKey": "otm-T-00044"
  }
}

Response (immediate, non-blocking):

{
  "runId": "otm-T-00044",
  "status": "accepted",
  "acceptedAt": 1773504558104
}

Key properties:

  • Parallel: All agent sessions start concurrently — no serialization
  • Non-blocking: Gateway returns accepted immediately; dispatcher doesn't wait
  • Auth: Gateway token passed via WebSocket connection headers
  • Rupert excluded: Human users are notified via Slack UI, not RPC

⚠️ idempotencyKey is NOT idempotent. Despite the name, the Gateway agent RPC method accepts duplicate calls with the same key — it uses the key as a runId label only. Calling twice with the same key = two separate agent sessions. Idempotency is the dispatcher's responsibility, not the gateway's. The Slack status flip (newassigned) is the sole dedup mechanism for the dispatcher. Once a task is assigned, it's invisible to future dispatcher runs.

vs. HTTP Webhook alternative (POST /hooks/agent): The webhook approach serializes agent sessions on CommandLane.Nested — agents run one at a time. With 6 agents × ~3 min each = ~18 min sequential vs. ~3 min parallel via WebSocket RPC. WebSocket is the correct choice for multi-agent dispatch.

Dispatch Operation Order (crash safety)

The dispatcher must execute operations in this exact order to prevent duplicate agent triggers:

1. Write task-dispatch.json to agent workspace
2. Update Slack status: new → assigned (+ set assigned_at)
3. Trigger agent via WS RPC

Why this order matters:

Crash point Result Recovery
After step 1, before step 2 File written, Slack still new Next dispatcher run re-dispatches → file already has the task (append-only), agent gets triggered. Safe but duplicate file entry.
After step 2, before step 3 Slack says assigned, agent never woke up Agent picks up task on next heartbeat or manual trigger. Safe — delayed but not lost.
After step 3 All done Clean path.

The dangerous alternative (trigger agent FIRST, then update Slack) risks: crash after trigger → Slack still new → next run triggers agent AGAIN → duplicate work, wasted tokens. This is why Slack update must come before agent trigger.

Dispatch File Format (task-dispatch.json)

{
  "dispatched": [
    {
      "taskId": "T-00033",
      "title": "Build login page",
      "priority": "high",
      "project": "PRJ-012 App",
      "type": "action",
      "subtasks": ["Create form", "Add validation", "Write tests"],
      "dispatchedAt": "2026-03-14T15:00:00Z",
      "slackItemId": "Rec0ALXYZ"
    }
  ]
}

The dispatch file is APPEND-ONLY — new tasks get added to the dispatched array. The agent removes entries when they pick them up (or marks them as "picked": true).

Agent-Side Convention

On session start, agents check for task-dispatch.json. If present, they:

  1. Pick up the highest-priority task
  2. Update their agent-state.json to "working" with the task
  3. Start working on it
  4. When done, mark subtasks complete via otm-update-task.sh

Agent Workspace Mapping

Agent Workspace path
claudia claudia-workspace
devdas devdas-workspace
archibald archibald-workspace
frederic frederic-workspace
salvatore salvatore-workspace
sylvain sylvain-workspace
rupert (skip — human, notified via Slack UI)

Component

Item Detail
File otm-dispatcher.js
Plist ai.openclaw.otm-dispatcher.plist
Schedule Every 120 seconds
Log ~/Library/Logs/OpenClaw/otm-dispatcher.log
State file otm-dispatcher-state.json

System 3b: Task Updates (otm-update-task.sh)

A file-drop mechanism that allows agents to update task status and mark subtasks complete — same security model as task creation (agents never touch the Slack API directly).

Script: otm-update-task.sh

# Mark a subtask done
otm-update-task.sh --task-id T-00033 --subtask-done "Create login form"

# Update task status
otm-update-task.sh --task-id T-00033 --status in_progress

# Report blocked
otm-update-task.sh --task-id T-00033 --status blocked --reason "Waiting on API key"

# Multiple subtasks done at once
otm-update-task.sh --task-id T-00033 \
  --subtask-done "Create login form" \
  --subtask-done "Add validation logic"

Parameters

Parameter Required Description
--task-id Task ID (T-NNNNN)
--status New status: in_progress | blocked | agent_done
--subtask-done Subtask title to mark as done (repeatable)
--reason Reason text (used with --status blocked)

At least one of --status or --subtask-done is required.

JSON Format (intermediate)

Written to ~/Library/Application Support/OpenClaw/otm/task-updates/<timestamp>-<uuid>.json

{
  "id": "uuid",
  "taskId": "T-00033",
  "action": "update",
  "createdAt": "2026-03-14T16:00:00Z",
  "status": "in_progress",
  "subtasksDone": ["Create login form"],
  "reason": null
}

Processing

The otm-injector.js is extended to also watch task-updates/ directory:

  1. Reads the update JSON
  2. Looks up the task in Slack by Task ID (scans items, matches Col0ALVK2NA1E)
  3. If status is set → updates the Status column
  4. If subtasksDone is set → finds matching child items by title → sets their status to done AND sets Col00 (checkbox) to true
  5. Moves file to processed/ on success, failed/ on error

Component

Item Detail
Script otm-update-task.sh (shared tool)
Processor otm-injector.js (extended)
Watcher ai.openclaw.otm-watcher.plist (updated WatchPaths)

System 4: Completion Detection (otm-completion-detector.js)

A lightweight scanner that auto-promotes tasks to agent_done when all subtasks are complete.

Every task has at least one subtaskotm-create-task.sh auto-adds "Confirm that task has been done" if no subtasks are provided. This guarantees the completion detector always has a signal.

Flow

Completion Detector (Node.js, launchd every 2 min)
  │
  ├── GET all tasks from Slack list
  ├── FILTER: status = "in_progress" + has child items (subtasks)
  ├── CHECK: all child items have status ∈ {done, agent_done} OR Col00 = true
  │
  └── YES → UPDATE parent task status → "agent_done"

Rules

Condition Action
Task in_progress + all subtasks done/agent_done → promote to agent_done
Task in_progress + some subtasks still open → skip (work in progress)
Task not in_progress → skip (not active)

What happens after agent_done

  • Technical tasks: Claudia validates the work → done
  • Business/decision tasks: Claudia creates a review task for Rupert → Rupert approves → done

Component

Item Detail
File otm-completion-detector.js
Plist ai.openclaw.otm-completion-detector.plist
Schedule Every 120 seconds
Log ~/Library/Logs/OpenClaw/otm-completion-detector.log

System 5: Component Heartbeats

Each OTM component writes a state file after every run. These files are watched by the collector and surfaced on the dashboard.

State File Format

{
  "component": {
    "id": "otm-injector",
    "status": "alive",
    "lastRun": "2026-03-14T10:15:00Z",
    "result": "success",
    "details": "Processed 2 tasks, 0 failures"
  }
}

State Files

Component File Written by
OTM Injector otm-injector-state.json otm-injector.js (end of each run)
Dispatcher otm-dispatcher-state.json otm-dispatcher.js (end of each run)
Completion Detector otm-completion-detector-state.json otm-completion-detector.js (end of each run)
Watcher otm-watcher-state.json watcher wrapper

All files: ~/Library/Application Support/OpenClaw/otm/

Dashboard

The OTM Components card shows each component's status with staleness color coding:

  • 🟢 Green: last run < 5 min ago
  • 🟡 Yellow: last run 5–15 min ago
  • 🔴 Red: last run > 15 min ago (action required)

Observability Stack

State file → collector.js (FSEvents) → SQLite otm_state table
                                             ↓
                               reader.js polls + broadcasts
                                             ↓
                                    Dashboard WebSocket

System 6: DMZ Relay

The DMZ relay bridges the private OpenClaw state to the public Vercel dashboard. It runs on a Synology NAS in the DMZ.

Architecture

OpenClaw VM (private)     Synology NAS (DMZ)           Browser (Vercel dashboard)
  │                         ┌──────────────────┐
  │  collector.js           │  RECEIVER         │
  │  pushToRelay()          │  127.0.0.1:3456   │
  │ ── HTTP POST ─────────► │  + bearer token   │
  │  on state change         │  + atomic write   │
  │                          │                   │
  │                          │  fab-state.json   │ ← shared state file
  │                          │                   │
  │                          │  BROADCASTER      │
  │                          │  0.0.0.0:3457     │
  │                          │  (TLS via proxy)  │ ◄── wss://nas.domain/ws
  │                          │  fs.watch → push  │ ◄── GET /api/state
  │                          └──────────────────┘

Services

Service File Port Binding Dependencies
Receiver receiver.js 3456 127.0.0.1 (localhost only) Zero — pure Node.js
Broadcaster broadcaster.js 3457 0.0.0.0 (behind TLS proxy) ws package only

Env Vars

Service Var Required Description
Receiver FAB_RELAY_TOKEN Shared secret bearer token
Receiver STATE_FILE Path to state file (default: ./fab-state.json)
Broadcaster STATE_FILE Same state file path
Collector (OpenClaw) FAB_RELAY_URL If set, enables relay push
Collector (OpenClaw) FAB_RELAY_TOKEN Must match receiver token

Security Model

Layer Protection
Bearer token Constant-time comparison (timing-attack safe)
Receiver binding 127.0.0.1 — not reachable from internet
Firewall Port 3456: allow ONLY from OpenClaw VM IP
Broadcaster Read-only; no auth needed (non-sensitive data)
TLS All external traffic via Synology reverse proxy + Let's Encrypt
Atomic write Receiver writes .tmp → rename; no partial reads

Collector Integration

// Only runs if FAB_RELAY_URL is set:
pushStateToRelay(); // called after every gateway/agent/OTM state change

The pushToRelay() function in collector.js:

  1. Builds full snapshot from SQLite (gateway + agents + OTM)
  2. POSTs to FAB_RELAY_URL with bearer token
  3. Handles errors gracefully — relay down = warning log, not crash

Files

File Location
receiver.js work/PROJECTS/fab-state/synology-relay/
broadcaster.js work/PROJECTS/fab-state/synology-relay/
package.json work/PROJECTS/fab-state/synology-relay/
SETUP-GUIDE.md work/PROJECTS/fab-state/synology-relay/

See work/PROJECTS/strategy-openclaw-org/docs/FAB-STATE.md System 6 for full documentation.

Full Pipeline Flow

Claudia              Filesystem            Injector         Slack            Dispatcher           Agent Workspace
  │                     │                     │               │                  │                      │
  │ otm-create-task.sh  │                     │               │                  │                      │
  │────────────────────►│ .json               │               │                  │                      │
  │                     │─── watcher ────────►│               │                  │                      │
  │                     │                     │ items.create  │                  │                      │
  │                     │                     │──────────────►│ status=new       │                      │
  │                     │                     │  + subtasks   │                  │                      │
  │                     │ move to processed/  │               │                  │                      │
  │                     │◄────────────────────│               │                  │                      │
  │                     │                     │               │                  │                      │
  │                     │                     │               │ ◄── scan ───────│ (every 2 min)        │
  │                     │                     │               │ new + assignee   │                      │
  │                     │                     │               │ ── update ──────►│                      │
  │                     │                     │               │ status=assigned  │                      │
  │                     │                     │               │                  │ task-dispatch.json   │
  │                     │                     │               │                  │─────────────────────►│
  │                     │                     │               │                  │                      │
  │                     │                     │               │                  │ WS RPC: agent()      │
  │                     │                     │               │                  │──► Gateway ──► Agent │
  │                     │                     │               │                  │   (parallel start)   │
  │                     │                     │               │                  │                      │
  │                     │                     │               │                  │      (agent works)   │
  │                     │                     │               │                  │                      │
  │                     │                     │               │ ◄── scan (completion detector, 2 min)  │
  │                     │                     │               │ in_progress +    │                      │
  │                     │                     │               │ all subtasks done│                      │
  │                     │                     │               │ → agent_done     │                      │

All Pipeline Components Summary

Component File Trigger
Task creator otm-create-task.sh Called by agents
Injector otm-injector.js Called by watcher/sweeper
Watcher ai.openclaw.otm-watcher.plist WatchPaths on new-tasks/ + task-updates/
Sweeper ai.openclaw.otm-sweeper.plist Every 10 min (catches misses)
Dispatcher otm-dispatcher.js Every 2 min (launchd)
Completion detector otm-completion-detector.js Every 2 min (launchd)

8. Error Monitor (OTM-7)

The Error Monitor is a dedicated OTM component that detects state inconsistencies, traces errors for lessons learned, and triggers corrective actions. It runs as part of the Watchdog cycle (IE-08, every 60s) but is logically separate from the Watchdog's operational checks (OTM-6).

8.1 Error Condition Catalogue

Each error condition is coded ERR-xx. The Error Monitor detects; the OTM corrects.

Code Condition Detection Rule Severity Auto-correction Manual escalation
ERR-01 Stale In Progress Task status = "In Progress" AND no IE-02 subtask report in >10 minutes Warning None — alert only OE-05: Admin notified. May indicate agent crash, stuck task, or slow work.
ERR-02 Agent-Task Mismatch (busy agent, no task) Agent status = busy AND current_task not found in Slack List (or task status ≠ In Progress) Critical Set agent idle (AT-04), call check_next_task_for_agent() OE-05: Admin alert with details of orphaned agent state
ERR-03 Agent-Task Mismatch (idle agent, active task) Task status = "In Progress" AND assigned agent status = idle in registry Critical Set agent busy, re-send SURE notification (OE-01) OE-05: Admin alert — state was inconsistent
ERR-04 Orphaned Pending Task Task status = "Pending" AND assigned agent status = idle High Promote task: TT-05 → re-evaluate via OTM-2 OE-07: Audit log on task
ERR-05 Counter Mismatch subtasks_remaining ≠ actual count of unchecked subtasks in Slack List High Recalculate and fix counter via SW-1 OE-05: Admin alert with old/new values
ERR-06 SURE Timeout OE-01 sent >3 minutes ago (1min + 2min retries) AND no IE-03 received Critical None — task stays In Progress OE-05: Admin alert. Agent may be unreachable, or gateway may have restarted.
ERR-07 Multiple Active Tasks per Agent Agent has >1 task with status = "In Progress" assigned to them Critical Keep oldest task, move others to Pending OE-05: Admin alert — invariant violation
ERR-08 Stuck in Assigned Task status = "Assigned" for >5 minutes (should transition immediately to In Progress or Pending) High Re-trigger OTM-2 for the task OE-05: Admin alert if re-trigger fails
ERR-09 Stuck in Rejected Task status = "Rejected" for >24 hours (Orchestrator hasn't submitted rework or cancelled) Warning None — alert only OE-05: Remind Orchestrator to act
ERR-10 Ghost Agent assigned_to field references a Slack user ID not in agent registry AND not auto-registerable Critical Task stays in current status OE-05: Admin alert — unknown agent
ERR-11 Duplicate Subtask Reports Same subtask_id reported done >1 time (idempotency check caught it) Info Silently discarded (idempotent) Logged in event log (§9) for pattern analysis
ERR-12 Stale Rework (no ack subtask closed) Task in "In Progress" after TT-08 rework, first subtask ("Acknowledge rework…") not closed within 10 minutes Warning None — alert only OE-05: Agent may not have read rework instructions

8.2 Error Monitor Processing

OTM-7 runs every 60s (piggybacks on IE-08 watchdog cron):
  |
  FOR EACH error check ERR-01 through ERR-12:
    |
    +-- Run detection query (SQLite + Slack API as needed)
    |
    +-- IF condition detected:
    |     1. Log error to event_log table (§9): {error_code, task_id, agent_id, details, timestamp}
    |     2. Log to task conversation (OE-07): "[timestamp] OTM-7 ERROR: ERR-xx detected — <description>"
    |     3. IF auto-correctable: execute correction, log correction action
    |     4. IF manual escalation: send OE-05 alert to admin
    |     5. Increment error counter in error_stats table (§10)
    |
    +-- IF condition NOT detected: skip

8.3 Lessons Learned Pipeline

Errors are not just fixed — they feed a continuous improvement loop.

  1. Error Statistics Table (error_stats in error DB, see §10): Tracks frequency, first/last occurrence, auto-correction success rate per ERR-xx code.
  2. Daily Error Report: OTM-7 generates a summary during the first watchdog cycle after 00:00 each day:
    • Error counts by code (ERR-01 through ERR-12)
    • Most frequent errors
    • Auto-correction success/failure ratio
    • New error patterns (first-time occurrences)
  3. Threshold Alerts: Alert on EVERY error (threshold = 1). During startup and early operation, all anomalies are surfaced immediately via OE-05. The threshold can be raised once the system is stable and baseline error rates are understood.
  4. Root Cause Tagging: Admin can tag errors with root cause via OTM API (POST /api/otm/error/{id}/tag), enabling aggregate analysis.

8.4 Corrective Action Summary

Action Triggered by Effect
Re-send SURE notification ERR-03 (idle agent, active task) Resynchronise agent with its task
Promote pending task ERR-04 (orphaned pending) Unblock queued work
Recalculate counter ERR-05 (counter mismatch) Fix data integrity
Free orphaned agent ERR-02 (busy agent, no task) Unblock agent for new work
Re-trigger OTM-2 ERR-08 (stuck in Assigned) Retry the assignment flow
Move excess tasks to Pending ERR-07 (multiple active tasks) Restore single-task invariant

9. Event Logging & Observability

All OTM events are logged to three complementary systems:

  1. Slack task conversation (OE-07) — human-readable, per-task, searchable in Slack (§7)
  2. Internal event log (event_log table in error DB) — structured, queryable, machine-readable
  3. Internal event log files — filesystem mirrors of database writes, for real-time tailing during tests and startup (§9.6)

9.1 Event Log Schema

CREATE TABLE event_log (
  id INTEGER PRIMARY KEY AUTOINCREMENT,
  timestamp INTEGER NOT NULL,          -- Unix timestamp (ms precision)
  event_type TEXT NOT NULL,            -- 'inbound' | 'outbound' | 'transition' | 'error' | 'correction' | 'system'
  event_code TEXT NOT NULL,            -- IE-xx, OE-xx, TT-xx, AT-xx, ERR-xx
  task_id TEXT,                        -- Slack List item ID (NULL for system events)
  agent_id TEXT,                       -- slack_user_id (NULL if not agent-related)
  handler TEXT,                        -- OTM-0 through OTM-7, SE-1, SW-1
  source TEXT NOT NULL,                -- 'se1', 'otm_api', 'watchdog', 'error_monitor', 'internal'
  detail TEXT,                         -- JSON blob with event-specific data
  duration_ms INTEGER,                 -- Processing time for this event
  success INTEGER DEFAULT 1,           -- 1 = success, 0 = failure
  error_message TEXT                   -- Error details if success = 0
);

CREATE INDEX idx_event_log_task ON event_log(task_id);
CREATE INDEX idx_event_log_agent ON event_log(agent_id);
CREATE INDEX idx_event_log_type ON event_log(event_type, timestamp);
CREATE INDEX idx_event_log_code ON event_log(event_code, timestamp);
CREATE INDEX idx_event_log_time ON event_log(timestamp);

9.2 What is Logged

Every IE-xx, OE-xx, TT-xx, AT-xx, and ERR-xx event produces one row in event_log. This includes:

Event Category Examples Logged Fields
Inbound events IE-01 (Slack event), IE-02 (subtask report), IE-03 (SURE ack) source, task_id, agent_id, raw payload in detail
Outbound events OE-01 (SURE notification), OE-06 (Slack write) target, task_id, delivery status, duration_ms
Task transitions TT-01 through TT-15 from_status, to_status, task_id, requesting_actor
Agent transitions AT-01 through AT-04 from_status, to_status, agent_id, triggering_task
Errors ERR-01 through ERR-12 error_code, detection_details, correction_applied
System events Startup, reconciliation, watchdog cycle cycle_number, checks_run, anomalies_found

9.3 Observability Queries

The event log enables:

-- Task lifecycle: all events for a specific task
SELECT * FROM event_log WHERE task_id = ? ORDER BY timestamp;

-- Agent activity: all events involving a specific agent
SELECT * FROM event_log WHERE agent_id = ? ORDER BY timestamp;

-- Error frequency: last 24 hours
SELECT event_code, COUNT(*) as count
FROM event_log
WHERE event_type = 'error' AND timestamp > ?
GROUP BY event_code ORDER BY count DESC;

-- Average task completion time
SELECT AVG(e2.timestamp - e1.timestamp) / 1000 / 60 as avg_minutes
FROM event_log e1
JOIN event_log e2 ON e1.task_id = e2.task_id
WHERE e1.event_code = 'TT-02' AND e2.event_code = 'TT-04';

-- Slowest handlers (performance monitoring)
SELECT handler, AVG(duration_ms), MAX(duration_ms), COUNT(*)
FROM event_log
WHERE duration_ms IS NOT NULL
GROUP BY handler ORDER BY AVG(duration_ms) DESC;

-- SURE acknowledgement response times
SELECT AVG(ack.timestamp - notif.timestamp) / 1000 as avg_seconds
FROM event_log notif
JOIN event_log ack ON notif.task_id = ack.task_id
WHERE notif.event_code = 'OE-01' AND ack.event_code = 'IE-03';

9.4 Retention & Historicisation

Data Retention Archive strategy
event_log (active) 30 days Rows older than 30 days → event_log_archive
event_log_archive 1 year Monthly SQLite dump to filesystem (gzipped)
error_stats Indefinite Cumulative counters, never purged
Slack conversation audit Indefinite Lives in Slack (Slack's retention policy applies)

Maintenance job (runs during OTM-6 watchdog, daily at 03:00):

-- Move old events to archive
INSERT INTO event_log_archive SELECT * FROM event_log WHERE timestamp < (now - 30 days);
DELETE FROM event_log WHERE timestamp < (now - 30 days);

-- Vacuum to reclaim space
VACUUM;

9.5 Why Not Sentry?

Internal structured logging (SQLite) is chosen over Sentry because:

  • No external dependency — OTM is self-contained
  • Queryable — SQL enables arbitrary analysis (Sentry requires its query language)
  • Correlated with task data — same DB, JOIN-able with agent registry
  • Low volume — estimated <10,000 events/day (see §10), no need for distributed tracing
  • Cost — zero (SQLite is free; Sentry has per-event pricing)
  • Privacy — all data stays on the OpenClaw server

If event volume exceeds 100,000/day or distributed tracing across multiple servers becomes needed, Sentry or OpenTelemetry would be reconsidered.

9.6 Filesystem Log File Mirroring

All database writes to event_log and error_stats are mirrored to two filesystem log files in real-time:

File Content Format Purpose
{OPENCLAW_DATA_DIR}/otm/logs/otm-events.log All event_log inserts [ISO-timestamp] [event_code] [handler] [task_id] [agent_id] detail_json Monitor all OTM activity via tail -f
{OPENCLAW_DATA_DIR}/otm/logs/otm-errors.log All ERR-xx detections + corrections [ISO-timestamp] [ERR-xx] [severity] [task_id] [agent_id] description [correction: action/none] Monitor errors during tests and startup

Implementation: Every INSERT INTO event_log and every error detection in OTM-7 appends one line to the corresponding log file. This is a synchronous append (negligible overhead at <350 events/day).

Log rotation: Daily at 03:00, rename to otm-events.log.YYYY-MM-DD and otm-errors.log.YYYY-MM-DD. Keep 30 days of rotated files. Older files deleted automatically.

Usage during development/testing:

# Watch all OTM events in real time
tail -f {OPENCLAW_DATA_DIR}/otm/logs/otm-events.log

# Watch errors only
tail -f {OPENCLAW_DATA_DIR}/otm/logs/otm-errors.log

# Filter for a specific task
tail -f otm-events.log | grep "T-00042"

# Filter for a specific error code
tail -f otm-errors.log | grep "ERR-03"

📌 The log files are append-only mirrors — the database remains the source of truth for queries and analysis. The files exist purely for human monitoring convenience.


10. Database Design (SQLite)

10.1 Overview

The OTM uses two separate SQLite databases:

  1. Main OTM DB (otm.db) — Agent registry, SURE pending, task history. Core operational state.
  2. Error & Event DB (otm-errors.db) — Event log, error statistics. Monitoring and observability. Separated so that error monitoring is independent from the main OTM application and can be analysed, reset, or rebuilt without affecting operations.

Database files:

  • {OPENCLAW_DATA_DIR}/otm/otm.db — Main OTM DB
  • {OPENCLAW_DATA_DIR}/otm/otm-errors.db — Error & Event DB
  • (Sylvain to confirm exact OPENCLAW_DATA_DIR path)

Library: better-sqlite3 (synchronous, fast, WAL mode for both)

10.2 Tables

Main OTM DB (otm.db):

Table Purpose Writer(s) Reader(s) Rows (steady state) Growth rate
agents Agent registry (§4.5) OTM-1 OTM-0, OTM-2, OTM-3, OTM-5, OTM-6 5–15 Near-zero (new agents rare)
task_history Snapshot of archived tasks (§Flow 5) OTM-6 (TT-12) Admin queries, reporting Growing ~20–50/month
sure_pending Outstanding SURE notifications awaiting ack (§6) OTM-2, OTM ack handler OTM-7 (timeout check) 0–5 Transient (cleared on ack/timeout)

Error & Event DB (otm-errors.db):

Table Purpose Writer(s) Reader(s) Rows (steady state) Growth rate
event_log Structured event log (§9.1) All OTM handlers OTM-7, admin queries ~10,000 ~300/day (purged monthly)
event_log_archive Archived events >30 days (§9.4) Maintenance job Admin queries only ~100,000 ~9,000/month
error_stats Error frequency counters (§8.3) OTM-7 OTM-7, daily report 12 rows (one per ERR-xx) Fixed

10.3 Full Schema

Main OTM DB (otm.db):

-- Agent Registry (see §4.5 for column details)
CREATE TABLE agents (
  slack_user_id TEXT PRIMARY KEY,
  otm_display_name TEXT NOT NULL,
  openclaw_agent_id TEXT,
  agent_type TEXT NOT NULL DEFAULT 'ai',
  status TEXT DEFAULT 'idle',
  current_task TEXT,
  task_started_at INTEGER,
  last_seen INTEGER
);

-- Task History (archived tasks snapshot)
CREATE TABLE task_history (
  id INTEGER PRIMARY KEY AUTOINCREMENT,
  task_id TEXT NOT NULL,               -- Original Slack List item ID
  title TEXT NOT NULL,
  assigned_to TEXT,                     -- slack_user_id
  final_status TEXT NOT NULL,          -- "Archived" (from Done or Cancelled)
  previous_status TEXT,                -- Status before archival
  priority INTEGER,
  context TEXT,
  subtask_count INTEGER,               -- Total subtasks at archival time
  created_at INTEGER,                  -- Task creation timestamp
  assigned_at INTEGER,
  completed_at INTEGER,
  validated_at INTEGER,
  cancelled_at INTEGER,
  archived_at INTEGER NOT NULL,        -- When TT-12 executed
  result_summary TEXT,
  total_duration_ms INTEGER,           -- assigned_at → completed_at
  review_duration_ms INTEGER,          -- completed_at → validated_at
  rework_count INTEGER DEFAULT 0,      -- Number of TT-08 rework cycles
  error_count INTEGER DEFAULT 0        -- Number of ERR-xx events during lifecycle
);

CREATE INDEX idx_task_history_agent ON task_history(assigned_to);
CREATE INDEX idx_task_history_status ON task_history(final_status);
CREATE INDEX idx_task_history_archived ON task_history(archived_at);

-- SURE Pending Notifications (see §6)
CREATE TABLE sure_pending (
  task_id TEXT PRIMARY KEY,
  agent_id TEXT NOT NULL,
  notification_type TEXT NOT NULL,     -- 'task_assigned' | 'rework_assigned'
  sent_at INTEGER NOT NULL,            -- First OE-01 sent
  retry_count INTEGER DEFAULT 0,       -- 0, 1, 2, 3 (max)
  last_retry_at INTEGER,
  acknowledged_at INTEGER              -- Set when IE-03 received. NULL = still pending.
);

Error & Event DB (otm-errors.db):

-- Event Log (see §9.1)
CREATE TABLE event_log (
  id INTEGER PRIMARY KEY AUTOINCREMENT,
  timestamp INTEGER NOT NULL,
  event_type TEXT NOT NULL,
  event_code TEXT NOT NULL,
  task_id TEXT,
  agent_id TEXT,
  handler TEXT,
  source TEXT NOT NULL,
  detail TEXT,
  duration_ms INTEGER,
  success INTEGER DEFAULT 1,
  error_message TEXT
);

CREATE INDEX idx_event_log_task ON event_log(task_id);
CREATE INDEX idx_event_log_agent ON event_log(agent_id);
CREATE INDEX idx_event_log_type ON event_log(event_type, timestamp);
CREATE INDEX idx_event_log_code ON event_log(event_code, timestamp);
CREATE INDEX idx_event_log_time ON event_log(timestamp);

-- Event Log Archive (identical schema)
CREATE TABLE event_log_archive (
  id INTEGER PRIMARY KEY,
  timestamp INTEGER NOT NULL,
  event_type TEXT NOT NULL,
  event_code TEXT NOT NULL,
  task_id TEXT,
  agent_id TEXT,
  handler TEXT,
  source TEXT NOT NULL,
  detail TEXT,
  duration_ms INTEGER,
  success INTEGER DEFAULT 1,
  error_message TEXT
);

-- Error Statistics (see §8.3)
CREATE TABLE error_stats (
  error_code TEXT PRIMARY KEY,         -- ERR-01 through ERR-12
  total_count INTEGER DEFAULT 0,
  last_24h_count INTEGER DEFAULT 0,    -- Reset daily by maintenance job
  first_seen INTEGER,                  -- Unix timestamp
  last_seen INTEGER,                   -- Unix timestamp
  auto_corrected_count INTEGER DEFAULT 0,
  escalated_count INTEGER DEFAULT 0
);

10.4 Database Maintenance

Operation Frequency Triggered by Description
Event log rotation Daily (03:00) OTM-6 watchdog + time check Move events >30 days to event_log_archive
Archive export Monthly (1st, 03:30) OTM-6 watchdog + date check Dump event_log_archive to gzipped SQL file on disk, then TRUNCATE
Error stats reset Daily (00:00) OTM-6 watchdog + time check Reset last_24h_count to 0 for all ERR-xx rows
SURE cleanup Every 60s OTM-7 error monitor Remove sure_pending rows where acknowledged_at IS NOT NULL and >1 hour old
VACUUM Weekly (Sunday 03:00) OTM-6 watchdog + day check Reclaim disk space after deletions
WAL checkpoint Automatic SQLite WAL mode Handled by better-sqlite3 automatically
Backup Daily (04:00) Sylvain's backup cron Copy otm.db to backup location (standard server backup)

10.5 Volume Estimates

Assumptions: 5 active agents, ~10 tasks created/day, ~3 subtasks/task average, watchdog runs 1,440×/day.

Main OTM DB (otm.db):

Table Writes/day Reads/day Steady-state rows Disk (est.)
agents ~10 (status flips) ~500 (every handler checks registry) 5–15 <1 KB
task_history ~1–2 (archival events) ~5 (reporting queries) ~500/year ~500 KB
sure_pending ~20 (insert + update on ack) ~1,440 (timeout checks) 0–5 (transient) <1 KB
Subtotal ~30/day ~1,950/day ~520 <1 MB

Error & Event DB (otm-errors.db):

Table Writes/day Reads/day Steady-state rows Disk (est.)
event_log ~300 (all events) ~50 (error monitor + queries) ~9,000 (30-day window) ~5 MB
event_log_archive ~9,000/month (from rotation) ~5/month (admin queries) ~100,000 (1-year window) ~50 MB
error_stats ~20 (counter increments) ~1,440 (every watchdog cycle) 12 <1 KB
Subtotal ~320/day ~1,500/day ~109,000 ~55 MB

📌 At this scale, SQLite is well within its performance envelope for both databases. The separation means the error DB can be independently analysed, reset, or rebuilt without affecting OTM operations. A weekly VACUUM on each keeps files compact.

10.6 Historicisation Strategy

Main OTM DB (otm.db)
  ├── agents              — live state, small, never archived
  ├── sure_pending        — transient, cleaned hourly
  └── task_history        — growing archive of completed tasks

Error & Event DB (otm-errors.db)
  ├── event_log           — rolling 30-day window
  ├── event_log_archive   — rolling 1-year window
  └── error_stats         — cumulative counters, never purged

Filesystem log files (otm/logs/)
  ├── otm-events.log      — real-time event mirror (rotated daily, 30-day keep)
  └── otm-errors.log      — real-time error mirror (rotated daily, 30-day keep)

Monthly export (filesystem)
  └── {OPENCLAW_DATA_DIR}/otm/archive/
      ├── events-2026-01.sql.gz    — monthly event log dump from otm-errors.db
      ├── events-2026-02.sql.gz
      └── ...

Annual report (generated)
  └── Aggregate stats from task_history (otm.db) + error_stats (otm-errors.db)
      → Feeds into CMMI metrics collection

Non-Functional Requirements

Idempotency

  • All OTM handlers MUST be idempotent
  • Subtask completion: check todo_completed (Col00) field before processing (no separate dedup table)
  • State transitions: verify previous_status matches expected before applying
  • File pipeline dedup: injector checks existing Slack items by Task ID before creating

State Integrity

  • previous_status MUST be set before every status change
  • All status writes go through OTM → SW-1. No direct Slack writes by any actor.
  • Watchdog requests transitions via OTM handler calls, not direct writes
  • todo_completed (Col00) MUST be set alongside status when marking items done

Audit Trail

  • Every OTM event logged in Slack task conversation feed with timestamp (§7)
  • Slack conversation IS the audit log — no separate log table
  • All log entries include event code (IE-xx, OE-xx, TT-xx, AT-xx) for traceability

Persistence

  • Agent registry in SQLite
  • SQLite DB location: OpenClaw server data directory
  • Startup reconciliation from openclaw.json + Slack List on OTM restart (§4.6)

Latency

  • Event handling MUST complete within 5 seconds
  • Slack API writes SHOULD complete within 5 seconds
  • Agent notifications SHOULD be sent within 10 seconds
  • SURE acknowledgement timeout: 1 min (first), 2 min (retry), then error (3 min total)
  • Dispatcher runs every 2 min — maximum 2 min delay from task creation to agent trigger
  • Completion detector runs every 2 min — maximum 2 min delay from last subtask to agent_done

Error Handling

  • Slack API calls: retry up to 3 times with exponential backoff
  • Unhandled errors: alert admin via OE-05
  • Tasks MUST NOT be silently lost
  • All errors logged in task conversation feed (OE-07)
  • Dispatcher crash: Slack assigned status is the sole dedup gate (§System 3)
  • Gateway idempotencyKey does NOT provide real idempotency — dispatcher owns dedup

Security

  • OTM-4 (validate/reject) restricted to registered reviewer agents
  • All Slack API calls authenticated via bot tokens
  • DMZ relay uses bearer token + constant-time comparison (§System 6)
  • Receiver bound to 127.0.0.1 only

Technology Stack

Component Technology Owner (who runs it) Activity frequency Data volume
OTM backend TypeScript/Node.js, OpenClaw plugin pipeline Devdas (builds), Sylvain (deploys) Continuous — handles all events ~350 events/day processed
File pipeline Bash scripts + Node.js (injector, dispatcher, detector) Devdas (builds), Sylvain (deploys) launchd: watcher (FSEvents), sweeper (10 min), dispatcher (2 min), detector (2 min) ~10 tasks/day through pipeline
SQLite DB better-sqlite3, WAL mode OTM (sole writer), Sylvain (backups) ~350 writes/day, ~3,500 reads/day ~55 MB steady state (see §10.5)
SE-1 (event listener) Slack Events API, socket mode, Bolt SDK Salvatore's Slack app (lists:read) ~50 events/day (Slack → OTM) <1 KB/event payload
SW-1 (writer) Slack Web API (lists:write) Salvatore's Slack app (called by OTM) ~200 API calls/day (field updates + audit posts) <1 KB/call
OTM API OpenClaw hooks / HTTP endpoints OTM (receives), Orchestrator + Agents (call) ~100 API calls/day <1 KB/call
Agent notifications OpenClaw Gateway WS RPC (AI) / Slack task conversation (Human) OTM (sends), Agents (receive) ~20 notifications/day <1 KB/notification
Watchdog + Error Monitor OpenClaw cron (60s interval) OTM-6 + OTM-7 (automatic) 1,440 cycles/day ~20 error checks/cycle
Event logging SQLite event_log table (see §9) OTM (writes), Admin (queries) ~300 events/day, 30-day active window ~5 MB active, ~50 MB archive
DMZ relay Node.js receiver + broadcaster on Synology NAS Sylvain (deploys) On every state change <1 KB/push
Testing Vitest, mock Slack API Devdas (writes + runs) CI on every PR

Component Ownership Map

┌─────────────────────────────────────────────────────┐
│ Slack (Salvatore's Slack App)                       │
│   SE-1: lists:read (event listener)                 │
│   SW-1: lists:write (field updates + audit posts)   │
└──────────────┬──────────────────────────┬───────────┘
               │ IE-01                    ▲ OE-06, OE-07
               ▼                          │
┌─────────────────────────────────────────────────────┐
│ OTM (OpenClaw Plugin Pipeline)                      │
│   OTM-0: Event Router (internal)                    │
│   OTM-1: Agent Registry (internal)                  │
│   OTM-2: Handle Task Assigned                       │
│   OTM-3: Handle Subtask Done                        │
│   OTM-4: Task Validate/Reject                       │
│   OTM-5: Handle Task Cancelled                      │
│   OTM-6: Watchdog (cron, 60s)                       │
│   OTM-7: Error Monitor (cron, 60s)                  │
│                                                     │
│   SQLite DB: agents, event_log, error_stats,        │
│              task_history, sure_pending              │
└──────────────┬──────────────────────────┬───────────┘
               │ OE-01, OE-02, OE-04     ▲ IE-02, IE-03, IE-04–IE-07
               ▼                          │
┌─────────────────────────────────────────────────────┐
│ OpenClaw Agents                                     │
│   Orchestrator (Claudia): IE-04, IE-05, IE-06, IE-07│
│   Agents (Devdas, etc.): IE-02, IE-03              │
│   Human (Rupert): via Slack UI → SE-1 → IE-01      │
└─────────────────────────────────────────────────────┘

File Pipeline (Part 3):
┌─────────────────────────────────────────────────────┐
│ otm-create-task.sh → new-tasks/ → otm-injector.js  │
│   → Slack List (status=new)                         │
│   → otm-dispatcher.js → task-dispatch.json          │
│   → Gateway WS RPC → Agent sessions (parallel)      │
│   → otm-update-task.sh → task-updates/             │
│   → otm-completion-detector.js → agent_done        │
└─────────────────────────────────────────────────────┘

11. Cost Analysis

11.1 OTM Infrastructure Cost

Component Cost Notes
Slack Pro 1 user license Already paid. Gives API access (SE-1 + SW-1). No per-API-call cost.
SQLite $0 Open source, embedded. No server, no license.
Node.js / TypeScript $0 Open source runtime.
Bolt SDK $0 Open source Slack SDK.
better-sqlite3 $0 Open source library.
OpenClaw $0 (incremental) OTM runs as a plugin inside the existing gateway. No additional instance.
Filesystem logging $0 Append to local files.
DMZ relay $0 (incremental) Runs on existing Synology NAS.
Total OTM cost $0 incremental Only pre-existing Slack Pro license required.

11.2 AI Usage by the OTM

The OTM uses zero AI. It is a deterministic state machine implemented in TypeScript. No LLM calls, no embeddings, no inference. Every decision is rule-based:

  • Routing: field comparison (OTM-0)
  • Agent availability: SQLite lookup (OTM-1)
  • State transitions: precondition checks + status writes (OTM-2 through OTM-5)
  • Error detection: SQL queries against known patterns (OTM-7)
  • Watchdog: timer + threshold checks (OTM-6)
  • File pipeline: filesystem watches + Slack API calls (Systems 1–4)

Token consumption by OTM: 0 tokens.

11.3 AI Usage by Actors (Outside OTM)

The actors that interact with the OTM do consume AI tokens, but this is outside the OTM's scope:

Actor AI usage OTM interaction cost
Orchestrator (Claudia) LLM calls for task planning, review, rework design OTM API calls = HTTP requests, ~0 tokens
Agents (Devdas, etc.) LLM calls for task execution OTM API calls (IE-02, IE-03) = HTTP requests, ~0 tokens
Human (Rupert) None (uses Slack UI) Slack events = Slack infrastructure, ~0 tokens

📌 The OTM API calls (IE-02 through IE-07) are simple HTTP POST requests with JSON payloads. They consume zero AI tokens. The only AI costs are generated by the agents and orchestrator doing their actual work — which they would do regardless of whether the OTM exists.

11.4 Cost Summary

OTM operation cost:     $0/month (zero AI, zero external services)
Slack API cost:         $0/month (included in existing Pro plan)
Infrastructure cost:    $0/month (runs on existing OpenClaw server + Synology NAS)
──────────────────────────────────────────────────────
Total incremental cost: $0/month

12. Project Deliverables

# Deliverable Owner Description
D-01 OTM-SPEC (this document) Claudia Specification and architecture
D-02 OTM-TESTS Claudia Test scenarios document
D-03 OTM implementation Devdas TypeScript plugin pipeline (OTM-0 through OTM-7, SE-1, SW-1)
D-04 SQLite schemas + migrations Devdas otm.db and otm-errors.db setup
D-05 Unit + integration tests Devdas Vitest test suite matching OTM-TESTS scenarios
D-06 Infrastructure setup Sylvain DB paths, cron config, backup setup, log rotation
D-07 task-orchestration skill Claudia OpenClaw skill for Claudia's Orchestrator role: task creation, project → step → task decomposition, assignment logic, validation/rejection, rework subtask design. This skill encodes the Orchestrator's side of the OTM protocol.
D-08 Slack app config Salvatore lists:read + lists:write scopes, socket mode setup
D-09 End-to-end validation Claudia + Devdas Full test suite execution on real Slack workspace
D-10 File pipeline scripts Devdas otm-create-task.sh, otm-update-task.sh, otm-injector.js, otm-dispatcher.js, otm-completion-detector.js
D-11 DMZ relay deployment Sylvain receiver.js + broadcaster.js on Synology NAS, TLS proxy setup
D-12 launchd plists Sylvain Watcher, sweeper, dispatcher, completion detector plists

📌 D-07 (task-orchestration skill) will include Rupert's higher-level instructions on how to break down projects into steps and steps into tasks. It is part of the scope of the full-blown validation tests (D-09).


Open Questions

# Question Status
1 Exact list_item_updated event payload schema Needs Salvatore to capture sample events
2 Can socket mode receive list events on Pro? Needs verification (may need Events API HTTP mode)
3 Plugin pipeline registration mechanism in OpenClaw Needs Devdas to investigate
4 SQLite file location on OpenClaw server Sylvain to decide
5 How agents tick subtasks in practice Resolved in v1.3: Agents call OTM API (IE-02), OTM updates Slack via SW-1. No direct Slack UI interaction.
6 Slack conversation API for List items — does it exist? Needs Salvatore to verify (may need workaround)
7 SURE ack timeout values and gateway restart handling Resolved in v1.5: Timeouts revised to 1min + 2min + error (3min total). Gateway restart handling defined in §4.7: OTM reconciles from Slack List + openclaw.json on startup, detects SURE timeouts, auto-corrects agent-task mismatches. Gateway restart logged as system event (IE-SYS-01). Orchestrator does not need to re-register agents.
8 OpenClaw agent → Slack user ID mapping in openclaw.json Needs Sylvain to confirm config structure
9 Human user registration protocol Deferred. v1 hard-codes Rupert + Claudia (§4.6). Future: how are new human users registered? Auto-detect from Slack assigned_to? Manual admin command? Re-registration after OTM restart? What about clients?
10 Slack archive API — can items be archived programmatically? Resolved in v1.6: slackLists.items.archive does not exist. Archiving is manual-only via Slack UI. OTM sets status = archived but cannot trigger visual Slack archival.
11 Gateway idempotencyKey — does it prevent duplicate sessions? Resolved in v1.8: No. The key is used as a runId label only. Duplicate calls with the same key = duplicate sessions. Dispatcher must use the Slack new → assigned status flip as the sole dedup mechanism.

Deprecated Items

Item ID/Flag Notes
Project (old select column) Col0AL4UJ8BJ8 Replaced by Project 2 text column (Col0ALZBS9C8Z) — 2026-03-14
Types: implementation, research, etc. Removed — only action / decision / review are active
--description parameter Removed from otm-create-task.sh — use --subtask instead
--creator parameter Removed — merged with --agent
--assignedTo parameter Removed — merged with --agent
TT-09 (Rejected → New) Removed in v1.3 — use TT-10 (cancel) + new task instead

End of Specification — v1.8 — OpenClaw Task Manager

Last updated: 2026-03-14

OTM-SPEC v1.8 — OpenClaw Task Manager Specification

Based on: Rupert's OTM Spec v1.0 (2026-03-12) Updated by: Claudia (2026-03-12–14) Merged by: Claudia (2026-03-14) — consolidates OTM-SPEC v1.5 + OTM-FIELD-MAPPING v1.8


Changelog

Note: v1.0–v1.5 changes were tracked in OTM-SPEC. v1.0–v1.8 field mapping changes were tracked in a separate OTM-FIELD-MAPPING document. Both changelogs are merged here.

Version Date Source Changes
v1.0 2026-03-12 OTM-SPEC Initial specification: state machine, SURE protocol, SE-1/SW-1, OTM-0 through OTM-6, actor registry, audit trail
v1.0 2026-03-14 OTM-FIELD-MAPPING Initial field mapping: script params, JSON format, Slack field mapping
v1.1 2026-03-12 OTM-SPEC Added rework flow (TT-08), Rejected state, Orchestrator manages subtasks
v1.1 2026-03-14 OTM-FIELD-MAPPING Added subtask support, task ID system (T-NNNNN)
v1.2 2026-03-12 OTM-SPEC Added Watchdog (OTM-6), archival (TT-12), Pending state (TT-03/TT-05)
v1.2 2026-03-14 OTM-FIELD-MAPPING Added pipeline flow diagram, directories
v1.3 2026-03-12 OTM-SPEC Removed TT-09, added Priority Scale, human actor type. Resolved Q5: agents call OTM API for subtask completion (not Slack directly).
v1.3 2026-03-14 OTM-FIELD-MAPPING Added completion detection (System 4)
v1.4 2026-03-12 OTM-SPEC Added Error Monitor (OTM-7), error catalogue (ERR-01 through ERR-12), dual-DB design
v1.4 2026-03-14 OTM-FIELD-MAPPING Added component heartbeats (System 5)
v1.5 2026-03-12 OTM-SPEC Human notifications via Slack task conversation (not DM). Error reports daily, threshold=1. Error monitoring in separate otm-errors.db. Only Orchestrator creates/deletes tasks. subtasks_remaining = count of unfinished subtasks. Added §12 Cost Analysis. SURE timeouts: 1+2min. Gateway restart detection (§4.7). Log file mirroring. task-orchestration skill added as deliverable.
v1.5 2026-03-14 OTM-FIELD-MAPPING Default confirmation subtask; deprecated fields cleanup
v1.6 2026-03-14 OTM-FIELD-MAPPING Added DMZ relay architecture (System 6), todo_completed (Col00) documentation, project label fixes
v1.7 2026-03-14 OTM-FIELD-MAPPING Added task dispatcher (System 3), task updates (System 3b), full architecture diagram, completion metrics, status lifecycle
v1.8 2026-03-14 OTM-FIELD-MAPPING Dispatcher triggers agents via Gateway WebSocket RPC (parallel, non-blocking); crash-safe operation order (file → Slack → RPC); idempotencyKey is NOT idempotent (design flaw documented); todo_completed checkbox (Col00) documentation; injector idempotency (dedup check); Slack archive API limitation noted
v1.8 2026-03-14 MERGED Consolidated OTM-SPEC v1.5 + OTM-FIELD-MAPPING v1.8 into single unified specification. Task ID format standardized to T-NNNNN (v1.8 implementation). Added Part 3 (File-Based Pipeline). All Slack column IDs included.

1. Purpose & Scope

The OpenClaw Task Manager (OTM) orchestrates the execution of tasks by AI and human agents. It uses Slack Lists as the task board, the Slack Events API as the event bus for human-originated changes, and the OTM API as the interface for AI actors.

The system is split into three cooperating layers:

  1. Slack Event Layer (SE-1) — Slack Events API (list_item_updated) detected by our Slack app in socket mode. Zero intelligence. Forwards raw events to a single OTM entry point. Requires only lists:read scope.
  2. Slack Write Layer (SW-1) — Handles all writes to Slack: task field updates and conversation feed audit entries. Requires lists:write scope. Called only by the OTM.
  3. OpenClaw Task Manager (OTM) — Authoritative backend implemented as an OpenClaw plugin pipeline. Owns all state, routing logic, agent registry, and business rules. Persists to SQLite. Sole component with the ability to change task status, agent status, and counter fields.

📌 Design principles:

  • ALL behaviour lives in the OTM. No business logic in SE-1 or SW-1.
  • The OTM is the sole writer of task status and related fields. No actor writes status directly.
  • Only the Orchestrator creates tasks and subtasks. The OTM never creates or deletes tasks — it manages their lifecycle after creation. The Orchestrator also manages subtask lists (creates, deletes, keeps completed) during rework flows; the OTM processes the resulting state changes.
  • ALL events are logged in the Slack task conversation feed with timestamps (§7).
  • Agent notifications use the SURE protocol: request + mandatory acknowledgement (§6).

1.1 Actors

Actor Current holder Type Role
Orchestrator Claudia AI Creates tasks, sets assignments, validates completed work — always via OTM API
Agent Devdas, Salvatore, etc. AI Executes tasks, reports progress directly to OTM API
Human Rupert, clients Human Creates/edits tasks in Slack UI; changes detected by SE-1 and forwarded to OTM
OTM (system) System Authoritative state machine, sole writer of all status and counter fields
Watchdog OTM-6 cron System Recovery cron — detects anomalies, requests OTM to execute corrective transitions

📌 Humans are identified by their Slack user ID. AI agents are identified by both their Slack user ID and their OpenClaw agent ID. Both types are managed in the same agent registry (§4.5).


1.2 System Architecture Overview

The OTM is composed of six cooperating systems, from task creation to dashboard visibility:

┌─────────────────────────────────────────────────────────────────────────────────┐
│                         OTM — One-Time Mission System                          │
│                                                                                │
│  ┌──────────────────────────────────────────────────────────────────────────┐   │
│  │ SYSTEM 1: Task Creation                                                  │   │
│  │                                                                          │   │
│  │  Claudia (orchestrator)                                                  │   │
│  │       │                                                                  │   │
│  │       ▼                                                                  │   │
│  │  otm-create-task.sh ──► JSON file ──► ~/…/otm/new-tasks/                │   │
│  │       │                                                                  │   │
│  │       ▼                                                                  │   │
│  │  next-task-id.json (T-NNNNN counter, flock)                             │   │
│  └──────────────────────────────────────────────────────────────────────────┘   │
│       │                                                                        │
│       ▼                                                                        │
│  ┌──────────────────────────────────────────────────────────────────────────┐   │
│  │ SYSTEM 2: Task Injection                                                 │   │
│  │                                                                          │   │
│  │  ai.openclaw.otm-watcher (WatchPaths)                                   │   │
│  │  ai.openclaw.otm-sweeper (every 10 min)                                 │   │
│  │       │                                                                  │   │
│  │       ▼                                                                  │   │
│  │  otm-injector.js ──► Slack Lists API ──► Rapido Task Campaign           │   │
│  │       │                (slackLists.items.create + subtasks)              │   │
│  │       ▼                                                                  │   │
│  │  processed/ or failed/                                                   │   │
│  └──────────────────────────────────────────────────────────────────────────┘   │
│       │                                                                        │
│       ▼  (task exists in Slack with status=new)                                │
│  ┌──────────────────────────────────────────────────────────────────────────┐   │
│  │ SYSTEM 3: Task Dispatcher                                                │   │
│  │                                                                          │   │
│  │  ai.openclaw.otm-dispatcher (every 2 min)                               │   │
│  │       │                                                                  │   │
│  │       ├──► Scan: status=new + assignee set                              │   │
│  │       ├──► Write task-dispatch.json to agent workspace                  │   │
│  │       ├──► Trigger agent via Gateway WS RPC (parallel)                  │   │
│  │       └──► Update Slack: new → assigned + set assigned_at               │   │
│  └──────────────────────────────────────────────────────────────────────────┘   │
│       │                                                                        │
│       ▼  (agent session starts, picks up task, works, reports progress)        │
│  ┌──────────────────────────────────────────────────────────────────────────┐   │
│  │ SYSTEM 3b: Task Updates (agent → Slack feedback)                         │   │
│  │                                                                          │   │
│  │  otm-update-task.sh ──► JSON ──► ~/…/otm/task-updates/                  │   │
│  │       │                                                                  │   │
│  │       ▼                                                                  │   │
│  │  otm-injector.js ──► Slack Lists API (update status, subtask done)      │   │
│  └──────────────────────────────────────────────────────────────────────────┘   │
│       │                                                                        │
│       ▼  (all subtasks done → auto-promote)                                    │
│  ┌──────────────────────────────────────────────────────────────────────────┐   │
│  │ SYSTEM 4: Completion Detection                                           │   │
│  │                                                                          │   │
│  │  ai.openclaw.otm-completion-detector (every 2 min)                      │   │
│  │       │                                                                  │   │
│  │       ├──► Scan: status=in_progress + all subtasks done                 │   │
│  │       ├──► Update: completion % + subtasks_remaining                    │   │
│  │       └──► Promote: in_progress → agent_done                           │   │
│  └──────────────────────────────────────────────────────────────────────────┘   │
│       │                                                                        │
│       ▼  (Claudia validates → done)                                            │
│  ┌──────────────────────────────────────────────────────────────────────────┐   │
│  │ SYSTEM 5: Component Heartbeats                                           │   │
│  │                                                                          │   │
│  │  Each component writes *-state.json after every run                     │   │
│  │       │                                                                  │   │
│  │       ▼                                                                  │   │
│  │  Collector (FSEvents) → SQLite → Reader (WebSocket) → Dashboard         │   │
│  └──────────────────────────────────────────────────────────────────────────┘   │
│                                                                                │
│  ┌──────────────────────────────────────────────────────────────────────────┐   │
│  │ SYSTEM 6: DMZ Relay                                                      │   │
│  │                                                                          │   │
│  │  Collector ──HTTP POST──► Synology Receiver (127.0.0.1:3456)            │   │
│  │                                 │                                        │   │
│  │                                 ▼                                        │   │
│  │                           fab-state.json                                 │   │
│  │                                 │                                        │   │
│  │                                 ▼                                        │   │
│  │                      Broadcaster (0.0.0.0:3457)                         │   │
│  │                           │            │                                 │   │
│  │                    WSS /ws        GET /api/state                         │   │
│  │                           │            │                                 │   │
│  │                           ▼            ▼                                 │   │
│  │                    Vercel Dashboard (browser)                            │   │
│  └──────────────────────────────────────────────────────────────────────────┘   │
│                                                                                │
│  ┌──────────────────────────────────────────────────────────────────────────┐   │
│  │ Task Lifecycle                                                           │   │
│  │                                                                          │   │
│  │  new ──► assigned ──► in_progress ──► agent_done ──► done               │   │
│  │   │         │              │              │                              │   │
│  │   │  (dispatcher)  (agent starts)  (completion     (Claudia            │   │
│  │   │                                  detector)       validates)          │   │
│  │   │                                                                      │   │
│  │   └──► blocked (can happen at any stage)                                │   │
│  └──────────────────────────────────────────────────────────────────────────┘   │
└─────────────────────────────────────────────────────────────────────────────────┘

2. Task Data Model

Each task is a top-level item in the Slack Task Board List. Subtasks are child items (linked via parent_item_id). The OTM is the sole writer of status and counter fields.

Slack List: F0ALE8DCW1F (Rapido Task Campaign v2), workspace: rapidocloud.slack.com

2.1 Task Item Fields

Field Type Writer Description
title text Orchestrator Task name
task_id text OTM Unique ID (T-NNNNN format — 5 digits, zero-padded)
assigned_to person Orchestrator Slack user ID of assigned agent (AI or Human)
status select OTM Current task state (see §3). OTM is sole writer — no exceptions
previous_status select OTM Status before the last transition. Critical for failure analysis
priority number Orchestrator 0=Critical … 4=Batchable (see §3.4)
context select Orchestrator project, research, operations, support, internal
subtasks_remaining number OTM Decremented counter, NOT live count
assigned_at datetime OTM When agent started work
completed_at datetime OTM When all subtasks done
validated_at datetime OTM When Orchestrator validated
result_summary text Agent Deliverables/output description
input_files text Orchestrator Links to input resources

2.2 Subtask Item Fields

Field Type Writer Description
title text Orchestrator Subtask description
todo_completed checkbox OTM (via SW-1) Built-in Slack Lists checkbox (Col00). Ticked when subtask is done. Must be set alongside status when marking items as done (see below).
parent_item_id reference System Links to parent task

📌 subtasks_remaining on the parent task is the canonical completion signal — not a live count.

📌 previous_status is set by the OTM on every transition. It enables post-mortem analysis when a task enters Failed state.

📌 Agents do NOT tick checkboxes directly in Slack. They report subtask completion to the OTM API (IE-02). The OTM then updates Slack via SW-1 — setting both the item status column and Col00 (checkbox).

todo_completed (Col00) Checkbox

The todo_completed field (Col00) is the built-in Slack Lists checkbox. It drives the visual checkmark ✅ in the Slack UI. Setting only the Status column to done does NOT check the box — both must be set explicitly.

Scenario Action
Subtask marked done Set Col00: checkbox: true on the subtask
Parent task marked done Set Col00: checkbox: true on the parent
Parent task at agent_done or in_progress Do NOT set checkbox (task isn't finished yet)

⚠️ Archive limitation: Slack Lists has a UI "Archive item" action, but slackLists.items.archive does not exist as an API method. Archiving is manual-only via the Slack UI. The OTM sets status to archived via the pipeline, but the actual Slack archive action cannot be automated.

2.3 JSON → Slack Field Mapping

This table is operationally critical — it maps the JSON task format (used in the file pipeline) to Slack column IDs.

JSON Field Slack Column Column ID Slack Type Notes
title Title Col0AKKTBJJKZ rich_text Clean one-liner only
taskId Task ID Col0ALVK2NA1E rich_text Format: T-NNNNN (5 digits, zero-padded)
type Type Col0AKUV4BF6F select action | decision | review
agent Assignee Col0AKZ9G5UAJ select Covers both agents and humans
project Project 2 Col0ALZBS9C8Z rich_text Free-text (migrated from select 2026-03-14)
priority Priority Col0ALE8DKWPK select See §3.4 priority mapping
(auto) Status Col0AL1B4UVLJ select Always set to new on creation
subtasks[] (child items) parent_item_id Each entry → child item with title + status new

Slack Built-in Fields:

Field Column ID Type Notes
todo_completed Col00 checkbox Built-in Slack Lists checkbox. Must be set explicitly alongside status.

Fields NOT mapped to Slack columns (metadata only):

JSON Field Purpose
id UUID for file tracking / idempotency
createdAt Timestamp, implicit in Slack item creation
status Internal pipeline status (pending → processed)

Deprecated Slack Columns:

Item Column ID Notes
Project (old select column) Col0AL4UJ8BJ8 Replaced by Project 2 text column (2026-03-14)

3. Task State Machine

The status field follows these transitions. The OTM is the sole writer — all transitions are executed by the OTM, regardless of which actor requested them.

3.1 Task States

Status Description
New Task created, not yet assigned
Assigned Orchestrator has set an assignee; OTM evaluating agent availability
Pending Agent is busy; task queued silently (no notification)
In Progress Agent is actively working (SURE acknowledgement received)
Agent Done All subtasks complete; awaiting Orchestrator review
Done Orchestrator validated the work
Rejected Orchestrator rejected; Orchestrator preparing rework subtasks
Failed Unrecoverable error during execution
Cancelled Task no longer needed; removed from active work
Archived Terminal state; auto-moved 7 days after Done/Cancelled

3.2 Task State Transition Diagram

                    Orchestrator creates task
                              |
                           [New]
                              |
              TT-01: Orch requests assignment → OTM executes
                              |
                         [Assigned]
                        /          \
                 TT-02: OTM      TT-03: OTM
              (IE-01 + agent   (IE-01 + agent
               idle in reg)    busy in reg)
                      |              |
               [In Progress]    [Pending]
               (after SURE ack)      |
                      |         TT-05: OTM promotes
         TT-04: OTM receives    (IE-09 + pending
         IE-02 subtask reports    task found)
                      |              |
           subtasks_remaining=0      |
                      |              |
                [Agent Done] <------/
                   /    \
          TT-06: Orch  TT-07: Orch
          validates     rejects
          (IE-04)       (IE-05)
                |           |
             [Done]    [Rejected]
               |        /      \
         TT-12: OTM  TT-08    TT-10
         (IE-08+7d)  (IE-06)  (IE-06)
               |       |        |
          [Archived] [Assigned] [Cancelled]
                     (rework)    (drop)

     At any point before Agent Done:
         TT-11: Orch/Human requests cancel (IE-06/IE-01) → OTM executes
         TT-12: Watchdog requests archive (IE-08 + 7d check) → OTM executes
         TT-13: OTM detects error (IE-09) → [Failed]
         TT-14: Orch requests retry (IE-07) → [New]
         TT-15: Orch/Human requests cancel (IE-06/IE-01) → [Cancelled] (from Failed)

3.2.1 Task Transition Action Index

Each task transition is coded TT-xx. All transitions are executed by the OTM. The "Requesting Actor" is who initiates; the OTM validates and applies.

Orchestrator-requested transitions (executed by OTM):

Code From → To Requesting Actor Inbound Event OTM Action Outbound Event
TT-01 New → Assigned Orchestrator IE-01: assigned_to field changed Validate assignment, set status OE-06, OE-07
TT-06 Agent Done → Done Orchestrator IE-04: validate API call Set validated_at, change status OE-06, OE-07, OE-02
TT-07 Agent Done → Rejected Orchestrator IE-05: reject API call Change status, post reason OE-06, OE-07
TT-08 Rejected → Assigned Orchestrator IE-06: rework API call (subtasks already prepared by Orchestrator) Count unfinished subtasks, set counter, change status OE-06, OE-07, OE-01
TT-10 Rejected → Cancelled Orchestrator/Human IE-06: cancel API / IE-01: Human Slack edit Change status OE-06, OE-07
TT-11 Any (pre-Agent Done) → Cancelled Orchestrator/Human IE-06: cancel API / IE-01: Human Slack edit Free agent, change status OE-06, OE-07, OE-04
TT-14 Failed → New Orchestrator IE-07: retry API call Reset task, change status OE-06, OE-07
TT-15 Failed → Cancelled Orchestrator/Human IE-06: cancel API / IE-01: Human Slack edit Change status OE-06, OE-07

OTM-initiated transitions (automated):

Code From → To Inbound Event (trigger) OTM Action Outbound Event
TT-02 Assigned → In Progress IE-01: status changed to Assigned (OTM-0 routes to OTM-2, agent registry query returns Idle) Execute AT-01, write counter, send SURE notification OE-01, OE-06, OE-07
TT-03 Assigned → Pending IE-01: status changed to Assigned (OTM-0 routes to OTM-2, agent registry query returns Busy) Queue task silently OE-06, OE-07
TT-04 In Progress → Agent Done IE-02: agent reports subtask done (OTM-3 decrements counter to 0) Execute AT-02, set completed_at OE-02, OE-06, OE-07
TT-05 Pending → Assigned IE-09: OTM internal — check_next_task_for_agent() finds pending task after AT-02/AT-03/AT-04 Re-evaluate via OTM-2 path OE-06, OE-07
TT-13 In Progress → Failed IE-10: OTM detects agent error (OpenClaw hook timeout >5min / agent crash / unhandled exception reported) Store previous_status, execute AT-04 OE-05, OE-06, OE-07

Watchdog-requested transitions (executed by OTM):

Code From → To Inbound Event (trigger) OTM Action Outbound Event
TT-12 Done/Cancelled → Archived IE-08: watchdog cron tick (OTM-6 checks completed_at or cancelled_at + 7 days < now) Archive task OE-06, OE-07

📌 Cross-reference: See §5 for TT-xx ↔ AT-xx ↔ handler mapping. See §6 for SURE protocol. See §7 for audit trail.

3.3 Task Transition Rules

All transitions are submitted to the OTM which validates preconditions and executes the state change. Every transition produces at minimum OE-06 (Slack field update) and OE-07 (audit log entry).

Code From To Inbound Event OTM Action Outbound Event
TT-01 New Assigned IE-01: assigned_to changed on New task Validate agent exists in registry, set status OE-06, OE-07
TT-02 Assigned In Progress IE-01: status=Assigned detected + agent Idle in registry Set agent busy (AT-01), init counter, send SURE task notification OE-01, OE-06, OE-07
TT-03 Assigned Pending IE-01: status=Assigned detected + agent Busy in registry Queue task, no notification OE-06, OE-07
TT-04 In Progress Agent Done IE-02: subtask completion report + counter decrements to 0 Set agent idle (AT-02), set completed_at, notify Orchestrator OE-02, OE-06, OE-07
TT-05 Pending Assigned IE-09: internal check_next_task_for_agent() + pending task found Re-route to OTM-2 (same as TT-01 path) OE-06, OE-07
TT-06 Agent Done Done IE-04: Orchestrator validate call Set validated_at, change status OE-02, OE-06, OE-07
TT-07 Agent Done Rejected IE-05: Orchestrator reject call with reason Change status, log reason OE-06, OE-07
TT-08 Rejected Assigned IE-06: Orchestrator rework call (subtasks already prepared) Count unfinished subtasks, set counter, change status OE-06, OE-07
TT-10 Rejected Cancelled IE-06: Orchestrator cancel call / IE-01: Human status edit Change status OE-06, OE-07
TT-11 New/Assigned/Pending/In Progress Cancelled IE-06: Orchestrator cancel call / IE-01: Human status edit Free agent if applicable (AT-03), change status OE-04, OE-06, OE-07
TT-12 Done/Cancelled Archived IE-08: watchdog tick + 7-day check passes Archive task OE-06, OE-07
TT-13 In Progress Failed IE-10: agent error detected (hook timeout/crash/exception) Store previous_status, free agent (AT-04) OE-05, OE-06, OE-07
TT-14 Failed New IE-07: Orchestrator retry call Reset task fields, change status OE-06, OE-07
TT-15 Failed Cancelled IE-06: Orchestrator cancel call / IE-01: Human status edit Change status OE-06, OE-07

📌 TT-09 (Rejected → New) removed in v1.3. Reassignment is handled by the Orchestrator cancelling the rejected task (TT-10) and creating a new task for a different agent. This simplifies the state machine.

3.4 Priority Scale

Value Label Meaning --priority flag
0 Critical Blocking other work, immediate attention critical
1 High Important, do next high
2 Medium/Normal Normal priority normal (or medium as alias)
3 Low When bandwidth allows low
4 Batchable Large/expensive work, can run async via Batch API batchable

Queue ordering: priority ASC, posted_at ASC (0 = highest priority, FIFO within same priority).


4. Agent State Machine

The OTM maintains agent availability state in the Agent Registry (OTM-1). Agent transitions are coded AT-xx and are distinct from task transitions (TT-xx).

4.1 Agent States

Status Description
Idle Agent is available, not working on any task
Busy Agent is actively working on a task (current_task is set)

4.2 Agent State Transition Diagram

              [Idle]
                |
         AT-01: OTM assigns task
         (triggered by TT-02)
         (SURE notification sent → OE-01)
                |
             [Busy]
                |
         AT-02: task completes (TT-04, IE-02 counter=0)
         AT-03: task cancelled (TT-11, IE-01/IE-06)
         AT-04: task fails (TT-13, IE-10)
                |
             [Idle]
                |
         → OTM calls check_next_task_for_agent() (IE-09)
         → if pending task found: TT-05 → AT-01 again

4.3 Agent Transition Action Index

All agent transitions are executed by the OTM. No actor changes agent status directly.

Code From → To Inbound Event (trigger) OTM Action Outbound Event
AT-01 Idle → Busy IE-01: Assigned event + agent Idle (during OTM-2) Set status=busy, current_task=task_id, task_started_at=now OE-01 (SURE notification), OE-07 (audit)
AT-02 Busy → Idle IE-02: subtask report + counter=0 (during OTM-3) Set status=idle, clear current_task, call check_next_task_for_agent() OE-07 (audit)
AT-03 Busy → Idle IE-01/IE-06: cancellation request (during OTM-5) Set status=idle, clear current_task, call check_next_task_for_agent() OE-04 (cancel notify), OE-07 (audit)
AT-04 Busy → Idle IE-10: agent error detected (during OTM error handler) Set status=idle, clear current_task, call check_next_task_for_agent() OE-05 (admin alert), OE-07 (audit)

4.4 Agent Transition Rules

Code From To Inbound Event OTM Action Outbound Event
AT-01 Idle Busy IE-01 (Assigned event) OTM-2 sets agent busy before sending SURE notification OE-01, OE-07
AT-02 Busy Idle IE-02 (last subtask report) OTM-3 frees agent, promotes next pending task OE-07
AT-03 Busy Idle IE-01/IE-06 (cancellation) OTM-5 frees agent if assigned to cancelled task OE-04, OE-07
AT-04 Busy Idle IE-10 (agent error) OTM error handler frees agent OE-05, OE-07

📌 Agent ↔ Task coupling: Every AT-xx is triggered by a TT-xx. See §5 for the complete bidirectional mapping.

📌 Watchdog note: OTM-6 monitors agent heartbeats (last_seen >2h) but does NOT change agent state. It alerts the admin (OE-05). Only OTM handlers modify agent status.

4.5 Agent Registry Schema

CREATE TABLE agents (
  slack_user_id TEXT PRIMARY KEY,   -- Slack user ID (e.g., "U0AKEB27HNK")
  otm_display_name TEXT NOT NULL,   -- Display name for logs/UI (e.g., "Devdas")
  openclaw_agent_id TEXT,           -- OpenClaw agent ID (e.g., "devdas"). NULL for human agents.
  agent_type TEXT NOT NULL           -- 'ai' | 'human'
      DEFAULT 'ai',
  status TEXT DEFAULT 'idle',       -- 'idle' | 'busy'  (see §4.1)
  current_task TEXT,                -- task item ID or NULL
  task_started_at INTEGER,          -- Unix timestamp or NULL
  last_seen INTEGER                 -- Unix timestamp
);

Registry operations:

  • register_agent(slack_user_id, otm_display_name, openclaw_agent_id, agent_type) — AT startup or on first activity
  • set_busy(slack_user_id, task_id) — AT-01
  • set_idle(slack_user_id) — AT-02, AT-03, AT-04
  • is_busy(slack_user_id) → boolean — checked during TT-02/TT-03
  • get_current_task(slack_user_id) → task_id | null

4.6 Startup Reconciliation (Dynamic Registry)

On OTM startup:

  1. Read OpenClaw config — Query openclaw.json agent configurations to populate AI agent entries automatically. Each configured OpenClaw agent that has a slack_user_id mapping is auto-registered with agent_type = 'ai'.
  2. Reconcile from Slack — Query Slack List for all tasks with status In Progress or Pending. Rebuild current_task and status (busy/idle) from those records.
  3. Human agentsHard-coded for v1. Rupert is pre-seeded in the agent registry at startup with agent_type = 'human', openclaw_agent_id = NULL. Claudia is pre-seeded as the Orchestrator. Dynamic human registration (auto-detect from assigned_to on first interaction) is deferred to a future version.

📌 v1 hard-coded actors:

Slack user ID Display name Type Role
U06K407LVCY Rupert human Task assignee / reviewer
U0AKEB27HNK Claudia ai Orchestrator (sole)

Future versions will define a proper human user registration and re-registration protocol (see Open Question 9).

📌 AI agents are notified via OpenClaw /hooks/agent. Human agents are notified via the Slack task conversation thread (posted by SW-1 as an OE-07 audit entry addressed to the human). The notification channel is determined by agent_type.

4.7 Gateway Restart Detection & OTM Resync

The OTM runs as an OpenClaw plugin pipeline inside the gateway process. Several restart scenarios must be handled:

Scenario A: Gateway restarts (OTM restarts with it)

  • OTM startup reconciliation (§4.6) runs automatically
  • All agent states rebuilt from Slack List + openclaw.json
  • SURE pending notifications checked: any outstanding acks >3 min old → ERR-06
  • System event logged: [timestamp] OTM SYSTEM: Gateway restart detected. Reconciliation complete.
  • Logged to both event_log (in error DB) and otm-events.log file

Scenario B: Gateway restarts but OTM was mid-processing

  • SQLite WAL mode ensures no data corruption on crash
  • On restart, OTM-7 (error monitor) runs within 60s and detects any inconsistencies:
    • ERR-02/ERR-03: Agent-task mismatches from interrupted transitions
    • ERR-08: Tasks stuck in Assigned from interrupted OTM-2
    • ERR-04: Orphaned Pending from interrupted promotions
  • All auto-correctable errors are fixed; others escalated

Scenario C: Gateway stops for extended period (>3 min)

  • SE-1 stops receiving Slack events during downtime
  • Agents cannot send IE-02/IE-03 reports (OpenClaw hooks are down)
  • On restart: reconciliation rebuilds state from Slack List (source of truth for task fields)
  • Pending SURE acks will have timed out → ERR-06 logged
  • Agents that were mid-task may have completed work but couldn't report it:
    • OTM-7 detects counter mismatches (ERR-05) on next cycle
    • Watchdog cross-checks subtask completion status in Slack vs subtasks_remaining

Orchestrator re-registration:

  • The Orchestrator does NOT need to re-register agents. The OTM rebuilds the registry from openclaw.json automatically on startup (§4.6).
  • If openclaw.json has changed (new agent added, agent removed), the reconciliation picks up the delta.

Gateway restart logging:

  • Every OTM startup logs a system event: IE-SYS-01: OTM startup with details including:
    • Agents reconciled (count + names)
    • Tasks found in active states (In Progress, Pending, Assigned)
    • SURE timeouts detected
    • Errors found and corrected during reconciliation
  • This event is logged to event_log, otm-events.log, AND posted to Slack #alerts channel (OE-05)

5. Cross-Reference Index

5.1 Task Transition → Agent Transition Mapping

Task Transition Triggers Agent Transition Handler
TT-02 (Assigned → In Progress) AT-01 (Idle → Busy) OTM-2
TT-04 (In Progress → Agent Done) AT-02 (Busy → Idle) OTM-3
TT-11 (→ Cancelled) AT-03 (Busy → Idle) OTM-5
TT-13 (In Progress → Failed) AT-04 (Busy → Idle) OTM error

5.2 Transition → Handler Mapping

Transition(s) Primary Handler Description
TT-01, TT-02, TT-03, TT-05 OTM-2 Task assignment, availability check, queue management
TT-04 OTM-3 Subtask completion, counter decrement, task completion
TT-06, TT-07, TT-08, TT-10 OTM-4 Validation, rejection, rework, cancel-after-reject
TT-11 OTM-5 Task cancellation
TT-12 OTM-6 Archival (watchdog requests, OTM executes)
TT-13, TT-14, TT-15 OTM error handler Failure detection, retry, abandon
AT-01 OTM-2 Agent set busy
AT-02 OTM-3 Agent freed on task completion
AT-03 OTM-5 Agent freed on task cancellation
AT-04 OTM error handler Agent freed on task failure

5.3 Inbound Event Index

Code Source Description Triggered by
IE-01 SE-1 Raw list_item_updated event from Slack (Human field edit) Human edits task in Slack UI
IE-02 Agent Subtask completion report via OTM API Agent calls OTM after completing subtask
IE-03 Agent SURE acknowledgement via OTM API Agent confirms receipt of task assignment
IE-04 Orchestrator Task validation request via OTM API Orchestrator reviews and approves
IE-05 Orchestrator Task rejection request via OTM API (with reason) Orchestrator reviews and rejects
IE-06 Orchestrator Task action request via OTM API (rework/cancel/retry) Orchestrator requests state change
IE-07 Orchestrator Task retry request via OTM API (from Failed) Orchestrator wants to retry failed task
IE-08 Watchdog Cron tick (every 60 seconds) Timer fires
IE-09 OTM internal check_next_task_for_agent() result Triggered after AT-02/AT-03/AT-04
IE-10 OTM internal Agent error detection (hook timeout >5min, crash, exception) OpenClaw health monitoring

5.4 Outbound Event Index

Code Target Description Via
OE-01 Agent SURE task notification (requires acknowledgement IE-03) OpenClaw hooks (AI) / Slack task conversation (Human)
OE-02 Orchestrator Task completion/validation notification OpenClaw hooks
OE-03 (reserved)
OE-04 Agent Task cancellation notification OpenClaw hooks (AI) / Slack task conversation (Human)
OE-05 Admin Alert (anomaly, error, stale task, agent down) Telegram / Slack #alerts
OE-06 Slack Task field update (status, counters, timestamps) SW-1
OE-07 Slack Audit log entry in task conversation feed SW-1

5.5 All Code Summaries

  • Task transitions (TT-xx): TT-01 through TT-15 (TT-09 removed in v1.3) — see §3.2.1 and §3.3
  • Agent transitions (AT-xx): AT-01 through AT-04 — see §4.3 and §4.4
  • Inbound events (IE-xx): IE-01 through IE-10 — see §5.3
  • Outbound events (OE-xx): OE-01 through OE-07 — see §5.4

6. SURE Protocol (Send-Understand-Report-Execute)

All task notifications to agents use the SURE protocol to guarantee delivery and acknowledgement.

6.1 Flow

OTM sends task assignment → OE-01
  |
  +-- OTM logs in task conversation (OE-07):
  |     "[2026-03-12 14:30:05] OTM → Agent(Devdas): Task assigned — <title>"
  |
  +-- Agent receives notification
  |
  +-- Agent sends acknowledgement → IE-03
  |
  +-- OTM logs in task conversation (OE-07):
        "[2026-03-12 14:30:12] Agent(Devdas) → OTM: Task acknowledged"

6.2 Timeout

  • Retry 1: If no IE-03 within 1 minute → OTM retries notification (OE-01), logs retry.
  • Retry 2: If no IE-03 within 2 more minutes (3 min total) → OTM retries again, logs retry.
  • Error: If no IE-03 after retry 2 → OTM logs ERR-06 error, alerts admin (OE-05). Task remains In Progress — admin decides.

📌 The 1+2 minute schedule is designed to allow time for a gateway restart (~2 min typical). If the gateway restarts but the OTM does not, the OTM resync procedure (§4.7) handles recovery.

6.3 SURE applies to

Event SURE required?
Task assignment (OE-01) ✅ Yes
Rework task (OE-01 after TT-08) ✅ Yes
Task cancellation (OE-04) ❌ No (fire-and-forget, agent stops)
Admin alert (OE-05) ❌ No

6.4 Agent acknowledgement API

POST /api/otm/ack
{
  "task_id": "<task item ID>",
  "agent_id": "<slack_user_id>",
  "type": "task_assigned"
}

7. Audit Trail

Every event processed by the OTM is logged in the Slack task conversation feed via SW-1. This provides a human-readable, timestamped record of all activity on each task.

7.1 What is logged

Event type Log format
State change by OTM [timestamp] OTM: Status changed <previous> → <new> (TT-xx)
Orchestrator request received [timestamp] Orchestrator → OTM: <action> requested (IE-xx)
Human change detected [timestamp] Human(<name>) change detected: <field> = <value> (IE-01)
Agent notification sent [timestamp] OTM → Agent(<name>): <notification type> (OE-xx)
Agent acknowledgement received [timestamp] Agent(<name>) → OTM: Acknowledged (IE-03)
Agent subtask report received [timestamp] Agent(<name>) → OTM: Subtask done — <title> (IE-02). Remaining: <n>
Watchdog action [timestamp] Watchdog: <check description> (IE-08)
Error/alert [timestamp] OTM ERROR: <description> (IE-10)

7.2 Implementation

All audit entries are posted as replies in the Slack task's conversation thread via SW-1. This means:

  • Every task's conversation is a complete history of its lifecycle
  • No separate log table needed — Slack IS the audit log
  • Human-readable without any tooling
  • Searchable via Slack search

PART 1 — SLACK INTEGRATION LAYER

SE-1: Slack Event Listener

Detail
Actors Slack Events API (source)
Inbound events IE-01: any list_item_updated event on the Task Board List
Actions Forward raw event payload to OTM-0 entry point
Outbound events None — SE-1 has zero intelligence
Transitions None — SE-1 is a passthrough

The Slack app (Salvatore's app, socket mode) subscribes to list_item_updated events.

SE-1 does exactly one thing:

ON list_item_updated:
  → call otm_handle_event(raw_event_payload)

No routing. No field inspection. No filtering. The OTM decides what to do with the event.

Required Slack App scope for SE-1: lists:read only.

SW-1: Slack Writer

Detail
Actors OTM (sole caller)
Inbound events OTM handler calls
Actions Write task fields to Slack List, post audit entries to task conversation
Outbound events OE-06: Slack field update, OE-07: audit log entry

SW-1 is the sole component that writes to Slack. It provides two operations:

  1. sw1_update_fields(task_id, fields) — Updates task item fields (status, counters, timestamps). Produces OE-06.
  2. sw1_post_audit(task_id, message) — Posts a timestamped message to the task's conversation thread. Produces OE-07.

Required Slack App scope for SW-1: lists:write.

📌 SE-1 (lists:read) and SW-1 (lists:write) are separate concerns. They may run in the same Slack app but are logically distinct.

📌 SW-1 does NOT create or delete tasks. Only the Orchestrator creates tasks and subtasks (via Slack API or UI). SW-1's write scope is limited to: updating existing task fields (OE-06) and posting audit entries to task conversations (OE-07). During rework, the Orchestrator manages subtask creation/deletion directly; the OTM then processes the state change via SW-1.


PART 2 — OPENCLAW TASK MANAGER (OTM)

Implemented as an OpenClaw plugin pipeline set. Persists to SQLite. Sole component that writes task status, agent status, and counter fields.

OTM-0: Event Router

Detail
Actors OTM (internal)
Inbound events IE-01: raw list_item_updated from SE-1
Actions ACT-R1: Parse event payload and identify change type
ACT-R2: Route to appropriate handler
Outbound events None directly — delegates to handlers

Single entry point for all Slack-originated events. Contains the routing logic that was previously in SE-1.

RECEIVE raw_event_payload from SE-1
  |
  +-- Parse: what field(s) changed?
  |
  +-- IF assigned_to changed AND status = "New":
  |       → route to OTM-2 (task assignment)
  |
  +-- IF status changed to "Assigned" (from Pending promotion or rework):
  |       → route to OTM-2 (task re-assignment)
  |
  +-- IF status changed to "Cancelled" by Human:
  |       → route to OTM-5 (cancellation)
  |
  +-- IF other field changed by Human:
  |       → log via sw1_post_audit (OE-07): "Human(<name>) changed <field>"
  |       → no state transition
  |
  +-- ELSE: ignore

📌 All routing intelligence lives here, not in SE-1. SE-1 is a dumb pipe.

OTM-1: Agent Registry

Detail
Actors OTM (owner/writer)
Inbound events IE-08: startup reconciliation
IE-01: new agent detected (auto-register)
Actions ACT-A1: Register new agent (from openclaw.json or first interaction)
ACT-A2: Set agent busy (AT-01)
ACT-A3: Set agent idle (AT-02, AT-03, AT-04)
ACT-A4: Reconcile from Slack List on startup
ACT-A5: Reconcile from openclaw.json on startup
Outbound events OE-07: audit log for registration events
Transitions AT-01, AT-02, AT-03, AT-04

See §4.5 for schema and §4.6 for startup reconciliation.

OTM-2: Handle Task Assigned

Detail
Actors OTM (executor), Agent (notified if idle)
Inbound events IE-01: assigned_to changed or status=Assigned (from OTM-0)
Actions ACT-T1: Count subtasks via Slack API
ACT-T2: Check agent availability in registry
ACT-T3: Set task status (via SW-1)
ACT-T4: Set agent busy (AT-01 via OTM-1)
ACT-T5: Send SURE notification (OE-01)
Outbound events OE-01: SURE task notification (if agent idle)
OE-06: Slack field update
OE-07: audit log entry
Task transitions TT-02 (→ In Progress) or TT-03 (→ Pending)
Agent transitions AT-01 (Idle → Busy) if agent available
RECEIVE task assignment event (from OTM-0)
  |
  +-- Read task fields: task_id, title, assigned_to, priority
  +-- ACT-T1: Count child items (subtasks) via Slack API → subtask_count
  +-- Store subtask_count as initial subtasks_remaining
  |
  +-- ACT-T2: Look up assigned_to agent in registry
  |
  +-- IF agent NOT found AND agent_type detectable:
  |       ACT-A1: Auto-register agent
  |       sw1_post_audit: "New agent registered: <name>"
  |
  +-- IF agent NOT found AND not detectable:
  |       OE-05: Alert admin
  |       sw1_post_audit: "ERROR: Unknown agent <id>"
  |       Task stays in current status
  |
  +-- IF agent is IDLE:
  |       Store previous_status
  |       ACT-T4: Execute AT-01 (agent → busy)
  |       sw1_update_fields: subtasks_remaining, status = "In Progress", assigned_at = now
  |       sw1_post_audit: "Status: Assigned → In Progress (TT-02). Agent: <name>"
  |       ACT-T5: Send SURE notification (OE-01)
  |       sw1_post_audit: "OTM → Agent(<name>): Task assigned (OE-01). Awaiting SURE ack."
  |
  +-- IF agent is BUSY:
          Store previous_status
          sw1_update_fields: status = "Pending"
          sw1_post_audit: "Status: Assigned → Pending (TT-03). Agent <name> busy with <current_task>"

OTM-3: Handle Subtask Done

Detail
Actors Agent (reports completion), OTM (processes), Orchestrator (notified on task completion)
Inbound events IE-02: agent subtask completion report via OTM API
Actions ACT-S1: Validate subtask belongs to agent's current task
ACT-S2: Decrement counter
ACT-S3: Update Slack subtask checkbox (via SW-1) — sets both status AND Col00
ACT-S4: Complete task if counter = 0
ACT-S5: Free agent (AT-02 via OTM-1)
ACT-S6: Promote next pending task (IE-09)
Outbound events OE-02: Orchestrator notification (on task complete)
OE-06: Slack field update
OE-07: audit log entry
Task transitions TT-04 (→ Agent Done when counter hits 0)
Agent transitions AT-02 (Busy → Idle) when task completes
RECEIVE subtask completion report (IE-02)
  {task_id, subtask_id, agent_id}
  |
  +-- ACT-S1: Validate:
  |     - subtask belongs to task
  |     - agent is assigned to task
  |     - subtask not already completed (idempotency: check todo_completed field / Col00)
  |       IF already completed: discard, return OK
  |
  +-- ACT-S3: sw1_update_fields: subtask.todo_completed = true (Col00), subtask.status = done
  +-- ACT-S2: Decrement task.subtasks_remaining by 1
  +-- sw1_update_fields: subtasks_remaining
  +-- sw1_post_audit: "Agent(<name>): Subtask done — <title> (IE-02). Remaining: <n>"
  |
  +-- IF subtasks_remaining > 0:
  |       Done. Await next report.
  |
  +-- IF subtasks_remaining = 0 (TASK COMPLETE):
          Store previous_status = "In Progress"
          sw1_update_fields: status = "Agent Done", completed_at = now
          sw1_post_audit: "Status: In Progress → Agent Done (TT-04). All subtasks complete."
          ACT-S5: Execute AT-02 (agent → idle)
          ACT-S6: Notify Orchestrator (OE-02):
            POST /hooks/agent {
              agentId: "main",
              message: "Task ready for review: <title>\nAgent: <name>\nElapsed: <time>\nResult: <result_summary>\nLink: <slack_link>"
            }
          sw1_post_audit: "OTM → Orchestrator: Task ready for review (OE-02)"
          CALL: check_next_task_for_agent(agent_id)  → IE-09

check_next_task_for_agent(agent_id):

Query Slack List for tasks WHERE:
  assigned_to = agent_id
  AND status = "Pending"
  ORDER BY priority ASC, posted_at ASC
  LIMIT 1
  |
  +-- IF Pending task found:
  |       sw1_update_fields: pending_task.status = "Assigned"
  |       sw1_post_audit on pending task: "Status: Pending → Assigned (TT-05). Agent now available."
  |       (OTM-0 detects Assigned change → OTM-2 fires)
  |
  +-- IF no Pending task:
          Agent remains idle.

📌 Idempotency is handled by checking todo_completed (Col00) on the subtask before processing. No separate processed_events table needed. The Slack conversation feed (OE-07) serves as the complete audit trail.

📌 Agent reports directly to OTM API (IE-02), not by ticking checkboxes in Slack. The OTM then updates Slack via SW-1 (ACT-S3). This ensures all writes go through the OTM.

OTM-4: Task Validate / Reject (Orchestrator API)

Detail
Actors Orchestrator (caller), OTM (executor)
Inbound events IE-04: Orchestrator requests validation
IE-05: Orchestrator requests rejection (with reason)
IE-06: Orchestrator signals rework ready (subtasks already managed by Orchestrator)
Actions ACT-V1: Verify task status = "Agent Done" or "Rejected"
ACT-V2: Execute validation (TT-06)
ACT-V3: Execute rejection (TT-07)
ACT-V4: Count unfinished subtasks, set counter, change status (TT-08)
Outbound events OE-02: confirmation to Orchestrator
OE-06: Slack field update
OE-07: audit log entries
Task transitions TT-06 (→ Done), TT-07 (→ Rejected), TT-08 (→ Assigned), TT-10 (→ Cancelled)

Validate request (IE-04):

{
  "task_id": "<task item ID>",
  "outcome": "validated",
  "comment": "<optional>"
}

Processing — validated:

ACT-V1: Verify task.status = "Agent Done" (reject API call otherwise)
Store previous_status = "Agent Done"
sw1_post_audit: "Orchestrator → OTM: Validation requested (IE-04)"
ACT-V2: Execute TT-06
  sw1_update_fields: status = "Done", validated_at = now, todo_completed = true (Col00)
  sw1_post_audit: "Status: Agent Done → Done (TT-06). Validated."
  IF comment: sw1_post_audit: "Orchestrator comment: <comment>"
OE-02: Notify Orchestrator: "Task <title> is Done"

Reject request (IE-05):

{
  "task_id": "<task item ID>",
  "outcome": "rejected",
  "reason": "<mandatory explanation>"
}

Processing — rejected:

ACT-V1: Verify task.status = "Agent Done" (reject API call otherwise)
Store previous_status = "Agent Done"
sw1_post_audit: "Orchestrator → OTM: Rejection requested (IE-05). Reason: <reason>"
ACT-V3: Execute TT-07
  sw1_update_fields: status = "Rejected"
  sw1_post_audit: "Status: Agent Done → Rejected (TT-07)"

Rework request (IE-06) — submitted after Orchestrator has already prepared subtasks:

The Orchestrator handles all subtask management before calling the OTM:

  1. Orchestrator deletes unnecessary/obsolete subtasks from the Slack List
  2. Orchestrator leaves completed subtasks in place (as a record)
  3. Orchestrator creates new subtasks (first = "Acknowledge rework request: ")
  4. Orchestrator then notifies the OTM that rework is ready:
{
  "task_id": "<task item ID>",
  "action": "rework"
}

Processing — rework:

Verify task.status = "Rejected"
sw1_post_audit: "Orchestrator → OTM: Rework requested (IE-06)"
ACT-V4: Execute TT-08
  Count unfinished subtasks via Slack API (todo_completed = false / Col00 unchecked)
  Set subtasks_remaining = count of unfinished subtasks
  sw1_update_fields: status = "Assigned", subtasks_remaining
  sw1_post_audit: "Status: Rejected → Assigned (TT-08). Rework: <n> unfinished subtasks."
  → OTM-0 detects Assigned change → OTM-2 fires → SURE notification sent → agent acks

📌 Rework flow — separation of concerns: The Orchestrator owns subtask management (create, delete, keep). The OTM owns state management (status transitions, counter recalculation, agent assignment). The OTM never creates or deletes subtasks. On rework, it counts unfinished subtasks to set subtasks_remaining, then triggers the normal assignment flow. The first new subtask is always an acknowledgement subtask — the agent closes it to confirm they understood the rework instructions.

Cancel-after-reject request (IE-06):

{
  "task_id": "<task item ID>",
  "action": "cancel"
}

Verify task.status = "Rejected" → Execute TT-10 (→ Cancelled).

OTM-5: Handle Task Cancelled

Detail
Actors Orchestrator/Human (requests), OTM (executes), Agent (notified if was working)
Inbound events IE-01: Human status edit detected by OTM-0
IE-06: Orchestrator cancel API call
Actions ACT-C1: Store previous_status
ACT-C2: Free agent if applicable (AT-03)
ACT-C3: Notify agent of cancellation (OE-04)
ACT-C4: Promote next pending task (IE-09)
Outbound events OE-04: cancellation notification to agent
OE-06: Slack field update
OE-07: audit log entry
Task transitions TT-11 (→ Cancelled)
Agent transitions AT-03 (Busy → Idle) if agent was working
RECEIVE cancellation request (IE-01 or IE-06)
  |
  +-- ACT-C1: Store previous_status
  +-- sw1_post_audit: "Cancellation requested by <actor> (IE-xx)"
  +-- sw1_update_fields: status = "Cancelled"
  +-- sw1_post_audit: "Status: <previous> → Cancelled (TT-11)"
  |
  +-- IF task was In Progress or Assigned:
  |       ACT-C2: Execute AT-03 (agent → idle)
  |       ACT-C3: Send cancellation notification (OE-04)
  |       sw1_post_audit: "OTM → Agent(<name>): Task cancelled (OE-04)"
  |       ACT-C4: check_next_task_for_agent(agent_id) → IE-09
  |
  +-- IF task was Pending:
  |       sw1_post_audit: "Removed from queue (no agent notification)"
  |
  +-- IF task was New:
          sw1_post_audit: "Cancelled before assignment"

OTM-6: Watchdog

Detail
Actors Watchdog cron (detector), OTM (executor), Admin (alerted)
Inbound events IE-08: cron tick (every 60 seconds)
Actions ACT-W1: Check for stale In Progress tasks (>24h, no subtask activity)
ACT-W2: Check for orphaned Pending tasks (agent idle but task pending)
ACT-W3: Check counter mismatches (subtasks_remaining vs actual)
ACT-W4: Check archival candidates (Done/Cancelled >7 days)
ACT-W5: Check agent heartbeats (last_seen >2h)
Outbound events OE-05: admin alert (anomalies)
OE-06: Slack field update (archival)
OE-07: audit log entries
Task transitions Requests TT-12 (→ Archived), requests TT-05 (orphaned Pending → Assigned)

📌 The Watchdog does NOT write state directly. It calls OTM handler functions to execute transitions.

Checks:

ACT-W1: STALE IN-PROGRESS
  Query tasks In Progress for >24h with no subtask activity in conversation feed
  → OE-05: Alert admin. Do NOT auto-reassign.
  → sw1_post_audit: "Watchdog: Stale task detected (>24h, no activity)"

ACT-W2: ORPHANED PENDING
  Query tasks Pending WHERE assigned agent is Idle in registry
  → Request OTM to re-trigger: call OTM-2 (TT-05 → Assigned)
  → sw1_post_audit: "Watchdog: Orphaned pending — re-triggering assignment"

ACT-W3: COUNTER MISMATCH
  Compare subtasks_remaining with actual unchecked subtask count
  → Recalculate and fix counter via SW-1
  → OE-05: Alert admin
  → sw1_post_audit: "Watchdog: Counter mismatch corrected (<old> → <new>)"

ACT-W4: ARCHIVAL
  Query tasks Done or Cancelled WHERE completed_at/cancelled_at + 7 days < now
  → Request OTM to execute TT-12
  → sw1_update_fields: status = "Archived"
  → sw1_post_audit: "Status: <previous> → Archived (TT-12). Auto-archived after 7 days."

ACT-W5: AGENT HEARTBEAT
  Query agents WHERE last_seen > 2h ago
  → OE-05: Alert admin (agent may be down). No state change.

End-to-End Flow Diagrams

Flow 1a — Task Assigned, Agent Idle

Orchestrator sets assigned_to on New task
  |
[SE-1] receives list_item_updated → forwards raw event to OTM (IE-01)
  |
[OTM-0] parses: assigned_to changed → routes to OTM-2
  |
[OTM-2] checks registry: agent is IDLE
  |  sw1_update_fields: subtasks_remaining, status = "In Progress", assigned_at
  |  sw1_post_audit: "Status: New → Assigned → In Progress (TT-01, TT-02)"
  |  executes AT-01: agent → busy
  |  sends SURE notification (OE-01)
  |  sw1_post_audit: "OTM → Agent(<name>): Task assigned. Awaiting SURE ack."
  |
Agent receives notification
  |  sends acknowledgement (IE-03)
  |
[OTM] receives IE-03
  |  sw1_post_audit: "Agent(<name>): Task acknowledged (IE-03)"
  |
Agent starts work

Flow 1b — Task Assigned, Agent Busy

Orchestrator sets assigned_to on New task
  |
[SE-1] → IE-01 → [OTM-0] → routes to OTM-2
  |
[OTM-2] checks registry: agent is BUSY
  |  sw1_update_fields: status = "Pending"
  |  sw1_post_audit: "Status: Assigned → Pending (TT-03). Agent busy with <current_task>"
  |
Task waits silently. No agent notification.

Flow 2a — Subtask Done (not last)

Agent completes subtask → reports to OTM API (IE-02)
  |
[OTM-3] validates: subtask belongs to task, not already completed
  |  sw1_update_fields: subtask.todo_completed = true (Col00), subtask.status = done
  |  decrements subtasks_remaining (3 → 2)
  |  sw1_update_fields: subtasks_remaining
  |  sw1_post_audit: "Agent(<name>): Subtask done — <title>. Remaining: 2"
  |
Agent continues working.

Flow 2b — Last Subtask (task complete)

Agent completes final subtask → reports to OTM API (IE-02)
  |
[OTM-3] decrements (1 → 0)
  |  sw1_update_fields: status → "Agent Done", completed_at
  |  sw1_post_audit: "Status: In Progress → Agent Done (TT-04). All subtasks complete."
  |  executes AT-02: agent → idle
  |  notifies Orchestrator (OE-02)
  |  sw1_post_audit: "OTM → Orchestrator: Task ready for review"
  |  check_next_task_for_agent()
  |    → Pending task found? → TT-05 → OTM-0 → OTM-2 → Flow 1a
  |    → No pending? → agent stays idle

Flow 3a — Orchestrator Validates

Orchestrator tells OTM to validate task (IE-04)
  |
[OTM-4] verifies status = "Agent Done"
  |  sw1_post_audit: "Orchestrator: Validation requested"
  |  sw1_update_fields: status → "Done", validated_at, todo_completed = true (Col00)
  |  sw1_post_audit: "Status: Agent Done → Done (TT-06). Validated."
  |  notifies Orchestrator: confirmed (OE-02)

Flow 3b — Orchestrator Rejects and Requests Rework

Step 1: Orchestrator tells OTM to reject task (IE-05)
  |
[OTM-4] sw1_post_audit: "Orchestrator: Rejection. Reason: <reason>"
  |  sw1_update_fields: status → "Rejected"
  |  sw1_post_audit: "Status: Agent Done → Rejected (TT-07)"

Step 2: Orchestrator manages subtasks directly in Slack (NOT via OTM):
  |  - Deletes unnecessary/obsolete subtasks
  |  - Leaves completed subtasks in place (as record)
  |  - Creates new subtasks:
  |      1. "Acknowledge rework request: <detailed reason and instructions>"
  |      2. "Fix validation on email field"
  |      3. "Add unit tests for edge cases"

Step 3: Orchestrator tells OTM that rework is ready (IE-06)
  |
[OTM-4] receives rework signal:
  |  Counts unfinished subtasks via Slack API → 3
  |  sw1_update_fields: status → "Assigned", subtasks_remaining = 3
  |  sw1_post_audit: "Status: Rejected → Assigned (TT-08). Rework: 3 unfinished subtasks."

Step 4: Normal assignment flow (TT-02 → SURE → agent works)
  |
[OTM-0] detects Assigned → OTM-2 → agent idle → In Progress
  |  SURE notification posted to Slack task conversation (OE-01)
  |  Agent acknowledges (IE-03)
  |
Agent reads first subtask: "Acknowledge rework request: ..."
  |  Agent closes first subtask to acknowledge (IE-02)
  |  OTM-3 decrements: 3 → 2
  |  sw1_post_audit: "Agent(<name>): Acknowledged rework. Remaining: 2"
  |
Agent works through remaining subtasks normally

Flow 4 — Cancellation

Orchestrator tells OTM to cancel (IE-06) / Human changes status in Slack UI (IE-01)
  |
[OTM-0] routes to OTM-5
  |
[OTM-5] sw1_post_audit: "Cancellation requested by <actor>"
  |  sw1_update_fields: status → "Cancelled", cancelled_at = now
  |  executes AT-03: agent freed if was working
  |  sends cancellation notification (OE-04)
  |  sw1_post_audit: "Status: <previous> → Cancelled (TT-11)"
  |  check_next_task_for_agent() for freed agent

Flow 5 — Auto-Archival (TT-12)

[OTM-6 Watchdog] IE-08: cron tick fires (every 60s)
  |
  +-- ACT-W4: Query tasks WHERE:
  |     (status = "Done" AND validated_at + 7 days < now)
  |     OR (status = "Cancelled" AND cancelled_at + 7 days < now)
  |
  +-- FOR EACH archival candidate:
        Store previous_status
        sw1_update_fields: status → "Archived"
        sw1_post_audit: "Status: <previous> → Archived (TT-12). Auto-archived after 7 days."
        INSERT INTO task_history (snapshot of task fields)
        DELETE task from active tasks (SQLite only — Slack List item untouched)

📌 TT-12 is a watchdog-requested transition (IE-08 trigger). The watchdog detects the 7-day threshold; the OTM executes the archive. Archived tasks are snapshotted to task_history for long-term reporting before being removed from the active tasks table.

📌 Slack archive limitation: The OTM sets status = "Archived" in Slack via SW-1, but the actual "Archive item" action in the Slack UI cannot be triggered via API (slackLists.items.archive does not exist). Manual Slack UI archiving is required for items to visually disappear from the default Slack list view.


PART 3 — FILE-BASED PIPELINE

The file-based pipeline is the operational implementation of Systems 1–4. It bridges the Orchestrator's task-creation workflow to the Slack List and agent workspaces, using JSON files as the intermediary to avoid direct Slack API calls by agents.

System 1: Task Creation (otm-create-task.sh)

The Orchestrator (Claudia) creates tasks by running a shell script. This populates a JSON file and atomically increments the task ID counter.

Script Parameters

Parameter Required Default Description
--title Short, actionable one-liner
--agent Who works on it: claudia, devdas, archibald, frederic, salvatore, sylvain, rupert
--priority normal critical | high | normal | medium | low | batchable
--project (none) Free-text project name (e.g. prj-012, fab-state)
--type action action | decision | review
--subtask "Confirm that task has been done" Repeatable. Each value becomes a Slack subtask. If none provided, a default confirmation subtask is auto-added (required for completion detection)

Examples

# Simple action task
otm-create-task.sh --title "Fix login bug" --agent devdas --priority high --project prj-012

# Decision for Rupert
otm-create-task.sh --title "Approve budget for Q2" --agent rupert --type decision

# Task with subtasks
otm-create-task.sh --title "Build login page" --agent devdas --project prj-012 \
  --subtask "Create login form component" \
  --subtask "Add validation logic" \
  --subtask "Write unit tests"

# Input/acknowledgement pattern (replaces --description)
otm-create-task.sh --title "Review API design" --agent frederic --type review \
  --subtask "input: the API spec is at docs/api-v2.md" \
  --subtask "Check endpoint naming conventions" \
  --subtask "Validate error response format"

# Batchable priority (can wait for batch processing)
otm-create-task.sh --title "Update documentation" --agent archibald --priority batchable

JSON File Format (intermediate)

Written to ~/Library/Application Support/OpenClaw/otm/new-tasks/<timestamp>-<uuid>.json

{
  "id": "uuid",
  "taskId": "T-00035",
  "title": "Short task title",
  "agent": "devdas",
  "createdAt": "2026-03-14T08:20:19Z",
  "priority": "normal",
  "project": "prj-012",
  "subtasks": ["Subtask 1", "Subtask 2"],
  "type": "action",
  "status": "pending"
}

Task ID System

  • Format: T-NNNNN (5 digits, zero-padded)
  • Counter file: ~/Library/Application Support/OpenClaw/otm/next-task-id.json
  • Atomic increment with file locking (flock)
  • Auto-assigned by otm-create-task.sh — agents don't manage IDs manually

Type Values

Type Purpose Primary user
action Task to execute (default) Agents
decision Requires a decision from someone Rupert
review Needs review / approval Rupert or agents

System 2: Task Injection (otm-injector.js)

The injector watches the new-tasks/ directory and publishes JSON task files to the Slack Lists API.

Triggers

Component File Trigger
Watcher ai.openclaw.otm-watcher.plist WatchPaths on new-tasks/ (and task-updates/)
Sweeper ai.openclaw.otm-sweeper.plist Every 10 min (catches misses)

Idempotency (Dedup Check)

The injector includes a dedup check before creating tasks in Slack to prevent duplicates on crash/retry:

  1. Before calling slackLists.items.create, the injector fetches all existing items
  2. Scans for a matching Task ID (Col0ALVK2NA1E)
  3. If the Task ID already exists → skips creation, moves file to processed/ with _skipped_duplicate: true
  4. If the dedup API call fails → proceeds with creation (better to duplicate than lose a task)

This prevents duplicate Slack items when the watcher or sweeper re-processes a file already injected (e.g., after a crash, retry, or race condition).

Directories

Path Purpose
~/Library/Application Support/OpenClaw/otm/new-tasks/ Inbox — pending task files
~/Library/Application Support/OpenClaw/otm/task-updates/ Inbox — update files (System 3b)
~/Library/Application Support/OpenClaw/otm/processed/ Successfully injected
~/Library/Application Support/OpenClaw/otm/failed/ Failed (with error metadata)

System 3: Task Dispatcher (otm-dispatcher.js)

A lightweight Node.js scanner that detects tasks with an assignee and dispatches them to the appropriate agent workspace. Runs every 2 minutes via launchd.

What it does

  1. Fetches all tasks from the Slack list (F0ALE8DCW1F)
  2. Finds tasks where: status = "new" AND assignee is set (not empty)
  3. For each matching task (in this exact order for crash safety):
    1. Writes a dispatch file to the agent's workspace: /Volumes/OPENCLAW/CLAUDIA/rapido-openclaw/workspaces/<agent>-workspace/task-dispatch.json
    2. Updates the task's status in Slack from newassigned + sets assigned_at (this is the dedup gate — once assigned, future runs skip it)
    3. Triggers the agent session via Gateway WebSocket RPC
  4. After all tasks processed, all agent triggers fire in parallel — agents start concurrently
  5. Writes its own component state file (System 5 heartbeat)

Agent Triggering via Gateway WebSocket RPC

After writing task-dispatch.json, the dispatcher actively triggers each agent via the OpenClaw Gateway WebSocket RPC. This eliminates the passive "wait for heartbeat" gap — agents start working immediately.

Protocol: WebSocket JSON-RPC to ws://127.0.0.1:18789

Dispatcher                    Gateway                     Agents
  │                              │                          │
  │ ws.connect()                 │                          │
  │─────────────────────────────►│                          │
  │                              │                          │
  │ { method: "agent",           │                          │
  │   params: {                  │                          │
  │     agentId: "archibald",    │                          │
  │     message: "Task T-00044   │                          │
  │       dispatched...",        │                          │
  │     idempotencyKey:          │                          │
  │       "otm-T-00044"         │                          │
  │   }}                         │                          │
  │─────────────────────────────►│ ──► archibald session ──►│
  │                              │                          │
  │ { method: "agent", ...       │                          │
  │   agentId: "devdas" }        │                          │
  │─────────────────────────────►│ ──► devdas session ─────►│
  │                              │                          │
  │ { method: "agent", ...       │                          │
  │   agentId: "salvatore" }     │                          │
  │─────────────────────────────►│ ──► salvatore session ──►│
  │                              │                          │
  │ ws.close()                   │     (all 3 run in        │
  │─────────────────────────────►│      parallel)           │
  │                              │                          │
  │  Dispatcher exits.           │  Gateway manages         │
  │  Total time: ~2 seconds.     │  concurrent sessions.    │

RPC call format:

{
  "method": "agent",
  "params": {
    "message": "Task T-00044 dispatched. Check task-dispatch.json and execute it.",
    "agentId": "archibald",
    "idempotencyKey": "otm-T-00044"
  }
}

Response (immediate, non-blocking):

{
  "runId": "otm-T-00044",
  "status": "accepted",
  "acceptedAt": 1773504558104
}

Key properties:

  • Parallel: All agent sessions start concurrently — no serialization
  • Non-blocking: Gateway returns accepted immediately; dispatcher doesn't wait
  • Auth: Gateway token passed via WebSocket connection headers
  • Rupert excluded: Human users are notified via Slack UI, not RPC

⚠️ idempotencyKey is NOT idempotent. Despite the name, the Gateway agent RPC method accepts duplicate calls with the same key — it uses the key as a runId label only. Calling twice with the same key = two separate agent sessions. Idempotency is the dispatcher's responsibility, not the gateway's. The Slack status flip (newassigned) is the sole dedup mechanism for the dispatcher. Once a task is assigned, it's invisible to future dispatcher runs.

vs. HTTP Webhook alternative (POST /hooks/agent): The webhook approach serializes agent sessions on CommandLane.Nested — agents run one at a time. With 6 agents × ~3 min each = ~18 min sequential vs. ~3 min parallel via WebSocket RPC. WebSocket is the correct choice for multi-agent dispatch.

Dispatch Operation Order (crash safety)

The dispatcher must execute operations in this exact order to prevent duplicate agent triggers:

1. Write task-dispatch.json to agent workspace
2. Update Slack status: new → assigned (+ set assigned_at)
3. Trigger agent via WS RPC

Why this order matters:

Crash point Result Recovery
After step 1, before step 2 File written, Slack still new Next dispatcher run re-dispatches → file already has the task (append-only), agent gets triggered. Safe but duplicate file entry.
After step 2, before step 3 Slack says assigned, agent never woke up Agent picks up task on next heartbeat or manual trigger. Safe — delayed but not lost.
After step 3 All done Clean path.

The dangerous alternative (trigger agent FIRST, then update Slack) risks: crash after trigger → Slack still new → next run triggers agent AGAIN → duplicate work, wasted tokens. This is why Slack update must come before agent trigger.

Dispatch File Format (task-dispatch.json)

{
  "dispatched": [
    {
      "taskId": "T-00033",
      "title": "Build login page",
      "priority": "high",
      "project": "PRJ-012 App",
      "type": "action",
      "subtasks": ["Create form", "Add validation", "Write tests"],
      "dispatchedAt": "2026-03-14T15:00:00Z",
      "slackItemId": "Rec0ALXYZ"
    }
  ]
}

The dispatch file is APPEND-ONLY — new tasks get added to the dispatched array. The agent removes entries when they pick them up (or marks them as "picked": true).

Agent-Side Convention

On session start, agents check for task-dispatch.json. If present, they:

  1. Pick up the highest-priority task
  2. Update their agent-state.json to "working" with the task
  3. Start working on it
  4. When done, mark subtasks complete via otm-update-task.sh

Agent Workspace Mapping

Agent Workspace path
claudia claudia-workspace
devdas devdas-workspace
archibald archibald-workspace
frederic frederic-workspace
salvatore salvatore-workspace
sylvain sylvain-workspace
rupert (skip — human, notified via Slack UI)

Component

Item Detail
File otm-dispatcher.js
Plist ai.openclaw.otm-dispatcher.plist
Schedule Every 120 seconds
Log ~/Library/Logs/OpenClaw/otm-dispatcher.log
State file otm-dispatcher-state.json

System 3b: Task Updates (otm-update-task.sh)

A file-drop mechanism that allows agents to update task status and mark subtasks complete — same security model as task creation (agents never touch the Slack API directly).

Script: otm-update-task.sh

# Mark a subtask done
otm-update-task.sh --task-id T-00033 --subtask-done "Create login form"

# Update task status
otm-update-task.sh --task-id T-00033 --status in_progress

# Report blocked
otm-update-task.sh --task-id T-00033 --status blocked --reason "Waiting on API key"

# Multiple subtasks done at once
otm-update-task.sh --task-id T-00033 \
  --subtask-done "Create login form" \
  --subtask-done "Add validation logic"

Parameters

Parameter Required Description
--task-id Task ID (T-NNNNN)
--status New status: in_progress | blocked | agent_done
--subtask-done Subtask title to mark as done (repeatable)
--reason Reason text (used with --status blocked)

At least one of --status or --subtask-done is required.

JSON Format (intermediate)

Written to ~/Library/Application Support/OpenClaw/otm/task-updates/<timestamp>-<uuid>.json

{
  "id": "uuid",
  "taskId": "T-00033",
  "action": "update",
  "createdAt": "2026-03-14T16:00:00Z",
  "status": "in_progress",
  "subtasksDone": ["Create login form"],
  "reason": null
}

Processing

The otm-injector.js is extended to also watch task-updates/ directory:

  1. Reads the update JSON
  2. Looks up the task in Slack by Task ID (scans items, matches Col0ALVK2NA1E)
  3. If status is set → updates the Status column
  4. If subtasksDone is set → finds matching child items by title → sets their status to done AND sets Col00 (checkbox) to true
  5. Moves file to processed/ on success, failed/ on error

Component

Item Detail
Script otm-update-task.sh (shared tool)
Processor otm-injector.js (extended)
Watcher ai.openclaw.otm-watcher.plist (updated WatchPaths)

System 4: Completion Detection (otm-completion-detector.js)

A lightweight scanner that auto-promotes tasks to agent_done when all subtasks are complete.

Every task has at least one subtaskotm-create-task.sh auto-adds "Confirm that task has been done" if no subtasks are provided. This guarantees the completion detector always has a signal.

Flow

Completion Detector (Node.js, launchd every 2 min)
  │
  ├── GET all tasks from Slack list
  ├── FILTER: status = "in_progress" + has child items (subtasks)
  ├── CHECK: all child items have status ∈ {done, agent_done} OR Col00 = true
  │
  └── YES → UPDATE parent task status → "agent_done"

Rules

Condition Action
Task in_progress + all subtasks done/agent_done → promote to agent_done
Task in_progress + some subtasks still open → skip (work in progress)
Task not in_progress → skip (not active)

What happens after agent_done

  • Technical tasks: Claudia validates the work → done
  • Business/decision tasks: Claudia creates a review task for Rupert → Rupert approves → done

Component

Item Detail
File otm-completion-detector.js
Plist ai.openclaw.otm-completion-detector.plist
Schedule Every 120 seconds
Log ~/Library/Logs/OpenClaw/otm-completion-detector.log

System 5: Component Heartbeats

Each OTM component writes a state file after every run. These files are watched by the collector and surfaced on the dashboard.

State File Format

{
  "component": {
    "id": "otm-injector",
    "status": "alive",
    "lastRun": "2026-03-14T10:15:00Z",
    "result": "success",
    "details": "Processed 2 tasks, 0 failures"
  }
}

State Files

Component File Written by
OTM Injector otm-injector-state.json otm-injector.js (end of each run)
Dispatcher otm-dispatcher-state.json otm-dispatcher.js (end of each run)
Completion Detector otm-completion-detector-state.json otm-completion-detector.js (end of each run)
Watcher otm-watcher-state.json watcher wrapper

All files: ~/Library/Application Support/OpenClaw/otm/

Dashboard

The OTM Components card shows each component's status with staleness color coding:

  • 🟢 Green: last run < 5 min ago
  • 🟡 Yellow: last run 5–15 min ago
  • 🔴 Red: last run > 15 min ago (action required)

Observability Stack

State file → collector.js (FSEvents) → SQLite otm_state table
                                             ↓
                               reader.js polls + broadcasts
                                             ↓
                                    Dashboard WebSocket

System 6: DMZ Relay

The DMZ relay bridges the private OpenClaw state to the public Vercel dashboard. It runs on a Synology NAS in the DMZ.

Architecture

OpenClaw VM (private)     Synology NAS (DMZ)           Browser (Vercel dashboard)
  │                         ┌──────────────────┐
  │  collector.js           │  RECEIVER         │
  │  pushToRelay()          │  127.0.0.1:3456   │
  │ ── HTTP POST ─────────► │  + bearer token   │
  │  on state change         │  + atomic write   │
  │                          │                   │
  │                          │  fab-state.json   │ ← shared state file
  │                          │                   │
  │                          │  BROADCASTER      │
  │                          │  0.0.0.0:3457     │
  │                          │  (TLS via proxy)  │ ◄── wss://nas.domain/ws
  │                          │  fs.watch → push  │ ◄── GET /api/state
  │                          └──────────────────┘

Services

Service File Port Binding Dependencies
Receiver receiver.js 3456 127.0.0.1 (localhost only) Zero — pure Node.js
Broadcaster broadcaster.js 3457 0.0.0.0 (behind TLS proxy) ws package only

Env Vars

Service Var Required Description
Receiver FAB_RELAY_TOKEN Shared secret bearer token
Receiver STATE_FILE Path to state file (default: ./fab-state.json)
Broadcaster STATE_FILE Same state file path
Collector (OpenClaw) FAB_RELAY_URL If set, enables relay push
Collector (OpenClaw) FAB_RELAY_TOKEN Must match receiver token

Security Model

Layer Protection
Bearer token Constant-time comparison (timing-attack safe)
Receiver binding 127.0.0.1 — not reachable from internet
Firewall Port 3456: allow ONLY from OpenClaw VM IP
Broadcaster Read-only; no auth needed (non-sensitive data)
TLS All external traffic via Synology reverse proxy + Let's Encrypt
Atomic write Receiver writes .tmp → rename; no partial reads

Collector Integration

// Only runs if FAB_RELAY_URL is set:
pushStateToRelay(); // called after every gateway/agent/OTM state change

The pushToRelay() function in collector.js:

  1. Builds full snapshot from SQLite (gateway + agents + OTM)
  2. POSTs to FAB_RELAY_URL with bearer token
  3. Handles errors gracefully — relay down = warning log, not crash

Files

File Location
receiver.js work/PROJECTS/fab-state/synology-relay/
broadcaster.js work/PROJECTS/fab-state/synology-relay/
package.json work/PROJECTS/fab-state/synology-relay/
SETUP-GUIDE.md work/PROJECTS/fab-state/synology-relay/

See work/PROJECTS/strategy-openclaw-org/docs/FAB-STATE.md System 6 for full documentation.

Full Pipeline Flow

Claudia              Filesystem            Injector         Slack            Dispatcher           Agent Workspace
  │                     │                     │               │                  │                      │
  │ otm-create-task.sh  │                     │               │                  │                      │
  │────────────────────►│ .json               │               │                  │                      │
  │                     │─── watcher ────────►│               │                  │                      │
  │                     │                     │ items.create  │                  │                      │
  │                     │                     │──────────────►│ status=new       │                      │
  │                     │                     │  + subtasks   │                  │                      │
  │                     │ move to processed/  │               │                  │                      │
  │                     │◄────────────────────│               │                  │                      │
  │                     │                     │               │                  │                      │
  │                     │                     │               │ ◄── scan ───────│ (every 2 min)        │
  │                     │                     │               │ new + assignee   │                      │
  │                     │                     │               │ ── update ──────►│                      │
  │                     │                     │               │ status=assigned  │                      │
  │                     │                     │               │                  │ task-dispatch.json   │
  │                     │                     │               │                  │─────────────────────►│
  │                     │                     │               │                  │                      │
  │                     │                     │               │                  │ WS RPC: agent()      │
  │                     │                     │               │                  │──► Gateway ──► Agent │
  │                     │                     │               │                  │   (parallel start)   │
  │                     │                     │               │                  │                      │
  │                     │                     │               │                  │      (agent works)   │
  │                     │                     │               │                  │                      │
  │                     │                     │               │ ◄── scan (completion detector, 2 min)  │
  │                     │                     │               │ in_progress +    │                      │
  │                     │                     │               │ all subtasks done│                      │
  │                     │                     │               │ → agent_done     │                      │

All Pipeline Components Summary

Component File Trigger
Task creator otm-create-task.sh Called by agents
Injector otm-injector.js Called by watcher/sweeper
Watcher ai.openclaw.otm-watcher.plist WatchPaths on new-tasks/ + task-updates/
Sweeper ai.openclaw.otm-sweeper.plist Every 10 min (catches misses)
Dispatcher otm-dispatcher.js Every 2 min (launchd)
Completion detector otm-completion-detector.js Every 2 min (launchd)

8. Error Monitor (OTM-7)

The Error Monitor is a dedicated OTM component that detects state inconsistencies, traces errors for lessons learned, and triggers corrective actions. It runs as part of the Watchdog cycle (IE-08, every 60s) but is logically separate from the Watchdog's operational checks (OTM-6).

8.1 Error Condition Catalogue

Each error condition is coded ERR-xx. The Error Monitor detects; the OTM corrects.

Code Condition Detection Rule Severity Auto-correction Manual escalation
ERR-01 Stale In Progress Task status = "In Progress" AND no IE-02 subtask report in >10 minutes Warning None — alert only OE-05: Admin notified. May indicate agent crash, stuck task, or slow work.
ERR-02 Agent-Task Mismatch (busy agent, no task) Agent status = busy AND current_task not found in Slack List (or task status ≠ In Progress) Critical Set agent idle (AT-04), call check_next_task_for_agent() OE-05: Admin alert with details of orphaned agent state
ERR-03 Agent-Task Mismatch (idle agent, active task) Task status = "In Progress" AND assigned agent status = idle in registry Critical Set agent busy, re-send SURE notification (OE-01) OE-05: Admin alert — state was inconsistent
ERR-04 Orphaned Pending Task Task status = "Pending" AND assigned agent status = idle High Promote task: TT-05 → re-evaluate via OTM-2 OE-07: Audit log on task
ERR-05 Counter Mismatch subtasks_remaining ≠ actual count of unchecked subtasks in Slack List High Recalculate and fix counter via SW-1 OE-05: Admin alert with old/new values
ERR-06 SURE Timeout OE-01 sent >3 minutes ago (1min + 2min retries) AND no IE-03 received Critical None — task stays In Progress OE-05: Admin alert. Agent may be unreachable, or gateway may have restarted.
ERR-07 Multiple Active Tasks per Agent Agent has >1 task with status = "In Progress" assigned to them Critical Keep oldest task, move others to Pending OE-05: Admin alert — invariant violation
ERR-08 Stuck in Assigned Task status = "Assigned" for >5 minutes (should transition immediately to In Progress or Pending) High Re-trigger OTM-2 for the task OE-05: Admin alert if re-trigger fails
ERR-09 Stuck in Rejected Task status = "Rejected" for >24 hours (Orchestrator hasn't submitted rework or cancelled) Warning None — alert only OE-05: Remind Orchestrator to act
ERR-10 Ghost Agent assigned_to field references a Slack user ID not in agent registry AND not auto-registerable Critical Task stays in current status OE-05: Admin alert — unknown agent
ERR-11 Duplicate Subtask Reports Same subtask_id reported done >1 time (idempotency check caught it) Info Silently discarded (idempotent) Logged in event log (§9) for pattern analysis
ERR-12 Stale Rework (no ack subtask closed) Task in "In Progress" after TT-08 rework, first subtask ("Acknowledge rework…") not closed within 10 minutes Warning None — alert only OE-05: Agent may not have read rework instructions

8.2 Error Monitor Processing

OTM-7 runs every 60s (piggybacks on IE-08 watchdog cron):
  |
  FOR EACH error check ERR-01 through ERR-12:
    |
    +-- Run detection query (SQLite + Slack API as needed)
    |
    +-- IF condition detected:
    |     1. Log error to event_log table (§9): {error_code, task_id, agent_id, details, timestamp}
    |     2. Log to task conversation (OE-07): "[timestamp] OTM-7 ERROR: ERR-xx detected — <description>"
    |     3. IF auto-correctable: execute correction, log correction action
    |     4. IF manual escalation: send OE-05 alert to admin
    |     5. Increment error counter in error_stats table (§10)
    |
    +-- IF condition NOT detected: skip

8.3 Lessons Learned Pipeline

Errors are not just fixed — they feed a continuous improvement loop.

  1. Error Statistics Table (error_stats in error DB, see §10): Tracks frequency, first/last occurrence, auto-correction success rate per ERR-xx code.
  2. Daily Error Report: OTM-7 generates a summary during the first watchdog cycle after 00:00 each day:
    • Error counts by code (ERR-01 through ERR-12)
    • Most frequent errors
    • Auto-correction success/failure ratio
    • New error patterns (first-time occurrences)
  3. Threshold Alerts: Alert on EVERY error (threshold = 1). During startup and early operation, all anomalies are surfaced immediately via OE-05. The threshold can be raised once the system is stable and baseline error rates are understood.
  4. Root Cause Tagging: Admin can tag errors with root cause via OTM API (POST /api/otm/error/{id}/tag), enabling aggregate analysis.

8.4 Corrective Action Summary

Action Triggered by Effect
Re-send SURE notification ERR-03 (idle agent, active task) Resynchronise agent with its task
Promote pending task ERR-04 (orphaned pending) Unblock queued work
Recalculate counter ERR-05 (counter mismatch) Fix data integrity
Free orphaned agent ERR-02 (busy agent, no task) Unblock agent for new work
Re-trigger OTM-2 ERR-08 (stuck in Assigned) Retry the assignment flow
Move excess tasks to Pending ERR-07 (multiple active tasks) Restore single-task invariant

9. Event Logging & Observability

All OTM events are logged to three complementary systems:

  1. Slack task conversation (OE-07) — human-readable, per-task, searchable in Slack (§7)
  2. Internal event log (event_log table in error DB) — structured, queryable, machine-readable
  3. Internal event log files — filesystem mirrors of database writes, for real-time tailing during tests and startup (§9.6)

9.1 Event Log Schema

CREATE TABLE event_log (
  id INTEGER PRIMARY KEY AUTOINCREMENT,
  timestamp INTEGER NOT NULL,          -- Unix timestamp (ms precision)
  event_type TEXT NOT NULL,            -- 'inbound' | 'outbound' | 'transition' | 'error' | 'correction' | 'system'
  event_code TEXT NOT NULL,            -- IE-xx, OE-xx, TT-xx, AT-xx, ERR-xx
  task_id TEXT,                        -- Slack List item ID (NULL for system events)
  agent_id TEXT,                       -- slack_user_id (NULL if not agent-related)
  handler TEXT,                        -- OTM-0 through OTM-7, SE-1, SW-1
  source TEXT NOT NULL,                -- 'se1', 'otm_api', 'watchdog', 'error_monitor', 'internal'
  detail TEXT,                         -- JSON blob with event-specific data
  duration_ms INTEGER,                 -- Processing time for this event
  success INTEGER DEFAULT 1,           -- 1 = success, 0 = failure
  error_message TEXT                   -- Error details if success = 0
);

CREATE INDEX idx_event_log_task ON event_log(task_id);
CREATE INDEX idx_event_log_agent ON event_log(agent_id);
CREATE INDEX idx_event_log_type ON event_log(event_type, timestamp);
CREATE INDEX idx_event_log_code ON event_log(event_code, timestamp);
CREATE INDEX idx_event_log_time ON event_log(timestamp);

9.2 What is Logged

Every IE-xx, OE-xx, TT-xx, AT-xx, and ERR-xx event produces one row in event_log. This includes:

Event Category Examples Logged Fields
Inbound events IE-01 (Slack event), IE-02 (subtask report), IE-03 (SURE ack) source, task_id, agent_id, raw payload in detail
Outbound events OE-01 (SURE notification), OE-06 (Slack write) target, task_id, delivery status, duration_ms
Task transitions TT-01 through TT-15 from_status, to_status, task_id, requesting_actor
Agent transitions AT-01 through AT-04 from_status, to_status, agent_id, triggering_task
Errors ERR-01 through ERR-12 error_code, detection_details, correction_applied
System events Startup, reconciliation, watchdog cycle cycle_number, checks_run, anomalies_found

9.3 Observability Queries

The event log enables:

-- Task lifecycle: all events for a specific task
SELECT * FROM event_log WHERE task_id = ? ORDER BY timestamp;

-- Agent activity: all events involving a specific agent
SELECT * FROM event_log WHERE agent_id = ? ORDER BY timestamp;

-- Error frequency: last 24 hours
SELECT event_code, COUNT(*) as count
FROM event_log
WHERE event_type = 'error' AND timestamp > ?
GROUP BY event_code ORDER BY count DESC;

-- Average task completion time
SELECT AVG(e2.timestamp - e1.timestamp) / 1000 / 60 as avg_minutes
FROM event_log e1
JOIN event_log e2 ON e1.task_id = e2.task_id
WHERE e1.event_code = 'TT-02' AND e2.event_code = 'TT-04';

-- Slowest handlers (performance monitoring)
SELECT handler, AVG(duration_ms), MAX(duration_ms), COUNT(*)
FROM event_log
WHERE duration_ms IS NOT NULL
GROUP BY handler ORDER BY AVG(duration_ms) DESC;

-- SURE acknowledgement response times
SELECT AVG(ack.timestamp - notif.timestamp) / 1000 as avg_seconds
FROM event_log notif
JOIN event_log ack ON notif.task_id = ack.task_id
WHERE notif.event_code = 'OE-01' AND ack.event_code = 'IE-03';

9.4 Retention & Historicisation

Data Retention Archive strategy
event_log (active) 30 days Rows older than 30 days → event_log_archive
event_log_archive 1 year Monthly SQLite dump to filesystem (gzipped)
error_stats Indefinite Cumulative counters, never purged
Slack conversation audit Indefinite Lives in Slack (Slack's retention policy applies)

Maintenance job (runs during OTM-6 watchdog, daily at 03:00):

-- Move old events to archive
INSERT INTO event_log_archive SELECT * FROM event_log WHERE timestamp < (now - 30 days);
DELETE FROM event_log WHERE timestamp < (now - 30 days);

-- Vacuum to reclaim space
VACUUM;

9.5 Why Not Sentry?

Internal structured logging (SQLite) is chosen over Sentry because:

  • No external dependency — OTM is self-contained
  • Queryable — SQL enables arbitrary analysis (Sentry requires its query language)
  • Correlated with task data — same DB, JOIN-able with agent registry
  • Low volume — estimated <10,000 events/day (see §10), no need for distributed tracing
  • Cost — zero (SQLite is free; Sentry has per-event pricing)
  • Privacy — all data stays on the OpenClaw server

If event volume exceeds 100,000/day or distributed tracing across multiple servers becomes needed, Sentry or OpenTelemetry would be reconsidered.

9.6 Filesystem Log File Mirroring

All database writes to event_log and error_stats are mirrored to two filesystem log files in real-time:

File Content Format Purpose
{OPENCLAW_DATA_DIR}/otm/logs/otm-events.log All event_log inserts [ISO-timestamp] [event_code] [handler] [task_id] [agent_id] detail_json Monitor all OTM activity via tail -f
{OPENCLAW_DATA_DIR}/otm/logs/otm-errors.log All ERR-xx detections + corrections [ISO-timestamp] [ERR-xx] [severity] [task_id] [agent_id] description [correction: action/none] Monitor errors during tests and startup

Implementation: Every INSERT INTO event_log and every error detection in OTM-7 appends one line to the corresponding log file. This is a synchronous append (negligible overhead at <350 events/day).

Log rotation: Daily at 03:00, rename to otm-events.log.YYYY-MM-DD and otm-errors.log.YYYY-MM-DD. Keep 30 days of rotated files. Older files deleted automatically.

Usage during development/testing:

# Watch all OTM events in real time
tail -f {OPENCLAW_DATA_DIR}/otm/logs/otm-events.log

# Watch errors only
tail -f {OPENCLAW_DATA_DIR}/otm/logs/otm-errors.log

# Filter for a specific task
tail -f otm-events.log | grep "T-00042"

# Filter for a specific error code
tail -f otm-errors.log | grep "ERR-03"

📌 The log files are append-only mirrors — the database remains the source of truth for queries and analysis. The files exist purely for human monitoring convenience.


10. Database Design (SQLite)

10.1 Overview

The OTM uses two separate SQLite databases:

  1. Main OTM DB (otm.db) — Agent registry, SURE pending, task history. Core operational state.
  2. Error & Event DB (otm-errors.db) — Event log, error statistics. Monitoring and observability. Separated so that error monitoring is independent from the main OTM application and can be analysed, reset, or rebuilt without affecting operations.

Database files:

  • {OPENCLAW_DATA_DIR}/otm/otm.db — Main OTM DB
  • {OPENCLAW_DATA_DIR}/otm/otm-errors.db — Error & Event DB
  • (Sylvain to confirm exact OPENCLAW_DATA_DIR path)

Library: better-sqlite3 (synchronous, fast, WAL mode for both)

10.2 Tables

Main OTM DB (otm.db):

Table Purpose Writer(s) Reader(s) Rows (steady state) Growth rate
agents Agent registry (§4.5) OTM-1 OTM-0, OTM-2, OTM-3, OTM-5, OTM-6 5–15 Near-zero (new agents rare)
task_history Snapshot of archived tasks (§Flow 5) OTM-6 (TT-12) Admin queries, reporting Growing ~20–50/month
sure_pending Outstanding SURE notifications awaiting ack (§6) OTM-2, OTM ack handler OTM-7 (timeout check) 0–5 Transient (cleared on ack/timeout)

Error & Event DB (otm-errors.db):

Table Purpose Writer(s) Reader(s) Rows (steady state) Growth rate
event_log Structured event log (§9.1) All OTM handlers OTM-7, admin queries ~10,000 ~300/day (purged monthly)
event_log_archive Archived events >30 days (§9.4) Maintenance job Admin queries only ~100,000 ~9,000/month
error_stats Error frequency counters (§8.3) OTM-7 OTM-7, daily report 12 rows (one per ERR-xx) Fixed

10.3 Full Schema

Main OTM DB (otm.db):

-- Agent Registry (see §4.5 for column details)
CREATE TABLE agents (
  slack_user_id TEXT PRIMARY KEY,
  otm_display_name TEXT NOT NULL,
  openclaw_agent_id TEXT,
  agent_type TEXT NOT NULL DEFAULT 'ai',
  status TEXT DEFAULT 'idle',
  current_task TEXT,
  task_started_at INTEGER,
  last_seen INTEGER
);

-- Task History (archived tasks snapshot)
CREATE TABLE task_history (
  id INTEGER PRIMARY KEY AUTOINCREMENT,
  task_id TEXT NOT NULL,               -- Original Slack List item ID
  title TEXT NOT NULL,
  assigned_to TEXT,                     -- slack_user_id
  final_status TEXT NOT NULL,          -- "Archived" (from Done or Cancelled)
  previous_status TEXT,                -- Status before archival
  priority INTEGER,
  context TEXT,
  subtask_count INTEGER,               -- Total subtasks at archival time
  created_at INTEGER,                  -- Task creation timestamp
  assigned_at INTEGER,
  completed_at INTEGER,
  validated_at INTEGER,
  cancelled_at INTEGER,
  archived_at INTEGER NOT NULL,        -- When TT-12 executed
  result_summary TEXT,
  total_duration_ms INTEGER,           -- assigned_at → completed_at
  review_duration_ms INTEGER,          -- completed_at → validated_at
  rework_count INTEGER DEFAULT 0,      -- Number of TT-08 rework cycles
  error_count INTEGER DEFAULT 0        -- Number of ERR-xx events during lifecycle
);

CREATE INDEX idx_task_history_agent ON task_history(assigned_to);
CREATE INDEX idx_task_history_status ON task_history(final_status);
CREATE INDEX idx_task_history_archived ON task_history(archived_at);

-- SURE Pending Notifications (see §6)
CREATE TABLE sure_pending (
  task_id TEXT PRIMARY KEY,
  agent_id TEXT NOT NULL,
  notification_type TEXT NOT NULL,     -- 'task_assigned' | 'rework_assigned'
  sent_at INTEGER NOT NULL,            -- First OE-01 sent
  retry_count INTEGER DEFAULT 0,       -- 0, 1, 2, 3 (max)
  last_retry_at INTEGER,
  acknowledged_at INTEGER              -- Set when IE-03 received. NULL = still pending.
);

Error & Event DB (otm-errors.db):

-- Event Log (see §9.1)
CREATE TABLE event_log (
  id INTEGER PRIMARY KEY AUTOINCREMENT,
  timestamp INTEGER NOT NULL,
  event_type TEXT NOT NULL,
  event_code TEXT NOT NULL,
  task_id TEXT,
  agent_id TEXT,
  handler TEXT,
  source TEXT NOT NULL,
  detail TEXT,
  duration_ms INTEGER,
  success INTEGER DEFAULT 1,
  error_message TEXT
);

CREATE INDEX idx_event_log_task ON event_log(task_id);
CREATE INDEX idx_event_log_agent ON event_log(agent_id);
CREATE INDEX idx_event_log_type ON event_log(event_type, timestamp);
CREATE INDEX idx_event_log_code ON event_log(event_code, timestamp);
CREATE INDEX idx_event_log_time ON event_log(timestamp);

-- Event Log Archive (identical schema)
CREATE TABLE event_log_archive (
  id INTEGER PRIMARY KEY,
  timestamp INTEGER NOT NULL,
  event_type TEXT NOT NULL,
  event_code TEXT NOT NULL,
  task_id TEXT,
  agent_id TEXT,
  handler TEXT,
  source TEXT NOT NULL,
  detail TEXT,
  duration_ms INTEGER,
  success INTEGER DEFAULT 1,
  error_message TEXT
);

-- Error Statistics (see §8.3)
CREATE TABLE error_stats (
  error_code TEXT PRIMARY KEY,         -- ERR-01 through ERR-12
  total_count INTEGER DEFAULT 0,
  last_24h_count INTEGER DEFAULT 0,    -- Reset daily by maintenance job
  first_seen INTEGER,                  -- Unix timestamp
  last_seen INTEGER,                   -- Unix timestamp
  auto_corrected_count INTEGER DEFAULT 0,
  escalated_count INTEGER DEFAULT 0
);

10.4 Database Maintenance

Operation Frequency Triggered by Description
Event log rotation Daily (03:00) OTM-6 watchdog + time check Move events >30 days to event_log_archive
Archive export Monthly (1st, 03:30) OTM-6 watchdog + date check Dump event_log_archive to gzipped SQL file on disk, then TRUNCATE
Error stats reset Daily (00:00) OTM-6 watchdog + time check Reset last_24h_count to 0 for all ERR-xx rows
SURE cleanup Every 60s OTM-7 error monitor Remove sure_pending rows where acknowledged_at IS NOT NULL and >1 hour old
VACUUM Weekly (Sunday 03:00) OTM-6 watchdog + day check Reclaim disk space after deletions
WAL checkpoint Automatic SQLite WAL mode Handled by better-sqlite3 automatically
Backup Daily (04:00) Sylvain's backup cron Copy otm.db to backup location (standard server backup)

10.5 Volume Estimates

Assumptions: 5 active agents, ~10 tasks created/day, ~3 subtasks/task average, watchdog runs 1,440×/day.

Main OTM DB (otm.db):

Table Writes/day Reads/day Steady-state rows Disk (est.)
agents ~10 (status flips) ~500 (every handler checks registry) 5–15 <1 KB
task_history ~1–2 (archival events) ~5 (reporting queries) ~500/year ~500 KB
sure_pending ~20 (insert + update on ack) ~1,440 (timeout checks) 0–5 (transient) <1 KB
Subtotal ~30/day ~1,950/day ~520 <1 MB

Error & Event DB (otm-errors.db):

Table Writes/day Reads/day Steady-state rows Disk (est.)
event_log ~300 (all events) ~50 (error monitor + queries) ~9,000 (30-day window) ~5 MB
event_log_archive ~9,000/month (from rotation) ~5/month (admin queries) ~100,000 (1-year window) ~50 MB
error_stats ~20 (counter increments) ~1,440 (every watchdog cycle) 12 <1 KB
Subtotal ~320/day ~1,500/day ~109,000 ~55 MB

📌 At this scale, SQLite is well within its performance envelope for both databases. The separation means the error DB can be independently analysed, reset, or rebuilt without affecting OTM operations. A weekly VACUUM on each keeps files compact.

10.6 Historicisation Strategy

Main OTM DB (otm.db)
  ├── agents              — live state, small, never archived
  ├── sure_pending        — transient, cleaned hourly
  └── task_history        — growing archive of completed tasks

Error & Event DB (otm-errors.db)
  ├── event_log           — rolling 30-day window
  ├── event_log_archive   — rolling 1-year window
  └── error_stats         — cumulative counters, never purged

Filesystem log files (otm/logs/)
  ├── otm-events.log      — real-time event mirror (rotated daily, 30-day keep)
  └── otm-errors.log      — real-time error mirror (rotated daily, 30-day keep)

Monthly export (filesystem)
  └── {OPENCLAW_DATA_DIR}/otm/archive/
      ├── events-2026-01.sql.gz    — monthly event log dump from otm-errors.db
      ├── events-2026-02.sql.gz
      └── ...

Annual report (generated)
  └── Aggregate stats from task_history (otm.db) + error_stats (otm-errors.db)
      → Feeds into CMMI metrics collection

Non-Functional Requirements

Idempotency

  • All OTM handlers MUST be idempotent
  • Subtask completion: check todo_completed (Col00) field before processing (no separate dedup table)
  • State transitions: verify previous_status matches expected before applying
  • File pipeline dedup: injector checks existing Slack items by Task ID before creating

State Integrity

  • previous_status MUST be set before every status change
  • All status writes go through OTM → SW-1. No direct Slack writes by any actor.
  • Watchdog requests transitions via OTM handler calls, not direct writes
  • todo_completed (Col00) MUST be set alongside status when marking items done

Audit Trail

  • Every OTM event logged in Slack task conversation feed with timestamp (§7)
  • Slack conversation IS the audit log — no separate log table
  • All log entries include event code (IE-xx, OE-xx, TT-xx, AT-xx) for traceability

Persistence

  • Agent registry in SQLite
  • SQLite DB location: OpenClaw server data directory
  • Startup reconciliation from openclaw.json + Slack List on OTM restart (§4.6)

Latency

  • Event handling MUST complete within 5 seconds
  • Slack API writes SHOULD complete within 5 seconds
  • Agent notifications SHOULD be sent within 10 seconds
  • SURE acknowledgement timeout: 1 min (first), 2 min (retry), then error (3 min total)
  • Dispatcher runs every 2 min — maximum 2 min delay from task creation to agent trigger
  • Completion detector runs every 2 min — maximum 2 min delay from last subtask to agent_done

Error Handling

  • Slack API calls: retry up to 3 times with exponential backoff
  • Unhandled errors: alert admin via OE-05
  • Tasks MUST NOT be silently lost
  • All errors logged in task conversation feed (OE-07)
  • Dispatcher crash: Slack assigned status is the sole dedup gate (§System 3)
  • Gateway idempotencyKey does NOT provide real idempotency — dispatcher owns dedup

Security

  • OTM-4 (validate/reject) restricted to registered reviewer agents
  • All Slack API calls authenticated via bot tokens
  • DMZ relay uses bearer token + constant-time comparison (§System 6)
  • Receiver bound to 127.0.0.1 only

Technology Stack

Component Technology Owner (who runs it) Activity frequency Data volume
OTM backend TypeScript/Node.js, OpenClaw plugin pipeline Devdas (builds), Sylvain (deploys) Continuous — handles all events ~350 events/day processed
File pipeline Bash scripts + Node.js (injector, dispatcher, detector) Devdas (builds), Sylvain (deploys) launchd: watcher (FSEvents), sweeper (10 min), dispatcher (2 min), detector (2 min) ~10 tasks/day through pipeline
SQLite DB better-sqlite3, WAL mode OTM (sole writer), Sylvain (backups) ~350 writes/day, ~3,500 reads/day ~55 MB steady state (see §10.5)
SE-1 (event listener) Slack Events API, socket mode, Bolt SDK Salvatore's Slack app (lists:read) ~50 events/day (Slack → OTM) <1 KB/event payload
SW-1 (writer) Slack Web API (lists:write) Salvatore's Slack app (called by OTM) ~200 API calls/day (field updates + audit posts) <1 KB/call
OTM API OpenClaw hooks / HTTP endpoints OTM (receives), Orchestrator + Agents (call) ~100 API calls/day <1 KB/call
Agent notifications OpenClaw Gateway WS RPC (AI) / Slack task conversation (Human) OTM (sends), Agents (receive) ~20 notifications/day <1 KB/notification
Watchdog + Error Monitor OpenClaw cron (60s interval) OTM-6 + OTM-7 (automatic) 1,440 cycles/day ~20 error checks/cycle
Event logging SQLite event_log table (see §9) OTM (writes), Admin (queries) ~300 events/day, 30-day active window ~5 MB active, ~50 MB archive
DMZ relay Node.js receiver + broadcaster on Synology NAS Sylvain (deploys) On every state change <1 KB/push
Testing Vitest, mock Slack API Devdas (writes + runs) CI on every PR

Component Ownership Map

┌─────────────────────────────────────────────────────┐
│ Slack (Salvatore's Slack App)                       │
│   SE-1: lists:read (event listener)                 │
│   SW-1: lists:write (field updates + audit posts)   │
└──────────────┬──────────────────────────┬───────────┘
               │ IE-01                    ▲ OE-06, OE-07
               ▼                          │
┌─────────────────────────────────────────────────────┐
│ OTM (OpenClaw Plugin Pipeline)                      │
│   OTM-0: Event Router (internal)                    │
│   OTM-1: Agent Registry (internal)                  │
│   OTM-2: Handle Task Assigned                       │
│   OTM-3: Handle Subtask Done                        │
│   OTM-4: Task Validate/Reject                       │
│   OTM-5: Handle Task Cancelled                      │
│   OTM-6: Watchdog (cron, 60s)                       │
│   OTM-7: Error Monitor (cron, 60s)                  │
│                                                     │
│   SQLite DB: agents, event_log, error_stats,        │
│              task_history, sure_pending              │
└──────────────┬──────────────────────────┬───────────┘
               │ OE-01, OE-02, OE-04     ▲ IE-02, IE-03, IE-04–IE-07
               ▼                          │
┌─────────────────────────────────────────────────────┐
│ OpenClaw Agents                                     │
│   Orchestrator (Claudia): IE-04, IE-05, IE-06, IE-07│
│   Agents (Devdas, etc.): IE-02, IE-03              │
│   Human (Rupert): via Slack UI → SE-1 → IE-01      │
└─────────────────────────────────────────────────────┘

File Pipeline (Part 3):
┌─────────────────────────────────────────────────────┐
│ otm-create-task.sh → new-tasks/ → otm-injector.js  │
│   → Slack List (status=new)                         │
│   → otm-dispatcher.js → task-dispatch.json          │
│   → Gateway WS RPC → Agent sessions (parallel)      │
│   → otm-update-task.sh → task-updates/             │
│   → otm-completion-detector.js → agent_done        │
└─────────────────────────────────────────────────────┘

11. Cost Analysis

11.1 OTM Infrastructure Cost

Component Cost Notes
Slack Pro 1 user license Already paid. Gives API access (SE-1 + SW-1). No per-API-call cost.
SQLite $0 Open source, embedded. No server, no license.
Node.js / TypeScript $0 Open source runtime.
Bolt SDK $0 Open source Slack SDK.
better-sqlite3 $0 Open source library.
OpenClaw $0 (incremental) OTM runs as a plugin inside the existing gateway. No additional instance.
Filesystem logging $0 Append to local files.
DMZ relay $0 (incremental) Runs on existing Synology NAS.
Total OTM cost $0 incremental Only pre-existing Slack Pro license required.

11.2 AI Usage by the OTM

The OTM uses zero AI. It is a deterministic state machine implemented in TypeScript. No LLM calls, no embeddings, no inference. Every decision is rule-based:

  • Routing: field comparison (OTM-0)
  • Agent availability: SQLite lookup (OTM-1)
  • State transitions: precondition checks + status writes (OTM-2 through OTM-5)
  • Error detection: SQL queries against known patterns (OTM-7)
  • Watchdog: timer + threshold checks (OTM-6)
  • File pipeline: filesystem watches + Slack API calls (Systems 1–4)

Token consumption by OTM: 0 tokens.

11.3 AI Usage by Actors (Outside OTM)

The actors that interact with the OTM do consume AI tokens, but this is outside the OTM's scope:

Actor AI usage OTM interaction cost
Orchestrator (Claudia) LLM calls for task planning, review, rework design OTM API calls = HTTP requests, ~0 tokens
Agents (Devdas, etc.) LLM calls for task execution OTM API calls (IE-02, IE-03) = HTTP requests, ~0 tokens
Human (Rupert) None (uses Slack UI) Slack events = Slack infrastructure, ~0 tokens

📌 The OTM API calls (IE-02 through IE-07) are simple HTTP POST requests with JSON payloads. They consume zero AI tokens. The only AI costs are generated by the agents and orchestrator doing their actual work — which they would do regardless of whether the OTM exists.

11.4 Cost Summary

OTM operation cost:     $0/month (zero AI, zero external services)
Slack API cost:         $0/month (included in existing Pro plan)
Infrastructure cost:    $0/month (runs on existing OpenClaw server + Synology NAS)
──────────────────────────────────────────────────────
Total incremental cost: $0/month

12. Project Deliverables

# Deliverable Owner Description
D-01 OTM-SPEC (this document) Claudia Specification and architecture
D-02 OTM-TESTS Claudia Test scenarios document
D-03 OTM implementation Devdas TypeScript plugin pipeline (OTM-0 through OTM-7, SE-1, SW-1)
D-04 SQLite schemas + migrations Devdas otm.db and otm-errors.db setup
D-05 Unit + integration tests Devdas Vitest test suite matching OTM-TESTS scenarios
D-06 Infrastructure setup Sylvain DB paths, cron config, backup setup, log rotation
D-07 task-orchestration skill Claudia OpenClaw skill for Claudia's Orchestrator role: task creation, project → step → task decomposition, assignment logic, validation/rejection, rework subtask design. This skill encodes the Orchestrator's side of the OTM protocol.
D-08 Slack app config Salvatore lists:read + lists:write scopes, socket mode setup
D-09 End-to-end validation Claudia + Devdas Full test suite execution on real Slack workspace
D-10 File pipeline scripts Devdas otm-create-task.sh, otm-update-task.sh, otm-injector.js, otm-dispatcher.js, otm-completion-detector.js
D-11 DMZ relay deployment Sylvain receiver.js + broadcaster.js on Synology NAS, TLS proxy setup
D-12 launchd plists Sylvain Watcher, sweeper, dispatcher, completion detector plists

📌 D-07 (task-orchestration skill) will include Rupert's higher-level instructions on how to break down projects into steps and steps into tasks. It is part of the scope of the full-blown validation tests (D-09).


Open Questions

# Question Status
1 Exact list_item_updated event payload schema Needs Salvatore to capture sample events
2 Can socket mode receive list events on Pro? Needs verification (may need Events API HTTP mode)
3 Plugin pipeline registration mechanism in OpenClaw Needs Devdas to investigate
4 SQLite file location on OpenClaw server Sylvain to decide
5 How agents tick subtasks in practice Resolved in v1.3: Agents call OTM API (IE-02), OTM updates Slack via SW-1. No direct Slack UI interaction.
6 Slack conversation API for List items — does it exist? Needs Salvatore to verify (may need workaround)
7 SURE ack timeout values and gateway restart handling Resolved in v1.5: Timeouts revised to 1min + 2min + error (3min total). Gateway restart handling defined in §4.7: OTM reconciles from Slack List + openclaw.json on startup, detects SURE timeouts, auto-corrects agent-task mismatches. Gateway restart logged as system event (IE-SYS-01). Orchestrator does not need to re-register agents.
8 OpenClaw agent → Slack user ID mapping in openclaw.json Needs Sylvain to confirm config structure
9 Human user registration protocol Deferred. v1 hard-codes Rupert + Claudia (§4.6). Future: how are new human users registered? Auto-detect from Slack assigned_to? Manual admin command? Re-registration after OTM restart? What about clients?
10 Slack archive API — can items be archived programmatically? Resolved in v1.6: slackLists.items.archive does not exist. Archiving is manual-only via Slack UI. OTM sets status = archived but cannot trigger visual Slack archival.
11 Gateway idempotencyKey — does it prevent duplicate sessions? Resolved in v1.8: No. The key is used as a runId label only. Duplicate calls with the same key = duplicate sessions. Dispatcher must use the Slack new → assigned status flip as the sole dedup mechanism.

Deprecated Items

Item ID/Flag Notes
Project (old select column) Col0AL4UJ8BJ8 Replaced by Project 2 text column (Col0ALZBS9C8Z) — 2026-03-14
Types: implementation, research, etc. Removed — only action / decision / review are active
--description parameter Removed from otm-create-task.sh — use --subtask instead
--creator parameter Removed — merged with --agent
--assignedTo parameter Removed — merged with --agent
TT-09 (Rejected → New) Removed in v1.3 — use TT-10 (cancel) + new task instead

End of Specification — v1.8 — OpenClaw Task Manager

Last updated: 2026-03-14

@RupertBarrow
Copy link
Author

3.2 Task fields
might be interesting to have a last status field heading to the User but useful when the OTM fails and changes the status to fail.
For failure analysis, we need to know what status the task was in before that

Throughout the document when you say “Claudia” you should actually be saying “the orchestrator”

Transition T-12 : watchdog tells OTM to change the state

3.3 : ALL transitions requested by Claudia/the orchestrated should go through the OTM. The OTM state transition management system is the only one near the system which is ability ability to change task status And related fields.

We also Need to state machine and state transition table for the agent which can be idle busy et cetera.
So we have two State diagrams one task and one for agent. We have two sets of transition actions, and two sets of triggering events.

Rename and recur all of these with significant prefixes update to 1.2 version of this document and come back for me to review.

@RupertBarrow
Copy link
Author

How does the OTM notify an agent to work on a task ?
We need a SURE system which confirms that the agent received the request, and expects an acknowledgement by the agent.
The 2 messages need to be logged by the OTM in the conversation feed of the Task in Slack, with date+time of the request and of the acknowledgement.

All state changed by the OTM will also be date+timed and traced in the same conversation feed.
All requests received by the OM from the Orchestrator will be date+timed and traced in the same conversation feed.
All changes by a human, detected by the OTM, will be date+timed and traced in the same conversation feed.

§3.2.1, OTM-initiated TT-02 : "Agent registry shows idle" is not a trigger, it is a state. Name precisely (with its code) the triggering event. Ditto for "Agent registry shows busy" in TT-03. TT-13 : how is this "Unrecoverable agent error" detected ? Name precisely (with its code) the triggering event.
TT-12 : ditto, name precisely (with its code) the triggering event "7 days elapsed".

§3.3 : split the last column into 2 : first "Inbound event" then "OTM action". Would it simplify things to add an "Outbound event" to this column ? (ditto §4.3)

§4.5 : name these 3 attributes more explicitely :
agent_id TEXT PRIMARY KEY, --> slack_user_id
agent_name TEXT NOT NULL, --> otm_display_name
openclaw_agent_id TEXT, --> openclaw_agent_id

Presumably, the slack_user_id of OpenClaw agents are already defined in the openclaw.json configuration : can we make this dynamically queried from OpenClaw ? Do this in "startup reconciliation".

Does the OTM only manage automated agents ? Or also human agents (Rupert, client ?) Are humans only identified by their Slack id in our system ?

SE-1 actions "Routes to OTM-2 (TT-01), OTM-3 (TT-04), or OTM-5 (TT-11) based on field change" : for maintainance simplicity, the list_item_updated event should be routed to 1 single entry point of the OTM, which will then do the routing. We want 0 intelligence in the Slack event listener. Move "2. Determines the event type" into the OTM itself.

SE-1 "Required Slack App Scopes" : separate concerns. The event listener in the Slack app only requires the lists:read Slack scope.

OTM-1 "triggering event" : codify each of the events. "actions" : codify each action. Lay out each item on a new line in the Detail column to make it more readable.
OTM-2, OTM-3, OTM-4, OTM-5, OTM-6 : ditto
OTM-3 : why do we need a "processed_events" table ? How will this grow ? How will it be maintained, cleaned up, etc ? "event_id" : events are numbered ?
Aren't traces on Slack tasks' conversations (human-readable) enough ?

Flow 2a :

  • "Agent ticks subtask checkbox in Slack" : I think the agent should talk directly to the OTM : the OTM can then handle updating the task and the rest of the flow. Same in 2b.
  • OTM-3 : "writes to Slack" : how ? This should mention the "write" part of the Slack app, which does not appear as part of the system yet.

Flow 3a : "Orchestrator reviews task → POST /api/task-validate { outcome: "validated" }"
The orchestrator should tell the OTM to validate the task. Ditto in 3b.

Flow 3b : we need to rework this : the orchestrator is going to ask the OTM to reject the task, and ask him to notify to the agent that it needs reworking. This supposes that the orchestrator :

  • does not touch completed subtasks of the task
  • removes other subtasks and replaces them with a new list of subtasks
  • passes a message to understand the reason why a rework is requested.
    I suggest that the first subtask re-added by the orchestrator to the task for rework be labeled something like "acknowledge rework request : (+ detailed explanation)". The agent receiving this rework task should take the first subtask and close it to acknowledge.
    I'm not sure we need a "rework_action"; the OTM doesn't have to notify the agent of "rework needed".

We also need to have an error control mechanism to detect inconsistencies like :

  • an task "in progress" or "rework" have not been updated in more than 10 minutes
  • there are "in progress" tasks assigned to an agent who is "idle"
  • etc.
    List all possible situations and design the error monitor to detect these, trace errors for lessons learned and quality improvements, and reassigning tasks, unblocking agents or channel communications etc.

Also plan a tracking/event logging system (such as Sentry or internal) to trace all events and actions of the OTM.

Flow 4 : "7 days later". Already mentioned, codify this event

Technology Stack
Before this chapter, summarise the features, tables, and users of the SQLite database. Explain how it is maintained, flushed, historicised.
Confirm "who" (which part of the system) uses each of these components.
Estimate activity, frequency, data volumes, etc. for each component.

Tests (in another doc) : write up (if not already done) test scenarios on a mock project in the real system.

Open Question 5 "How agents tick subtasks in practice (API call? Slack UI?)" : answer = via the OTM.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment