Source: /Users/mike/code/docs/sqlite-adoption-and-storage-audit.md
Date: 2026-03-14
Comprehensive inventory of how the arthack monorepo stores state, config, and data on disk -- plus a proposed SQLite convention for consolidating file-based state.
Three databases already exist, each with its own patterns.
- Path:
~/.local/share/telegramctl/messages.db - Stack: SQLModel + SQLAlchemy
- Source:
apps/telegramctl/telegramctl/db.py,models.py - Pragmas: WAL, busy_timeout=5000
- Schema:
messagestable with 13 columns, soft deletes (deleted_at), 5 indexes - Migrations: Schema introspection (
PRAGMA table_info) before ALTER TABLE - Features: Upsert on edit_date, soft delete with batch staleness detection, singleton engine
- Cross-CLI reader: linksctl opens this DB read-only via URI mode to extract URLs
- Path:
~/.local/share/remindctl/reminders.db - Stack: Raw sqlite3
- Source:
apps/remindctl/remindctl/db.py(45 lines total) - Pragmas: WAL
- Schema: Single
reminderstable (id, message, alert, due_at, created_at, delivered_at) - Migrations:
contextlib.suppress(sqlite3.OperationalError)for idempotent ALTER TABLE - Features: sqlite3.Row factory for dict-like access
- Path:
~/.local/state/claude/hook-events.db(overridable viaHOOK_EVENTS_DB) - Stack: Raw sqlite3
- Source:
claude/plugins/claude/hooks/hook-events-db.py - Pragmas: WAL, timeout=5
- Schema:
eventstable with 11 columns + full JSON blob, 5 indexes - Usage: Write-only from hook script. Reads are manual/ad-hoc.
- Path:
~/.local/state/arthack/llm-sessions/{session_id}/logs.db - Source:
claude/plugins/arthack/hooks/rewrite_llm.py - Stack: External
llmtool manages schema and writes; hook only injects-d {path}flag - Per-session isolation: One DB per Claude session
- Paths:
~/.local/share/knowctl/data/chromadb/,~/.local/share/claudectl/data/ - Stack: ChromaDB (Haystack) + SentenceTransformers via vectorctl daemon
- IPC: ZMQ over
~/.local/share/vectorctl/daemon.sock - Not a migration target -- separate concern (vector search)
These apps store each record as a separate YAML file and glob the directory to list or search. Highest-value SQLite migration candidates.
| App | Path | Record Type | Access Pattern |
|---|---|---|---|
| kanbanctl | ~/.local/share/kanbanctl/tasks/{task_id}.yaml |
Task with stage, history, metadata | glob *.yaml, parse all, filter by stage, sort by created_at |
| skillctl | ~/.local/share/skillctl/skill-runs/{ts}-{id}.yaml |
SkillRun dataclass (10 fields) | glob *.yaml, parse all, find by ID prefix |
| skillctl | ~/.local/share/skillctl/pipeline-runs/{ts}-{id}.yaml |
PipelineRun dataclass + steps | glob *.yaml, parse all, find by ID prefix |
| telegramctl | ~/.local/share/telegramctl/tasks/{message_id}.yaml |
Message-to-task mapping | Load by message_id |
| pairctl | ~/.local/share/pairctl/projects/{slug}/chats/{id}/chat.yaml |
Chat with messages list | Nested dirs, growing message arrays |
| usagectl | ~/.local/state/usagectl/runs/{run_id}/{preset}.yaml |
Usage run data | Glob runs, wrong XDG dir |
| App | Path | Format | Contents |
|---|---|---|---|
| voicectl | ~/.local/share/voicectl/transcribed_voices.yaml |
YAML dict | chat_id:message_id -> transcribed_at, grows indefinitely |
| telegraphctl | ~/.local/share/telegraphctl/account.json |
JSON | Single Telegraph account (access_token, etc.) |
| telegraphctl | ~/.local/share/telegraphctl/pages.json |
JSON | Array of published pages, scanned by find_page() |
| deployctl | ~/.local/share/deployctl/ci-state.yaml |
YAML | tracked_runs dict keyed by run ID |
| deployctl | ~/.local/share/deployctl/deploys-state.yaml |
YAML | tracked_runs dict keyed by run ID |
| claudectl | ~/.local/share/claudectl/sync-state.json |
JSON | Daemon sync checkpoint |
- Path:
~/.local/share/telegramctl/plans/{plan_id}/metadata.yaml+v{n}.md - Format: YAML metadata + versioned markdown files
- Purpose: Plan history tracking with Telegraph URL and Telegram message ID references
The load_daemon_state() / save_daemon_state() pattern manages small YAML files with last_run_at timestamps. Used by ~8 daemons:
| App | State File Location |
|---|---|
| claudectl (watch-prompts) | ~/.local/share/claudectl/ |
| claudectl (check-upgrade) | ~/.local/share/claudectl/ |
| claudectl (daily-report) | ~/.local/state/claudectl/daily-report-log.yaml |
| gogctl (check-auth) | ~/.local/share/gogctl/ |
| boxctl (check-artbird) | ~/.local/share/boxctl/ |
| qmdctl (check-health) | ~/.local/share/qmdctl/ |
| knowctl (sync-topics) | ~/.local/share/knowctl/sync-state.yaml |
| linksctl (sync-links) | ~/.local/share/linksctl/sync-state.yaml |
Verdict: Not worth migrating to SQLite. Trivial data (1-5 keys), harmless to lose, infrequent writes.
Config files stay as YAML/env files per repo convention. Not SQLite candidates.
| App | File | Format | Contents |
|---|---|---|---|
| arthack | hosts.yaml |
YAML | Host configuration for SSH/Tailscale (loaded by cli_common.hosts) |
| telegramctl | config.yaml |
YAML | Bot credentials, API keys, chat IDs, authorized users |
| linksctl | config.yaml |
YAML | Content directory path |
| deployctl | config.yaml |
YAML | GitHub username, deploy target configs |
| claudectl | config.yaml |
YAML | Sync settings, embedding batch size, max workers |
| phonectl | config.yaml |
YAML | Phone server URL |
| notifyctl | config.yaml |
YAML | Notification settings |
| tmuxctl | layouts/*.yaml |
YAML | Session layout templates |
| keepctl | env |
Plain text | GKEEP_EMAIL, GKEEP_MASTER_TOKEN |
| voicectl | env |
Plain text | OPENAI_STT_API_KEY, OPENAI_TTS_API_KEY |
| searchctl | env |
Plain text | PERPLEXITY_API_KEY |
| mcpctl | registered/{alias}/env |
Plain text | Per-MCP-server environment variables |
| App | Path | Purpose |
|---|---|---|
| knowctl | apps/knowctl/config/knowctl/config.yaml |
Embedding, chunking, search settings (deep-merged with defaults) |
| pairctl | apps/pairctl/config/{claude,codex,gemini}.yaml |
Partner CLI definitions |
| scrapectl | apps/scrapectl/config/presets/*.yaml |
Scraping preset templates |
| App | Path | Contents | Regenerable? |
|---|---|---|---|
| claudectl | ~/.cache/claudectl/claude-code-system-prompts/ |
Cloned system prompts repo | Yes (git clone) |
| mcpctl | ~/.cache/mcpctl/*.json |
Cached MCP server metadata | Yes |
| summaryctl | ~/.local/share/summaryctl/cache/ |
Summary cache by content hash | Yes |
| knowctl | ~/.local/share/knowctl/scraped-topic-urls-cache/ |
Scraped URL caches | Yes |
| knowctl | ~/.local/share/knowctl/youtube-cache/ |
YouTube transcript cache | Yes |
| App | Path Pattern | Module | Cleanup | Purpose |
|---|---|---|---|---|
| browserctl | /tmp/browserctl/uploads/{uuid}/{filename} |
server/api.py handle_upload |
Session GC | File uploads for CDP file input |
| browserctl | /tmp/browserctl/pool |
server/config.py PathsConfig |
Manual (provisioning) | Chrome working directory on artbird |
| brandctl | /tmp/{uuid}.svg |
run_generate_images.py |
Explicit unlink in finally | SVG intermediate for rsvg-convert |
| brandctl | /tmp/{uuid}.png |
run_generate_images.py |
Explicit unlink in finally | PNG intermediate for PIL processing |
| brandctl | /tmp/{uuid}/AppIcon.iconset/ |
run_generate_images.py |
Auto (TemporaryDirectory) | macOS .icns staging for iconutil |
| knowctl | /tmp/knowctl-db-{uuid}/ |
run_sync_topics.py |
Explicit rmtree after swap | ChromaDB atomic refresh staging |
| promptctl | /tmp/{template_name}.md |
run_render_demo.py |
Explicit unlink in finally | Temp file for gh gist create |
| promptctl | /tmp/promptctl-render-demo.md |
run_render_demo.py |
Explicit unlink in finally | Summary gist output |
| voicectl | /tmp/voicectl-{uuid}/ |
api.py transcribe_file |
Auto (TemporaryDirectory) | Audio format conversion workspace |
| telegramctl | /tmp/{uuid}/voice |
run_transcribe_voice.py |
Auto (TemporaryDirectory) | Voice message download buffer |
| pairctl | /tmp/pairctl-{uuid}.md |
helpers.py invoke_cli |
Explicit unlink in except/finally | Partner CLI output capture |
| gitctl | /tmp/gitctl-lint-{ts}.txt |
helpers.py open_fix_session |
None (orphaned) | Lint errors passed to tmux Claude session |
| App | Path Pattern | Module | Created By | Purpose |
|---|---|---|---|---|
| kanbanctl | /tmp/arthack-claude/panes/{server_uuid}/{pane_id}.json |
task_helpers.py find_pane_for_session |
Claude Code binary | Session-to-pane discovery for plan approval |
- No accumulation issues. All temp files use either explicit cleanup or context managers, except gitctl lint files which are intentionally left for tmux sessions.
- Secure patterns. All use
tempfilemodule (not hardcoded names), except browserctl's fixed/tmp/browserctl/tree and the Claude pane discovery path. - One orphan pattern.
/tmp/gitctl-lint-{timestamp}.txtfiles are never cleaned up. Low impact -- small text files, cleaned on reboot.
| Type | Path | App | Purpose |
|---|---|---|---|
| Unix socket | ~/.local/share/vectorctl/daemon.sock |
vectorctl | ZMQ IPC for embedding daemon |
| Unix socket | ~/.local/share/telegramctl/*.sock |
telegramctl | Bot-to-CLI IPC |
| fcntl lock | ~/.local/state/locks/{name}.lock |
cli_common.daemon | Exclusive daemon instance lock |
| Lock file | ~/.local/state/gitctl/commit-{hash}.lock |
gitctl | Per-project commit lock |
| PID file | ~/.local/share/remindctl/check-reminders.pid |
remindctl | Daemon PID tracking |
| Item | Reason |
|---|---|
All config files (~/.config/*/) |
YAML config stays per repo convention |
| Daemon state YAML (last_run_at) | Trivial, harmless to lose, 8 CLIs share the pattern |
| summaryctl cache | Content-hash keyed, regenerable, can be blown away |
| knowctl caches (scraped URLs, YouTube) | Regenerable content caches |
| usagectl markdown reports | Human-readable output alongside data files |
| spoolctl queue YAML | Filesystem IS the queue (create = enqueue, delete = dequeue). 87-line module. |
| skillctl active-pipeline.yaml | Ephemeral runtime IPC signal (written, read, deleted within one pipeline run) |
| skillctl interrupt.yaml | Ephemeral IPC signal file |
| assistctl agent-prompts | Transient prompt staging |
| ChromaDB stores | Separate vector search concern, managed by vectorctl daemon |
| Current Path | Correct Path | Issue |
|---|---|---|
~/.local/state/usagectl/runs/ |
~/.local/share/usagectl/ |
Run data is persistent, not ephemeral state |
~/.local/state/assistctl/conversations/ |
~/.local/share/assistctl/ |
Conversation history is persistent data |
~/.local/state/claudectl/daily-report-log.yaml |
~/.local/share/claudectl/ |
Accumulator data, not transient state |
~/.local/state/claude/hook-events.db |
~/.local/share/claude/ |
Database is persistent data (low priority -- hook script, not a CLI) |
Each CLI that needs structured persistence gets ~/.local/share/<cli>/<cli>.db. No shared database. Respects CLI boundary rules.
The default stack is raw sqlite3 with sqlite3.Row factory. telegramctl keeps SQLModel as the exception (it earned it with complex ORM queries and soft deletes).
Why raw: Most CLIs have 1-3 tables with simple CRUD. sqlite3 is stdlib (zero new deps). remindctl proves the pattern works in 45 lines.
New shared module providing boilerplate that each CLI's db.py currently reinvents:
def connect(cli_name: str) -> sqlite3.Connection
# Path: ~/.local/share/{cli_name}/{cli_name}.db
# Sets: WAL, busy_timeout=5000, foreign_keys=ON, row_factory=Row
# Creates parent dirs if needed
def migrate(conn: sqlite3.Connection, migrations: list[str]) -> None
# Apply ALTER TABLE migrations with suppress(OperationalError)
def connect_readonly(db_path: str | Path) -> sqlite3.Connection
# URI mode with ?mode=ro for cross-CLI reads"""SQLite database for <cli>."""
_SCHEMA = """\
CREATE TABLE IF NOT EXISTS <table> (
id INTEGER PRIMARY KEY AUTOINCREMENT,
...
created_at TEXT NOT NULL
);
"""
_MIGRATIONS: list[str] = []
def connect() -> sqlite3.Connection:
from cli_common.db import connect as _connect, migrate
conn = _connect("<cli>")
conn.executescript(_SCHEMA)
migrate(conn, _MIGRATIONS)
return connEach CLI handles one-time YAML-to-SQLite migration in its connect():
- Check if old data files exist
- If DB is empty and old files exist, import them
- Delete old files after successful import
Create cli_common/db.py with connect(), migrate(), connect_readonly(). No CLI changes yet.
Apps that glob YAML files to list/search/filter -- biggest wins from SQLite.
2a. kanbanctl -- task_helpers.py globs *.yaml to list tasks. tasks table with JSON history column. Direct mapping from save/load/list/delete/advance functions to SQL.
2b. skillctl -- skill_store.py and pipeline_store.py both glob *.yaml with identical patterns. skill_runs and pipeline_runs tables. pipeline_steps gets FK to pipeline_runs.
2c. telegraphctl -- Two JSON files. accounts (single-row) and pages tables.
2d. voicectl -- Growing YAML dict loaded and re-saved in full on every transcription. Simple transcribed_voices table with composite PK.
2e. telegramctl tasks (alongside 2a) -- Add telegram_tasks table to existing messages.db. These are SQLModel tables since telegramctl already uses that stack.
3a. pairctl -- Nested YAML chat files. chats + messages tables.
3b. deployctl -- Two YAML state files. Single tracked_runs table with type column.
3c. usagectl -- YAML run files in wrong XDG dir. runs table + fix XDG path.
4a. remindctl -- Switch to cli_common.db.connect("remindctl"). Removes ~10 lines of boilerplate.
4b. hook-events -- Leave as-is. Standalone hook script, write-only, no benefit from shared infra.
~/.config/
arthack/hosts.yaml
claudectl/config.yaml
deployctl/config.yaml
keepctl/env
linksctl/config.yaml
mcpctl/registered/*/env
notifyctl/config.yaml
phonectl/config.yaml
searchctl/env
telegramctl/config.yaml
tmuxctl/layouts/*.yaml
voicectl/env
~/.local/share/
claudectl/sync-state.json, data/ (ChromaDB)
deployctl/ci-state.yaml, deploys-state.yaml
kanbanctl/tasks/*.yaml
knowctl/sync-state.yaml, data/, scraped-topic-urls-cache/, youtube-cache/
linksctl/sync-state.yaml
pairctl/projects/*/chats/*/chat.yaml
remindctl/reminders.db, check-reminders.pid
skillctl/skill-runs/*.yaml, pipeline-runs/*.yaml
spoolctl/queue/*.yaml, failed/*.yaml
summaryctl/cache/
telegramctl/messages.db, tasks/*.yaml, plans/*/, *.sock
telegraphctl/account.json, pages.json
vectorctl/daemon.sock
voicectl/transcribed_voices.yaml
~/.local/state/
assistctl/conversations/, agent-prompts/
claude/hook-events.db
claudectl/sync-sessions.log, daily-report-log.yaml
gitctl/commit-*.lock
locks/*.lock
skillctl/active-pipeline.yaml, interrupt.yaml
usagectl/runs/*/*.yaml, latest/*.yaml
~/.cache/
claudectl/claude-code-system-prompts/
mcpctl/*.json
/tmp/
arthack-claude/panes/*/ (Claude Code, read by kanbanctl)
browserctl/uploads/*/ (browserctl)
browserctl/pool (browserctl, artbird only)
knowctl-db-*/ (knowctl, transient)
voicectl-*/ (voicectl, transient)
pairctl-*.md (pairctl, transient)
gitctl-lint-*.txt (gitctl, orphaned)
*.svg, *.png (brandctl, transient)
promptctl-render-demo.md (promptctl, transient)