Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Select an option

  • Save possibilities/d9df01aab453a6b9203e2dbffd68399a to your computer and use it in GitHub Desktop.

Select an option

Save possibilities/d9df01aab453a6b9203e2dbffd68399a to your computer and use it in GitHub Desktop.
Storage Audit and SQLite Adoption Strategy for arthack monorepo

Storage Audit and SQLite Adoption Strategy

Source: /Users/mike/code/docs/sqlite-adoption-and-storage-audit.md Date: 2026-03-14

Comprehensive inventory of how the arthack monorepo stores state, config, and data on disk -- plus a proposed SQLite convention for consolidating file-based state.


1. Existing SQLite Databases

Three databases already exist, each with its own patterns.

telegramctl -- Message Archive

  • Path: ~/.local/share/telegramctl/messages.db
  • Stack: SQLModel + SQLAlchemy
  • Source: apps/telegramctl/telegramctl/db.py, models.py
  • Pragmas: WAL, busy_timeout=5000
  • Schema: messages table with 13 columns, soft deletes (deleted_at), 5 indexes
  • Migrations: Schema introspection (PRAGMA table_info) before ALTER TABLE
  • Features: Upsert on edit_date, soft delete with batch staleness detection, singleton engine
  • Cross-CLI reader: linksctl opens this DB read-only via URI mode to extract URLs

remindctl -- Reminders

  • Path: ~/.local/share/remindctl/reminders.db
  • Stack: Raw sqlite3
  • Source: apps/remindctl/remindctl/db.py (45 lines total)
  • Pragmas: WAL
  • Schema: Single reminders table (id, message, alert, due_at, created_at, delivered_at)
  • Migrations: contextlib.suppress(sqlite3.OperationalError) for idempotent ALTER TABLE
  • Features: sqlite3.Row factory for dict-like access

hook-events -- Claude Hook Analytics

  • Path: ~/.local/state/claude/hook-events.db (overridable via HOOK_EVENTS_DB)
  • Stack: Raw sqlite3
  • Source: claude/plugins/claude/hooks/hook-events-db.py
  • Pragmas: WAL, timeout=5
  • Schema: events table with 11 columns + full JSON blob, 5 indexes
  • Usage: Write-only from hook script. Reads are manual/ad-hoc.

LLM Session Logs (external)

  • Path: ~/.local/state/arthack/llm-sessions/{session_id}/logs.db
  • Source: claude/plugins/arthack/hooks/rewrite_llm.py
  • Stack: External llm tool manages schema and writes; hook only injects -d {path} flag
  • Per-session isolation: One DB per Claude session

ChromaDB (not SQLite)

  • Paths: ~/.local/share/knowctl/data/chromadb/, ~/.local/share/claudectl/data/
  • Stack: ChromaDB (Haystack) + SentenceTransformers via vectorctl daemon
  • IPC: ZMQ over ~/.local/share/vectorctl/daemon.sock
  • Not a migration target -- separate concern (vector search)

2. File-Based State Stores

YAML-per-record (glob to list/search)

These apps store each record as a separate YAML file and glob the directory to list or search. Highest-value SQLite migration candidates.

App Path Record Type Access Pattern
kanbanctl ~/.local/share/kanbanctl/tasks/{task_id}.yaml Task with stage, history, metadata glob *.yaml, parse all, filter by stage, sort by created_at
skillctl ~/.local/share/skillctl/skill-runs/{ts}-{id}.yaml SkillRun dataclass (10 fields) glob *.yaml, parse all, find by ID prefix
skillctl ~/.local/share/skillctl/pipeline-runs/{ts}-{id}.yaml PipelineRun dataclass + steps glob *.yaml, parse all, find by ID prefix
telegramctl ~/.local/share/telegramctl/tasks/{message_id}.yaml Message-to-task mapping Load by message_id
pairctl ~/.local/share/pairctl/projects/{slug}/chats/{id}/chat.yaml Chat with messages list Nested dirs, growing message arrays
usagectl ~/.local/state/usagectl/runs/{run_id}/{preset}.yaml Usage run data Glob runs, wrong XDG dir

Single-file state (load-all, modify, re-save)

App Path Format Contents
voicectl ~/.local/share/voicectl/transcribed_voices.yaml YAML dict chat_id:message_id -> transcribed_at, grows indefinitely
telegraphctl ~/.local/share/telegraphctl/account.json JSON Single Telegraph account (access_token, etc.)
telegraphctl ~/.local/share/telegraphctl/pages.json JSON Array of published pages, scanned by find_page()
deployctl ~/.local/share/deployctl/ci-state.yaml YAML tracked_runs dict keyed by run ID
deployctl ~/.local/share/deployctl/deploys-state.yaml YAML tracked_runs dict keyed by run ID
claudectl ~/.local/share/claudectl/sync-state.json JSON Daemon sync checkpoint

Plan versioning (telegramctl)

  • Path: ~/.local/share/telegramctl/plans/{plan_id}/metadata.yaml + v{n}.md
  • Format: YAML metadata + versioned markdown files
  • Purpose: Plan history tracking with Telegraph URL and Telegram message ID references

Daemon state (via cli_common.daemon)

The load_daemon_state() / save_daemon_state() pattern manages small YAML files with last_run_at timestamps. Used by ~8 daemons:

App State File Location
claudectl (watch-prompts) ~/.local/share/claudectl/
claudectl (check-upgrade) ~/.local/share/claudectl/
claudectl (daily-report) ~/.local/state/claudectl/daily-report-log.yaml
gogctl (check-auth) ~/.local/share/gogctl/
boxctl (check-artbird) ~/.local/share/boxctl/
qmdctl (check-health) ~/.local/share/qmdctl/
knowctl (sync-topics) ~/.local/share/knowctl/sync-state.yaml
linksctl (sync-links) ~/.local/share/linksctl/sync-state.yaml

Verdict: Not worth migrating to SQLite. Trivial data (1-5 keys), harmless to lose, infrequent writes.


3. Config Files

Config files stay as YAML/env files per repo convention. Not SQLite candidates.

~/.config/ directory

App File Format Contents
arthack hosts.yaml YAML Host configuration for SSH/Tailscale (loaded by cli_common.hosts)
telegramctl config.yaml YAML Bot credentials, API keys, chat IDs, authorized users
linksctl config.yaml YAML Content directory path
deployctl config.yaml YAML GitHub username, deploy target configs
claudectl config.yaml YAML Sync settings, embedding batch size, max workers
phonectl config.yaml YAML Phone server URL
notifyctl config.yaml YAML Notification settings
tmuxctl layouts/*.yaml YAML Session layout templates
keepctl env Plain text GKEEP_EMAIL, GKEEP_MASTER_TOKEN
voicectl env Plain text OPENAI_STT_API_KEY, OPENAI_TTS_API_KEY
searchctl env Plain text PERPLEXITY_API_KEY
mcpctl registered/{alias}/env Plain text Per-MCP-server environment variables

Built-in configs (shipped in repo)

App Path Purpose
knowctl apps/knowctl/config/knowctl/config.yaml Embedding, chunking, search settings (deep-merged with defaults)
pairctl apps/pairctl/config/{claude,codex,gemini}.yaml Partner CLI definitions
scrapectl apps/scrapectl/config/presets/*.yaml Scraping preset templates

4. Cache Directories

App Path Contents Regenerable?
claudectl ~/.cache/claudectl/claude-code-system-prompts/ Cloned system prompts repo Yes (git clone)
mcpctl ~/.cache/mcpctl/*.json Cached MCP server metadata Yes
summaryctl ~/.local/share/summaryctl/cache/ Summary cache by content hash Yes
knowctl ~/.local/share/knowctl/scraped-topic-urls-cache/ Scraped URL caches Yes
knowctl ~/.local/share/knowctl/youtube-cache/ YouTube transcript cache Yes

5. /tmp File Inventory

Files created by arthack apps

App Path Pattern Module Cleanup Purpose
browserctl /tmp/browserctl/uploads/{uuid}/{filename} server/api.py handle_upload Session GC File uploads for CDP file input
browserctl /tmp/browserctl/pool server/config.py PathsConfig Manual (provisioning) Chrome working directory on artbird
brandctl /tmp/{uuid}.svg run_generate_images.py Explicit unlink in finally SVG intermediate for rsvg-convert
brandctl /tmp/{uuid}.png run_generate_images.py Explicit unlink in finally PNG intermediate for PIL processing
brandctl /tmp/{uuid}/AppIcon.iconset/ run_generate_images.py Auto (TemporaryDirectory) macOS .icns staging for iconutil
knowctl /tmp/knowctl-db-{uuid}/ run_sync_topics.py Explicit rmtree after swap ChromaDB atomic refresh staging
promptctl /tmp/{template_name}.md run_render_demo.py Explicit unlink in finally Temp file for gh gist create
promptctl /tmp/promptctl-render-demo.md run_render_demo.py Explicit unlink in finally Summary gist output
voicectl /tmp/voicectl-{uuid}/ api.py transcribe_file Auto (TemporaryDirectory) Audio format conversion workspace
telegramctl /tmp/{uuid}/voice run_transcribe_voice.py Auto (TemporaryDirectory) Voice message download buffer
pairctl /tmp/pairctl-{uuid}.md helpers.py invoke_cli Explicit unlink in except/finally Partner CLI output capture
gitctl /tmp/gitctl-lint-{ts}.txt helpers.py open_fix_session None (orphaned) Lint errors passed to tmux Claude session

Files read (not created) by arthack apps

App Path Pattern Module Created By Purpose
kanbanctl /tmp/arthack-claude/panes/{server_uuid}/{pane_id}.json task_helpers.py find_pane_for_session Claude Code binary Session-to-pane discovery for plan approval

Observations

  • No accumulation issues. All temp files use either explicit cleanup or context managers, except gitctl lint files which are intentionally left for tmux sessions.
  • Secure patterns. All use tempfile module (not hardcoded names), except browserctl's fixed /tmp/browserctl/ tree and the Claude pane discovery path.
  • One orphan pattern. /tmp/gitctl-lint-{timestamp}.txt files are never cleaned up. Low impact -- small text files, cleaned on reboot.

6. Sockets and Locks

Type Path App Purpose
Unix socket ~/.local/share/vectorctl/daemon.sock vectorctl ZMQ IPC for embedding daemon
Unix socket ~/.local/share/telegramctl/*.sock telegramctl Bot-to-CLI IPC
fcntl lock ~/.local/state/locks/{name}.lock cli_common.daemon Exclusive daemon instance lock
Lock file ~/.local/state/gitctl/commit-{hash}.lock gitctl Per-project commit lock
PID file ~/.local/share/remindctl/check-reminders.pid remindctl Daemon PID tracking

7. Stores That Should NOT Migrate to SQLite

Item Reason
All config files (~/.config/*/) YAML config stays per repo convention
Daemon state YAML (last_run_at) Trivial, harmless to lose, 8 CLIs share the pattern
summaryctl cache Content-hash keyed, regenerable, can be blown away
knowctl caches (scraped URLs, YouTube) Regenerable content caches
usagectl markdown reports Human-readable output alongside data files
spoolctl queue YAML Filesystem IS the queue (create = enqueue, delete = dequeue). 87-line module.
skillctl active-pipeline.yaml Ephemeral runtime IPC signal (written, read, deleted within one pipeline run)
skillctl interrupt.yaml Ephemeral IPC signal file
assistctl agent-prompts Transient prompt staging
ChromaDB stores Separate vector search concern, managed by vectorctl daemon

8. XDG Misplacements

Current Path Correct Path Issue
~/.local/state/usagectl/runs/ ~/.local/share/usagectl/ Run data is persistent, not ephemeral state
~/.local/state/assistctl/conversations/ ~/.local/share/assistctl/ Conversation history is persistent data
~/.local/state/claudectl/daily-report-log.yaml ~/.local/share/claudectl/ Accumulator data, not transient state
~/.local/state/claude/hook-events.db ~/.local/share/claude/ Database is persistent data (low priority -- hook script, not a CLI)

9. Proposed SQLite Convention

One DB per CLI

Each CLI that needs structured persistence gets ~/.local/share/<cli>/<cli>.db. No shared database. Respects CLI boundary rules.

Raw sqlite3 by default

The default stack is raw sqlite3 with sqlite3.Row factory. telegramctl keeps SQLModel as the exception (it earned it with complex ORM queries and soft deletes).

Why raw: Most CLIs have 1-3 tables with simple CRUD. sqlite3 is stdlib (zero new deps). remindctl proves the pattern works in 45 lines.

cli_common/db.py

New shared module providing boilerplate that each CLI's db.py currently reinvents:

def connect(cli_name: str) -> sqlite3.Connection
    # Path: ~/.local/share/{cli_name}/{cli_name}.db
    # Sets: WAL, busy_timeout=5000, foreign_keys=ON, row_factory=Row
    # Creates parent dirs if needed

def migrate(conn: sqlite3.Connection, migrations: list[str]) -> None
    # Apply ALTER TABLE migrations with suppress(OperationalError)

def connect_readonly(db_path: str | Path) -> sqlite3.Connection
    # URI mode with ?mode=ro for cross-CLI reads

Per-CLI db.py template

"""SQLite database for <cli>."""
_SCHEMA = """\
CREATE TABLE IF NOT EXISTS <table> (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    ...
    created_at TEXT NOT NULL
);
"""
_MIGRATIONS: list[str] = []

def connect() -> sqlite3.Connection:
    from cli_common.db import connect as _connect, migrate
    conn = _connect("<cli>")
    conn.executescript(_SCHEMA)
    migrate(conn, _MIGRATIONS)
    return conn

Data migration pattern

Each CLI handles one-time YAML-to-SQLite migration in its connect():

  1. Check if old data files exist
  2. If DB is empty and old files exist, import them
  3. Delete old files after successful import

10. Migration Phases

Phase 1: Foundation

Create cli_common/db.py with connect(), migrate(), connect_readonly(). No CLI changes yet.

Phase 2: High-value migrations

Apps that glob YAML files to list/search/filter -- biggest wins from SQLite.

2a. kanbanctl -- task_helpers.py globs *.yaml to list tasks. tasks table with JSON history column. Direct mapping from save/load/list/delete/advance functions to SQL.

2b. skillctl -- skill_store.py and pipeline_store.py both glob *.yaml with identical patterns. skill_runs and pipeline_runs tables. pipeline_steps gets FK to pipeline_runs.

2c. telegraphctl -- Two JSON files. accounts (single-row) and pages tables.

2d. voicectl -- Growing YAML dict loaded and re-saved in full on every transcription. Simple transcribed_voices table with composite PK.

2e. telegramctl tasks (alongside 2a) -- Add telegram_tasks table to existing messages.db. These are SQLModel tables since telegramctl already uses that stack.

Phase 3: Medium-value migrations

3a. pairctl -- Nested YAML chat files. chats + messages tables.

3b. deployctl -- Two YAML state files. Single tracked_runs table with type column.

3c. usagectl -- YAML run files in wrong XDG dir. runs table + fix XDG path.

Phase 4: Existing DB alignment

4a. remindctl -- Switch to cli_common.db.connect("remindctl"). Removes ~10 lines of boilerplate.

4b. hook-events -- Leave as-is. Standalone hook script, write-only, no benefit from shared infra.


11. Complete Directory Map

~/.config/ (12 apps)

~/.config/
  arthack/hosts.yaml
  claudectl/config.yaml
  deployctl/config.yaml
  keepctl/env
  linksctl/config.yaml
  mcpctl/registered/*/env
  notifyctl/config.yaml
  phonectl/config.yaml
  searchctl/env
  telegramctl/config.yaml
  tmuxctl/layouts/*.yaml
  voicectl/env

~/.local/share/ (14 apps)

~/.local/share/
  claudectl/sync-state.json, data/ (ChromaDB)
  deployctl/ci-state.yaml, deploys-state.yaml
  kanbanctl/tasks/*.yaml
  knowctl/sync-state.yaml, data/, scraped-topic-urls-cache/, youtube-cache/
  linksctl/sync-state.yaml
  pairctl/projects/*/chats/*/chat.yaml
  remindctl/reminders.db, check-reminders.pid
  skillctl/skill-runs/*.yaml, pipeline-runs/*.yaml
  spoolctl/queue/*.yaml, failed/*.yaml
  summaryctl/cache/
  telegramctl/messages.db, tasks/*.yaml, plans/*/, *.sock
  telegraphctl/account.json, pages.json
  vectorctl/daemon.sock
  voicectl/transcribed_voices.yaml

~/.local/state/ (7 apps)

~/.local/state/
  assistctl/conversations/, agent-prompts/
  claude/hook-events.db
  claudectl/sync-sessions.log, daily-report-log.yaml
  gitctl/commit-*.lock
  locks/*.lock
  skillctl/active-pipeline.yaml, interrupt.yaml
  usagectl/runs/*/*.yaml, latest/*.yaml

~/.cache/ (2 apps)

~/.cache/
  claudectl/claude-code-system-prompts/
  mcpctl/*.json

/tmp/ (7 apps)

/tmp/
  arthack-claude/panes/*/     (Claude Code, read by kanbanctl)
  browserctl/uploads/*/       (browserctl)
  browserctl/pool             (browserctl, artbird only)
  knowctl-db-*/               (knowctl, transient)
  voicectl-*/                 (voicectl, transient)
  pairctl-*.md                (pairctl, transient)
  gitctl-lint-*.txt           (gitctl, orphaned)
  *.svg, *.png                (brandctl, transient)
  promptctl-render-demo.md    (promptctl, transient)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment