Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Select an option

  • Save glaucobrito/c6fc0cf5ac5ecd2208ce1f90ab0ff9ab to your computer and use it in GitHub Desktop.

Select an option

Save glaucobrito/c6fc0cf5ac5ecd2208ce1f90ab0ff9ab to your computer and use it in GitHub Desktop.

Building a Unified Memory System for AI Agents in Production

How we built a 3-layer memory architecture that bridges OpenClaw and Claude Code into a single brain — with real numbers from 33 days of operation.

The Problem

Most AI agent setups have a memory problem: they either forget everything between sessions (stateless) or accumulate noise until the context window overflows. RAG helps with retrieval but doesn't build understanding. The LLM rediscovers knowledge from scratch on every query.

Karpathy's LLM Wiki proposes a compelling alternative: a persistent, compounding wiki maintained by the LLM. Great idea — but designed for a researcher browsing Obsidian. We needed something for an operational AI agent running a business with 8 stores, 20 cron jobs, 7 services, and two different AI platforms (OpenClaw + Claude Code).

This document describes what we built, what worked, what didn't, and the decisions behind each choice.

Architecture Overview

┌─────────────────────────────────────────────────────────┐
│                    TWO ENTRY POINTS                      │
│                                                          │
│   OpenClaw (Telegram)          Claude Code (VS Code)     │
│         │                              │                 │
│         ▼                              ▼                 │
│   JSONL transcriptions          claude-mem plugin         │
│         │                        (auto-capture)          │
│         ▼                              │                 │
│   Bridge (cron */30)                   │                 │
│         │                              │                 │
│         └──────────┐    ┌──────────────┘                 │
│                    ▼    ▼                                │
│              ┌──────────────┐                            │
│              │  claude-mem   │  Layer 1: Subconscious     │
│              │  SQLite + DB  │  1,493 observations        │
│              │  Chroma 46MB  │  Vector search              │
│              └──────┬───────┘                            │
│                     │                                    │
│              auto-precompact (Sonnet, daily 23h)          │
│                     │                                    │
│              ┌──────▼───────┐                            │
│              │ Workspace MD  │  Layer 2: Conscious        │
│              │ decisions.md  │  Curated, structured        │
│              │ lessons.md    │  Read at boot               │
│              │ pending.md    │                            │
│              │ wip.md        │                            │
│              └──────┬───────┘                            │
│                     │                                    │
│              ┌──────▼───────┐                            │
│              │  Auto-memory  │  Layer 3: Persistent       │
│              │  16 .md files │  Cross-session identity     │
│              │  (CC native)  │                            │
│              └──────────────┘                            │
│                                                          │
│              ┌──────────────┐                            │
│              │ Schema files  │  How to operate             │
│              │ SOUL.md       │  Who I am                   │
│              │ AGENTS.md     │  Rules & protocols          │
│              │ CLAUDE.md     │  Boot sequence              │
│              └──────────────┘                            │
└─────────────────────────────────────────────────────────┘

The Three Layers

Layer 1: Subconscious (claude-mem + Chroma)

What: Automatic capture of everything that happens in every session. No manual intervention.

How: The claude-mem plugin hooks into Claude Code's tool calls and extracts observations — structured summaries of what happened, what was decided, what changed. These go into a SQLite database and are indexed in ChromaDB for vector search.

Bridge: OpenClaw sessions happen in Telegram, not Claude Code. A Python bridge (openclaw_mem_bridge.py) reads OpenClaw's JSONL transcriptions and inserts observations into the same SQLite DB. Runs every 30 minutes via cron. Zero LLM tokens — pure heuristic extraction.

Numbers:

  • 1,493 observations in 33 days
  • 46 MB of vector embeddings
  • 4 projects tracked (workspace, openclaw, root, tmux-executor)

Key decision: We chose claude-mem over OpenClaw's built-in memory-core because claude-mem had 1,200+ observations vs memory-core's empty database (Gemini embeddings were failing silently). OpenClaw's memory slot is exclusive — only one memory plugin can run at a time.

Layer 2: Conscious (Workspace Markdown)

What: Curated, structured files that both OpenClaw and Claude Code read at boot. This is the shared brain.

File Purpose Size
MEMORY.md Index — points to everything else 85 lines
memory/decisions.md Permanent decisions — never revisit 42 KB
memory/lessons.md Lessons learned (strategic + tactical) 89 KB
memory/pending.md Open items waiting for action 7 KB
memory/wip.md Work in progress — where we stopped ~1 KB
memory/people.md Who is who 2 KB
memory/projects.md Project status varies
memory/YYYY-MM-DD.md Daily logs (39 files) varies
feedback/approved.json Patterns to repeat 4 KB
feedback/rejected.json Patterns to never repeat 2 KB

Key insight: These files sit on disk in the OpenClaw workspace. Both platforms read the same files. One source of truth, two consumers.

Layer 3: Persistent (Claude Code Auto-Memory)

What: Claude Code's native memory system — 16 markdown files with YAML frontmatter that persist across conversations.

Type Count Examples
user 2 Agent identity, user profile
feedback 6 "crons use bash not LLM", "backtest before deploy"
project 4 Credit scoring v3, CRM portal status
reference 3 Infrastructure ports, fiscal structure

Key insight: This layer stores how to behave, not what happened. It tells future sessions things like "this user has ADHD — keep responses short" or "never mock the database in tests."

Automated Processes

Auto-Precompact (Daily, 23h BRT)

The most important automation. Runs via claude --print (Claude Sonnet, subscription — zero extra cost) with Gemini Flash as fallback.

What it does:

  1. Reads last 12 hours of observations from claude-mem (both Claude Code AND OpenClaw sessions)
  2. Reads today's daily log and current state of decisions/lessons/pending
  3. Asks Sonnet to extract: decisions, lessons, resolved items, new pending items, work in progress
  4. Writes results to the appropriate files with deduplication

What it extracts:

Field Max per run Filter
Decisions 2 Must be permanent ("always do X"). Most sessions have 0.
Lessons 3 Must be an error that cost time. Not things that worked.
Pending new 3 Must not exist already. Key-phrase dedup.
Pending resolved unlimited Exact text match against existing items.
Work in progress 3 Tasks started but not finished. Preserves continuity.
Feedback unlimited Only explicit user approval.

Why not the OpenClaw gateway? Anthropic blocked OAuth-based API proxying. claude --print is the official CLI, uses the subscription directly, and produces better results (Sonnet > Haiku).

Prune Lessons (Weekly, Sunday 03:30 UTC)

Tactical lessons (marked with ⏳) expire after 30 days. Strategic lessons (marked with 🔒) are permanent. The pruner also removes tactical lessons that duplicate strategic ones (60% word overlap threshold).

Bridge (Every 30 minutes)

Imports OpenClaw Telegram sessions into claude-mem. Heuristic extraction from JSONL — no LLM tokens consumed.

Boot Sequence

Every session starts by reading (in order):

  1. SOUL.md — agent identity and values
  2. AGENTS.md — operational rules
  3. MEMORY.md — index of everything
  4. memory/YYYY-MM-DD.md — today's log
  5. memory/pending.md — open items
  6. memory/wip.md — where we left off
  7. feedback/approved.json — patterns to repeat
  8. feedback/rejected.json — patterns to avoid

Token budget: ~8-10K tokens. Down from ~20K after pruning historical deliveries out of MEMORY.md.

Work in Progress: The Missing Piece

The insight that changed our architecture came from comparing Karpathy's LLM Wiki with OriginMind's Creative DNA critique.

Karpathy says: good query answers should be filed back into the wiki. OriginMind says: systems should preserve momentum, not just artifacts.

Both point to the same gap: what happens between sessions?

Our system captured what was decided and what was learned, but not what was in progress. Every new session started from zero conceptual state.

The fix: wip.md — a file that the auto-precompact overwrites each run with whatever was being worked on but not finished. The next session reads it at boot and knows exactly where to pick up.

# Work in Progress
*Updated: 2026-04-08 18:31 UTC — generated by auto-precompact*

## Agent Cris — integration tests post-boot
- **Status:** Systemd service running (PID 850282, port 8001), test list was cut before completion
- **Next step:** Test CV reception via Evolution API webhook and validate full screening flow
- **Context:** /etc/systemd/system/rh-agente-cris.service, http://0.0.0.0:8001

Decisions and Trade-offs

What we chose NOT to do

Idea Why we rejected it
Obsidian as UI Our primary consumer is the LLM, not a human browsing. No one clicks wiki links.
Cross-referencing between files Chroma does semantic cross-referencing on demand. Explicit [[links]] in markdown would be written but never read.
Chroma in boot Boot runs before knowing what the user wants. Injecting random observations adds noise, not value. Chroma serves on-demand queries.
Entity pages (one per person/system) Doesn't scale for an operation with 130 employees. One people.md file is enough.
LLM-powered dreaming/consolidation OpenClaw's built-in dreaming ran 3 times, promoted 0 insights. Auto-precompact with Sonnet does better extraction. Dreaming delegated back to OpenClaw.
Flashcards (SM-2) Built 453 cards. Nobody reviewed them. The knowledge already exists in decisions.md and lessons.md. Archived.

What we chose TO do

Decision Why
claude --print over API keys Official client, uses subscription, zero extra cost, better model (Sonnet).
Bash wrappers over LLM for crons 19 of 20 crons are deterministic. Zero tokens, guaranteed execution. Only precompact needs LLM.
Single pending.md Had two files diverging silently. One source of truth, no symlinks.
Tactical lessons expire lessons.md was 89 KB and growing. Tactical items (⏳) auto-prune after 30 days. Strategic items (🔒) are permanent.
Include OpenClaw observations The user makes decisions in Telegram too. Filtering them out meant losing context.

Numbers After 33 Days

Metric Value
Total observations 1,493
Permanent decisions 39
Lessons learned 64 (39 strategic + 25 tactical)
Daily logs 39 files
Auto-memory files 16
Vector embeddings 46 MB
Boot tokens ~8-10K
Crons running 20 (1 uses LLM)
Services running 7
Cost of memory system $0 (subscription + Gemini free tier)

What's Next

  1. Monitor WIP quality — new feature, needs validation over 5-10 sessions
  2. Lint for contradictions — pending.md says "CRM 0% execution" while projects.md says "CRM operational"
  3. Query → file back — when the agent synthesizes a good answer, save it as a wiki page (Karpathy's insight)

How to Replicate

The architecture is platform-agnostic. You need:

  1. A capture layer — claude-mem, or any system that records what happens in sessions
  2. A consolidation process — our auto-precompact runs daily, uses claude --print for extraction
  3. Structured markdown files — decisions, lessons, pending, WIP
  4. A boot sequence — CLAUDE.md or equivalent that tells the agent what to read on startup
  5. A feedback loop — approved.json/rejected.json so the system learns from corrections

The key principle: the LLM is the primary consumer of memory, not the human. Design for machine reading (structured, deduplicated, minimal), not human browsing (interlinked, visual, explorable).


Built with OpenClaw + Claude Code + claude-mem. Running in production since March 7, 2026.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment