glaucobrito/unified-memory-architecture-openclaw-claude-code.md

Building a Unified Memory System for AI Agents in Production

How we built a 3-layer memory architecture that bridges OpenClaw and Claude Code into a single brain — with real numbers from 33 days of operation.

The Problem

Most AI agent setups have a memory problem: they either forget everything between sessions (stateless) or accumulate noise until the context window overflows. RAG helps with retrieval but doesn't build understanding. The LLM rediscovers knowledge from scratch on every query.

Karpathy's LLM Wiki proposes a compelling alternative: a persistent, compounding wiki maintained by the LLM. Great idea — but designed for a researcher browsing Obsidian. We needed something for an operational AI agent running a business with 8 stores, 20 cron jobs, 7 services, and two different AI platforms (OpenClaw + Claude Code).

This document describes what we built, what worked, what didn't, and the decisions behind each choice.

Architecture Overview

┌─────────────────────────────────────────────────────────┐
│                    TWO ENTRY POINTS                      │
│                                                          │
│   OpenClaw (Telegram)          Claude Code (VS Code)     │
│         │                              │                 │
│         ▼                              ▼                 │
│   JSONL transcriptions          claude-mem plugin         │
│         │                        (auto-capture)          │
│         ▼                              │                 │
│   Bridge (cron */30)                   │                 │
│         │                              │                 │
│         └──────────┐    ┌──────────────┘                 │
│                    ▼    ▼                                │
│              ┌──────────────┐                            │
│              │  claude-mem   │  Layer 1: Subconscious     │
│              │  SQLite + DB  │  1,493 observations        │
│              │  Chroma 46MB  │  Vector search              │
│              └──────┬───────┘                            │
│                     │                                    │
│              auto-precompact (Sonnet, daily 23h)          │
│                     │                                    │
│              ┌──────▼───────┐                            │
│              │ Workspace MD  │  Layer 2: Conscious        │
│              │ decisions.md  │  Curated, structured        │
│              │ lessons.md    │  Read at boot               │
│              │ pending.md    │                            │
│              │ wip.md        │                            │
│              └──────┬───────┘                            │
│                     │                                    │
│              ┌──────▼───────┐                            │
│              │  Auto-memory  │  Layer 3: Persistent       │
│              │  16 .md files │  Cross-session identity     │
│              │  (CC native)  │                            │
│              └──────────────┘                            │
│                                                          │
│              ┌──────────────┐                            │
│              │ Schema files  │  How to operate             │
│              │ SOUL.md       │  Who I am                   │
│              │ AGENTS.md     │  Rules & protocols          │
│              │ CLAUDE.md     │  Boot sequence              │
│              └──────────────┘                            │
└─────────────────────────────────────────────────────────┘

The Three Layers

Layer 1: Subconscious (claude-mem + Chroma)

What: Automatic capture of everything that happens in every session. No manual intervention.

How: The claude-mem plugin hooks into Claude Code's tool calls and extracts observations — structured summaries of what happened, what was decided, what changed. These go into a SQLite database and are indexed in ChromaDB for vector search.

Bridge: OpenClaw sessions happen in Telegram, not Claude Code. A Python bridge (openclaw_mem_bridge.py) reads OpenClaw's JSONL transcriptions and inserts observations into the same SQLite DB. Runs every 30 minutes via cron. Zero LLM tokens — pure heuristic extraction.

Numbers:

1,493 observations in 33 days
46 MB of vector embeddings
4 projects tracked (workspace, openclaw, root, tmux-executor)

Key decision: We chose claude-mem over OpenClaw's built-in memory-core because claude-mem had 1,200+ observations vs memory-core's empty database (Gemini embeddings were failing silently). OpenClaw's memory slot is exclusive — only one memory plugin can run at a time.

Layer 2: Conscious (Workspace Markdown)

What: Curated, structured files that both OpenClaw and Claude Code read at boot. This is the shared brain.

File	Purpose	Size
`MEMORY.md`	Index — points to everything else	85 lines
`memory/decisions.md`	Permanent decisions — never revisit	42 KB
`memory/lessons.md`	Lessons learned (strategic + tactical)	89 KB
`memory/pending.md`	Open items waiting for action	7 KB
`memory/wip.md`	Work in progress — where we stopped	~1 KB
`memory/people.md`	Who is who	2 KB
`memory/projects.md`	Project status	varies
`memory/YYYY-MM-DD.md`	Daily logs (39 files)	varies
`feedback/approved.json`	Patterns to repeat	4 KB
`feedback/rejected.json`	Patterns to never repeat	2 KB

Key insight: These files sit on disk in the OpenClaw workspace. Both platforms read the same files. One source of truth, two consumers.

Layer 3: Persistent (Claude Code Auto-Memory)

What: Claude Code's native memory system — 16 markdown files with YAML frontmatter that persist across conversations.

Type	Count	Examples
user	2	Agent identity, user profile
feedback	6	"crons use bash not LLM", "backtest before deploy"
project	4	Credit scoring v3, CRM portal status
reference	3	Infrastructure ports, fiscal structure

Key insight: This layer stores how to behave, not what happened. It tells future sessions things like "this user has ADHD — keep responses short" or "never mock the database in tests."

Automated Processes

Auto-Precompact (Daily, 23h BRT)

The most important automation. Runs via claude --print (Claude Sonnet, subscription — zero extra cost) with Gemini Flash as fallback.

What it does:

Reads last 12 hours of observations from claude-mem (both Claude Code AND OpenClaw sessions)
Reads today's daily log and current state of decisions/lessons/pending
Asks Sonnet to extract: decisions, lessons, resolved items, new pending items, work in progress
Writes results to the appropriate files with deduplication

What it extracts:

Field	Max per run	Filter
Decisions	2	Must be permanent ("always do X"). Most sessions have 0.
Lessons	3	Must be an error that cost time. Not things that worked.
Pending new	3	Must not exist already. Key-phrase dedup.
Pending resolved	unlimited	Exact text match against existing items.
Work in progress	3	Tasks started but not finished. Preserves continuity.
Feedback	unlimited	Only explicit user approval.

Why not the OpenClaw gateway? Anthropic blocked OAuth-based API proxying. claude --print is the official CLI, uses the subscription directly, and produces better results (Sonnet > Haiku).

Prune Lessons (Weekly, Sunday 03:30 UTC)

Tactical lessons (marked with ⏳) expire after 30 days. Strategic lessons (marked with 🔒) are permanent. The pruner also removes tactical lessons that duplicate strategic ones (60% word overlap threshold).

Bridge (Every 30 minutes)

Imports OpenClaw Telegram sessions into claude-mem. Heuristic extraction from JSONL — no LLM tokens consumed.

Boot Sequence

Every session starts by reading (in order):

SOUL.md — agent identity and values
AGENTS.md — operational rules
MEMORY.md — index of everything
memory/YYYY-MM-DD.md — today's log
memory/pending.md — open items
memory/wip.md — where we left off
feedback/approved.json — patterns to repeat
feedback/rejected.json — patterns to avoid

Token budget: ~8-10K tokens. Down from ~20K after pruning historical deliveries out of MEMORY.md.

Work in Progress: The Missing Piece

The insight that changed our architecture came from comparing Karpathy's LLM Wiki with OriginMind's Creative DNA critique.

Karpathy says: good query answers should be filed back into the wiki. OriginMind says: systems should preserve momentum, not just artifacts.

Both point to the same gap: what happens between sessions?

Our system captured what was decided and what was learned, but not what was in progress. Every new session started from zero conceptual state.

The fix: wip.md — a file that the auto-precompact overwrites each run with whatever was being worked on but not finished. The next session reads it at boot and knows exactly where to pick up.

# Work in Progress
*Updated: 2026-04-08 18:31 UTC — generated by auto-precompact*

## Agent Cris — integration tests post-boot
- **Status:** Systemd service running (PID 850282, port 8001), test list was cut before completion
- **Next step:** Test CV reception via Evolution API webhook and validate full screening flow
- **Context:** /etc/systemd/system/rh-agente-cris.service, http://0.0.0.0:8001

Decisions and Trade-offs

What we chose NOT to do

Idea	Why we rejected it
Obsidian as UI	Our primary consumer is the LLM, not a human browsing. No one clicks wiki links.
Cross-referencing between files	Chroma does semantic cross-referencing on demand. Explicit `[[links]]` in markdown would be written but never read.
Chroma in boot	Boot runs before knowing what the user wants. Injecting random observations adds noise, not value. Chroma serves on-demand queries.
Entity pages (one per person/system)	Doesn't scale for an operation with 130 employees. One `people.md` file is enough.
LLM-powered dreaming/consolidation	OpenClaw's built-in dreaming ran 3 times, promoted 0 insights. Auto-precompact with Sonnet does better extraction. Dreaming delegated back to OpenClaw.
Flashcards (SM-2)	Built 453 cards. Nobody reviewed them. The knowledge already exists in decisions.md and lessons.md. Archived.

What we chose TO do

Decision	Why
`claude --print` over API keys	Official client, uses subscription, zero extra cost, better model (Sonnet).
Bash wrappers over LLM for crons	19 of 20 crons are deterministic. Zero tokens, guaranteed execution. Only precompact needs LLM.
Single pending.md	Had two files diverging silently. One source of truth, no symlinks.
Tactical lessons expire	lessons.md was 89 KB and growing. Tactical items (⏳) auto-prune after 30 days. Strategic items (🔒) are permanent.
Include OpenClaw observations	The user makes decisions in Telegram too. Filtering them out meant losing context.

Numbers After 33 Days

Metric	Value
Total observations	1,493
Permanent decisions	39
Lessons learned	64 (39 strategic + 25 tactical)
Daily logs	39 files
Auto-memory files	16
Vector embeddings	46 MB
Boot tokens	~8-10K
Crons running	20 (1 uses LLM)
Services running	7
Cost of memory system	$0 (subscription + Gemini free tier)

What's Next

Monitor WIP quality — new feature, needs validation over 5-10 sessions
Lint for contradictions — pending.md says "CRM 0% execution" while projects.md says "CRM operational"
Query → file back — when the agent synthesizes a good answer, save it as a wiki page (Karpathy's insight)

How to Replicate

The architecture is platform-agnostic. You need:

A capture layer — claude-mem, or any system that records what happens in sessions
A consolidation process — our auto-precompact runs daily, uses claude --print for extraction
Structured markdown files — decisions, lessons, pending, WIP
A boot sequence — CLAUDE.md or equivalent that tells the agent what to read on startup
A feedback loop — approved.json/rejected.json so the system learns from corrections

The key principle: the LLM is the primary consumer of memory, not the human. Design for machine reading (structured, deduplicated, minimal), not human browsing (interlinked, visual, explorable).

Built with OpenClaw + Claude Code + claude-mem. Running in production since March 7, 2026.