seefood/mnemon-wrapper-prd.md

Product Requirements Document: Mnemon Memory Orchestrator

Overview

This document specifies a production-oriented implementation plan for a memory orchestration layer that sits in front of Mnemon and improves how agents retrieve, refine, and use graph memory. Mnemon should remain the persistent memory backend, while the new component acts as a query planner, retrieval orchestrator, re-ranker, and context packager for downstream agents.(Mnemon GitHub)(MCP architecture spec)

The recommended implementation target is an MCP server with a thin CLI for local debugging and batch experiments. MCP is designed around isolated, composable servers that expose focused capabilities to clients, which maps well to a memory orchestration service.(MCP architecture spec)(MCP memory server example)

Product Goal

Build a standalone service that makes agents use Mnemon more effectively without modifying Mnemon core in the first implementation. The system should improve recall quality, reduce irrelevant retrievals, support iterative memory expansion, and return compact, task-shaped context blocks suitable for agent consumption.(Mnemon GitHub)(Pragmatic Leader: Architecting Memory with MCP Servers)

Problem Statement

Graph memory systems are good at persistent storage and graph-structured recall, but agents still need help deciding what to ask for, how broadly to expand, which memories to trust, and how to compress retrieved results into actionable context. A separate orchestrator can provide these capabilities while keeping the storage and retrieval substrate stable and reusable.(Mnemon GitHub)(Pragmatic Leader: Architecting Memory with MCP Servers)(Mnemis paper)

Product Principles

Keep Mnemon as the source of truth for memory persistence and base retrieval.(Mnemon GitHub)
Keep orchestration logic outside Mnemon until value is proven by benchmarks and production usage.(Mnemon GitHub)
Expose capabilities as MCP tools first, because this makes the system easy to plug into agent runtimes that support MCP.(MCP architecture spec)(MCP memory server example)
Preserve explainability by returning retrieval traces, scores, and reasons for inclusion when possible.(Pragmatic Leader: Architecting Memory with MCP Servers)
Prefer deterministic retrieval pipelines before experimenting with learned or highly adaptive routing.

Users

Primary users are developers building agent systems that need persistent memory, graph recall, and better memory utilization. Secondary users are researchers experimenting with retrieval orchestration, graph expansion, and memory-aware agent loops.(Mnemon GitHub)(Pragmatic Leader: Architecting Memory with MCP Servers)

Scope

In Scope

MCP server exposing memory orchestration tools.
Thin CLI that calls the same internal orchestration library.
Mnemon adapter module for querying and writing memory.
Query rewrite and decomposition.
Multi-stage retrieval pipeline.
Optional graph neighborhood expansion.
Re-ranking and deduplication.
Context packaging for agents.
Basic observability and evaluation harness.

Out of Scope

Replacing Mnemon storage.
Forking Mnemon in v1.
Training a custom neural model in v1.
Building a full autonomous planner.
UI beyond basic CLI diagnostics.

System Positioning

The component should be implemented as a wrapper in front of Mnemon rather than as a direct patch to Mnemon core. That allows independent iteration, benchmarking, rollback, and support for alternate memory backends later if desired.(Mnemon GitHub)(MCP architecture spec)

High-Level Architecture

Agent / MCP Client
        |
        v
Memory Orchestrator MCP Server
  - Query planner
  - Retrieval pipeline
  - Graph expander
  - Re-ranker
  - Context packager
  - Trace/logger
        |
        v
Mnemon Adapter
        |
        v
Mnemon backend / storage

The orchestration server should expose stable tool interfaces while hiding internal retrieval complexity. Internally it should be built as a modular pipeline so each stage can be independently tested and replaced.(Mnemon GitHub)(MCP architecture spec)

Why MCP First

The MCP architecture defines hosts, clients, and focused servers, which makes it a natural fit for a memory service that agents can call as a tool. Building this as MCP first makes the service reusable across multiple agent environments rather than coupling it to one custom runtime.(MCP architecture spec)

Core Use Cases

Use Case 1: Direct recall

The agent asks for memory relevant to a current task. The orchestrator rewrites the query, retrieves candidates from Mnemon, re-ranks them, and returns a compact context package.

Use Case 2: Multi-hop recall

The agent asks about a person, project, or concept. The orchestrator retrieves seed memories, expands through graph neighbors, filters noisy branches, and returns a compressed set of linked evidence.

Use Case 3: Episodic summary

The agent asks what happened recently for an entity or topic. The orchestrator retrieves related memories over a time window, groups them, deduplicates them, and emits a chronological summary block.

Use Case 4: Memory write with retrieval validation

The agent wants to write new memory. The orchestrator checks for near-duplicates, links the new memory to existing nodes if appropriate, and forwards the write to Mnemon.

Functional Requirements

1. MCP Server

The system must expose an MCP server with tools for retrieval, memory write, memory explain, and diagnostics. The server must be stateless at the protocol level, with request-scoped execution and optional shared caches behind the scenes.(MCP architecture spec)(MCP memory server example)

Required tools:

memory_search
memory_expand
memory_context
memory_write
memory_explain
memory_health

2. CLI

The system must include a CLI that exercises the same internal modules as the MCP server. The CLI should support debugging pipelines locally, replaying queries from logs, and running evaluation batches.

Example commands:

orchestrator search --query "what do we know about vendor X?"
orchestrator context --task "draft reply to customer" --query "recent issues with vendor X"
orchestrator explain --query "project atlas blockers"
orchestrator eval --dataset eval/queries.jsonl

3. Mnemon Adapter

The system must isolate all Mnemon interactions behind an adapter interface. This adapter should support candidate retrieval, graph neighbor expansion, entity lookup, memory write, and metadata fetch so the rest of the orchestrator is not tightly coupled to Mnemon internals.(Mnemon GitHub)

Suggested interface:

class MemoryBackend:
    def search(query, top_k, filters=None): ...
    def neighbors(node_ids, edge_types=None, hops=1, limit=50): ...
    def get_memories(memory_ids): ...
    def write(memory_record): ...
    def link(source_id, target_id, relation): ...
    def health(): ...

4. Query Planning

The orchestrator must transform the raw agent request into retrieval-ready subqueries. This includes normalization, entity extraction, intent detection, optional time-window inference, and decomposition into parallel retrieval steps.

Inputs:

user task
optional conversation context
optional filters

Outputs:

canonical query
subqueries
retrieval strategy
hop budget
confidence estimate

5. Multi-Stage Retrieval

The retrieval pipeline must support at least three stages:

Initial candidate retrieval from Mnemon.
Optional graph expansion around promising candidates.
Re-ranking and pruning into a final result set.

The retrieval strategy should be configurable per request. Initial implementation should prefer simple heuristics and weighted scoring over experimental learned routing.(Pragmatic Leader: Architecting Memory with MCP Servers)(Mnemis paper)

6. Re-ranking

The orchestrator must score candidates using a transparent weighted formula. The formula should combine relevance, recency, graph proximity, memory type, and duplication penalties.

Suggested v1 scoring model:

score =
  w_relevance * semantic_score +
  w_recency   * recency_score +
  w_graph     * graph_proximity_score +
  w_type      * type_match_score -
  w_dup       * duplication_penalty -
  w_noise     * noise_penalty

Weights must be configurable. The system must return per-item score breakdowns in explain mode.

7. Context Packaging

The orchestrator must convert the final memory set into a compact context object optimized for agent consumption. This object should include concise summaries, source IDs, entity links, and optional chronological or topical grouping.

Suggested output schema:

{
  "query": "...",
  "strategy": "direct|expanded|episodic",
  "summary": "short synthesized memory context",
  "items": [
    {
      "memory_id": "...",
      "summary": "...",
      "score": 0.91,
      "reasons": ["entity match", "recent", "same project cluster"],
      "linked_entities": ["..."],
      "timestamp": "..."
    }
  ],
  "trace": {
    "subqueries": ["..."],
    "expanded_from": ["..."],
    "dropped": [{"id": "...", "reason": "duplicate"}]
  }
}

8. Explainability

The system must support an explain mode that shows query rewrites, retrieved candidates, expansion decisions, ranking scores, and drop reasons. This is required for debugging and for tuning retrieval quality.(Pragmatic Leader: Architecting Memory with MCP Servers)

9. Memory Write Path

The system must support writing new memory through the orchestrator. Before writing, it should perform duplicate detection and optional relation linking against nearby memories to avoid uncontrolled memory bloat.

10. Configuration

The system must support YAML or TOML configuration for:

backend endpoint
top-k limits
hop budgets
scoring weights
recency decay
allowed edge types
deduplication thresholds
trace verbosity
cache TTLs

Non-Functional Requirements

Performance

P50 memory search under 400 ms for direct recall in local deployments.
P95 under 1.5 s for expanded multi-hop recall.
Support configurable timeouts and partial-result fallback.

Reliability

Graceful degradation if graph expansion fails.
Retry with bounded backoff for backend transient failures.
Return partial trace and partial results rather than hard-failing whenever possible.

Observability

Structured logs per request.
Metrics: latency, recall depth, candidate counts, dedup ratio, expansion rate, cache hit rate.
Query replay support from stored traces.

Security

No uncontrolled shell execution.
Validate and sanitize all incoming tool parameters.
Support access control if deployed as a shared service.

Proposed MCP Tool Contracts

`memory_search`

Purpose: retrieve relevant memory candidates.

Input:

{
  "query": "string",
  "task": "optional string",
  "filters": {"entity_ids": [], "time_range": {}, "memory_types": []},
  "top_k": 10,
  "expand": false
}

Output:

{
  "items": [...],
  "summary": "...",
  "trace_id": "..."
}

`memory_context`

Purpose: produce a compact agent-ready context package.

Input:

{
  "query": "string",
  "task": "string",
  "response_budget": {"max_items": 8, "max_chars": 3000},
  "filters": {}
}

Output:

{
  "summary": "...",
  "items": [...],
  "context_block": "...",
  "trace_id": "..."
}

`memory_expand`

Purpose: expand around memory or entity seeds using graph neighbors.

`memory_write`

Purpose: store memory with deduplication and optional linking.

`memory_explain`

Purpose: return debug trace for a prior request or fresh query execution.

`memory_health`

Purpose: verify backend connectivity and basic retrieval readiness.

Internal Modules

Recommended package structure:

memory_orchestrator/
  app/
    mcp_server.py
    cli.py
  core/
    planner.py
    retriever.py
    expander.py
    reranker.py
    deduper.py
    packager.py
    explainer.py
  backends/
    mnemon_adapter.py
    base.py
  models/
    schemas.py
  config/
    settings.py
  eval/
    runner.py
    datasets/
  tests/

Implementation Notes for Claude Code

Tech Stack

Recommended stack:

Python 3.11+
FastMCP or another MCP server framework compatible with the target runtime
Pydantic for schemas
Typer or Click for CLI
httpx for backend calls
pytest for tests
structlog or standard JSON logging for traces

Phase 1: Deterministic Wrapper

Implement a deterministic orchestration layer first. Do not implement the complex-valued neural routing concept in v1. The first milestone should prove that a structured wrapper already improves Mnemon-backed retrieval.

Deliverables:

MCP server
CLI
Mnemon adapter
query planner
retrieval pipeline
heuristic reranker
context packager
explain mode
tests and example config

Phase 2: Evaluation Harness

Implement offline evaluation to compare:

raw Mnemon retrieval
orchestrated retrieval without expansion
orchestrated retrieval with expansion

Metrics:

precision@k
recall@k
nDCG@k
duplicate rate
average context length
user-rated usefulness if a labeled dataset exists

Phase 3: Experimental Adaptive Controller

Only after deterministic improvements are measured should the system add an experimental adaptive controller inspired by the proposed “wrapper magic.” This controller may adjust query expansion, reranking, or retrieval routing, but it must remain optional and behind a feature flag.

The controller should not replace Mnemon and should not be required for baseline operation.

Ranking Heuristics for v1

Suggested scoring inputs:

semantic similarity from Mnemon or embedding score
entity overlap with extracted query entities
relation overlap with requested task type
recency decay for time-sensitive queries
graph distance penalty for far expansions
source reliability or confidence metadata if available
duplicate and near-duplicate suppression

Context Formatting Rules

The agent-ready context block should:

be concise
include only high-value items
preserve identifiers for traceability
group by entity, time, or task when useful
avoid dumping raw memory text when a synthesized summary is enough

Example format:

Memory context for task: draft vendor update

Relevant entities:
- Vendor X
- Project Atlas
- Infra team

Key recalled facts:
1. Vendor X missed the March delivery milestone and cited firmware instability.
2. Atlas depends on the delayed module integration.
3. Infra team escalated procurement risk in two recent episodes.

Supporting memory IDs:
- mem_1021
- mem_1044
- mem_1092

API Boundaries

The orchestrator must avoid leaking Mnemon-specific implementation details into MCP responses except for stable identifiers and optional backend metadata. This keeps the wrapper portable and reduces coupling to Mnemon internals.(Mnemon GitHub)

Failure Modes

The implementation must handle:

backend unavailable
empty retrieval set
excessively broad graph expansion
duplicate explosion
stale or contradictory memories
oversized context results

Fallback behavior should prefer smaller, safer outputs and explicit trace annotations.

Acceptance Criteria

The v1 implementation is acceptable when all of the following are true:

An agent can call the MCP server to retrieve memory context backed by Mnemon.(Mnemon GitHub)(MCP architecture spec)
A CLI can run the same retrieval path locally.
Query planning, expansion, reranking, and packaging are modular and tested.
Explain mode shows why memories were selected or dropped.
Benchmark runs show equal or better precision/utility than direct Mnemon retrieval on a representative eval set.
The system can be disabled and agents can still fall back to raw Mnemon retrieval.

Recommended Rollout Plan

Milestone 1

Basic MCP server + Mnemon adapter + direct retrieval + packaging.

Milestone 2

Query decomposition + explain mode + deduplication + CLI.

Milestone 3

Graph expansion + reranking + evaluation harness.

Milestone 4

Feature-flagged adaptive controller experiments.

Milestone 5

Decide whether any proven retrieval features should be upstreamed into Mnemon core.

Decision Log

Decision: wrapper first, not Mnemon fork

Reason: faster iteration, lower coupling, easier benchmarking, safer rollback, and better alignment with MCP’s composable server model.(Mnemon GitHub)(MCP architecture spec)

Decision: MCP plus CLI

Reason: MCP is the runtime interface for agents, while CLI is the developer interface for debugging and evaluation.(MCP architecture spec)(MCP memory server example)

Decision: deterministic v1

Reason: retrieval quality should be improved with transparent orchestration before adding experimental adaptive logic.

Prompt for Claude Code

Implement a Python project called mnemon-memory-orchestrator that exposes an MCP server and a CLI. Use Mnemon as the backend through an adapter layer. Build a deterministic retrieval orchestrator with query planning, optional graph expansion, heuristic reranking, deduplication, context packaging, explain mode, configuration, tests, and an evaluation harness. Keep the code modular so that an experimental adaptive controller can be added later behind a feature flag. Do not modify Mnemon core in v1. Provide a runnable local setup, example config, and integration tests for the main MCP tools.(Mnemon GitHub)(MCP architecture spec)