ruvnet/README.md

ruLake — A Memory Lake for Agentic AI

Give your AI agents fast, trustworthy memory — without standing up a vector database.

ruLake is the layer between your agents and the data they remember. Plug in the storage you already have (S3, BigQuery, Snowflake, Parquet, files), expose it through one MCP tool, and every agent on every host gets the same low-latency, content-addressed view of memory.

Created by rUv. Part of the RuVector ecosystem alongside ruvector-rabitq (1-bit compression kernel) and RVF (durable segment format). Repo: ruvnet/RuLake.

What it is, in one paragraph

Agentic systems are built on contrastive AI — embeddings that put similar things close together and different things far apart. Every "what does the agent remember about X?" query is, underneath, a contrast: rank the corpus by distance to X. ruLake is the place where those contrasts run. It keeps a compressed copy of your vectors in RAM, serves hits at ≈1.02× raw library speed (essentially free abstraction), and refreshes cold entries from whatever cloud or file store actually owns the bytes. Each cached entry is anchored by a cryptographic witness, so an answer is verifiable across processes, hosts, and time.

Why agents in particular

One MCP tool, one decision layer. rulake-mcp speaks the Model Context Protocol. Claude Desktop, Cursor, Cline, Continue, agentic-flow — they all get a single rulake_query tool that takes intent (search / verify / explain / refresh), risk, freshness budget, and policy, and returns the answer plus a decision trace (chosen_action, reason_code, backends_used, refusals). The agent says what it wants; ruLake decides where to look, how strict to be, whether to refuse.
Trust by witness, not vibes. Every result carries the SHAKE-256 hex of the underlying bundle. Two agents, two hosts, same data → same witness → same answer, byte-exact.
Honest refusals beat confident lies. Stale cache + missing remote witness? WITNESS_MISMATCH_REFUSED, empty data, agent retries narrower. Better than serving a stale answer with a high score.

Performance, cost, footprint

	What it delivers
Latency	1.02× raw RaBitQ ≈ ~1 ms cache-hit at n=100k, D=128. Measured.
Throughput	957 QPS single-thread, 2,854 QPS concurrent (Arc-drop-lock + AVX-512 VPOPCNTDQ).
Compression	1-bit RaBitQ — 32× smaller than f32 vectors at D=128.
Cost	$0. MIT/Apache-2.0, no service to host, no per-query fee.
Surfaces	Rust crate · Python wheel · Node.js · `rulake-mcp` binary · Docker image.

Edge / browser: WASM SDKs are roadmap (v0.5) — @ruvector/rulake-wasm for Cloudflare Workers, Deno, Bun, browsers. Today the binary is small enough to run on a Raspberry Pi or an EC2 t4g.nano.

Install (five surfaces, three install commands)

# Rust — the core crate
cargo add ruvector-rulake

# Python — PyO3 + ABI3 wheels (NumPy zero-copy)
pip install ruvector-rulake

# Node.js / TypeScript — napi-rs prebuilt binaries (Float32Array zero-copy)
npm install @ruvector/rulake

# MCP server binary — agent-callable governed memory
cargo install --path mcp-server   # or grab a release binary

Wire to Claude Desktop or any MCP client over stdio:

{
  "mcpServers": {
    "rulake": {
      "command": "rulake-mcp",
      "args": ["stdio", "--config", "/etc/rulake/mcp.toml"]
    }
  }
}

Wire a remote agent over Streamable HTTP:

{
  "mcpServers": {
    "rulake": {
      "transport": "streamable-http",
      "url": "https://rulake.example.com/mcp",
      "headers": { "Authorization": "Bearer <token>" }
    }
  }
}

Auth + RBAC (production-shaped)

ruLake's MCP server enforces capability + per-collection RBAC at every tool call. Configure once in mcp.toml:

[[backends]]
type = "fs"
id   = "lake-prod"
root = "/srv/rulake"

[[allow]]
backend    = "lake-prod"
collection = "memories.public.*"     # anchored regex
caps       = ["read"]

[[allow]]
backend    = "lake-prod"
collection = "memories.policies"
caps       = ["read", "publish"]

# Read-only agent (default), bearer auth on loopback:
rulake-mcp http --bind 127.0.0.1:7440 --auth bearer --bearer-token-file /etc/rulake/token

# Production: JWT via OAuth scope-to-capability mapping
#   mcp:rulake:read     → Read
#   mcp:rulake:publish  → Read + Publish
#   mcp:rulake:admin    → Read + Publish + Admin
rulake-mcp http --auth jwt --capabilities read,publish \
                --audit-file /var/log/rulake-mcp/audit.jsonl

Every tool call writes one JSONL audit line with policy_decision (capability_required, capability_granted) + decision (chosen_action, reason_code, backends_used, refusals). The "every audit line explains itself" gate from ADR-004 §7.

Architecture (the short version)

              ┌──────────────────── ruLake ─────────────────────┐
 agent ──MCP──▶                                                 │
              │   ┌─── planner ───┐    ┌─── audit (JSONL) ───┐  │
              │   │ intent →      │    │ policy_decision +   │  │
              │   │   plan →      │    │ decision per call   │  │
              │   │   refusals    │    └─────────────────────┘  │
              │   └───────┬───────┘                             │
              │           ▼                                     │
              │   ┌─── VectorCache (Arc'd) ─────────────────┐   │
              │   │   witness → RaBitQ index (1-bit)        │   │
              │   │   pointers: (be, collection) → witness  │   │
              │   └──────────────────▲──────────────────────┘   │
              │                      │ prime (on miss)          │
              │   ┌────── BackendAdapter ─────────────────┐     │
              │   │  GCS Parquet · S3 · BigQuery · Delta  │     │
              │   │  Iceberg · RVF · FsBackend · Local    │     │
              │   └───────────────────────────────────────┘     │
              └─────────────────────────────────────────────────┘

Six sibling crates today

Crate	What it is	Tests
`ruvector-rulake` (root)	Core cache + router + RaBitQ kernel	43
`python/`	Python SDK (PyO3, ABI3 wheels)	14
`node/`	Node.js / TypeScript SDK (napi-rs)	10
`mcp-server/`	`rulake-mcp` — MCP server with RBAC + JWT + audit	38
`gcs-backend/`	Parquet-on-GCS BackendAdapter	4

109 tests across the repo. Every crate builds silently in --release.

Compared with everything else

	ruLake	Pinecone / Weaviate	BigQuery Vector Search	RaBitQ direct
Owns your data	❌	✅	—	—
1.02× direct-library speed	✅	❌	❌	✅
Witness-anchored (cross-process)	✅	❌	❌	❌
MCP-native for agents	✅	❌	❌	❌
Per-collection RBAC	✅	partial	partial	❌
OAuth-style JWT auth	✅	✅	✅	❌
Per-call audit trace	✅	partial	✅	❌
$0 to run	✅	❌	❌	✅

Design docs

ADR-155 — cache-first, federated refill, M2–M5 backend roadmap
ADR-156 — agent-memory substrate framing
ADR-001 — submodule + sibling-crate layout
ADR-002 — Python SDK (PyO3, ABI3 wheels)
ADR-003 — Node.js SDK (napi-rs)
ADR-004 — MCP server (1340 lines, the decision-layer + RBAC + auth spec)

Status (2026-04-26)

✅ M1 + M1.5 — core cache + bundle protocol + 3 consistency modes + persist end-to-end
✅ Audience shells — Python SDK, Node.js SDK, MCP server (4 intents, 4 capability tiers, 7 tools)
✅ First cloud backend — GCS Parquet (with cheap current_bundle() for the resource path)
✅ Auth + RBAC — bearer (dev-only), JWT with OAuth-style scope mapping, per-collection allow-lists, JSONL audit, replay-protection module
🚧 v0.4 tail — wire JWT/replay into HTTP middleware, layered rate limiting on the MCP path, tools/list filter by capability
🗺 Roadmap — M2 ParquetBackend on S3, M3 BigQueryBackend with push-down, M4 governance (RBAC/PII/lineage at the cache layer, OpenLineage emission), M5 Delta/Iceberg, WASM SDKs for browser/Cloudflare/Deno, GPU rabitq kernel, Java SDK

Open source. ❤️ Free forever. MIT OR Apache-2.0.