Skip to content

Instantly share code, notes, and snippets.

@karpathy
Created April 4, 2026 16:25
Show Gist options
  • Select an option

  • Save karpathy/442a6bf555914893e9891c11519de94f to your computer and use it in GitHub Desktop.

Select an option

Save karpathy/442a6bf555914893e9891c11519de94f to your computer and use it in GitHub Desktop.
llm-wiki

LLM Wiki

A pattern for building personal knowledge bases using LLMs.

This is an idea file, it is designed to be copy pasted to your own LLM Agent (e.g. OpenAI Codex, Claude Code, OpenCode / Pi, or etc.). Its goal is to communicate the high level idea, but your agent will build out the specifics in collaboration with you.

The core idea

Most people's experience with LLMs and documents looks like RAG: you upload a collection of files, the LLM retrieves relevant chunks at query time, and generates an answer. This works, but the LLM is rediscovering knowledge from scratch on every question. There's no accumulation. Ask a subtle question that requires synthesizing five documents, and the LLM has to find and piece together the relevant fragments every time. Nothing is built up. NotebookLM, ChatGPT file uploads, and most RAG systems work this way.

The idea here is different. Instead of just retrieving from raw documents at query time, the LLM incrementally builds and maintains a persistent wiki — a structured, interlinked collection of markdown files that sits between you and the raw sources. When you add a new source, the LLM doesn't just index it for later retrieval. It reads it, extracts the key information, and integrates it into the existing wiki — updating entity pages, revising topic summaries, noting where new data contradicts old claims, strengthening or challenging the evolving synthesis. The knowledge is compiled once and then kept current, not re-derived on every query.

This is the key difference: the wiki is a persistent, compounding artifact. The cross-references are already there. The contradictions have already been flagged. The synthesis already reflects everything you've read. The wiki keeps getting richer with every source you add and every question you ask.

You never (or rarely) write the wiki yourself — the LLM writes and maintains all of it. You're in charge of sourcing, exploration, and asking the right questions. The LLM does all the grunt work — the summarizing, cross-referencing, filing, and bookkeeping that makes a knowledge base actually useful over time. In practice, I have the LLM agent open on one side and Obsidian open on the other. The LLM makes edits based on our conversation, and I browse the results in real time — following links, checking the graph view, reading the updated pages. Obsidian is the IDE; the LLM is the programmer; the wiki is the codebase.

This can apply to a lot of different contexts. A few examples:

  • Personal: tracking your own goals, health, psychology, self-improvement — filing journal entries, articles, podcast notes, and building up a structured picture of yourself over time.
  • Research: going deep on a topic over weeks or months — reading papers, articles, reports, and incrementally building a comprehensive wiki with an evolving thesis.
  • Reading a book: filing each chapter as you go, building out pages for characters, themes, plot threads, and how they connect. By the end you have a rich companion wiki. Think of fan wikis like Tolkien Gateway — thousands of interlinked pages covering characters, places, events, languages, built by a community of volunteers over years. You could build something like that personally as you read, with the LLM doing all the cross-referencing and maintenance.
  • Business/team: an internal wiki maintained by LLMs, fed by Slack threads, meeting transcripts, project documents, customer calls. Possibly with humans in the loop reviewing updates. The wiki stays current because the LLM does the maintenance that no one on the team wants to do.
  • Competitive analysis, due diligence, trip planning, course notes, hobby deep-dives — anything where you're accumulating knowledge over time and want it organized rather than scattered.

Architecture

There are three layers:

Raw sources — your curated collection of source documents. Articles, papers, images, data files. These are immutable — the LLM reads from them but never modifies them. This is your source of truth.

The wiki — a directory of LLM-generated markdown files. Summaries, entity pages, concept pages, comparisons, an overview, a synthesis. The LLM owns this layer entirely. It creates pages, updates them when new sources arrive, maintains cross-references, and keeps everything consistent. You read it; the LLM writes it.

The schema — a document (e.g. CLAUDE.md for Claude Code or AGENTS.md for Codex) that tells the LLM how the wiki is structured, what the conventions are, and what workflows to follow when ingesting sources, answering questions, or maintaining the wiki. This is the key configuration file — it's what makes the LLM a disciplined wiki maintainer rather than a generic chatbot. You and the LLM co-evolve this over time as you figure out what works for your domain.

Operations

Ingest. You drop a new source into the raw collection and tell the LLM to process it. An example flow: the LLM reads the source, discusses key takeaways with you, writes a summary page in the wiki, updates the index, updates relevant entity and concept pages across the wiki, and appends an entry to the log. A single source might touch 10-15 wiki pages. Personally I prefer to ingest sources one at a time and stay involved — I read the summaries, check the updates, and guide the LLM on what to emphasize. But you could also batch-ingest many sources at once with less supervision. It's up to you to develop the workflow that fits your style and document it in the schema for future sessions.

Query. You ask questions against the wiki. The LLM searches for relevant pages, reads them, and synthesizes an answer with citations. Answers can take different forms depending on the question — a markdown page, a comparison table, a slide deck (Marp), a chart (matplotlib), a canvas. The important insight: good answers can be filed back into the wiki as new pages. A comparison you asked for, an analysis, a connection you discovered — these are valuable and shouldn't disappear into chat history. This way your explorations compound in the knowledge base just like ingested sources do.

Lint. Periodically, ask the LLM to health-check the wiki. Look for: contradictions between pages, stale claims that newer sources have superseded, orphan pages with no inbound links, important concepts mentioned but lacking their own page, missing cross-references, data gaps that could be filled with a web search. The LLM is good at suggesting new questions to investigate and new sources to look for. This keeps the wiki healthy as it grows.

Indexing and logging

Two special files help the LLM (and you) navigate the wiki as it grows. They serve different purposes:

index.md is content-oriented. It's a catalog of everything in the wiki — each page listed with a link, a one-line summary, and optionally metadata like date or source count. Organized by category (entities, concepts, sources, etc.). The LLM updates it on every ingest. When answering a query, the LLM reads the index first to find relevant pages, then drills into them. This works surprisingly well at moderate scale (~100 sources, ~hundreds of pages) and avoids the need for embedding-based RAG infrastructure.

log.md is chronological. It's an append-only record of what happened and when — ingests, queries, lint passes. A useful tip: if each entry starts with a consistent prefix (e.g. ## [2026-04-02] ingest | Article Title), the log becomes parseable with simple unix tools — grep "^## \[" log.md | tail -5 gives you the last 5 entries. The log gives you a timeline of the wiki's evolution and helps the LLM understand what's been done recently.

Optional: CLI tools

At some point you may want to build small tools that help the LLM operate on the wiki more efficiently. A search engine over the wiki pages is the most obvious one — at small scale the index file is enough, but as the wiki grows you want proper search. qmd is a good option: it's a local search engine for markdown files with hybrid BM25/vector search and LLM re-ranking, all on-device. It has both a CLI (so the LLM can shell out to it) and an MCP server (so the LLM can use it as a native tool). You could also build something simpler yourself — the LLM can help you vibe-code a naive search script as the need arises.

Tips and tricks

  • Obsidian Web Clipper is a browser extension that converts web articles to markdown. Very useful for quickly getting sources into your raw collection.
  • Download images locally. In Obsidian Settings → Files and links, set "Attachment folder path" to a fixed directory (e.g. raw/assets/). Then in Settings → Hotkeys, search for "Download" to find "Download attachments for current file" and bind it to a hotkey (e.g. Ctrl+Shift+D). After clipping an article, hit the hotkey and all images get downloaded to local disk. This is optional but useful — it lets the LLM view and reference images directly instead of relying on URLs that may break. Note that LLMs can't natively read markdown with inline images in one pass — the workaround is to have the LLM read the text first, then view some or all of the referenced images separately to gain additional context. It's a bit clunky but works well enough.
  • Obsidian's graph view is the best way to see the shape of your wiki — what's connected to what, which pages are hubs, which are orphans.
  • Marp is a markdown-based slide deck format. Obsidian has a plugin for it. Useful for generating presentations directly from wiki content.
  • Dataview is an Obsidian plugin that runs queries over page frontmatter. If your LLM adds YAML frontmatter to wiki pages (tags, dates, source counts), Dataview can generate dynamic tables and lists.
  • The wiki is just a git repo of markdown files. You get version history, branching, and collaboration for free.

Why this works

The tedious part of maintaining a knowledge base is not the reading or the thinking — it's the bookkeeping. Updating cross-references, keeping summaries current, noting when new data contradicts old claims, maintaining consistency across dozens of pages. Humans abandon wikis because the maintenance burden grows faster than the value. LLMs don't get bored, don't forget to update a cross-reference, and can touch 15 files in one pass. The wiki stays maintained because the cost of maintenance is near zero.

The human's job is to curate sources, direct the analysis, ask good questions, and think about what it all means. The LLM's job is everything else.

The idea is related in spirit to Vannevar Bush's Memex (1945) — a personal, curated knowledge store with associative trails between documents. Bush's vision was closer to this than to what the web became: private, actively curated, with the connections between documents as valuable as the documents themselves. The part he couldn't solve was who does the maintenance. The LLM handles that.

Note

This document is intentionally abstract. It describes the idea, not a specific implementation. The exact directory structure, the schema conventions, the page formats, the tooling — all of that will depend on your domain, your preferences, and your LLM of choice. Everything mentioned above is optional and modular — pick what's useful, ignore what isn't. For example: your sources might be text-only, so you don't need image handling at all. Your wiki might be small enough that the index file is all you need, no search engine required. You might not care about slide decks and just want markdown pages. You might want a completely different set of output formats. The right way to use this is to share it with your LLM agent and work together to instantiate a version that fits your needs. The document's only job is to communicate the pattern. Your LLM can figure out the rest.

@earaizapowerera
Copy link
Copy Markdown

Hace un año y medio falleció mi mamá. Y, como muchos, me quedé con conversaciones pendientes… preguntas sin hacer… y respuestas que solo el tiempo empieza a revelar.

Mi mamá escribió un libro: “El Parkinson, mi amigo, mi maestro”. Un testimonio profundo, valiente y lleno de aprendizajes sobre la vida, la resiliencia y el sentido.

Hace unos meses tomé una decisión poco convencional: subí su libro a un sistema de IA tipo RAG. aquí comparto mi publicación: https://www.linkedin.com/posts/andres-felipe-zuluaga-echeverry-a5185421_ia-ia-ia-activity-7347605447505244161-PDd1?utm_source=social_share_send&utm_medium=member_desktop_web&rcm=ACoAAASTJd0BVlK3PwZxp0sMR36aXx8EE3X-qNE

Y empezó a pasar algo poderoso.

Comencé a tener “conversaciones” con ese conocimiento. Le hacía preguntas… y de alguna manera, mi mamá a través de sus palabras me respondía, me cuestionaba, me aterrizaba.

No era magia. Era memoria estructurada.

Pero ahora, con la evolución hacia modelos como LLM Wiki y herramientas como ElevenLabs, me doy cuenta de algo aún más profundo:

👉 Ya no se trata solo de consultar información. 👉 Se trata de reconstruir una presencia.

Hoy veo la posibilidad de ir más allá del libro:

* integrar sus videos

* incluir mensajes de voz

* correos

* reflexiones sueltas

* incluso momentos cotidianos

Y convertir todo eso en una “wiki viva” de su pensamiento, su esencia y su forma de ver el mundo.

No para reemplazarla. Eso es imposible.

Sino para preservar algo invaluable: su manera de hacer preguntas. su forma de interpretar la vida. su voz interior.

Esto abre una conversación mucho más grande:

¿Y si la tecnología no solo sirve para automatizar… sino para amplificar lo más humano que tenemos?

¿Y si podemos construir legados vivos, que sigan inspirando, cuestionando y acompañando a quienes vienen después?

Para mí, esto ya no es solo tecnología. Es una nueva forma de memoria. Una nueva forma de conexión. Y, de alguna manera… una nueva forma de amor.

muy interesante. ¿El libro lo divide en capítulos o "pedazos" más pequeños? o cómo estructuró el wiki? De acuerdo que programar/automatizar solo es una de mil aplicaciones.

@earaizapowerera
Copy link
Copy Markdown

wdyt about this, sounds like a neat implementation of the principles ? https://github.com/milla-jovovich/mempalace

Clever project, but it solves a different problem. MemPalace is about recall — "what did I say 3 months ago?" It stores conversations verbatim and searches them. Karpathy's approach is about compiled knowledge — the LLM doesn't just store, it builds structured understanding with connections and summaries. That's a fundamentally different thing. I've been building along Karpathy's line but for teams — hierarchical knowledge with automatic inheritance, where every element knows its place in the structure. The quick explanation in: https://waykee.com/ (open source in a few days)

@aarora79
Copy link
Copy Markdown

aarora79 commented Apr 8, 2026

My take on this idea -> https://github.com/aarora79/personal-knowledge-base, extends it by saying we can have a Claude Skill do the raw -> wiki conversion -> query | lint; generate a visual graph that you can see linking the concepts.

@arpitnath
Copy link
Copy Markdown

I have been running a version of this similar pattern. Started from a DNS analogy, instead of everything being a blob of text, what if we have typed records, each record has a type (SUMMARY, META, SOURCE, ALIAS, COLLECTION) that tells the agent how to consume it. So when the agent searches for "obsidian sync", the library knows the introduction file is the canonical answer and not one of the 42 other files that mention it.

Ran benchmarks on 3 public corpora (quartz -76 files, obsidian help - 171 files, mdn - 14k files), on mdn, grep returns 1212 files per query unranked and blink-query returns 5 ranked in 10ms. The speed gap gets bigger as corpus grows , 28x faster on small wikis, 83x on mdn.
On the 14k files, grep returns an average of 1212 unranked files per query because common terms like "Promise" appear in 1314 files, "DOM" in 9363. blink returns top 5 ranked. Therefore, the agent reads ~242x fewer files to find the answer.

Where it currently breaks or struggles: entity queries on very common terms where BM25 can't pick the canonical page without graph-aware signals.

Whole benchmark is one command: npm run benchmark​
https://github.com/arpitnath/blink-query

@xoai
Copy link
Copy Markdown

xoai commented Apr 8, 2026

Here are some updates from sage-wiki as I work on building a comprehensive tool based on this idea.

Screenshot 2026-04-08 112628 autoresearch-loop-—-sage-wiki-04-07-2026_11_01_AM
  • TUI (Text User Interface): In addition to using Obsidian as a viewer for your wiki, you now have two built-in alternatives: a web UI and a TUI. The TUI offers a four-tab terminal dashboard, allowing you to browse articles with rendered Markdown, perform fuzzy searches with previews, engage in streaming Q&A with citations, and access a live compile dashboard that monitors your sources and automatically recompiles them. Remember, this is your data and your tool, so you are free to choose whichever viewer you feel most comfortable with.

  • Cost Optimization: This feature is particularly beneficial for those with a large vault of documents (for example, 10,000 or more). It includes prompt caching (saving 50-90% on input tokens from providers like Anthropic, Gemini, or OpenAI), batch API support (using compile --batch for a 50% discount via asynchronous processing), and cost tracking that provides a breakdown after every compile. You can also use compile --estimate to preview costs before committing. Additionally, there's an auto-batch mode that activates when you have more than ten sources to process. The compile pipeline now clearly shows what you're spending and where your costs are coming from, which is crucial once your wiki expands beyond just a few dozen sources.

sage-wiki is a single, cross-platform binary that works with any provider. Just drop your files into a folder, and you'll have a wiki ready to go. You can even turn it into an MCP so any LLM can work with your "second brain" easily.

Feel free to provide feedback and contribute more.

@marciopuga
Copy link
Copy Markdown

Amazing thinking as usual @karpathy!
I particularly loved the Memex reference

The Memex was a hypothetical device — envisioned as a mechanized desk with microfilm storage — that would let a person store all their books, records, and communications, then retrieve and link them together through associative "trails." Bush argued that the human mind works by association rather than indexing, and that our tools for managing knowledge should reflect that.

This was my take on Personal Knowledge over text: https://github.com/marciopuga/cog

@Pratiyush
Copy link
Copy Markdown

https://github.com/Pratiyush/llm-wiki - Work in progress - HELP in Issues and Suggestions Needed

@anzal1
Copy link
Copy Markdown

anzal1 commented Apr 8, 2026

Took this pattern and built it into a zero-config CLI: npx quicky-wiki init auto-detects your API keys and picks the best model. Full pipeline — ingest, query, lint, prune, serve.

A few things I added beyond the core pattern:

  • Confidence-scored claims — every fact gets a confidence score and source citation. Single-source claims stay low-confidence; corroborated claims across sources get promoted. Helps with @asong56's hallucination concern — contested claims are surfaced, not buried.
  • Temporal tracking — claims are timestamped so you can see knowledge evolution and flag stale facts.
  • Live dashboard — Obsidian-style force-directed graph (Canvas 2D with level-of-detail for performance at 300+ nodes), plus built-in LLM chat for querying the wiki directly.
  • Multi-provider — Anthropic, OpenAI, Gemini, Ollama, or any openai-compatible endpoint (Groq, Together, vLLM, LM Studio).

Works with markdown files, URLs, or any text source. One command to get started:

npx quicky-wiki init

https://github.com/anzal1/quicky-wiki

@dolzenko
Copy link
Copy Markdown

dolzenko commented Apr 8, 2026

Is there any tool (or will it even make sense at all) to route all my recorded codex cli sessions to something like this to build the KB out of months of work with the agent?

@Bytekron
Copy link
Copy Markdown

Bytekron commented Apr 8, 2026

This is one of the first writeups on “LLM + knowledge base” that actually clicks for me, because it shifts the focus away from pure retrieval and toward accumulation. The line of thinking that stood out most is that most document workflows keep forcing the model to rediscover the same patterns over and over again, while a maintained wiki turns that repeated effort into a durable asset. That feels much closer to how people actually build expertise.

What I like here is that this is not just “RAG but nicer.” The important difference is the idea of synthesis as a first-class artifact. Instead of treating every answer as disposable chat output, the useful parts get promoted into pages, relationships, summaries, contradictions, and cross-links. That is a much better mental model for long-term work, especially when the source material is messy, repetitive, or constantly changing.

I also think this pattern becomes especially powerful in narrow domains where there is a lot of semi-structured information and a lot of recurring questions. For example, I run projects in the Minecraft ecosystem like Minelist and MinecraftServer.buzz, and one thing that becomes obvious very quickly is how much information piles up around servers, versions, gamemodes, metadata quality, vote systems, SEO content, duplicate detection, moderation notes, and historical changes. A traditional search layer helps you retrieve fragments, but it does not really “understand the estate” over time. A maintained wiki layer could.

In that kind of setting, an LLM-maintained wiki could become the connective tissue between raw scraped data, editorial notes, taxonomy decisions, and user-facing content. One page could track how a specific server evolved over time. Another could map tag ambiguity across categories like SMP, survival, vanilla, or modded. Another could explain why certain duplicate-host patterns appear across listings. Over weeks or months, that becomes much more valuable than a pile of disconnected documents or one-off prompts.

I also really agree with the point that the hardest part of knowledge systems is not storing information, it is maintenance. Humans are usually willing to create a page once, but they are much less willing to update ten related pages, fix broken links, revise old claims, and keep a taxonomy coherent. That is exactly the kind of repetitive but context-sensitive work LLMs are surprisingly well suited for. Not because they are always right, but because they make the cost of maintaining structure low enough that the structure can actually survive.

The “wiki is the codebase” analogy is also very strong. It suggests a workflow where the human’s role is curation, judgment, and direction, while the model handles the mechanical burden of integration and refactoring. That feels like a more realistic and productive division of labor than pretending the model should simply answer everything on demand from a pile of uploads.

One thing I would be very interested in is how people handle quality control once the wiki grows beyond a hobby-sized vault. For example, how do you best represent confidence, source freshness, disagreements between sources, and unresolved ambiguities without turning the whole thing into bureaucratic overhead? There is probably a sweet spot where the schema is structured enough to keep the system disciplined, but not so rigid that the workflow becomes annoying.

I also wonder whether the best implementations of this idea will end up being domain-specific rather than universal. A personal research wiki, a company knowledge base, and a vertical operational system probably want different page types, different update policies, and different notions of truth. The pattern feels general, but the actual payoff probably comes from tailoring it hard to a specific domain.

Either way, this gist describes something much more interesting than the usual “chat with your docs” framing. It treats knowledge work as something cumulative, revisable, and alive. Im gonna use it as a reference for a blog post on Minelist and MinecraftServer.buzz about LLM's and Minecraft. That feels much closer to how serious research, operations, and even niche content businesses actually work in practice.

@MehmetGoekce
Copy link
Copy Markdown

Built an implementation using Claude Code + Logseq/Obsidian with a two-layer cache architecture: L1 (auto-loaded rules in Claude's memory) + L2 (on-demand wiki in Logseq/Obsidian). The key insight was that not all knowledge belongs in the wiki — critical rules must be auto-loaded every session.
Includes a /wiki skill with ingest, query, lint, and a schema that enforces page types and cross-references. Setup in 5 minutes via ./setup.sh.
Full write-up: https://mehmetgoekce.substack.com/p/i-built-karpathys-llm-wiki-with-claude
Repo: https://github.com/MehmetGoekce/llm-wiki

@jakob1379
Copy link
Copy Markdown

Is there any tool (or will it even make sense at all) to route all my recorded codex cli sessions to something like this to build the KB out of months of work with the agent?

bash?

for convo in $(insert command that yields each conversation to an array); do <codex|claude|...|> add this to my wiki; done

@shibing624
Copy link
Copy Markdown

Great writeup! Re: the CLI tools section where you mention qmd as a local search engine for the wiki — wanted to share an alternative approach we've been working on: TreeSearch.

The core difference: two fundamentally different retrieval philosophies.

QMD takes the RAG-enhanced route: chunk documents → BM25 + vector search → LLM query expansion → LLM re-ranking. It runs 3 local models (~2GB) and gets strong semantic results, but at the cost of model loading and inference latency.

TreeSearch takes the structure-first route: no chunking, no embeddings, no models at all. Instead of splitting documents into fixed-size chunks and retrieving by vector similarity (which destroys heading hierarchy), it parses documents into tree structures based on their natural heading hierarchy, then uses SQLite FTS5 keyword matching with structure-aware scoring (title match, term overlap, IDF weighting, generic section demotion). Zero models, pure CPU, millisecond latency.

Quick comparison:

QMD TreeSearch
Core approach BM25 + vector + LLM reranking Structure-aware tree search, no embeddings
File formats Markdown only MD, Code (Python AST + regex), PDF, DOCX, JSON, HTML, XML, CSV — 10+ types
Model dependency 3 local models (~2GB) Zero — pure heuristic scoring
Code search Not supported Supported (CodeSearchNet MRR 0.91)
Query latency Seconds (model inference) Milliseconds (5,000 docs < 10ms)
Best for "I don't remember exactly what I wrote" — fuzzy semantic queries "The doc has clear structure and keywords can anchor position" — structured queries

For the wiki pattern specifically, TreeSearch is a good fit because wiki pages are inherently well-structured markdown with heading hierarchies — exactly the kind of documents where structure-aware retrieval shines. And since it's zero-dependency (just SQLite), it adds no infrastructure overhead to the wiki setup.

pip install pytreesearch
treesearch "How does auth work?" wiki/

Both tools are complementary — QMD for when you need deep semantic understanding, TreeSearch for when structure and speed matter most. The right choice depends on your wiki's size and query patterns.

@a-ml
Copy link
Copy Markdown

a-ml commented Apr 8, 2026

Been thinking about this a lot lately. We've been trying to do this with cognition. Not the things you know, but the way you actually think. The heuristics you apply without noticing, the tensions between things you believe, the mental models that shape every decision before you're even aware you're making one.

The hard part isn't storage, it's extraction. You can't just ask someone what their values are. You have to start from a real decision. What did you reject? What tradeoff actually mattered to you? What rule did you apply on instinct? Our approach, an LLM reads through conversation transcripts on a schedule and classifies what it finds against a strict hierarchy of types. Decision rule, framework, tension, preference. "Idea" is last resort. Everything gets a confidence score and an epistemic tag so the system knows the difference between something you're sure about and something you're still working out.

Typed edges rather than a flat list. Supports, contradicts, evolved_into, depends_on. That's what makes it traversable rather than just searchable. An agent can walk the contradictions in your own reasoning, find connections between domains you never explicitly linked, or surface something you've been circling for weeks without naming it.

Nodes decay too, which felt important. Values hold. Ideas fade fast. The graph is supposed to model what's live in your thinking right now, not accumulate everything you've ever said, but that's probably a personal choice.

Mine has 8,000+ nodes at this point, 16 MCP tools, runs as an npx server. Curious whether the decay model resonates with you or whether you'd approach that part differently.

https://github.com/multimail-dev/thinking-mcp

Very interesting

@xoai
Copy link
Copy Markdown

xoai commented Apr 8, 2026

Is there any tool (or will it even make sense at all) to route all my recorded codex cli sessions to something like this to build the KB out of months of work with the agent?

sage-wiki can act as an MCP (Model Context Protocol) server, letting you save knowledge directly from your AI conversations into your wiki. Instead of losing insights when a chat session ends, you can tell your AI to capture them.

Say you're debugging a performance issue with your AI and discover that the bottleneck is in the database connection pool, not the query itself. At the end of the session:

"Capture the key findings from this debugging session. Tag with postgres, performance."

The AI extracts items like:

  • "connection-pool-bottleneck" - The actual performance issue was exhausted connections, not slow queries
  • "pgbouncer-transaction-mode" - Transaction-level pooling resolved the issue; session-level was causing connection hoarding

These become source files that the compiler weaves into your wiki's knowledge graph. For old conversations, you can export data from ChatGPT or Claude and put it in your wiki folder.

@Helleeni
Copy link
Copy Markdown

Helleeni commented Apr 8, 2026

This is so brilliant! I built a personal Wiki containing my programming projects over a lunch hour (though burnt through my tokens for one Claude Code session :-). Anyway, great idea and so easy to implement. Just sharing the prompt! Thank you so much!

@ZimoLiao
Copy link
Copy Markdown

ZimoLiao commented Apr 8, 2026

This resonates deeply — and it’s exciting to see this idea articulated so clearly.

We’ve actually been building something along these lines with ScholarAIO, but pushing it one step further toward a fully executable system.

The core alignment is strong: instead of treating knowledge as something to retrieve at query time, we treat it as something to compile, structure, and continuously evolve into a persistent, navigable knowledge base. In practice, this looks very much like an LLM-maintained wiki layer that grows over time.

Where ScholarAIO goes beyond the “LLM Wiki” concept is in closing the loop between knowledge and action.

  • The wiki is not just a passive memory — it becomes an operational substrate for agents.
  • Knowledge doesn’t stop at summaries or cross-references — it is directly translated into executable workflows, scripts, and tool interactions.
  • Every interaction (successful or failed) can be written back, turning the system into a self-improving research environment, not just a knowledge store.

In other words, instead of:

sources → wiki → answers

we are building toward:

sources → evolving wiki → agents → tools → results → wiki

Another key difference is scalability. Because the system is built around modular ingestion + schema-driven structuring + tool abstraction, it can expand to new domains at near-zero marginal cost. Adding a new field is no longer a matter of rebuilding pipelines — it’s simply a matter of plugging in new documentation and tool interfaces, and letting the system compile itself.

What emerges is less a “better RAG” and more a domain-agnostic knowledge-to-action engine.

Really exciting direction — it feels like this pattern (LLM as compiler of living knowledge systems) is going to underpin a lot of the next generation of agentic software.

@KaifAhmad1
Copy link
Copy Markdown

@karpathy
This framing of compilation over retrieval really resonates.
We’ve been building something similar with Semantica — a semantic layer that turns unstructured data into structured, explainable knowledge graphs with provenance and reasoning.
Feels like this could become a core layer for agent systems.
https://github.com/Hawksight-AI/semantica

@realaaa
Copy link
Copy Markdown

realaaa commented Apr 8, 2026

this is a great concept - thanks for sharing ! I was thinking along the lined of doing such for my personal PKI based on TiddlyWiki

plus also for commercial ones in my case those would be Nextcloud Collectives type wikis

@Foroutsweg
Copy link
Copy Markdown

Nice

@aaronmrosenthal
Copy link
Copy Markdown

aaronmrosenthal commented Apr 8, 2026

Added to ToolKode, Thank you. https://www.npmjs.com/package/@toolkit-cli/toolkode
WikiGraph engine. Knowledge compounds across sessions.

@grishasen
Copy link
Copy Markdown

grishasen commented Apr 8, 2026

The approach resonates deeply and seems very promising. However, after spending two days building the library from documentation and team message threads, it appears too niche compared to building a local RAG system in a single day and using it as a knowledge base. RAG is immediately useful, whereas the wiki build feels far from complete and consumes a significant number of tokens. It's a great idea indeed, just feeling that practically it may be not so different from creating Wiki yourself.

@Thrimbda
Copy link
Copy Markdown

Thrimbda commented Apr 8, 2026

thank you for your amazing idea, here's a skill I've created based on this gist: llm-wiki

@jurajskuska
Copy link
Copy Markdown

jurajskuska commented Apr 8, 2026

Is there any tool (or will it even make sense at all) to route all my recorded codex cli sessions to something like this to build the KB out of months of work with the agent?

Yes there is locally saved jsonl for each session in .claude. I am indexing it and using as the second deeper level of the sessions..so claude could always when rerquired to go through and see not only what did you ask but also what he responded. I am also using ctx mcp so saving tokens heavilly and included sqlite and obsidian in there so it is a very native working with the same sources as claude to be synchronised. I appreciated that also non AI specialist can work with md files in obsidian so the whole qa team could be involved in the same sources and knowledges.

@emailhuynhhuy
Copy link
Copy Markdown

Thank you for sharing. Your post gave me the courage to share my own 'raw' progress — and helped me understand why what I built actually works.

The problem that broke my trust in generation: Using cloud LLMs or NotebookLM to build n8n automation workflows kept producing the same failure mode: plausible-looking JSON that missed critical execution details. The logic looked right. It failed silently in production. For complex automation, "mostly correct" isn't a degraded state — it's a broken state.

What I built instead — a Deterministic Retrieval System:

I organized thousands of validated n8n workflow JSONs on a local NAS. Each is mapped to an Obsidian MD file with rich metadata: tags, process steps, and a direct pointer to the source JSON.

It maps directly to your three-layer architecture:

  • Raw sources: validated JSONs — immutable, never touched by the LLM
  • Wiki layer: Obsidian MD files — not for reading, but for navigation
  • Schema: the local AI acts purely as a router. It traverses the graph, finds the right metadata pointer, and retrieves the pre-validated JSON for the team to paste and run.

Instead of asking an LLM to generate a workflow, we ask it to find one. 100% reliable. No hallucinated logic.

Your framing of the wiki as a "persistent, compounding artifact" is what made this click. The Obsidian graph is my fast navigation layer — seeing how workflows connect, identifying direction. The NAS is the deep execution layer — deterministic, no surprises.

Where I'm taking this next:

I'm now applying this same pointer-based pattern to other knowledge bases beyond workflows — testing whether the same reliability holds when the "source of truth" is less structured than JSON (documentation, SOPs, client briefs). The hypothesis is that the pattern generalizes: as long as the retrieval layer is deterministic and the wiki layer handles navigation, generation becomes optional rather than necessary.

The tension I can't fully resolve yet:

Pointer-based retrieval works perfectly when there's a match. But when a novel request arrives — something that doesn't exist in the library — the system is blind. Falling back to generation breaks the reliability I've built. Staying purely deterministic means the system can't grow into genuinely new territory.

Your wiki pattern handles novelty well because the LLM can still synthesize across existing pages. I'm wondering if there's a hybrid path: deterministic retrieval for known cases, but a wiki-style synthesis layer that absorbs novel cases over time — and promotes them into validated sources once tested in production.

Do you see a way to maintain that level of reliability at the retrieval layer while keeping the system fluid at the edges?

🚀 From Imagination to Reflex: Why Your AI Strategy Needs a Predictability Generation Engine
(Following my previous post on building a Deterministic Retrieval System to eliminate AI randomness)
In the last post, we discussed how AI "finds the right" data through Pointers. The next question is: How do we ensure AI "does the right thing" when executing that data?
The answer lies in a critical technical argument from Anthropic’s Claude’s Agent Skills design philosophy. It explains why a high-performance system is never an accident:

  1. "Run, don't Read"
    In Claude's Agent Skills documentation, Anthropic provides a golden rule:
    "Scripts in your skill directory can run without loading their contents into context. The script executes and only the output consumes tokens. The key instruction to include in your SKILL.md is to tell Claude to run the script, not read it."
    Why does this matter?
    The Old Way (Read): You ask AI to generate a PDF or Word report. It might look okay, but the formatting often breaches company standards, forcing you to edit it manually.
    My Approach (Run): AI acts as a Router. Upon receiving a request, it points directly to a Tested Code script—such as a standardized Word Template—and executes it immediately.
  2. Three Core Benefits of "Generate on Templates"
    Moving from Stochastic Generation (probabilistic content) to Predictability Generation (deterministic results) brings transformative efficiency:
    Absolute Stability: Tested code always runs correctly; refined corporate knowledge remains deterministic (e.g., standard tables). It is far easier and more reliable to have AI execute a pre-defined process than to let it "hallucinate" a new one and hope for the best.
    Resource Optimization: Only the final output consumes tokens. You no longer pay for the AI to "imagine" or redefine things that should be immutable standards in your business.
    Environment Validation: Execution scripts can verify prerequisites—such as "Is the API active?"—before running. This level of operational control is something pure AI-generated text can never achieve.
  3. AI as a Lever, Not a Replacement
    I believe AI's progress isn't about how "smart" it is to replace humans, but how accurately it executes the knowledge humans have standardized. This is where humans become irreplaceable: AI cannot intuitively know your specific "standards."
    Instead of letting AI "swim" in a sea of uncertain data, build it a system of "rails" (Pointers) and "execution stations" (Scripts). This is the only way to turn AI into a true lever for your capabilities, allowing us to perform complex tasks that far exceed our inherent skills.
    👉 The Takeaway: Don't expect AI to "think" correctly for you every time. Design it so AI "does" exactly what you have already validated.
    🔗 References:
    Andrej Karpathy’s insights on sustainable knowledge systems.
    Anthropic’s Claude’s Agent Skills course.
deterministic generation

@lkishfy
Copy link
Copy Markdown

lkishfy commented Apr 8, 2026

One disadvantage might be that AI hallucinations can become permanently embedded as facts, causing errors to propagate. It also has maintenance burden, you have to check and clean the notes.

Good point. I handle this with an actor-network-inspired graph (Re: Bruno Latour) where nodes connect through typed associations. I then have a retrieval system that prioritizes based on network weight, centrality, freshness, controversy signals, and gateway bottlenecks—so what surfaces is what the graph actively supports, not every stale claim equally.

Errors can still enter, but they won't propagate as truth unless the graph keeps reinforcing them. Day-to-day knowledge work and capture becomes triage on noisy areas, not endless manual cleanup; though some linting never hurts.

@Pratiyush
Copy link
Copy Markdown

Pratiyush commented Apr 8, 2026

@waydelyle
Copy link
Copy Markdown

SwarmVault update — quick follow-up from my earlier comment. We've been shipping steadily since then and just hit v0.1.27. Some highlights:

  • Parser-backed code analysis across 12+ languages (JS/TS, Python, Go, Rust, Java, C#, C/C++, Ruby, PHP, PowerShell) — the knowledge graph now understands module boundaries, exports, and call relationships, not just text
  • swarmvault add for capturing arXiv papers, DOIs, tweets, and articles with normalized frontmatter — research workflows feed directly into the vault
  • Semantic similarity edges + hyperedges in the graph, with embedding caching so local queries stay fast
  • Interactive graph viewer with search, filters, and export to HTML/SVG/GraphML/Cypher
  • Repo-aware watch mode with git hooks (post-commit/post-checkout) — the vault stays current as your codebase evolves
  • Fully offline-capable — graph traversal, search, and the viewer all work locally. Remote assets are localized on ingest

The core philosophy hasn't changed: every operation (ingest, compile, query, lint) writes durable artifacts that compound over time. The vault is the product, not ephemeral chat sessions.

Still provider-agnostic — works with OpenAI, Anthropic, Gemini, Ollama, OpenRouter, Groq, Together, xAI, Cerebras, or fully offline with the heuristic provider.

Would love feedback from anyone building on top of the LLM Wiki pattern. PRs and issues welcome.

https://github.com/swarmclawai/swarmvault

@1024205457-boop
Copy link
Copy Markdown

Hi Andrej! Your course was my introduction to AI — it's been incredibly inspiring to follow your work since then.

▎ I built a Venn diagram + note-taking tool powered by AI about a week ago. When I saw you publish this LLM Wiki pattern, I couldn't wait to integrate it. The result: instead of markdown files, concepts live as interactive nested Venn diagrams with a bidirectional Wiki panel. Each node is both a circle in the diagram and a Wiki page. Also added AI-powered Lint (contradiction/duplicate detection) and diagram merge, inspired by the Lint operation you described.

https://github.com/1024205457-boop/Venn

▎ Thank you for sharing this pattern!

@ClayGendron
Copy link
Copy Markdown

This is a great explanation of a problem I have been working to solve with a project called grover.

grover is an in-process agentic file system, and my hope is that it becomes the virtual file system organizations use to engineer their own knowledge bases — something like an "AI semantic layer."

I came to this through trying to build an agent that could navigate and understand my organization's database metadata to enable code generation, and it was immediately clear that graph relationships were essential context. grover grew out of trying to blend concepts from file systems, graph traversal, and MCP into a single CLI-driven interface.

As I continue to build out grover, I believe it could be a tool that is used to implement this LLM Wiki concept.

  • Read-only sources, writable synthesis, one file system. A grover mount has directory-level permissions, so you can create a /wiki directory where /wiki/raw/* is read-only (human-curated, immutable, the source of truth) while /wiki/synthesis/* or other directories are LLM-writable. Answers, comparisons, and explorations from user conversations get filed back as new pages under /synthesis/, with explicit links to the raw sources they cite or to prior syntheses. The raw layer stays trustworthy and the synthesis layer compounds from humans + LLM interactions.
  • Cross-references are first-class edges, not text. grover stores links between files as persistent records (/.connections), separate from document content. That means lint becomes a graph query — orphans, hubs, missing cross-refs, and stale claims fall out of pagerank / neighborhood / meeting_subgraph instead of an LLM crawl over hundreds of files. A piece I am now planning to add next is a markdown analyzer that parses [[wikilinks]] from page content on write/edit and auto-generates the connections.
  • Unix primitives the agent already knows. CRUD, glob/grep, semantic + lexical + vector search, and graph traversal share one composable CLI and result type, exposed through a single MCP tool. Your grep "^## \[" log.md | tail -5 trick generalizes — no need for a specific tool per workflow.

Last thing, this repo is only about a month old and I have a ton of work I still want to do, but wanted to share it as at least a thought project to this discussion. Thanks for reading!

pip install grover

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment