Inspired by Andrej Karpathy’s “LLM Wiki” idea.
We’re building Semantica — a semantic layer that transforms unstructured data into persistent, structured knowledge graphs with reasoning and provenance.
The goal is simple:
Move beyond retrieval systems → toward systems that build and maintain understanding
🔗 https://github.com/Hawksight-AI/semantica
Most LLM systems today rely on retrieval (RAG):
- Upload documents
- Retrieve relevant chunks at query time
- Generate answers
This works — but it has a fundamental limitation:
The system re-discovers knowledge from scratch on every query.
There is no accumulation. No memory. No evolving understanding.
As Andrej Karpathy points out, the real shift is:
From retrieving fragments → to compiling knowledge
Instead of repeatedly searching raw documents, systems should:
- Extract entities and relationships
- Build structured representations
- Maintain cross-references
- Continuously update knowledge
The system evolves from:
documents → retrieval → answers
to:
documents → structured knowledge → evolving understanding → answers
Semantica is built around this exact paradigm.
Instead of chunk-based retrieval, it focuses on:
- Entity extraction & linking
- Graph-based knowledge representation
- Provenance tracking (why something is true)
- Reasoning over structured context
This creates a persistent semantic layer that LLMs can rely on — rather than re-deriving context every time.
The hardest part of knowledge systems isn’t storage — it’s maintenance.
Humans don’t scale at:
- updating cross-references
- resolving inconsistencies
- keeping knowledge fresh
LLMs can.
This is where compiled systems win.
RAG retrieves. Semantica builds understanding.
We’re moving from:
“find relevant text”
to:
“construct and evolve knowledge”
If you’re exploring this direction or building similar systems, would love to connect.