This note summarizes a long discussion about Claude Opus 4.7, 1M-token context, open-source and commercial SOTA RAG, LLM Wiki-style knowledge compilation, Cognee, Letta, and a production architecture for a mid-sized legal/government/community agentic organization.
The core conclusion is simple:
Do not build "a chatbot with a CRM."
Build an Agentic Case + Knowledge OS.
For a roughly 300-daily-user organization doing legal, government, and community-accessible operations, the system should have four separate cores:
1. CRM / capability graph
2. Grounded knowledge layer
3. Workflow / task engine
4. Agent runtime with approvals, memory, and observability
The LLM should not be the system of record. It should act as a planner, drafter, router, summarizer, analyst, and monitor. Deterministic systems should own identity, permissions, CRM state, task state, rules, audit logs, and high-impact approvals.
Claude Opus 4.7's 1M-token context is best understood as a large context window, not persistent session memory. Technically, serving long context relies heavily on KV cache management: transformers cache key/value tensors for prior tokens so future tokens can attend to them without recomputing everything.
A 1M-token window can fit most books and many large document packets. But fitting text is not the same as reliably using it. Long context still suffers from effective-context issues such as:
- lost-in-the-middle behavior
- distractor sensitivity
- weak multi-hop retrieval over long contexts
- poorer performance when the input is bloated or poorly structured
Practical conclusion:
Use long context for synthesis over curated evidence.
Do not use long context as your primary search strategy.
The robust SOTA architecture is:
parse
-> hybrid retrieve
-> rerank / late interaction
-> evidence pack
-> long-context synthesis
-> continuous eval
In practice:
raw corpus
-> structured parsing
-> dense + sparse/BM25 search
-> top 100-300 candidate retrieval
-> rerank to top 20-60
-> pack final evidence with source IDs
-> answer with citations
-> evaluate retrieval, grounding, and task outcomes
The best systems shrink the haystack before asking the model to reason over it.
There are two different requirements:
Knowledge that compounds
= organizational understanding that improves over time.
Answers that must be grounded
= claims tied back to auditable source evidence.
These should not be collapsed into one layer.
Recommended split:
Compiled knowledge layer:
LLM Wiki, graph memory, summaries, strategy notes, decision records
Ground-truth layer:
immutable raw sources, parsed spans, page/section IDs, citations, provenance
Rule:
Compiled memory helps the organization think.
Raw source citations establish what is true.
Karpathy's LLM Wiki idea is best understood as knowledge compilation:
raw sources
-> LLM-maintained markdown/wiki pages
-> entity pages, topic pages, contradictions, summaries, decision records
-> query the compiled layer
-> fall back to raw sources for exact evidence
This is excellent for research, strategy, competitive intelligence, organizational memory, and long-running projects.
Production caveat:
The wiki is a lossy derived artifact.
It must not be the only source of truth.
Best production form:
raw/ immutable source documents
wiki/ reviewed markdown knowledge base
index/ hybrid search over raw + wiki
kg/ typed relationships and temporal facts
eval/ stale-claim, citation, contradiction checks
Cognee is a strong candidate for the graph/vector memory layer. It is useful because it can organize data into a graph of raw information, extracted concepts, summaries, entities, and relationships.
Where Cognee fits well:
- relationship-aware memory
- entity and concept extraction
- semantic + graph retrieval
- enrichment over time
- organizational knowledge that compounds
Where Cognee still needs engineering:
- exact source-span citation resolution
- audit-grade page/section provenance
- claim-level verification
- human review for high-stakes graph mutations
- retrieval and groundedness evals
Best framing:
Cognee = compiled graph/vector memory layer
Raw source store = truth layer
Citation resolver = trust boundary
LLM = synthesis layer
Letta is meaningfully open source and self-hostable for core agent memory/server use. The hosted letta.com experience adds cloud/API convenience, hosted UX, managed services, and product features.
Best role for Letta:
persistent agent memory
agent identity/persona
long-running task context
user/team preferences
stateful coding/research agents
Not enough by itself for:
legal/government-grade grounding
source citations
case-level provenance
enterprise knowledge graph truth
high-stakes auditability
Recommended use:
Letta = persistent agent memory
Cognee/Graphiti/GraphRAG = organizational knowledge graph
Hybrid RAG = grounded answers
Workflow engine = task execution state
For a community/legal/government organization, a normal CRM is not enough. The system must know who people are, what they can do, what they are allowed to do, and when they are available.
Core entities:
Person
Organization
Team
Role
Capability
Credential
Jurisdiction
Language
Location
Availability
Consent
Case
Task
Interaction
Outcome
Trust / reliability signal
Conflict / restriction
Training / certification
Routing is relational:
Find someone who can do this task,
in this jurisdiction,
in this language,
with this credential,
with user consent,
without a conflict,
within this SLA,
who is not overloaded,
and who has a reliable completion history.
This should be implemented as hard filters plus ranking.
Hard filters:
permission
consent
jurisdiction
credential requirement
conflict of interest
case sensitivity
language minimum
availability
legal/age constraints if relevant
data access rights
Ranking features:
skill match
jurisdiction match
language match
availability
current workload
historical reliability
relationship to community/org
geographic proximity
training goals
urgency/SLA
The agent should not autonomously make rights-affecting, legal, eligibility, denial, or government-status decisions.
Recommended action-risk ladder:
| Tier | Agent may do | Approval needed? |
|---|---|---|
| T0: Read-only | Search knowledge, summarize approved docs, answer with citations | Usually no, but log |
| T1: Draft | Draft emails, memos, task notes, summaries | Review before sending |
| T2: Internal update | Create tasks, update CRM fields, classify cases | Policy-dependent; reversible queue preferred |
| T3: User-facing action | Message users, request docs, offer tasks, schedule calls | Approval for sensitive contexts |
| T4: High-impact action | Submit forms, legal/government recommendations, escalation | Human professional approval required |
| T5: Restricted | Deny services, make final legal conclusions, impersonate authority | No autonomous execution |
Principle:
Agents propose, draft, route, and monitor.
Humans approve high-impact actions.
+---------------------------------------------------------------+
| User Interfaces |
| public portal | internal cockpit | mobile | WhatsApp/Slack/etc |
+-------------------------------+-------------------------------+
|
+-------------------------------v-------------------------------+
| Identity, Consent, Permissions |
| Keycloak / SSO / OpenFGA / role + case ACLs |
+-------------------------------+-------------------------------+
|
+-------------------------------v-------------------------------+
| CRM + Capability + Case System |
| people | orgs | roles | skills | credentials | availability |
| geography | language | trust | consent | workload | history |
+-------------------------------+-------------------------------+
|
+-------------------------------v-------------------------------+
| Workflow + Task Engine |
| durable workflows | approvals | SLAs | reminders | audit logs |
+-------------------------------+-------------------------------+
|
+-------------------------------v-------------------------------+
| Agent Runtime |
| planner | retriever | drafter | router | verifier | monitor |
| human gates | tool permissions | memory | traces |
+-------------------------------+-------------------------------+
|
+-------------------------------v-------------------------------+
| Knowledge System |
| raw sources | parsed docs | hybrid search | rerankers | graph |
| wiki/memory | rules-as-code | citations | contradiction tracking|
+-------------------------------+-------------------------------+
|
+-------------------------------v-------------------------------+
| Observability, Evals, Governance |
| groundedness | retrieval recall | citation precision | audit |
| safety tests | prompt/tool traces | model/version lineage |
+---------------------------------------------------------------+
| Layer | Recommended default | Why |
|---|---|---|
| Identity / SSO | Keycloak | Mature open-source identity and access management |
| Fine-grained auth | OpenFGA | Relationship-based authorization for cases, docs, tasks, teams |
| CRM / civic ops | CiviCRM | Strong fit for nonprofits, civic orgs, contacts, campaigns, cases, activities |
| Volunteer/task ops | CiviVolunteer + custom capability graph | Skills, roles, shifts, availability, routing |
| Durable workflows | Temporal | Long-running workflows, retries, crash recovery, human pauses |
| Human/BPMN workflows | Camunda | Useful for visible human tasks, BPMN/DMN, approval processes |
| Agent orchestration | LangGraph | Stateful, multi-step agents with human-in-the-loop support |
| Parsing | Docling | Strong open-source document parsing for PDFs and structured docs |
| RAG platform | Haystack, RAGFlow, or thin custom layer | Production RAG pipelines and document-centric retrieval |
| Search | Qdrant and/or OpenSearch | Hybrid vector + sparse/BM25 search |
| Embeddings | BGE-M3 or Qwen3-Embedding | Strong multilingual and hybrid retrieval baseline |
| Reranking | BGE-reranker, Qwen3-Reranker, Mixedbread | Often the biggest retrieval-quality jump |
| Late interaction | ColBERT / RAGatouille / PyLate | For legal/code/science/exact-distractor-heavy retrieval |
| Compounding memory | Cognee and/or Graphiti | Graph/vector memory and temporal knowledge tracking |
| Agent memory | Letta | Persistent agent state and memory |
| Rules-as-code | docassemble, OpenFisca, DMN/Drools | Deterministic legal/government logic and guided interviews |
| Observability/eval | Phoenix, Ragas, DeepEval, custom harness | Tracing, groundedness, retrieval, citation, regression tests |
| Community | Discourse / Zulip | Public community knowledge and internal threaded operations |
A pragmatic path is hybrid: open-source control over core state, commercial frontier LLMs where quality matters.
CRM/case/community:
CiviCRM + CiviCase + CiviVolunteer
custom capability graph service
Identity and permissions:
Keycloak + OpenFGA
Workflow:
Temporal for durable workflows
LangGraph for agent workflows
optional Camunda for BPMN/DMN governance
Knowledge:
Docling for parsing
Haystack or RAGFlow for RAG pipeline
Qdrant for hybrid vector/sparse retrieval
OpenSearch if legal/keyword search is central
BGE-M3 / Qwen embeddings
reranker service
Cognee or Graphiti for graph/memory
Git-backed LLM Wiki for reviewed strategic knowledge
Rules/legal:
docassemble for guided interviews and document assembly
OpenFisca/PolicyEngine or DMN for eligibility/process rules
Observability:
Phoenix
Ragas or DeepEval
custom golden datasets
OpenTelemetry traces
LLM:
commercial frontier model early for quality
open-weight/private serving later for cost/privacy-sensitive workloads
collects structured facts
classifies request
detects urgency/risk
asks missing questions
creates or updates CRM/case record
routes to the right workflow
answers questions from approved sources
shows citations
states uncertainty
refuses when evidence is insufficient
matches tasks to people
applies hard filters
ranks candidates
explains assignment
offers tasks
monitors acceptance and completion
summarizes case history
drafts next actions
prepares documents
checks deadlines
flags contradictions
suggests escalation
summarizes weekly patterns
detects bottlenecks
tracks community needs
finds repeated legal/government issues
updates LLM Wiki / knowledge graph
prepares leadership memos
Recommended layers:
1. Raw source layer
- immutable files, laws, policies, transcripts, emails, case notes
- source ID, hash, date, author, jurisdiction, ACL
2. Parsed / indexed layer
- parsed text
- chunks with page, section, span
- BM25 + vector search
- reranking
3. Compiled knowledge layer
- LLM Wiki pages
- entity pages
- concept pages
- procedure pages
- contradiction pages
- decision records
- strategy memos
4. Graph / memory layer
- entities
- relationships
- temporal facts
- provenance
- capabilities
- cases
- policy changes
Recommended wiki structure:
/wiki/entities/
/wiki/legal_topics/
/wiki/government_programs/
/wiki/procedures/
/wiki/community_needs/
/wiki/contradictions/
/wiki/strategy/
/wiki/decision_records/
/wiki/open_questions/
Every wiki page should include:
status: draft / reviewed / approved / stale
owner: person/team
last_verified: date
source_ids: [...]
jurisdiction: [...]
confidence: high/medium/low
| Research direction | Practical implication |
|---|---|
| ReAct | Agents should reason, call tools, observe results, then continue |
| Toolformer | Design clean tools/APIs that agents can call safely |
| Reflexion | Store after-action reviews, but do not treat them as truth |
| Generative Agents | Persistent memory helps long-running agents model context over time |
| MemGPT / Letta | Treat context as memory hierarchy; page between active context and storage |
| Contextual Retrieval | Add document-specific context to chunks before retrieval |
| RAPTOR | Use hierarchical summary trees for long documents and legal corpora |
| GraphRAG | Extract entities/relationships for multi-hop and relational questions |
| LightRAG | Combine graph + vector retrieval with incremental updates |
| HippoRAG | Explore graph-based multi-hop retrieval over related facts |
| RAFT | Fine-tune later so models use retrieved evidence better amid distractors |
| RULER | Evaluate long-context and multi-hop behavior beyond simple needle tests |
Hyperscaler-native options:
Azure:
Azure AI Search + Azure AI Foundry + Azure OpenAI + Document Intelligence
Google:
Vertex AI Agent Builder + Vector Search + Ranking API + Gemini grounding
AWS:
Bedrock Knowledge Bases + Bedrock rerankers + OpenSearch/Aurora/Pinecone/Redis + Textract
Databricks:
Mosaic AI Vector Search + Unity Catalog + Agent Framework + MLflow/Agent Evaluation
Turnkey enterprise options:
Glean:
enterprise workplace search and knowledge assistant
Moveworks:
employee support and workflow automation
Coveo:
enterprise/search-first relevance and generated answers
Vectara:
turnkey grounded RAG API
Best-of-breed commercial stack:
LlamaParse or Unstructured
-> Pinecone / Elastic / Zilliz / MongoDB Atlas / Weaviate Cloud
-> Cohere or Voyage embeddings/rerankers
-> Claude / GPT / Gemini / Mistral / Cohere
-> LangSmith / Arize / Braintrust / Galileo
Build open-source/hybrid if:
your workflows are unique
you need source-level grounding
you need control over legal/government logic
you need data sovereignty
you want organizational knowledge as a strategic asset
Buy commercial if:
you mainly need CRM + support workflows
you already use Salesforce/Microsoft deeply
you lack engineering capacity
time-to-market is more important than architecture control
Best recommendation for this case:
Hybrid leaning open-source.
Use commercial LLMs where quality is strategic, but keep identity, permissions, workflows, CRM, source store, and eval logs under your control.
define risk tiers
define data model
choose CRM
choose identity/permission model
choose 2-3 pilot workflows
collect source corpus
build golden evaluation set
design approval policy
Deliverables:
capability profile schema
case schema
task schema
permission model
source/citation model
agent action policy
launch internal knowledge Q&A
launch case intake assistant
launch task routing recommendations
integrate CRM and workflow engine
add human approval queue
instrument traces/evals
Deliverables:
internal agent cockpit
retrieval/citation pipeline
CRM integration
task recommendation engine
approval workflow
basic dashboards
add Cognee or Graphiti
create LLM Wiki
add weekly strategy memos
add contradiction/stale-claim checks
add rule engine for one workflow
add user-facing portal beta
Deliverables:
knowledge graph
reviewed wiki pages
decision records
rules-as-code prototype
public Q&A beta with citations
retrieval recall@k
citation precision
grounded answer rate
unsupported-claim rate
stale-source rate
contradiction detection rate
time-to-update after policy change
time to triage
time to first human response
time to resolution
SLA breach rate
reopen rate
escalation rate
caseworker hours saved
assignment acceptance rate
completion rate
on-time rate
quality review score
wrong-assignment rate
workload balance
volunteer/staff burnout indicators
accessibility issues
language coverage
geographic coverage
user satisfaction
intake drop-off
successful referrals
approval bypass attempts
policy violations blocked
sensitive tool calls
permission-denied events
audit completeness
model/version regression failures
CRM decides who exists and what they may do.
Workflow engine decides what must happen.
Rules engine decides deterministic legal/government logic.
Knowledge system provides grounded evidence.
Agent proposes, drafts, routes, monitors, and explains.
Humans approve high-impact actions.
Evals and audit decide whether the system is trustworthy.
The strategic takeaway:
Long context is a capability.
RAG is a grounding mechanism.
LLM Wiki is a compounding-memory pattern.
Cognee/Graphiti are graph-memory infrastructure.
Letta is persistent agent memory.
Temporal/LangGraph are execution infrastructure.
CiviCRM/OpenFGA/Keycloak are organizational state and control.
Rules-as-code keeps high-stakes logic deterministic.
The product should combine all of these into an Agentic Knowledge + Case OS.