Stef Enigmatik bananemure

LLM Wiki

A pattern for building personal knowledge bases using LLMs.

This is an idea file, it is designed to be copy pasted to your own LLM Agent (e.g. OpenAI Codex, Claude Code, OpenCode / Pi, or etc.). Its goal is to communicate the high level idea, but your agent will build out the specifics in collaboration with you.

The core idea

Most people's experience with LLMs and documents looks like RAG: you upload a collection of files, the LLM retrieves relevant chunks at query time, and generates an answer. This works, but the LLM is rediscovering knowledge from scratch on every question. There's no accumulation. Ask a subtle question that requires synthesizing five documents, and the LLM has to find and piece together the relevant fragments every time. Nothing is built up. NotebookLM, ChatGPT file uploads, and most RAG systems work this way.

So you want to write a sync system for a web app with offline and realtime support? Good luck. You might find the following resources useful.

Overview articles

Database in a browser, a spec (Stepan Parunashvili)

What problem are we trying to solve with a sync system?
The web of tomorrow (Nikita Prokopov)

	"""
	The most atomic way to train and inference a GPT in pure, dependency-free Python.
	This file is the complete algorithm.
	Everything else is just efficiency.

	@karpathy
	"""

	import os # os.path.exists
	import math # math.log, math.exp

	from typing import Dict, Union
	from huggingface_hub import get_safetensors_metadata
	import argparse
	import sys

	# Example:
	# python get_gpu_memory.py Qwen/Qwen2.5-7B-Instruct

	# Dictionary mapping dtype strings to their byte sizes
	bytes_per_dtype: Dict[str, float] = {

	import asyncio
	import base64
	import json
	import os
	import pyaudio
	from websockets.asyncio.client import connect


	class SimpleGeminiVoice:
	def __init__(self):

	# Taken from: https://gist.github.com/sayakpaul/23862a2e7f5ab73dfdcc513751289bea

	from diffusers import FluxPipeline, FluxTransformer2DModel
	from transformers import T5EncoderModel
	import torch
	import gc


	def flush():
	gc.collect()

	# requires sentence_transformers>=3.2.0
	from sentence_transformers import SentenceTransformer, export_optimized_onnx_model, export_dynamic_quantized_onnx_model

	# The model to export to ONNX (+ optimize, quantize), OpenVINO
	model_id = "mixedbread-ai/mxbai-embed-large-v1"
	# Where to save the exported models locally
	output_dir = model_id.replace("/", "-")

	onnx_model = SentenceTransformer(model_id, backend="onnx", model_kwargs={"export": True})
	onnx_model.save_pretrained(output_dir)

	Understand the Task: Grasp the main objective, goals, requirements, constraints, and expected output.
	- Minimal Changes: If an existing prompt is provided, improve it only if it's simple. For complex prompts, enhance clarity and add missing elements without altering the original structure.
	- Reasoning Before Conclusions: Encourage reasoning steps before any conclusions are reached. ATTENTION! If the user provides examples where the reasoning happens afterward, REVERSE the order! NEVER START EXAMPLES WITH CONCLUSIONS!
	- Reasoning Order: Call out reasoning portions of the prompt and conclusion parts (specific fields by name). For each, determine the ORDER in which this is done, and whether it needs to be reversed.
	- Conclusion, classifications, or results should ALWAYS appear last.
	- Examples: Include high-quality examples if helpful, using placeholders [in brackets] for complex elements.
	- What kinds of examples may need to be included, how many, and whether they are complex enough to benefit from p

	#!/usr/bin/env python3

	import asyncio
	import time

	import aiohttp

	START = time.monotonic()

	/*
	* This is free and unencumbered software released into the public domain.
	*
	* For more information, please refer to <https://unlicense.org>
	*/

	//Regular text
	#define BLK "\e[0;30m"
	#define RED "\e[0;31m"
	#define GRN "\e[0;32m"