Skip to content

Instantly share code, notes, and snippets.

View bananemure's full-sized avatar

Stef Enigmatik bananemure

View GitHub Profile
@bananemure
bananemure / llm-wiki.md
Created April 6, 2026 16:44 — forked from karpathy/llm-wiki.md
llm-wiki

LLM Wiki

A pattern for building personal knowledge bases using LLMs.

This is an idea file, it is designed to be copy pasted to your own LLM Agent (e.g. OpenAI Codex, Claude Code, OpenCode / Pi, or etc.). Its goal is to communicate the high level idea, but your agent will build out the specifics in collaboration with you.

The core idea

Most people's experience with LLMs and documents looks like RAG: you upload a collection of files, the LLM retrieves relevant chunks at query time, and generates an answer. This works, but the LLM is rediscovering knowledge from scratch on every question. There's no accumulation. Ask a subtle question that requires synthesizing five documents, and the LLM has to find and piece together the relevant fragments every time. Nothing is built up. NotebookLM, ChatGPT file uploads, and most RAG systems work this way.

@bananemure
bananemure / microgpt.py
Created February 12, 2026 17:41 — forked from karpathy/microgpt.py
microgpt
"""
The most atomic way to train and inference a GPT in pure, dependency-free Python.
This file is the complete algorithm.
Everything else is just efficiency.
@karpathy
"""
import os # os.path.exists
import math # math.log, math.exp
@bananemure
bananemure / building-sync-systems.md
Created April 26, 2025 17:04 — forked from pesterhazy/building-sync-systems.md
Building an offline realtime sync engine

So you want to write a sync system for a web app with offline and realtime support? Good luck. You might find the following resources useful.

Overview articles

@bananemure
bananemure / get_memory_size.py
Created January 16, 2025 16:56 — forked from philschmid/get_memory_size.py
Get needed GPU per precision for a Hugging Face Model Id
from typing import Dict, Union
from huggingface_hub import get_safetensors_metadata
import argparse
import sys
# Example:
# python get_gpu_memory.py Qwen/Qwen2.5-7B-Instruct
# Dictionary mapping dtype strings to their byte sizes
bytes_per_dtype: Dict[str, float] = {
import asyncio
import base64
import json
import os
import pyaudio
from websockets.asyncio.client import connect
class SimpleGeminiVoice:
def __init__(self):
@bananemure
bananemure / flux-dev-under-8gbs.py
Created October 20, 2024 18:00 — forked from ariG23498/flux-dev-under-8gbs.py
Run FLUX Dev under 8gbs of VRAM.
# Taken from: https://gist.github.com/sayakpaul/23862a2e7f5ab73dfdcc513751289bea
from diffusers import FluxPipeline, FluxTransformer2DModel
from transformers import T5EncoderModel
import torch
import gc
def flush():
gc.collect()
@bananemure
bananemure / export_locally.py
Created October 15, 2024 14:07 — forked from tomaarsen/export_locally.py
Export Sentence Transformer models to ONNX (+ optimization, quantization) & OpenVINO
# requires sentence_transformers>=3.2.0
from sentence_transformers import SentenceTransformer, export_optimized_onnx_model, export_dynamic_quantized_onnx_model
# The model to export to ONNX (+ optimize, quantize), OpenVINO
model_id = "mixedbread-ai/mxbai-embed-large-v1"
# Where to save the exported models locally
output_dir = model_id.replace("/", "-")
onnx_model = SentenceTransformer(model_id, backend="onnx", model_kwargs={"export": True})
onnx_model.save_pretrained(output_dir)
Understand the Task: Grasp the main objective, goals, requirements, constraints, and expected output.
- Minimal Changes: If an existing prompt is provided, improve it only if it's simple. For complex prompts, enhance clarity and add missing elements without altering the original structure.
- Reasoning Before Conclusions: Encourage reasoning steps before any conclusions are reached. ATTENTION! If the user provides examples where the reasoning happens afterward, REVERSE the order! NEVER START EXAMPLES WITH CONCLUSIONS!
- Reasoning Order: Call out reasoning portions of the prompt and conclusion parts (specific fields by name). For each, determine the ORDER in which this is done, and whether it needs to be reversed.
- Conclusion, classifications, or results should ALWAYS appear last.
- Examples: Include high-quality examples if helpful, using placeholders [in brackets] for complex elements.
- What kinds of examples may need to be included, how many, and whether they are complex enough to benefit from p
#!/usr/bin/env python3
import asyncio
import time
import aiohttp
START = time.monotonic()
@bananemure
bananemure / ANSI-color-codes.h
Created July 21, 2023 20:02 — forked from RabaDabaDoba/ANSI-color-codes.h
The entire table of ANSI color codes working in C!
/*
* This is free and unencumbered software released into the public domain.
*
* For more information, please refer to <https://unlicense.org>
*/
//Regular text
#define BLK "\e[0;30m"
#define RED "\e[0;31m"
#define GRN "\e[0;32m"