Skip to content

Instantly share code, notes, and snippets.

View willzhqiang's full-sized avatar
💭
😊

Qiang willzhqiang

💭
😊
View GitHub Profile

LLM Wiki

A pattern for building personal knowledge bases using LLMs.

This is an idea file, it is designed to be copy pasted to your own LLM Agent (e.g. OpenAI Codex, Claude Code, OpenCode / Pi, or etc.). Its goal is to communicate the high level idea, but your agent will build out the specifics in collaboration with you.

The core idea

Most people's experience with LLMs and documents looks like RAG: you upload a collection of files, the LLM retrieves relevant chunks at query time, and generates an answer. This works, but the LLM is rediscovering knowledge from scratch on every question. There's no accumulation. Ask a subtle question that requires synthesizing five documents, and the LLM has to find and piece together the relevant fragments every time. Nothing is built up. NotebookLM, ChatGPT file uploads, and most RAG systems work this way.

@kevinjalbert
kevinjalbert / import-raindrop-highlights-into-readwise.rb
Last active February 28, 2025 22:51
Import Raindrop.io Highlights into Readwise
#!/usr/bin/env ruby
require "httparty"
require "nokogiri"
require "open-uri"
require "uri"
RAINDROP_AUTH_TOKEN="xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx"
READWISE_AUTH_TOKEN="xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
LAST_SAVED_HIGHLIGHT=000000000 # <- Keeps track of import position (Updates automatically)
Dependences:
sudo apt-get update
sudo apt-get install build-essential
apt-get install python-dev
sudo pip install -U setuptools
Steps:
download from https://mrjbq7.github.io/ta-lib/install.html
@njam
njam / asyncio_pool.py
Created October 13, 2017 18:24
Limit number of concurrently running asyncio tasks
import asyncio
from collections import deque
class AsyncioPool:
def __init__(self, concurrency, loop=None):
"""
@param loop: asyncio loop
@param concurrency: Maximum number of concurrently running tasks
"""
from gensim.models import KeyedVectors
# Load gensim word2vec
w2v_path = '<Gensim File Path>'
w2v = KeyedVectors.load_word2vec_format(w2v_path)
import io
# Vector file, `\t` seperated the vectors and `\n` seperate the words
"""
import tensorflow as tf
import numpy as np
corpus_raw = 'He is the king . The king is royal . She is the royal queen '
# convert to lower case
corpus_raw = corpus_raw.lower()
words = []
for word in corpus_raw.split():
@mdespriee
mdespriee / LDAIncrementalExample.scala
Created June 29, 2017 19:13
Example of how to build LDA incrementally in Spark, with comparison to one-shot learning.
// This code is related to PR https://github.com/apache/spark/pull/17461
// I show how to use the setInitialModel() param of LDA to build a model incrementally,
// and I compare the performance (perplexity) with a model built in one-shot
import scala.collection.mutable
import org.apache.spark.ml.{Pipeline, PipelineModel}
import org.apache.spark.ml.clustering.{LDA, LDAModel}
@codspire
codspire / making-zeppelin-work-on-windows.md
Last active December 2, 2021 03:54
Making Zeppelin, Spark, pyspark work on Windows

Zeppelin, Spark, PySpark Setup on Windows (10)

I wish running Zeppelin on windows wasn't as hard as it is. Things go haiwire if you already have Spark installed on your computer. Zeppelin's embedded Spark interpreter does not work nicely with existing Spark and you may need to perform below steps (hacks!) to make it work. I am hoping that these will be fixed in newer Zeppelin versions.

If you try to run Zeppelin after extracting the package, you might encounter "The filename, directory name, or volume label syntax is incorrect."

Google search landed me to https://issues.apache.org/jira/browse/ZEPPELIN-1584, this link was helpful but wasn't enough to get Zeppelin working.

Below is what I had to do to make it work on my Windows 10 computer.

source ~/.vimrc
set visualbell
set noerrorbells
set surround
set relativenumber
" disable mappings from .vimrc
inoremap <C-U> <C-U>
inoremap <CR> <CR>
@thehesiod
thehesiod / async_worker_pool.py
Last active September 26, 2024 03:14
Asynchronous Worker Pool, allows for limiting number of concurrent tasks
import asyncio
from datetime import datetime, timezone
import os
def utc_now():
# utcnow returns a naive datetime, so we have to set the timezone manually <sigh>
return datetime.utcnow().replace(tzinfo=timezone.utc)
class Terminator:
pass