Skip to content

Instantly share code, notes, and snippets.

@veekaybee
veekaybee / normcore-llm.md
Last active March 16, 2026 11:27
Normcore LLM Reads

Anti-hype LLM reading list

Goals: Add links that are reasonable and good explanations of how stuff works. No hype and no vendor content if possible. Practical first-hand accounts of models in prod eagerly sought.

Foundational Concepts

Screenshot 2023-12-18 at 10 40 27 PM

Pre-Transformer Models

@j-adamczyk
j-adamczyk / knn_with_faiss.py
Created September 12, 2020 12:14
k nearest neighbors classifier with faiss library
import numpy as np
import faiss
class FaissKNeighbors:
def __init__(self, k=5):
self.index = None
self.y = None
self.k = k
@koreyou
koreyou / bm25.py
Created November 1, 2019 05:26
Implementation of OKapi BM25 with sklearn's TfidfVectorizer
""" Implementation of OKapi BM25 with sklearn's TfidfVectorizer
Distributed as CC-0 (https://creativecommons.org/publicdomain/zero/1.0/)
"""
import numpy as np
from sklearn.feature_extraction.text import TfidfVectorizer
from scipy import sparse
class BM25(object):
@ines
ines / streamlit_prodigy.py
Created October 3, 2019 20:37
Streamlit + Prodigy
"""
Example of a Streamlit app for an interactive Prodigy dataset viewer that also lets you
run simple training experiments for NER and text classification.
Requires the Prodigy annotation tool to be installed: https://prodi.gy
See here for details on Streamlit: https://streamlit.io.
"""
import streamlit as st
from prodigy.components.db import connect
from prodigy.models.ner import EntityRecognizer, merge_spans, guess_batch_size
@ruxi
ruxi / jointplot_w_hue.py
Last active March 13, 2021 23:39
jointplot_w_hue
__author__ = "lewis.r.liu@gmail.com"
__copyright__ = "Copyright 2020, 2018, https://gist.github.com/ruxi/ff0e9255d74a3c187667627214e1f5fa"
__license__ = "MIT"
__version__ = "0.0.2"
# update: June 13, 2020
# created: Feb 19, 2018
# desc: seaborn jointplot with 'hue'
# prepared for issue: https://github.com/mwaskom/seaborn/issues/365
# resolved (22 Aug 2020): https://github.com/mwaskom/seaborn/pull/2210
# Taken from https://forums.aws.amazon.com/thread.jspa?messageID=332091
sudo su -
cd /usr/local/bin
mkdir ffmpeg
cd ffmpeg
wget http://ffmpeg.gusari.org/static/64bit/ffmpeg.static.64bit.latest.tar.gz
tar -xzf ffmpeg.static.64bit.latest.tar.gz
@josephmisiti
josephmisiti / helloevolve.py
Created November 11, 2016 18:19
helloevolve.py - a simple genetic algorithm in Python
"""
helloevolve.py implements a genetic algorithm that starts with a base
population of randomly generated strings, iterates over a certain number of
generations while implementing 'natural selection', and prints out the most fit
string.
The parameters of the simulation can be changed by modifying one of the many
global variables. To change the "most fit" string, modify OPTIMAL. POP_SIZE
controls the size of each generation, and GENERATIONS is the amount of
generations that the simulation will loop through before returning the fittest
@joshlk
joshlk / faster_toPandas.py
Last active September 19, 2025 16:11
PySpark faster toPandas using mapPartitions
import pandas as pd
def _map_to_pandas(rdds):
""" Needs to be here due to pickling issues """
return [pd.DataFrame(list(rdds))]
def toPandas(df, n_partitions=None):
"""
Returns the contents of `df` as a local `pandas.DataFrame` in a speedy fashion. The DataFrame is
repartitioned if `n_partitions` is passed.
@baraldilorenzo
baraldilorenzo / readme.md
Last active September 13, 2025 12:17
VGG-16 pre-trained model for Keras

##VGG16 model for Keras

This is the Keras model of the 16-layer network used by the VGG team in the ILSVRC-2014 competition.

It has been obtained by directly converting the Caffe model provived by the authors.

Details about the network architecture can be found in the following arXiv paper:

Very Deep Convolutional Networks for Large-Scale Image Recognition

K. Simonyan, A. Zisserman

@brentp
brentp / betareg.py
Last active September 17, 2024 14:48
beta regression in statsmodels
# -*- coding: utf-8 -*-
u"""
Beta regression for modeling rates and proportions.
References
----------
Grün, Bettina, Ioannis Kosmidis, and Achim Zeileis. Extended beta regression
in R: Shaken, stirred, mixed, and partitioned. No. 2011-22. Working Papers in
Economics and Statistics, 2011.