Skip to content

Instantly share code, notes, and snippets.

@jimsrc
jimsrc / gpt4_text_compression.md
Last active July 23, 2025 20:37
Minimizing the number of tokens usage to interact with GPT-4.

Overview

I just read this trick for text compression, in order to save tokens in subbsequent interactions during a long conversation, or in a subsequent long text to summarize.

SHORT VERSION:

It's useful to give a mapping between common words (or phrases) in a given long text that one intends to pass later. Then pass that long text to gpt-4 but encoded with such mapping. The idea is that the encoded version contains less tokens than the original text. There are several algorithms to identify frequent words or phrases inside a given text, such as NER, TF-IDF, part-of-speech (POS) tagging, etc.

@VictorTaelin
VictorTaelin / gpt4_abbreviations.md
Last active December 13, 2025 10:50
Notes on the GPT-4 abbreviations tweet

Notes on this tweet.

  • The screenshots were taken on different sessions.

  • The entire sessions are included on the screenshots.

  • I lost the original prompts, so I had to reconstruct them, and still managed to reproduce.

  • The "compressed" version is actually longer! Emojis and abbreviations use more tokens than common words.

How to setup a practically free CDN using Backblaze B2 and Cloudflare

⚠️ Note 2023-01-21
Some things have changed since I originally wrote this in 2016. I have updated a few minor details, and the advice is still broadly the same, but there are some new Cloudflare features you can (and should) take advantage of. In particular, pay attention to Trevor Stevens' comment here from 22 January 2022, and Matt Stenson's useful caching advice. In addition, Backblaze, with whom Cloudflare are a Bandwidth Alliance partner, have published their own guide detailing how to use Cloudflare's Web Workers to cache content from B2 private buckets. That is worth reading,

@anshoomehra
anshoomehra / parsing10k.ipynb
Last active November 25, 2025 00:38
How to Parse 10-K Report from EDGAR (SEC)
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@W4ngatang
W4ngatang / download_glue_data.py
Last active October 21, 2025 02:22
Script for downloading data of the GLUE benchmark (gluebenchmark.com)
''' Script for downloading all GLUE data.
Note: for legal reasons, we are unable to host MRPC.
You can either use the version hosted by the SentEval team, which is already tokenized,
or you can download the original data from (https://download.microsoft.com/download/D/4/6/D46FF87A-F6B9-4252-AA8B-3604ED519838/MSRParaphraseCorpus.msi) and extract the data from it manually.
For Windows users, you can run the .msi file. For Mac and Linux users, consider an external library such as 'cabextract' (see below for an example).
You should then rename and place specific files in a folder (see below for an example).
mkdir MRPC
cabextract MSRParaphraseCorpus.msi -d MRPC
@GuilloOme
GuilloOme / background.js
Last active March 24, 2023 20:05
Puppeteer (v.0.12.0) navigation blocking workaround
(function() {
'use strict';
// keep track of all the opened tab
let tabs = {};
// Get all existing tabs
chrome.tabs.query({}, function(results) {
results.forEach(function(tab) {
tabs[tab.id] = tab;
@aparrish
aparrish / spacy_intro.ipynb
Last active April 30, 2026 08:59
NLP Concepts with spaCy. Code examples released under CC0 https://creativecommons.org/choose/zero/, other text released under CC BY 4.0 https://creativecommons.org/licenses/by/4.0/
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.