ezgz jibay

Overview

I just read this trick for text compression, in order to save tokens in subbsequent interactions during a long conversation, or in a subsequent long text to summarize.

SHORT VERSION:

It's useful to give a mapping between common words (or phrases) in a given long text that one intends to pass later. Then pass that long text to gpt-4 but encoded with such mapping. The idea is that the encoded version contains less tokens than the original text. There are several algorithms to identify frequent words or phrases inside a given text, such as NER, TF-IDF, part-of-speech (POS) tagging, etc.

Notes on this tweet.

The screenshots were taken on different sessions.
The entire sessions are included on the screenshots.
I lost the original prompts, so I had to reconstruct them, and still managed to reproduce.
The "compressed" version is actually longer! Emojis and abbreviations use more tokens than common words.

How to setup a practically free CDN using Backblaze B2 and Cloudflare

⚠️ Note 2023-01-21
Some things have changed since I originally wrote this in 2016. I have updated a few minor details, and the advice is still broadly the same, but there are some new Cloudflare features you can (and should) take advantage of. In particular, pay attention to Trevor Stevens' comment here from 22 January 2022, and Matt Stenson's useful caching advice. In addition, Backblaze, with whom Cloudflare are a Bandwidth Alliance partner, have published their own guide detailing how to use Cloudflare's Web Workers to cache content from B2 private buckets. That is worth reading,

	''' Script for downloading all GLUE data.

	Note: for legal reasons, we are unable to host MRPC.
	You can either use the version hosted by the SentEval team, which is already tokenized,
	or you can download the original data from (https://download.microsoft.com/download/D/4/6/D46FF87A-F6B9-4252-AA8B-3604ED519838/MSRParaphraseCorpus.msi) and extract the data from it manually.
	For Windows users, you can run the .msi file. For Mac and Linux users, consider an external library such as 'cabextract' (see below for an example).
	You should then rename and place specific files in a folder (see below for an example).

	mkdir MRPC
	cabextract MSRParaphraseCorpus.msi -d MRPC

	(function() {
	'use strict';

	// keep track of all the opened tab
	let tabs = {};

	// Get all existing tabs
	chrome.tabs.query({}, function(results) {
	results.forEach(function(tab) {
	tabs[tab.id] = tab;