Skip to content

Instantly share code, notes, and snippets.

@jcrist
jcrist / bench.py
Last active January 28, 2024 11:59
Vaex String benchmarks, updated with dask fixes
import vaex
import numpy as np
import dask.dataframe as dd
import dask
import dask.distributed
import json
import os
import time
import argparse
import multiprocessing
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@chrisdpa-tvx
chrisdpa-tvx / athena.rst
Last active March 20, 2022 06:28
Create an Athena database, table, and query

All Your Data Does Not Belong In a Database

Businesses are machines producing mountains of data about sales, usage, customer, costs, etc... Traditionally data processing is highly centralised with teams of staff and computer running hot a whirling ready to process. We can do better than moving the mountain of data into the corporate data machine - so long as that machinary is light enough to be moved to the data.

Don't move the mountain - Bring the processing to the data

We've had this problem; a huge directory of files in CSV format, conataining vital information for our business. But it's in CSV, requires analysis, and don't you don't feel like learning sed/grep/awk today - besides it's 2017 and no-one thinks those tools are easy to use.

@balupton
balupton / README.md
Created March 27, 2014 05:29
Remove script for Gmail that delets all email threads/messages that match search for when Gmail can't do it itself

Remove script for Gmail

function Intialize() {
  return;
}

function Install() {
  ScriptApp.newTrigger("purgeGmail")
 .timeBased().everyMinutes(10).create();