Skip to content

Instantly share code, notes, and snippets.

@vishwambhar
vishwambhar / latency.txt
Created September 25, 2016 06:15 — forked from jboner/latency.txt
Latency Numbers Every Programmer Should Know
Latency Comparison Numbers
--------------------------
L1 cache reference 0.5 ns
Branch mispredict 5 ns
L2 cache reference 7 ns 14x L1 cache
Mutex lock/unlock 25 ns
Main memory reference 100 ns 20x L2 cache, 200x L1 cache
Compress 1K bytes with Zippy 3,000 ns 3 us
Send 1K bytes over 1 Gbps network 10,000 ns 10 us
Read 4K randomly from SSD* 150,000 ns 150 us ~1GB/sec SSD
import pandas as pd
from ggplot import *
from sklearn.datasets import fetch_20newsgroups
from sklearn.metrics import roc_curve
# vectorizer
from sklearn.feature_extraction.text import HashingVectorizer
# our classifiers
from sklearn.naive_bayes import BernoulliNB, MultinomialNB
#!/usr/bin/env ruby
require "date"
five_days_ago = Date.parse(Time.now.to_s) - 5
IO.popen("hadoop fs -lsr /tmp").each_line do |line|
permissions,replication,user,group,size,mod_date,mod_time,path = *line.split(/\s+/)
if (mod_date)
if Date.parse(mod_date.to_s) < five_days_ago
puts line
if permissions.split('')[0] == 'd'
hadoop fs -cat /Work/lon_text/lon_order_data_t/cdw320_lon_order_data_t.1.txt | head -100 | gzip > test.csv.gz
cat cdw320_lon_order_data_t.1.txt | head -100 | gzip > ../../tsnyder/cdw320_lon_order_data_t.1.txt.gz
hadoop fs -cat /Work/tsnyder/cdw320_lon_order_data_t.1.txt.gz | gunzip