Skip to content

Instantly share code, notes, and snippets.

@bhtucker
bhtucker / hogwild_logistic_reg.py
Last active April 23, 2019 22:51
Use a Hogwild-inspired algorithm to learn logistic regression over a sample dataset in parallel
"""
Demo of using Hogwild algorthim for parallel learning with shared memory
Uses sklearn's LogisticRegression for accuracy comparison
Output
('initial accuracy:', 0.45333333333333331)
worker 25974 score 0.93
worker 25975 score 0.92
worker 25976 score 0.88
worker 25974 score 0.94
@karpathy
karpathy / pg-pong.py
Created May 30, 2016 22:50
Training a Neural Network ATARI Pong agent with Policy Gradients from raw pixels
""" Trains an agent with (stochastic) Policy Gradients on Pong. Uses OpenAI Gym. """
import numpy as np
import cPickle as pickle
import gym
# hyperparameters
H = 200 # number of hidden layer neurons
batch_size = 10 # every how many episodes to do a param update?
learning_rate = 1e-4
gamma = 0.99 # discount factor for reward
@vasanthk
vasanthk / System Design.md
Last active March 20, 2026 17:25
System Design Cheatsheet

System Design Cheatsheet

Picking the right architecture = Picking the right battles + Managing trade-offs

Basic Steps

  1. Clarify and agree on the scope of the system
  • User cases (description of sequences of events that, taken together, lead to a system doing something useful)
    • Who is going to use it?
    • How are they going to use it?
@mbastian
mbastian / writepaldbstore.scala
Created October 26, 2015 16:33
writepaldbstore.scala
val writer: StoreWriter = PalDB.createWriter(new File("store.paldb"));
writer.put("foo", "bar");
writer.put(1213, Array(1, 2, 3));
writer.close();
# GUIDE
# https://spark.apache.org/docs/latest/sql-programming-guide.html#overview
###
## HOW TO START PYSPARK CONSOLE (copy and paste into terminal)
###
/opt/spark-1.3.1-bin-hadoop2.4/bin/pyspark --master yarn-client --num-executors 34 --spark.yarn.executor.memoryOverhead 2000 --spark.executor.memory 4g --spark.shuffle.spill true --spark.shuffle.memoryFraction .6 --spark.storage.memoryFraction .6 --spark.driver.memory 4g
###
###
Q: what book should i use to learn ML?
A: use several, and find the one that speaks to you.
the list below assumes you know a bit of math but
are not very mathematical, and are interested in learning
enough to be practical. that is, it is not at the
mathematical level of MIJ's alleged list
(cf. https://news.ycombinator.com/item?id=1055389 )
@azymnis
azymnis / KMeansJob.scala
Created October 23, 2014 23:07
K-Means in scalding
import com.twitter.algebird.{Aggregator, Semigroup}
import com.twitter.scalding._
import scala.util.Random
/**
* This job is a tutorial of sorts for scalding's Execution[T] abstraction.
* It is a simple implementation of Lloyd's algorithm for k-means on 2D data.
*
* http://en.wikipedia.org/wiki/K-means_clustering
@MishaelRosenthal
MishaelRosenthal / FCBF.scala
Last active September 25, 2018 11:24
Implements the Fast Correlation Based Filter algorithm for feature selection.Conference version: http://machinelearning.wustl.edu/mlpapers/paper_files/icml2003_YuL03.pdfJournal version: http://machinelearning.wustl.edu/mlpapers/paper_files/YuL04.pdf
package com.liveperson.lpbt.research.hadoop.examples
import scala.annotation.tailrec
/**
* User: mishaelr
* Date: 7/11/13
* Time: 10:33 AM
*/
object FCBF extends App{
@4np
4np / install-commit-hook.sh
Last active January 2, 2016 09:39
This gist will install a Git pre commit hook that will play a 'push it' sample from the Salt N Peppa song… :P
#!/bin/sh
# get global git template dir
if [ `git config --global init.templatedir` ]; then
TEMPLATE_DIR=`git config --global init.templatedir`
else
# there is no git template dir yet, set it
git config --global init.templatedir ~/.git
TEMPLATE_DIR=~/.git
fi
@azymnis
azymnis / ItemSimilarity.scala
Created December 13, 2013 05:17
Approximate item similarity using LSH in Scalding.
import com.twitter.scalding._
import com.twitter.algebird.{ MinHasher, MinHasher32, MinHashSignature }
/**
* Computes similar items (with a string itemId), based on approximate
* Jaccard similarity, using LSH.
*
* Assumes an input data TSV file of the following format:
*
* itemId userId