Evan Casey evancasey

System Design Cheatsheet

Picking the right architecture = Picking the right battles + Managing trade-offs

User cases (description of sequences of events that, taken together, lead to a system doing something useful)
- Who is going to use it?
- How are they going to use it?

	"""
	Demo of using Hogwild algorthim for parallel learning with shared memory
	Uses sklearn's LogisticRegression for accuracy comparison

	Output
	('initial accuracy:', 0.45333333333333331)
	worker 25974 score 0.93
	worker 25975 score 0.92
	worker 25976 score 0.88
	worker 25974 score 0.94

	""" Trains an agent with (stochastic) Policy Gradients on Pong. Uses OpenAI Gym. """
	import numpy as np
	import cPickle as pickle
	import gym

	# hyperparameters
	H = 200 # number of hidden layer neurons
	batch_size = 10 # every how many episodes to do a param update?
	learning_rate = 1e-4
	gamma = 0.99 # discount factor for reward

	val writer: StoreWriter = PalDB.createWriter(new File("store.paldb"));
	writer.put("foo", "bar");
	writer.put(1213, Array(1, 2, 3));
	writer.close();

	# GUIDE
	# https://spark.apache.org/docs/latest/sql-programming-guide.html#overview

	###
	## HOW TO START PYSPARK CONSOLE (copy and paste into terminal)
	###
	/opt/spark-1.3.1-bin-hadoop2.4/bin/pyspark --master yarn-client --num-executors 34 --spark.yarn.executor.memoryOverhead 2000 --spark.executor.memory 4g --spark.shuffle.spill true --spark.shuffle.memoryFraction .6 --spark.storage.memoryFraction .6 --spark.driver.memory 4g
	###

	###

	Q: what book should i use to learn ML?

	A: use several, and find the one that speaks to you.

	the list below assumes you know a bit of math but
	are not very mathematical, and are interested in learning
	enough to be practical. that is, it is not at the
	mathematical level of MIJ's alleged list
	(cf. https://news.ycombinator.com/item?id=1055389 )

	import com.twitter.algebird.{Aggregator, Semigroup}
	import com.twitter.scalding._

	import scala.util.Random

	/**
	* This job is a tutorial of sorts for scalding's Execution[T] abstraction.
	* It is a simple implementation of Lloyd's algorithm for k-means on 2D data.
	*
	* http://en.wikipedia.org/wiki/K-means_clustering

	package com.liveperson.lpbt.research.hadoop.examples

	import scala.annotation.tailrec

	/**
	* User: mishaelr
	* Date: 7/11/13
	* Time: 10:33 AM
	*/
	object FCBF extends App{

	#!/bin/sh

	# get global git template dir
	if [ `git config --global init.templatedir` ]; then
	TEMPLATE_DIR=`git config --global init.templatedir`
	else
	# there is no git template dir yet, set it
	git config --global init.templatedir ~/.git
	TEMPLATE_DIR=~/.git
	fi

	import com.twitter.scalding._
	import com.twitter.algebird.{ MinHasher, MinHasher32, MinHashSignature }

	/**
	* Computes similar items (with a string itemId), based on approximate
	* Jaccard similarity, using LSH.
	*
	* Assumes an input data TSV file of the following format:
	*
	* itemId userId