narayana1208’s gists

narayana1208 / transform_actions_taken.pig

Last active August 29, 2015 14:19 — forked from michael-erasmus/transform_actions_taken.pig

	REGISTER '../udfs/jython/actions_taken.py' USING jython AS actions_taken;
	REGISTER '../udfs/python/actions_taken.py' USING streaming_python AS actions_taken1;

	raw = load '$OUTPUT_PATH/extract-actions-taken'
	using PigStorage()
	as (
	user_id:chararray,
	visitor_id:chararray,
	client_id:chararray,
	last_modified:chararray,

narayana1208 / extract_actions_taken.pig

Last active August 29, 2015 14:19 — forked from michael-erasmus/extract_actions_taken.pig

	set mongo.input.query {"date":{"\$gt":{"\$date":$MAX_DATE}}}
	set mongo.input.split.create_input_splits false

	actions_taken =
	LOAD '$BUFFER_METRICS_MONGO_URI.event.seamless.actions_taken'
	USING com.mongodb.hadoop.pig.MongoLoader(
	'user_id:chararray,
	visitor_id:chararray,
	client_id:chararray,
	last_modified:chararray,

narayana1208 / sentiment_classification.py

Last active August 29, 2015 14:14 — forked from bonzanini/sentiment_classification.py

	# You need to install scikit-learn:
	# sudo pip install scikit-learn
	#
	# Dataset: Polarity dataset v2.0
	# http://www.cs.cornell.edu/people/pabo/movie-review-data/
	#
	# Full discussion:
	# https://marcobonzanini.wordpress.com/2015/01/19/sentiment-analysis-with-python-and-scikit-learn

narayana1208 / gist:e6e6b3eefb10a38001e0

Last active August 29, 2015 14:13 — forked from debasishg/gist:8172796

Probabilistic Data Structures for Web Analytics and Data Mining : A great overview of the space of probabilistic data structures and how they are used in approximation algorithm implementation.
Models and Issues in Data Stream Systems
Philippe Flajolet’s contribution to streaming algorithms : A presentation by Jérémie Lumbroso that visits some of the hostorical perspectives and how it all began with Flajolet
Approximate Frequency Counts over Data Streams by Gurmeet Singh Manku & Rajeev Motwani : One of the early papers on the subject.
[Methods for Finding Frequent Items in Data Streams](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.187.9800&rep

narayana1208 / HdfsSyncingLocalFileOutputCommitter.java

Last active August 29, 2015 14:09 — forked from marlhammer/HdfsSyncingLocalFileOutputCommitter.java

	import java.io.File;
	import java.io.IOException;

	import org.apache.commons.io.FileUtils;
	import org.apache.hadoop.conf.Configuration;
	import org.apache.hadoop.fs.FileSystem;
	import org.apache.hadoop.fs.Path;
	import org.apache.hadoop.mapreduce.JobContext;
	import org.apache.hadoop.mapreduce.JobStatus.State;
	import org.apache.hadoop.mapreduce.TaskAttemptContext;

narayana1208 / ontime_database.R

Last active August 29, 2015 14:08 — forked from dggoldst/ontime_database.R

	#data from http://www.transtats.bts.gov/DL_SelectFields.asp?Table_ID=236&DB_Short_Name=On-Time
	#Ask for these fields
	#"DAY_OF_WEEK" (IN UI DayOfWeek)
	#"FL_DATE" (FlightDate)
	#"CARRIER" (Carrier)
	#"ORIGIN_CITY_MARKET_ID" (OriginCityMarketID)
	#"ORIGIN" (Origin)
	#"CRS_DEP_TIME" (CRSDepTime)
	#"DEP_DELAY" (DepDelay)
	#"ARR_DELAY" (ArrDelay)

narayana1208 / spark-svd.scala

Last active August 29, 2015 14:08 — forked from vrilleup/spark-svd.scala

	import org.apache.spark.mllib.linalg.distributed.RowMatrix
	import org.apache.spark.mllib.linalg._
	import org.apache.spark.{SparkConf, SparkContext}

	// To use the latest sparse SVD implementation, please build your spark-assembly after this
	// change: https://github.com/apache/spark/pull/1378

	// Input tsv with 3 fields: rowIndex(Long), columnIndex(Long), weight(Double), indices start with 0
	// Assume the number of rows is larger than the number of columns, and the number of columns is
	// smaller than Int.MaxValue

narayana1208 / 01.repl.txt

Last active August 29, 2015 14:07 — forked from ceteri/01.repl.txt

	$ ./bin/spark-shell
	14/04/18 15:23:49 INFO spark.HttpServer: Starting HTTP Server
	14/04/18 15:23:49 INFO server.Server: jetty-7.x.y-SNAPSHOT
	14/04/18 15:23:49 INFO server.AbstractConnector: Started SocketConnector@0.0.0.0:49861
	Welcome to
	____ __
	/ __/__ ___ _____/ /__
	_\ \/ _ \/ _ `/ __/ '_/
	/___/ .__/\_,_/_/ /_/\_\ version 0.9.1
	/_/

narayana1208 / visualizing_crime.py

Last active August 29, 2015 14:07 — forked from AtlasPilotPuppy/visualizing_crime.py

	# data can be found at https://data.sfgov.org/api/views/tmnf-yvry/rows.csv?accessType=DOWNLOAD
	# or https://data.sfgov.org/Public-Safety/SFPD-Incidents-Previous-Three-Months/tmnf-yvry
	import time
	import matplotlib.colors as colors
	import matplotlib.cm as cmx
	from matplotlib import pyplot as plt
	from matplotlib.patches import Patch
	import numpy as np
	import pandas

narayana1208 / gist:2e11325863bc8e7e8952

Last active August 29, 2015 14:07 — forked from robertjmoore/gist:3960013

	SELECT orders.customerid,
	orders.transactiondate,
	orders.transactionamount,
	cohorts.cohortdate
	FROM orders
	JOIN (SELECT customerid,
	Min(transactiondate) AS cohortDate
	FROM orders
	GROUP BY customerid) AS cohorts
	ON orders.customerid = cohorts.customerid;

narayanareddy narayana1208