Skip to content

Instantly share code, notes, and snippets.

@FlatMapIO
Forked from debasishg/gist:8172796
Last active January 8, 2016 17:18
Show Gist options
  • Select an option

  • Save FlatMapIO/0d2914806555e01c09a2 to your computer and use it in GitHub Desktop.

Select an option

Save FlatMapIO/0d2914806555e01c09a2 to your computer and use it in GitHub Desktop.
1. Hyperloglog and MinHash : Implementation of a form of hyperloglog and adding capabilities of MinHash algorithm on to it which would enable to perform set intersections."While it does require extra processing power to deal with collecting all the minima, it’s possible to get satisfactory performance out of the structure for a relatively low storage or memory footprint" (http://tech.adroll.com/blog/data/2013/07/10/hll-minhash.html)
2. Streaming/Sketching Conference from AK Tech : Conatins links to videos and slides from the speakers like Muthukrishnan who spoke about Count Min Sketch (http://blog.aggregateknowledge.com/2013/05/23/foundation-capital-and-aggregate-knowledge-sponsor-streamingsketching-conference/)
3. Medians and Beyond: New Aggregation Techniques for Sensor Networks : The paper that introduced q-digest for range queries and quantile approximation (http://www.cs.virginia.edu/~son/cs851/papers/ucsb.sensys04.pdf)
4. Two blog posts on Q-Digest
a) (http://papercruncher.com/2011/07/31/q-digest/)
b) (http://www.prelert.com/blog/q-digest-an-algorithm-for-computing-approximate-quantiles-on-a-collection-of-integers/)
c) The Art of Approximating Distributions: Histograms and Quantiles at Scale - an alternative approach to q-digest (http://metamarkets.com/2013/histograms/#)
5. t-digest : A new data structure for accurate on-line accumulation of rank-based statistics such as quantiles and trimmed means. Ted Dunning's variant of Q-digest that does some improvements (https://github.com/tdunning/t-digest)
6. stream-lib : A collection of Stream summarization and cardinality estimation algorithms like CM Sketch, Hyperloglog, Bloom Filters (https://github.com/addthis/stream-lib)
7. Count-Min Sketch
a) An Improved Data Stream Summary: The Count-Min Sketch and its Applications - Cormode & Muthukrishnan : The paper that introduced count min sketch (http://dimacs.rutgers.edu/~graham/pubs/papers/cm-full.pdf)
b) faq on count-min sketch that also highlights it's differences with bloom filters (https://sites.google.com/site/countminsketch/home/faq)
c) Count Min Sketch by Cormode : Introductory paper (http://dimacs.rutgers.edu/~graham/pubs/papers/cmencyc.pdf)
d) Streaming Algorithms and Sketches - Count Min Sketch on AK Tech Blog (http://blog.aggregateknowledge.com/2011/09/13/streaming-algorithms-and-sketches/)
e) Muthukrishnan talking on Count Min Sketch at AK Tech conference (http://www.youtube.com/watch?v=OOZC4KCErN0)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment