Skip to content

Instantly share code, notes, and snippets.

@huojga
huojga / spark-intro.md
Created August 30, 2016 16:46 — forked from jaceklaskowski/spark-intro.md
Introduction to Apache Spark

Introducting Apache Spark

  • What use cases are a good fit for Apache Spark? How to work with Spark?
    • create RDDs, transform them, and execute actions to get result of a computation
    • All computations in memory = "memory is cheap" (we do need enough of memory to fit all the data in)
      • the less disk operations, the faster (you do know it, don't you?)
    • You develop such computation flows or pipelines using a programming language - Scala, Python or Java <-- that's where ability to write code is paramount
    • Data is usually on a distributed file system like Hadoop HDFS or NoSQL databases like Cassandra
    • Data mining = analysis / insights / analytics
  • log mining
@huojga
huojga / bash-cheatsheet.sh
Created August 3, 2016 20:11 — forked from LeCoupa/bash-cheatsheet.sh
Bash CheatSheet for UNIX Systems
#!/bin/bash
#####################################################
# Name: Bash CheatSheet for Mac OSX
#
# A little overlook of the Bash basics
#
# Usage:
#
# Author: J. Le Coupanec
# Date: 2014/11/04
@huojga
huojga / interview-questions.md
Created July 13, 2016 16:45 — forked from jvns/interview-questions.md
A list of questions you could ask while interviewing

A lot of these are outright stolen from Edward O'Campo-Gooding's list of questions. I really like his list.

I'm having some trouble paring this down to a manageable list of questions -- I realistically want to know all of these things before starting to work at a company, but it's a lot to ask all at once. My current game plan is to pick 6 before an interview and ask those.

I'd love comments and suggestions about any of these.

I've found questions like "do you have smart people? Can I learn a lot at your company?" to be basically totally useless -- everybody will say "yeah, definitely!" and it's hard to learn anything from them. So I'm trying to make all of these questions pretty concrete -- if a team doesn't have an issue tracker, they don't have an issue tracker.

I'm also mostly not asking about principles, but the way things are -- not "do you think code review is important?", but "Does all code get reviewed?".