- What use cases are a good fit for Apache Spark? How to work with Spark?
- create RDDs, transform them, and execute actions to get result of a computation
- All computations in memory = "memory is cheap" (we do need enough of memory to fit all the data in)
- the less disk operations, the faster (you do know it, don't you?)
- You develop such computation flows or pipelines using a programming language - Scala, Python or Java <-- that's where ability to write code is paramount
- Data is usually on a distributed file system like Hadoop HDFS or NoSQL databases like Cassandra
- Data mining = analysis / insights / analytics
- log mining
| #!/bin/bash | |
| ##################################################### | |
| # Name: Bash CheatSheet for Mac OSX | |
| # | |
| # A little overlook of the Bash basics | |
| # | |
| # Usage: | |
| # | |
| # Author: J. Le Coupanec | |
| # Date: 2014/11/04 |
A lot of these are outright stolen from Edward O'Campo-Gooding's list of questions. I really like his list.
I'm having some trouble paring this down to a manageable list of questions -- I realistically want to know all of these things before starting to work at a company, but it's a lot to ask all at once. My current game plan is to pick 6 before an interview and ask those.
I'd love comments and suggestions about any of these.
I've found questions like "do you have smart people? Can I learn a lot at your company?" to be basically totally useless -- everybody will say "yeah, definitely!" and it's hard to learn anything from them. So I'm trying to make all of these questions pretty concrete -- if a team doesn't have an issue tracker, they don't have an issue tracker.
I'm also mostly not asking about principles, but the way things are -- not "do you think code review is important?", but "Does all code get reviewed?".