Skip to content

Instantly share code, notes, and snippets.

@pbhalesain
Created June 11, 2015 14:27
Show Gist options
  • Select an option

  • Save pbhalesain/153a80e856eaf3345b49 to your computer and use it in GitHub Desktop.

Select an option

Save pbhalesain/153a80e856eaf3345b49 to your computer and use it in GitHub Desktop.
Certified Spark Developer - Databricks Certification
https://databricks.com/spark/certification/certified-spark-developer
http://go.databricks.com/spark-certified-developer
Books:
Learning Spark (The main book) - O'reilly
Introduction to Apache Spark (Video Book) - O'reilly - Paco Nathan
Spark Reference Applications https://www.gitbook.com/book/databricks/databricks-spark-reference-applications/details
Spark Knowledge Base (Troubleshoot, Best Practices) https://www.gitbook.com/book/databricks/databricks-spark-knowledge-base/details
Advanced Analytics with Spark - O'reilly - Sean Owen et al.
Fast Data Processing - Packt - Holden Karau
Machine Learning with Spark - Packt - Nick Pentreath
Online Tutorials:
https://spark-summit.org/2014/training
http://ampcamp.berkeley.edu/5/exercises/index.html
Blogs:
http://www.dattamsha.com/tag/spark/
Focus Areas:
Spark Execution Model
Hands on Programming of spark applications in Scala, Python, Java
Troubleshooting, Best Practices.
More focused on code than theory.
As per Paco Nathan in this thread https://mail-archives.apache.org/mod_mbox/spark-user/201505.mbox/%3CCAN4BO_DCAF1mQC-uoo4oAWkH=kQp945=vGKn--bzJi0KgYuYCw@mail.gmail.com%3E
Understanding breadth of Spark API usage across Scala, Java, Python
Applying best practices to avoid runtime issues and performance bottlenecks
Distinguishing Spark features and practices from MapReduce usage
Integrating SQL, Streaming, ML, Graph atop the Spark unified engine
Solving typical use cases with Spark in Scala, Java, Python
Berkely MOOC Certification available on eDX.org as XSeries Verified track.
Introduction to Big Data with Apache Spark https://www.edx.org/course/introduction-big-data-apache-spark-uc-berkeleyx-cs100-1x
Scalable Machine Learning https://www.edx.org/course/scalable-machine-learning-uc-berkeleyx-cs190-1x
Both certifications are available for $50 for certified track or they can be enrolled for free without participatory certification.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment