Getting Started with Storm

Getting Started with Storm“, by Jonathan Leibiusky, Gabriel Eisbruch & Dario Simonassi.
Countinuous Streaming Computation with Twitter’s Cluster Technology.

Even as big data is turning the world upside down, the next phase of the revolution is already taking shape: real-time data analysis. This hands-on guide introduces you to Storm, a distributed, JVM-based system for processing streaming data. Through simple tutorials, sample Java code, and a complete real-world scenario, you’ll learn how to build fast, fault-tolerant solutions that process results as soon as the data arrives.

Discover how easy it is to set up Storm clusters for solving various problems, including continuous data computation, distributed remote procedure calls, and data stream processing.

Apache Storm project site

Note that this book is based on Storm 0.7.1 but so far the latest version is 0.9.5, so this book is quite outdated and need to be integrated with the online documentation.

SF Big Analytics – Update on Speech Recognition and HPC at Baidu Research

Baidu hosted SF Analytics Meetup at their Sunnyvale office on August 19th, 2015 – Updates on Speech Recognition, Deep Learning and HPC.

SF Big Analytics Part 1. Deep Learning by Chief Scientist Andrew Ng

SF Big Analytics Part 2. Bryan Catanzaro, Senior Researcher: “Why is HPC So Important to AI?”

SF Big Analytics Part 3. Awni Hannun, Senior Researcher: “Update on Deep Speech”

The Spark Certified Developer program


Databricks and O’Reilly have partnered to create a certified developer exam.

According to databriks, the Spark Developer Certification is the way for a developer to:

  • Demonstrate recognized validation for your expertise.
  • Meet the global standards to ensure compatibility between Spark applications and distributions.
  • Stay up to date with the latest advances and training in Spark.
  • Be a part of the Spark developers community.

Developers can tak the exam online here.

You will take the test on your own computer, under the monitoring of a proctoring team. The test is about 90 minutes with a series of randomly generated questions covering all aspects of Spark.

The test will include questions in Scala, Python, Java, and SQL. However, deep proficiency in any of those languages is not required, since the questions focus on Spark and its model of computation.

To prepare for the Spark certification exam, they recommend that you:

  • Are comfortable coding the advanced exercises in Spark Camp or related training (example exercises can be found here).
  • Have mastered the material released so far in the O’Reilly book, Learning Spark.
  • Have some hands-on experience developing Spark apps in production already.

See also this article from the Spark Summit 2014 Training Archive.