In the near future when Artificial Intelligence (Intelligence displayed by machines) will start to spread out and autonomously make important decisions that will impact our daily lives, the issue of its accountability under the law will raise.
AI systems are expected to justify their decisions without revealing all their internal secrets, to protect the commercial advantage of the AI providers. Not to mention that map inputs and intermediate representations in AI systems to human-interpretable concepts is a challenging problem because these systems tend to work as black boxes.
As such explanation systems should be considered distinct from AI systems.
This paper written by researchers of the Harvard University highlights some interesting aspects of this debate and shows that this problem is by no mean straightforward.
Another pretty good step forward in Deep Neural Networks from DeepMind.
They took inspiration from neurosciences-based theories about the consolidation of previously acquired skills and memories in mammalian and human brains: connections between neurons are less likely to be overwritten if they have been important in previously learnt tasks. This mechanism is known as “synaptic consolidation“.
The result is a neural network model that can learn several tasks without overwriting what was previously learnt (a known limitation of the current neural network approach, known as “catastrophic forgetting”).
The new algorithm has been called “Elastic Weight Consolidation” (EWC).
All the details can be read in their last PNAS paper.
Gradient boosting ensemble technique for regression
Gradient boosting is a machine learning technique for regression and classification problems, which produces a prediction model in the form of an ensemble of weak prediction models, typically decision trees. It builds the model in a stage-wise fashion like other boosting methods do, and it generalizes them by allowing optimization of an arbitrary differentiable loss function. (source: Wikipedia)
This is a great video tutorial from Alexander Ihler, Associate Professor at Information & Computer Science, UC Irvine.
You can found other interesting data science tutorials made by Alexander Ihler in this YouTube channel:
Introduction to Deep Learning with Python
Alec Radford, Head of Research at indico Data Solutions, speaking on deep learning with Python and the Theano library.
An amazing data science YouTube tutorial with emphasis on high performance computing, natural language processing using recurrent neural nets, and large scale learning with GPUs.
This tutorial provides and excellent example of how deep learning can be practically applied to real world problems.
SlideShare presentation is available here: http://slidesha.re/1zs9M11
“Getting Started with Storm“, by Jonathan Leibiusky, Gabriel Eisbruch & Dario Simonassi.
Countinuous Streaming Computation with Twitter’s Cluster Technology.
Even as big data is turning the world upside down, the next phase of the revolution is already taking shape: real-time data analysis. This hands-on guide introduces you to Storm, a distributed, JVM-based system for processing streaming data. Through simple tutorials, sample Java code, and a complete real-world scenario, you’ll learn how to build fast, fault-tolerant solutions that process results as soon as the data arrives.
Discover how easy it is to set up Storm clusters for solving various problems, including continuous data computation, distributed remote procedure calls, and data stream processing.
Apache Storm project site: https://storm.apache.org/
Note that this book is based on Storm 0.7.1 but so far the latest version is 0.9.5, so this book is quite outdated and need to be integrated with the online documentation.
Baidu hosted SF Analytics Meetup at their Sunnyvale office on August 19th, 2015 – Updates on Speech Recognition, Deep Learning and HPC.
SF Big Analytics Part 1. Deep Learning by Chief Scientist Andrew Ng
SF Big Analytics Part 2. Bryan Catanzaro, Senior Researcher: “Why is HPC So Important to AI?”
SF Big Analytics Part 3. Awni Hannun, Senior Researcher: “Update on Deep Speech”
Databricks and O’Reilly have partnered to create a certified developer exam.
According to databriks, the Spark Developer Certification is the way for a developer to:
- Demonstrate recognized validation for your expertise.
- Meet the global standards to ensure compatibility between Spark applications and distributions.
- Stay up to date with the latest advances and training in Spark.
- Be a part of the Spark developers community.
Developers can tak the exam online here.
You will take the test on your own computer, under the monitoring of a proctoring team. The test is about 90 minutes with a series of randomly generated questions covering all aspects of Spark.
The test will include questions in Scala, Python, Java, and SQL. However, deep proficiency in any of those languages is not required, since the questions focus on Spark and its model of computation.
To prepare for the Spark certification exam, they recommend that you:
- Are comfortable coding the advanced exercises in Spark Camp or related training (example exercises can be found here).
- Have mastered the material released so far in the O’Reilly book, Learning Spark.
- Have some hands-on experience developing Spark apps in production already.
See also this article from the Spark Summit 2014 Training Archive.