PyConFr 2014

Lyon, October 25-28

PyConFR 2014

dimanche 14:30:00–15:00:00

Bootstrapping Machine Learning

Louis Dorard

Audience level:
Intermediate

Description

Prediction APIs are democratizing Machine Learning. They make it easier for developers to build smart features in their apps by abstracting away some of the complexities of building and deploying predictive models. In this talk we’ll look at the possibilities and limitations of ML, how to use Prediction APIs, how to prepare data to send to them, and how to assess performance.

Abstract

Last year, Forrester introduced the term “predictive apps” and described them as “the next big thing in app development”. They are apps that can provide the right functionality and content at the right time, for the right person, by continuously learning about them. Now, in the same way that we talked of the importance of mobile-first development, we are talking of the importance of predictive-first development.

Fortunately for developers, machine learning is being democratized through Prediction APIs that abstract away some of the complexities of building predictive models from data (and deploying them). They make it easier for developers to build smart features in their apps (for instance: detecting spam, predicting missing values, predicting a user's intent, his interests, etc.). As demand for predictive apps is growing, more and more of these APIs are coming out. Microsoft and Wolfram have just entered the space of Machine Learning as a Service and Prediction APIs, with Azure ML and Wolfram Programming Cloud respectively. They are thus joining Google Prediction API and BigML, a promising startup which is rapidly expanding.

We'll start this talk by looking at the possibilities and limitations of Machine Learning. Then, we'll study how Prediction APIs work. We'll go through an IPython notebook to illustrate this with the BigML API: we’ll see how to create a (white-box) predictive model from data and how to use that model. The crucial point here is to have “good” data to send to the API. We’ll discuss what this means and how to prepare data, from “sanity checks” you can make in Excel to scripting data processing tasks with the Pandas library. Finally, we’ll go over the criteria and methodology to assess the performance of a Machine Learning system prior to its deployment in production.

Voir le support de présentation

Sponsors