Scalable Machine Learning in Python

Tom Augspurger, who works at Anaconda maintaining libraries like pandas, Dask, and Dask-ML, joins Matt Rocklin and Hugo Bowne-Anderson to discuss scalable machine learning in Python.

Dask-ML provides tools for scalable machine learning. It works with libraries like scikit-learn and XGBoost to scale out to larger datasets or larger problems.

We’re fortunate to have great, high-performance libraries like NumPy, SciPy and Scikit-Learn for machine learning. They work great for problems that fit on a single machine. For larger problems, however, you’ll run into compute or memory constraints that slow down the iterative process of developing a machine learning model. Dask-ML let’s you restore that rapid cycle by scaling your familiar machine learning workflow with Dask.

After attending, you’ll know

  • When (and when not!) to reach for distributed machine learning tools
  • The different types of scaling challenges you might run into
  • How to distribute a hyperparameter search on a Dask Cluster
  • How to scale out to larger-than-memory datasets

Join us this Thursday, August 13th at 5pm US Eastern time by signing up here and dive into the wonderful world of scalable machine learning in Python!

Dask logo with matrix
Share