Tom Augspurger, who works at Anaconda maintaining libraries like pandas, Dask, and Dask-ML, joins Matt Rocklin and Hugo Bowne-Anderson to discuss scalable machine learning in Python.
Dask-ML provides tools for scalable machine learning. It works with libraries like scikit-learn and XGBoost to scale out to larger datasets or larger problems.
We’re fortunate to have great, high-performance libraries like NumPy, SciPy and Scikit-Learn for machine learning. They work great for problems that fit on a single machine. For larger problems, however, you’ll run into compute or memory constraints that slow down the iterative process of developing a machine learning model. Dask-ML let’s you restore that rapid cycle by scaling your familiar machine learning workflow with Dask.
After attending, you’ll know
- When (and when not!) to reach for distributed machine learning tools
- The different types of scaling challenges you might run into
- How to distribute a hyperparameter search on a Dask Cluster
- How to scale out to larger-than-memory datasets
Join us this Thursday, August 13th at 5pm US Eastern time by signing up here and dive into the wonderful world of scalable machine learning in Python!