Effective Machine Learning with Dask

This class focuses on leveraging Dask for Machine Learning in several different ways: Dask implements a number of distributed algorithms; interoperates with popular Python libraries, and integrates with several external projects (e.g., PyTorch). This module looks at each of the options, as well as the full ML lifecycle, from ingesting data to performing inference.

Learn Ideas and Gain Skills

  • What does Dask offer — and not offer — for machine learning workflows
  • Leveraging Dask for proper out-of-core and/or parallel training
  • Implementing an end-to-end workflow with Dask and other tools

Dask machine learning Python


Prerequisites

  • Python, basic level
  • Understanding of ML concepts and workflow, basic level
  • Dask programming, basic level

Topics

Introduction

  • What makes scale-out machine learning different and challenging
  • How Dask flexibly approaches distributed ML challenges
  • Using Dask with — not instead of — other tools
  • Optional: Dask’s model for enabling custom algorithms

Dask and scikit-learn

  • Hyperparameter Search
  • Out-of-core non-parallel training (incremental)
  • In-memory parallel training
  • Combining incremental (out-of-core) and parallel training
  • Review of scikit-learn + Dask helper APIs
  • How to match scikit-learn algorithms to Dask options

Data Preparation and Dask’s Algorithms and Integration with XGBoost

  • Ingesting data
  • Feature engineering transformations
  • Model training (GLM, clustering)
  • Pipelines
  • Dask + xgboost
  • Optional: Dask + GPU (overview)

Performing Inference at Scale

  • General patterns for inference
  • Predicting with Dask Futures
  • About ParallelPostFit
  • Optional: Low-latency vs. batch vs. resource-intensive inference patterns

Custom Algorithms (Optional Intro, full-day only)

  • Dask’s task scheduler
  • Mechanisms for distributing, sharing, aggregating, and collecting data
  • Example: implementing a simulation model

Review and Q & A

  • Gotchas and best practices
  • Architecture options for integrating Dask