Parallelizing Computation with Dask Delayed and Futures

This class module focuses on using the Dask scheduler to empower custom parallel computation. Dask Delayed and Futures represent lightweight mechanisms for building and running custom task graphs, while staying within traditional Python coding patterns. This combination — regular Python code with a powerful distributed scheduler — enables all kinds of industry or discipline-specific workloads to be parallelized for fast, large-scale computation.

Learn Ideas and Gain Skills

  • How to parallelize algorithms with minimal changes to your code
  • Patterns for scheduling compute against data that may not yet be available
  • Maximizing parallelism and minimizing bottlenecks

Computation Dask Dask Delayed and Futures


Prerequisites

  • Python, basic to intermediate level

Topics

Introduction

  • Local parallel programming with ThreadPoolExecutor.map
  • Async with submit() and concurrent.futures
  • Generalizing async code with task graphs
  • Scaling task graphs: basic scheduler idea

Building and Running Graphs with Delayed

  • Creating and running Delayed objects
  • Decorator and explicit Delayed
  • Compute and Persist
  • Inspecting dependencies and task graphs programmatically and visually
  • Allowed operations with Delayed
  • What to do about prohibited operations with Delayed, and why
  • Aggregating data with Delayed
  • Best practices for Delayed

Running and Managing Work with Futures

  • Review of Future/Promise model
  • Launching tasks with submit, map
  • Getting results
  • Inspecting Future status
  • Graphs: Composing Futures
  • Managing Future lifetimes and task scheduling: errors, cancel, wait, fire_and_forget, del
  • Dynamic graphs: as_completed

Review and Q&A

  • Comparison: Futures vs Delayed
  • Using scatter
  • Debugging