Data Science at Scale with Dask


An introduction to distributed computing:

  • When, why and how should you leverage distributed computing?
  • Introduction to Dask, an OSS Python library for distributed computing

How to parallelise your Python code with Dask:

  • Why parallelise your code?
  • Using dask.delayed() to parallelise custom code

Scaling your NumPy and pandas workflows:

  • How to scale your NumPy and pandas to larger-than-memory datasets?
  • Dask Collections: Bags, Arrays and DataFrames

Distributed Machine Learning with Dask:

  • How to build distributed ML models
  • Bursting to the cloud to transcend local compute resources

Richard Pelgrim

Affiliation: Coiled

Richard Pelgrim is a data scientist with a passion for communicating technical content in creative and compelling ways that increase engagement. Currently he does so as Data Science Evangelist at, the leading company around the open-source Dask library for distributed computing in Python. Richard is regularly invited to give Dask tutorials at meet-ups and conferences and has a treasure chest of expert tips to support anyone on-boarding with Dask.

visit the speaker at: Github • Homepage