Deploying and Scaling Data Science Tools

Jacob Tomlinson, who works at NVIDIA maintaining libraries like RAPIDS, Dask, Dask-Kubernetes and Dask-Cloudprovider, joins Matt Rocklin and Hugo Bowne-Anderson to discuss deployment and scaling of data science tools on distributed systems.

Dask has many cluster manager utilities which help users set up distributed Dask clusters on a variety of different infrastructures.

Dask’s distributed tooling means that users can start a scheduler with one command and any number of workers with another. However figuring out where to run them, how to requisition lots of infrastructure, how to get everything talking to each other, how to access that cluster, can be a challenge.

Jacob also occasionally live streams open source development work on his Constrained Coding channel. In these streams he often picks a small GitHub issue and opens a pull request to resolve the issue while racing a 30 minute clock. We thought it might be fun to get multiple brains together on this stream and do one together.

After attending, you’ll know

  • How distributed Dask clusters communicate
  • Different cluster types (static and ephemeral)
  • The variety of different platforms you can spin up your Dask cluster on
  • How open source contributions work!

Join us this Thursday, August 20th at 9am US Eastern time by signing up here and dive into the wonderful world of scalable data science in Python!

A graphic for Coiled's Science Thursday with Jacob Tomlinson ("Deploying and Scaling Data Science Tools on Distributed Systems").

Sign up for updates