Open Source

Coiled is built on Dask. As creators and contributors, we’re committed to supporting the platform.

What is Dask?

Dask is a free and open source library that helps scale your data science workflows and provides a complete framework for distributed computing in Python.

It is used by researchers, data scientists, large corporations, and government agencies. Notably, Dask users include NASA, Harvard Medical School, Walmart, and Capital One.

Distributed-computing framework

ML workflows

Cloud deployment

Scaling Data Science Workflows

Dask integrates seamlessly with the PyData ecosystem making it easy to scale your NumPy, pandas, and scikit-learn code. Dask is purely Python-based and has a familiar API that makes onboarding quick and adoption simple.

Distributed Computing Toolbox

Dask can help scale up your computation to use all the cores of your workstations or scale out to leverage cloud services on AWS, Azure, and GCP. Dask is a distributed computing toolbox and has been used with multiple other Python libraries including XGBoost, Prefect, RAPIDS, and more.

xgboost logo
scikit learn logo
apache airflow logo
prefect logo
pangeo logo
featuretools logo
xarray logo
pytroll logo
prophet logo
iris logo
pytorch logo

Collections, Schedulers, and Workers

Dask collections provide the API used to write Dask code. Collections create task graphs that define how to perform the computation in parallel. The actual computation is performed on a cluster, which consists of:

  • A scheduler, that manages the flow of work and sends the tasks to the workers
  • Workers, that compute the tasks given to them by the scheduler

At the very beginning of the process, there is a client, that lives where you write your Python code. It is the user-facing entry point that passes on the tasks to the scheduler.

Dask Dashboards

Dask provides a live interactive dashboard containing multiple plots and tables to help diagnose the state of your cluster. It includes:

  • A cluster map, that visualizes interactions between the scheduler and the workers
  • Task stream, that shows real-time activities performed by each worker
  • Progress bar, that displays the progress being made on each task

Dask Graph
Dask dashboard graph
Dask Task Stream
dask dashboard stream

Coiled Cloud: Dask in the Cloud

Coiled Cloud makes it easy to scale your data science workflows to the cloud using Dask. Coiled Cloud handles DevOps so that you can focus on Data Science. It takes care of deploying containers, hooking up networking securely, managing Docker images, and more!

Dask Resources

Learn Dask and get involved with the Dask Community

“With these improvements, we have seen roughly 100x improvement model training times and costs have gone down 97%, assuming you are only paying for computation during training.”

Michael McCarty
Director, Center for Machine Learning Distinguished Machine Learning Engineer, Capital One

Ready to get started?

Create your first cluster in minutes.