What is Dask?

Dask natively scales Python. Dask provides advanced parallelism for analytics, enabling performance at scale for the tools you love. ~ Dask.org


Dask is a free and open source library that helps scale your data science workflows and provides a complete framework for distributed computing in Python. It is used by researchers, data scientists, large corporations, and even government agencies. Notable Dask users include NASA, Harvard Medical School, Walmart, and Capital One.

Scaling Data Science Workflows

Dask integrates seamlessly with the PyData ecosystem making it easy to scale your NumPy, pandas, and scikit-learn code. Dask is purely Python-based and has a Familiar API that makes onboarding quick and adoption simple.

scikit-learn ML algorithms being scaled
Logos of Apache Airflow, Pangeo, scikit-learn, Featuretools, Prophet, Iris, Prefect, PyTROLL, xarray, RAPIDS, dmlc XGBoost

Distributed Computing Toolbox

Dask can help scale-up your computation to use all the core of your workstations, or scale-out to leverage cloud services like AWS, Azure, and GCP. Dask is a distributed computing toolbox and has been used with multiple other softwares including X