Virtual

The Open Developer Series – Scaling Python with Dask and Coiled


The Open Developer Series closes out 2021 with a special session featuring Matthew Rocklin.

Python is the defacto standard for data analysis, visualization, and machine learning. Python has a rich library ecosystem, with an easy to use API, and decent performance. However, for a long while Python did not scale well beyond a single core or data that fit in memory.

Dask is a Python package for general purpose parallel computing that has been used parallelize many existing Python libraries like Numpy, Pandas, Scikit-Learn, Xarray, XGBoost, PyTorch, and others, allowing them to operate on larger-than-memory datasets and across clusters of thousands of machines. Dask has effectively lifted the Scientific Python software stack up to parallel and distributed computing.

This talk describes the architecture of Dask, a dynamic distributed task scheduling system, and common use cases both in scientific data processing, machine learning, and industry. Alongside common data science and machine learning use cases, we’ll also pay special attention to the Pangeo effort which grew up around Dask for scalable processing of earth system data.

Use this WebEx link and the meeting information below to log in on the day of the event. 
Meeting Number: 2764 831 2806
Password: ODS2021