Dask Heartbeat by Coiled: October 2021

The Coiled Team October 27, 2021

, , ,


Introduction

The Dask community is highly distributed with different teams working independently. This is powerful but sometimes makes it hard for people within the community to see everything that is going on. The Dask Heartbeat by Coiled is a monthly publication intended to centralize and broadcast Dask news over the previous month.  

If you want something added to this list, either send an email at info@coiled.io, or tweet and tag @dask_dev, and we’ll try to include it. Keep reading for the latest updates.

Dask Heartbeat October 2021

Documentation Refresh

There is an ongoing effort to refactor Dask’s documentation, led by Jacob Tomlinson and Julia Signell. As a first step, the documentation sphinx theme has been migrated to the Executable Book Theme, and the overall structure and navigation have been improved.

Dask Documentation Refresh

Check out the new documentation here: https://docs.dask.org/en/latest/ and keep an eye out for the official announcement on blog.dask.org!

Experimental Shuffling Algorithm

Gabe Joseph has been working on a promising new approach to Dask DataFrame shuffling. Shuffling refers to interacting with the entire dataset to complete a particular operation, which can get more expensive as the dataset gets larger. This experimental approach is performing well on terabytes of larger-than-memory datasets. To learn more and try the feature yourself, check out Better shuffling in Dask: a proof-of-concept.

More eager ordering in Dask

Thanks to Erik Welch, dask.order now eagerly computes dependent tasks to allow the parent tasks to be released from memory. This update improves Dask’s memory usage. Learn more about the work and see the graph optimizations in this PR.

Update on Active Memory Management

Guido Imperiale has continued to improve the Dask distributed scheduler’s memory management. It now has the capability to run multiple active memory managers in parallel and automatically purge any replicated tasks.

Dask Cluster State Preservance

Jacob Tomlinson is working to improve the lifecycle of Dask clusters, which involves creating, listing, scaling, and deleting clusters. A key part of this effort is to improve how Dask manages and stores the state of a cluster, Jacob added this functionality in a recent PR.

Shoutout to Dask Maintainers

Dask is a very active project with a lot of work happening simultaneously. Therefore, maintaining the project and making sure it stays stable is essential. We’d like to thank all the Dask maintainers who help review PRs, stay on top of CI, release Dask in a timely cadence, and build the awesome Dask community. 🙂

Releases

Over the month of September, both Dask and Distributed versions 2021.09.0 and 2021.09.1 were released.

Dask Monthly Community Meeting 

Some highlights from the October Dask community meeting:

  • The Coiled team held a StackOverflow Sprint where we helped answer ~50 questions and engaged with over a hundred more.
  • The RAPIDS and Coiled teams are hiring Dask developers! 

Full meeting notes are available here.

You’re All Caught Up On Dask

That’s it. Thanks for reading.

If you’re interested in taking Coiled Cloud for a spin, which provides hosted Dask clusters, docker-less managed software, and one-click deployments, you can do so for free today when you click below.

Try Coiled Cloud


Ready to get started?

Create your first cluster in minutes.