Dask Heartbeat by Coiled: October 2021
• October 27, 2021
The Dask community is highly distributed with different teams working independently. This is powerful but sometimes makes it hard for people within the community to see everything that is going on. The Dask Heartbeat by Coiled is a monthly publication intended to centralize and broadcast Dask news over the previous month.
There is an ongoing effort to refactor Dask’s documentation, led by Jacob Tomlinson and Julia Signell. As a first step, the documentation sphinx theme has been migrated to the Executable Book Theme, and the overall structure and navigation have been improved.
Experimental Shuffling Algorithm
Gabe Joseph has been working on a promising new approach to Dask DataFrame shuffling. Shuffling refers to interacting with the entire dataset to complete a particular operation, which can get more expensive as the dataset gets larger. This experimental approach is performing well on terabytes of larger-than-memory datasets. To learn more and try the feature yourself, check out Better shuffling in Dask: a proof-of-concept.
More eager ordering in Dask
Thanks to Erik Welch,
dask.order now eagerly computes dependent tasks to allow the parent tasks to be released from memory. This update improves Dask’s memory usage. Learn more about the work and see the graph optimizations in this PR.
Update on Active Memory Management
Guido Imperiale has continued to improve the Dask distributed scheduler’s memory management. It now has the capability to run multiple active memory managers in parallel and automatically purge any replicated tasks.
Dask Cluster State Preservance
Jacob Tomlinson is working to improve the lifecycle of Dask clusters, which involves creating, listing, scaling, and deleting clusters. A key part of this effort is to improve how Dask manages and stores the state of a cluster, Jacob added this functionality in a recent PR.
Shoutout to Dask Maintainers
Dask is a very active project with a lot of work happening simultaneously. Therefore, maintaining the project and making sure it stays stable is essential. We’d like to thank all the Dask maintainers who help review PRs, stay on top of CI, release Dask in a timely cadence, and build the awesome Dask community. 🙂
Dask Monthly Community Meeting
Some highlights from the October Dask community meeting:
- The Coiled team held a StackOverflow Sprint where we helped answer ~50 questions and engaged with over a hundred more.
- The RAPIDS and Coiled teams are hiring Dask developers!
Full meeting notes are available here.
You’re All Caught Up On Dask
That’s it. Thanks for reading.
If you’re interested in taking Coiled Cloud for a spin, which provides hosted Dask clusters, docker-less managed software, and one-click deployments, you can do so for free today when you click below.