Dask Heartbeat by Coiled: November 2021
• November 15, 2021
The Dask community is highly distributed, with different teams working independently. This is powerful but sometimes makes it hard for people within the community to see everything that is going on. The Dask Heartbeat by Coiled is a monthly publication intended to centralize and broadcast Dask news over the previous month.
Dask Discourse Community Forum
Dask has a new community forum at discourse.dask.group!
Ian Rose set this up with help and input from many Dask contributors. It is a space for the entire Dask community of users, contributors, and enthusiasts to participate in discussions, ask and answer questions, share interesting resources, and showcase their work. Be sure to check it out and introduce yourself!
Worker State Machine Refactor
Florian Jetter worked on refactoring the Worker State Machine — the pipeline that dictates how a task (and its states like waiting, ready, executing, etc.) are handled by the Dask workers. This refactor has been crucial to help solve stability problems around deadlocked or stuck clusters.
Such problems can be difficult to debug and hard to reproduce, so the team also worked on a way for users to create a snapshot of the cluster state if the cluster froze. Instructions on how to use this will soon be provided in an updated issue template.
This is a part of a broader effort to investigate and improve the stability of the Dask Distributed scheduler.
Dask for Life Sciences
Genevieve Buckley has been working as the Dask Life Science Fellow since early 2021. As a part of this, she has helped improve and maintain Dask, with a special focus on life-science applications. Genevieve has also led various outreach activities, including, organizing Dask workshops, mentoring a GSoC student, and writing community blog posts. You can learn more in the CZI OSS Update.
Genevieve’s work draws to a close in December, and we’d like to thank her for all her contributions to the Dask community. 🙂
Update on AMM
Guido Imperiale has been working on an Active Memory Manager for the past few months:
“The Active Memory Manager, or AMM, is an experimental daemon that optimizes memory usage of workers across the Dask cluster. ” ~ Dask Distributed documentation
With dask/distributed version 2021.10.0 and above, you can enable the active memory manager in your Dask configuration file. Learn more about AMM, its policies, and how to enable it in the high-level documentation. Guido will also be publishing a blog post about AMM soon!
The Dask documentation is continuously updated. Here are some highlights from October:
- Jacob Tomlinson and Julia Signell are continuing the documentation refresh efforts — the new Sphinx Book Theme is now the default across all Dask documentation pages.
- Ray Bell helped improve documentation and examples around Dask SSH Cluster.
- Florian Jetter has started discussions on expanding the Dask developer documentation.
Stale Issues and PRs Sprint
Dask contributors held a sprint in early October to devote attention to some long-standing issues and pull-requests across multiple Dask repositories on GitHub. They worked to triage, manage, and close over a hundred issues and PRs where discussions seemed to have stalled.
Dask typically strives for a two week release window, but a few releases needed to be postponed in September and October. This was due to some reported stability regressions in earlier versions, and the team wanted to ensure the stability of new releases. For more information, see 2021.9.1 and 2021.10.0.
The team is still observing issues, but they hope it will only affect only a small number of Dask users. If you are facing any problems, please reach out on the Dask issue tracker.
Dask Monthly Community Meeting
Some highlights from the November Dask community meeting:
- The RAPIDS team is experimenting with a new technique that allows on-demand memory spilling.
- Richard Zamora is working to improve the speed of reading parquet files from remote storage locations.
Full meeting notes are available here.
You’re All Caught Up On Dask
That’s it. Thanks for reading.
If you’re interested in taking Coiled Cloud for a spin, which provides hosted Dask clusters, docker-less managed software, and one-click deployments, you can do so for free today when you click below.