Dask Heartbeat by Coiled: December 2021

The Coiled Team December 23, 2021

, , ,


Introduction

The Dask community is highly distributed, with different teams working independently. This is powerful but sometimes makes it hard for people within the community to see everything that is going on. The Dask Heartbeat by Coiled is a monthly publication intended to centralize and broadcast Dask news over the previous month.  

If you want something added to this list, either send an email at info@coiled.io, or tweet and tag @dask_dev, and we’ll try to include it. Keep reading for the latest updates.

Dask Heartbeat Nov 2021

Improved Ordering Diagnostics

Erik Welch added some Dask diagnostic visualizations to better understand the optimization in dask.order. This is related to Erik’s to make dask.order compute dependent tasks eagerly, which allows the parent tasks to be released from memory giving performance benefits. Learn more about the visualizations in this elaborate example and look out for a blog post by Erik!

Cluster Memory Visualization

More diagnostics! Guido Imperiale has been working on the Active Memory Manager for Dask Distributed for the past few months. In line with this work, Guido added a new visualization called MemorySampler to compare the cluster-wide memory usage over time. Learn how to use it in the documentation.

MemorySampler Visualization. Source: docs.dask.org

High Level Graph Optimizations

There’s an ongoing long-term effort to add High Level Graph optimizations to Dask APIs. As a part of this,

Reduced Memory Use while Merging Subframes

Adding to memory management improvements, Gabe Joseph updated how Dask DataFrame “subframes” are merged (before deserialization internally). The subframes are now merged without copying whenever possible, i.e., whenever they share a buffer, reducing the overall memory consumption.

Accessibility Sprint

Earlier this month, the Dask community held an Accessibility Sprint to improve the alt-text for images in the Dask documentation. This was organized by Sarah Johnson and Ian Rose in collaboration with Quantsight’s Isabela Presedo-Floyd and Tony Fast, and alt-texts for nearly 30 images were improved during the sprint!

Improved Support for Parquet

Richard Zamora and Matthew Powers improved how Dask DataFrame reads and writes parquet files:

Releases

Over the month of November, the following Dask packages were released:

  • Dask and Distributed versions 2021.11.0, 2021.11.1, and 2021.11.2 (Note: Version 2021.11.0 included a critical regression for UXC comms which was reverted for the 2021.11.1 release);
  • Dask-ML versions 2021.11.16 and 2021.11.30;
  • Dask Gateway version 0.9.0.

Dask Monthly Community Meeting 

Some highlights from the December Dask community meeting:

  • The RAPIDS team will help improve the dask-sql project and add integrations with cuML and XGBoost.
  • Coiled has a new “Community team” that will focus on supporting the Dask community. Look out for an official announcement soon!

Full meeting notes are available here.

You’re All Caught Up On Dask

That’s it. Thanks for reading.

If you’re interested in taking Coiled Cloud for a spin, which provides hosted Dask clusters, docker-less managed software, and one-click deployments, you can do so for free today when you click below.

Try Coiled Cloud


Ready to get started?

Create your first cluster in minutes.