Dask Heartbeat by Coiled
• November 17, 2020
The Dask community is highly distributed with different teams working independently. This is powerful but sometimes makes it hard for people within the community to see everything that is going on. The Dask Heartbeat by Coiled is intended to centralize and broadcast Dask news over the previous week.
With Travis CI ending unlimited free accounts for OSS projects, many of the Dask projects are switching to Github Actions. Thank you to Jacob Tomlinson (NVIDIA) for leading this and for others like Thomas Fan and James Bourbeau helping out. See this issue for progress on migrating Dask projects to GitHub Actions.
Dask-Cloudprovider gets GCP, Digital Ocean support
This work was recently done by Jacob Tomlinson (NVIDIA) and Ben Zaitlen (NVIDIA) and provides an easy way to deploy Dask on those platforms without setting up Kubernetes (or anything really).
PyData Global 2020 Talks
- Skinny Pandas Riding on a Rocket, comparing Dask, Vaex, SQLite by Ian Ozsvald
- Deploying Dask with Coiled by Matthew Rocklin
- Scaling up your Data work with Dask by Hugo Bowne-Anderson and James Bourbeau
GPU Accelerated Deconvolution Blogpost
New dask.annotate Function
Thanks to Simon Perkins (South African Radio Astronomy) there is now a new dask.annotate context manager
with dask.annotate(priority=1): df = dd.read_parquet(...)
This isn’t yet plugged into the distributed scheduler, but this is a great first step to making annotations like priorities, worker restrictions, resources restrictions, retries and other attributes much easier to specify on Dask collections.
High Level Graph Rewrite
There is a long-running effort from engineers at NVIDIA, Coiled, and Capital One to move High Level Graphs directly to the scheduler. This currently somewhat works with the development version of Dask for DataFrames, and results in relatively fast submission of graphs, and reduced graph communication time. We’ve moved on to benchmarking and profiling.
SVD Performance and Precision Improvements
Roger Moens at Delft University, who has been going over the SVD and approximate SVD algorithms, has noted several performance and correctness improvements, and has started work here:
Thanks also to Eric Czech (Related) for providing careful review.
Behind the Code of Dask and pandas: Q&A with Tom Augspurger
Anaconda ran an interview with Dask core maintainer Tom Augspurger. You can read more here.
clEsperanto Adopts Dask
The image processing library clEsperanto has added introductory support for Dask arrays:
Dask on ARM on K8s Blogpost
Holden Karau recently published a blog post on building a Dask cluster on a cluster of Raspberry Pis running Kubernetes: https://scalingpythonml.com/2020/11/03/a-first-look-at-dask-on-arm-on-k8s.html
Dask-Gateway 0.9.0 Release
Dask Gateway version 0.9.0 was released by Jim Crist-Harif (Prefect). This release unifies the use of normal dask-worker/dask-scheduler executables, allowing for greater composability, especially with projects like Dask-CUDA. It also increases the set of Helm configurations, along with the standard set of bugfixes.
Dask in HPC Workshop Announcement (EU timezones)
There is a proposed “Dask in HPC” workshop announcement here: https://github.com/dask/community/issues/110
This follows from a similar event organized last year in Turin. This is organized by David Swenson (ENS Lyon).
RAPIDS + Prefect + Dask Blogpost
Ayush Dattagupta from RAPIDS pushed out a blog post showing using these three tools together here: https://medium.com/rapids-ai/scheduling-optimizing-rapids-workflows-with-dask-and-prefect-6fc26d011bf
Finally, a Homage to the US election
Finally, on a light-hearted topic, Oriana Chegwidden (CarbonPlan) made this lovely image for those that were anxiously watching the election results:
That’s it. Thanks for reading all.
If you’re interested in taking Coiled Cloud for a spin, you can do so for free today when you click below.