While running workloads to test Dask reliability, we noticed that some workers were freezing or dying when the OS stepped in and started killing processes when the system ran out of memory.
Having Dask workers die from memory overuse is common, so we thought that we’d investigate this further.
Dask manages memory, so it needs to know how much memory it has available.
Setting the memory limit correctly is important to the performance of your Dask cluster. This memory limit affects when Dask spills data in memory to disk, and when it restarts workers that are out of memory.
When the memory limit is set too low, Dask won’t use as much memory as it could. That’s not the worst thing, but it means your work will potentially take longer than it needs to.
When the memory limit is set too high, Dask can fail to spill to disk when needed and fail to restart workers when they’re out of memory. This is bad because if a worker isn’t killed when it’s out of memory, then it can freeze and potentially your cluster will be stuck waiting for the work from this worker. If the worker is restarted, then Dask will retry the work.
Dask attempts to determine the correct memory limit to use. It does this by looking at the total system memory (including virtual memory) and the cgroup memory limit if set. (Dask also looks at RSS rlimit if set). You can also explicitly set the memory limit. Dask then uses the minimum of these values.
The upshot is that in many deployments—including how we had been deploying Dask with Coiled—the memory limit will be set to the total system memory. That’s not ideal because that limit is too high: we’re also running an OS and logging agents.
To address this, Coiled now sets the cgroup memory limit on the containers running dask. We set this to the available memory on the system before Dask starts running.
Here’s how we’ve implemented this.
First, our script that starts Dask on each VM now sets an environment variable with the available memory:
Then, we’ve modified the Docker compose file we use to start Dask (we run Dask in containers, and configure the containers using Docker compose). We’ve added a line to explicitly set mem_limit for the container to ${AVAILABLE_MEMORY_FOR_DASK:-0}.
We could have just used $AVAILABLE_MEMORY_FOR_DASK, but instead we’re setting a default value of 0 (i.e., no limit) just in case the environment variable isn’t set. Mostly that’s unnecessary, but it makes deployment safer because the script that sets the environment variable is part of our custom AMI while the Docker compose file is created for each cluster by our control-plane.
With a Dask client connected to your Coiled (or other Dask) cluster, you can check the value of the memory limit Dask is using by running:
The upshot is that since we’ve made this change, Dask running on Coiled now does a better job spilling to disk and restarting workers when appropriate, and we’re seeing increased reliability from our testing.