Should Dask focus on Deep Learning?
• July 20, 2021
Deep learning is a fascinating and powerful computational capability that has captured our collective imagination. It benefits from tremendous amounts of data and complex workflows, just the kind of situation where Dask excels.
As a result, Dask developers frequently ask ourselves if we should invest more energy into integrating with deep learning technologies. Every time this comes up, we decide to pass, mainly for two reasons:
- What we see on the ground is that most people today are struggling with earlier stages in the data processing lifecycle and workflows that are different than deep learning.
- It is unclear what patterns in deep learning workflows we should focus on, beyond what we already support, like hyper-parameter optimization, pre-processing, bulk hand-offs to other frameworks, and custom development.
This is a frequently asked question, so I thought I would address these two points below and then ask for feedback at the end.
Dask users don’t ask for deep learning today
Dask has always been user and community-driven. We’re not very creative, and we mostly do what people ask us to do. Dask users ask us to focus on other things. Here are some partial results from the ongoing Dask Survey where respondents answer the question: “What common feature requests do you care about most?”
If we zoom in a bit, we can compare the results to “Better Numpy/Pandas support” where the most common answer is “Critical to me” (orange) against the results to “Integrate with Deep Learning Frameworks” where the most common answer is “Not relevant to me” (blue).
However, this survey is biased towards existing Dask users and data professionals. It would be great to get a broader base of people to respond. If you have networks where you can distribute the survey link, https://dask.org/survey, that would be welcome.
Common patterns in Deep Learning
To be clear, people use Dask to scale out deep learning all the time. Often they use Dask’s lower-level APIs like Dask Futures, or they use the capabilities of Dask-ML like hyper-parameter optimization (HPO) or other Dask-powered libraries like Optuna.
These all enable machine learning practitioners to scale out different parts of the training process. We’ve seen a widespread need for these sorts of features in the past, and so we’ve prioritized them. However, these are all only part of a solution.
If you ask me today, “How do I do deep learning with Dask?” I’ll say, “There are many approaches. It really depends on the problem that you’re trying to solve”. Deep learning today is still a deeply creative field, and building canned all-in-one solutions in a creative field is like giving a chef an MRE or fast food meal. Yes, it solves a problem, but not the problem they were looking to solve.
Dask + Horovod?
The one thing that we could do, but haven’t invested time in, is to integrate with large-scale training systems like Horovod that train a single very large model on many GPUs. Our experience is that this is relatively rare and that the groups that do this kind of work don’t need or want all-in-one tooling.
We would be happy to be wrong though, and the current XGBoost integration (probably the closest technical comparison we have to this) has been tremendously successful.
Feedback here is welcome.
We’d love to play with deep learning problems, but every time we investigate this, we come back with the answer of either “there are more important things to focus on” or “this field isn’t sufficiently predictable to automate”.
We are often wrong though, and welcome feedback. We’ve been reading through the Dask survey a lot recently. If you want to influence our development by putting a response, there would be welcome, even if you don’t currently use Dask today.
Thanks for reading. If you’re interested in trying out Coiled Cloud, which provides hosted Dask clusters, docker-less managed software, and one-click deployments, you can do so for free today when you click below.