Last week I finished my position at NVIDIA. Next week I form a company. This company will work to scale Python’s open source data science ecosystem, primarily using Dask.
This post mostly talks about the history of funding Dask, and then goes into the motivation for creating a company. I will follow up with plans for the new company in a future post.
Dask is an open source Python library that helps to parallelize other Python libraries to work on large datasets and on large clusters of distributed hardware.
As community driven open source projects go, Dask has been remarkably well funded. We’ve always had a few people (2-10) paid to work on Dask about half-time over the past five years.
While at Anaconda we pulled the money to support this from a diverse set of sources:
US Research grants from NSF and NASA, with collaborations like Pangeo
(tip: this was the highest bang-for-buck out of all these options)
This money went to fund developers. These developers focused full time on adding features, fixing bugs, engaging in community ecosystem design, and answering user questions. This full-time funded development was critical to the success of Dask.
Then, about a year ago, we found a new source of funding:
Rather than try to change the culture of a large institution we went along with it, and I personally changed my employer to NVIDIA to build out this team.
“Same job, new employer” I often said, and in many cases this was true. My job was to maintain Dask, and also to make sure that the GPU work that the 50-person RAPIDS team was doing would interact well with Dask and with the surrounding Python data science community. I’m really proud of this work, and I’m honored to be a part of such a large effort.
We also hired six additional people to work on Dask specifically, along with a mix of GPU-Python ecosystem things. Their work included the obvious Dask + GPU work:
As well as many community ecosystem improvements:
Today NVIDIA generally operates as a good partner to the ecosystem. Yes, they’re motivated to sell you GPUs, but along the way they’re cleaning up a bunch of stuff, which is great. It’s like when someone new moves into the neighborhood and starts picking up trash and planting flowers.
The Dask team within RAPIDS within NVIDIA is going to continue working on these same problems and more. The first person we hired onto this team was my good friend and long-time colleague Ben Zaitlen, who has been managing this team for the last couple of months (honestly he does a better job at this than I ever did). Ben has been in the Python community for a long time and I am thrilled knowing that he is leading this team into the future. More on my experience managing within NVIDIA in a future post.
Dask’s development is distinct in that it has both …
Most OSS projects that are built and maintained by for-profit entities feel different from those that are built and maintained by a volunteer workforce (like Numpy, Scipy, Matplotlib, Pandas, Scikit-Learn, and Jupyter for example).
In general, I’ve gotten the impression that the community trusts Dask in the same way that it trusts other PyData projects, despite Dask’s history of funded development. I am personally grateful for this trust. I believe that this is for many reasons:
As an organizer for the Dask project I’ve been on the lookout for how to increase our maintainer pool and how to secure the kind of long-term baseline funding a project needs in order to employ maintainers long term. When I find those sources, I go after them. This is what motivated my move to NVIDIA a year ago and this is what motivates my move now.
Today I see large and well funded companies and institutions asking for help deploying Dask, and making it work well within their institution. They’re used to paying for these things (indeed, they often ask me for someone that they can pay for these things) so I figure it’s best to make an organization that they can pay.
There is a lot of demand for scalable Python with Dask today. I think that if we channel this demand carefully that it can have a strong positive impact on Dask, on Python, on open source software, and more importantly on the social and environmental problems that we all want to solve with that software.
(Looking for more? I’ll describe my plans in more depth in a follow up blogpost)