Dask Contributor Spotlight: Ian Rose
• February 16, 2022
Dask is built and maintained by hundreds of people collaborating from around the world. In this series, we talk to some of these Dask contributors, discuss their journey into open source development, and hear their thoughts on the PyData ecosystem of tools. In this edition, we’re excited to chat with our very own Ian Rose, Software Engineer at Coiled, contributor to the Dask project, and creator of the Dask-JupyterLab Extension.
Ian has been a part of the PyData community for a long time. He has made significant contributions to various libraries, including the Jupyter and Pangeo projects, which have become foundational for data science and geospatial applications in Python.
Reflecting on his career so far, Ian says:
“My academic background is in geophysics, though I don’t do much of that these days. After graduate school, I spent some time working with Fernando Pérez on JupyterLab and other Jupyter-related things. I also did a stint as a data scientist in local government, where I worked on projects around transportation, land-use, and public health.”
Currently, at Coiled, Ian works to improve the Dask project and leads initiatives to support the Dask community. As his colleagues, I can also say he is an awesome person to work with. 🙂
Keep reading to learn more about Ian’s open source journey!
How did you get started with programming?
I got started in programming in middle school, messing around with Macs. At the time, I mostly used C/C++ and tried to make clones of Space Invaders.
Why is open source important to you?
Open source is important to me for a few reasons. I think it’s important for people without access to institutional bank accounts to be able to install and run high-quality software without having to ask for permission or fork over large amounts of money. Open source allows for students, hobbyists, people without discretionary income, and rabble rousers all to be able to participate in tech and the way tech influences society.
Just as importantly, open source allows people to inspect and verify important technical products: open science, open government, and data journalism are all built on open source software. It has become a critical tool for transparency and accountability in lots of domains.
What open source projects do you contribute to?
How did you first get introduced to Dask?
I first got introduced to Dask by seeing some of the early work outputs from the Pangeo project.
How did you start contributing to Dask?
Also Pangeo! At the time, I was spending most of my days working on JupyterLab. I attended a Pangeo workshop in Boulder, Colorado with some other Jupyter folks, got to know Matthew Rocklin, and started working on what would be the Dask JupyterLab extension.
What part of Dask do you mainly contribute to?
Why does being a contributor excite you?
It’s fun building things and seeing them be used in the world. I especially like it when they are used to make the world a better place!
Besides developing Dask, do you use it too? If so, how does one affect the other?
Between all the things I’m working on, I don’t get to experience it as a user as much as I’d like! So it’s useful to see lots of questions from users, because it allows me to view things from a fresher perspective.
What is your favorite part of Dask?
My favorite part of Dask is fsspec. It was so useful that it was extracted into its own standalone project, and I reach for it all the time, even when not doing distributed computing. I love having a uniform file-like interface to all sorts of different storage backends!
What are some things that you want to see improved in Dask?
Dask is a very flexible project, and it integrates with some extremely flexible data APIs (i.e., NumPy and pandas). The product of those two things makes for a huge API surface area, much of which is difficult for new users to grasp, and they often run into problems that are difficult for them to solve. I want the project to be easier for people to learn, and when things go wrong, easier to troubleshoot.
What do you see in the future of Dask? And, scalable computing in general?
I think the future of Dask is going to be focused on two things:
- easier deployment, and
- less need for users to understand the specifics of chunking and parallel algorithms.
If we can crack both of those, it will really make scalable computing accessible to everyone, and unlock things for a ton of new people. I’m most excited to see what smaller operations without a lot of IT support can do with medium sized data.
Thank you, Ian, for all your contributions to the Python open source ecosystem, and especially, to Dask. We’re so grateful to have you be a part of our community!