How to Run a Stack Overflow Sprint

Pavithra Eswaramoorthy November 1, 2021

, , ,


The Coiled team held an internal Stack Overflow Sprint earlier this month. It was a week-long sprint during which we answered Dask-related questions on Stack Overflow and observed patterns in these questions to guide the production of community resources. After answering over fifty questions and engaging with over a hundred more, we’d like to share some lessons we learned. These lessons are applicable to any open source project aiming to understand its users’ pain points and explore themes to guide future documentation and content creation.

Key Takeaways:

  • We were able to quantify the usage of different Dask APIs — DataFrames are way more popular than we thought;
  • We realized that many users struggle with the basics — distributed computing is hard!;
  • We noticed a lot of questions where Dask was being used sub-optimally — there is need for more best-practices to help guide users in a better direction.

We’ll also discuss how we organized this sprint in a remote-first company and our ideas for future sprints. Would you be interested in participating in the next sprint? Let us know on Twitter or at info@coiled.io!

Background and Sprint Goals

The Stack Overflow community has built a robust system for technologists to get accurate and quick answers to their questions. Many Dask users also share their problems there. In fact, there are currently over 3.5k questions with the ‘Dask’ tag.

Coiled has a lot of Dask expertise, and our team interacts with user questions on other platforms like the Dask issue tracker, Slack workspace, and the Gitter chat. So, we figured we can help users on Stack Overflow too. 🙂

Stack Overflow Trends for the ‘Dask’ tag

The sprint had two primary goals:

  1. Help the community by addressing Dask specific questions, and
  2. Understand the common issues that the community has with Dask, to guide production of resources such as blogs, tutorials, and how-to guides.

Since this was Coiled’s first-ever Stack Overflow sprint, a secondary goal was also to test this format and gather lessons to help plan future events.

Lessons from Planning and Executing a Stack Overflow Sprint

Our team spans over 5 different time zones from India to the Pacific Coast. Organizing a sprint in a fully remote company posed a challenge…but also a fun opportunity! Here are some lessons we learned from weeks of planning and the impromptu adjustments during the sprint.

Start planning early

We found that defining clear goals and day-to-day tasks was instrumental in getting everyone across the globe aligned on the expectations. The Data Science Evangelists at Coiled led this initiative and worked closely with the Open Source Dask Engineers.

Our pre-sprint plan included rough estimates for the number of questions to answer and triage (eg: identify themes, gauge complexity, etc.) each day. We decided to start with the “most upvoted questions” thinking that these would be the questions with the most ‘watchers’ and, therefore, most helpful to our users (but, we quickly changed directions, keep reading!). Our planned workflow was for the Evangelists to triage and answer questions first and then work through any tricky questions with the Dask Engineers.

Adapt Quickly

Planning is valuable, however, there’s nothing like actual ‘on-the-ground experience to inform what you should be paying attention to, and what is (or isn’t!) working.

As an example, we quickly realized that many of the “most upvoted questions” lacked a minimal, reproducible example or were nuanced questions requiring deep expertise. Moreover, many of these were actually ‘stale’ questions that had been unanswered for years. Hence, after Day 1, we changed course to the newest questions. Even though many questions still lacked a verifiable example, this gave us an opportunity to engage with the community on a current issue and work towards getting such an example.

Minimize context switching

Continuous context switching can not only reduce overall productivity but also have an impact on motivation. The context switching between different topics started to weigh on us, so Day 3 onwards, we decided to answer questions:

  • that were similar to the questions we just answered, i.e., search for questions on similar topics and help solve them;
  • that revolved around the topics that we’re already fairly comfortable with.

Leverage time zones differences

Coordinating across time zones can be tricky, but we can also use it to our advantage by splitting the effort across groups from the beginning.

To make the most of the different time zones, our team divided the sprint into four “sessions”:

  • Session 1: Asia/Europe group of Evangelists answer and triage questions
  • Session 2: All Evangelists sync, discuss progress, and update the process/plan if required
  • Session 3: All evangelists sync with the Dask Engineers and work on the selected list of questions
  • Session 4: US group of Evangelists continue answering and triaging questions

Retrospectives are Important

On the final day, we reflected on the sprint and created an action plan on how we can do better next time. This was incredibly helpful to not only understand our own workflow better but also communicate the results and learning to the broader team and the community. This very blog post is an outcome of the retrospective!

Prominent Dask Themes on Stack Overflow

A major takeaway for us was the themes and patterns that we identified. More than half of all Dask questions on Stack Overflow involved Dask DataFrame. This was not a surprise because Dask DataFrame provides a gateway into Dask for many pandas-using data scientists. It was still nice to see this trend backed up by concrete data. Users had questions about DataFrame operations like groupby and set_index, and wanted more clarity around reading and writing different types of data.

Dask Array was the next most popular Dask collection, and users were interested in how it differs from the NumPy API. This was followed by questions around Dask’s distributed scheduler, memory usage, and diagnostic tools. Interestingly, xarray stood out among the projects that use Dask internally.

We also noticed that many users could benefit from the pointers in the Dask best practices guide, and we recommend checking it out!

What’s next?

For the next iteration, we’re considering an open event that involves the entire Dask community. We had a lot of fun during the sprint, and we’d love to share it with more people. We’re also very excited to create more resources based on the themes mentioned earlier!

Overall, this was an enjoyable and rewarding experience for us, and we can’t wait for the next one!

Thanks for reading! If the type of work described above interests you, or you want to learn more about opportunities at Coiled generally, check out our Careers page by clicking below.

Learn More


Ready to get started?

Create your first cluster in minutes.