scientific python libraries

Pavithra Eswaramoorthy December 29, 2020

Python, Data Science, and Machine Learning in 2020

, , , ,


Python is the language of choice for science and research today, in both industry and academia. A major reason for its popularity is the plethora of supporting tools available for various specific use cases, including NumPy, SciPy, pandas, Matplotlib, scikit-learn, Jupyter Project, and of course, Dask. The Scientific Python community consists of people around the world who develop, maintain, and use these open source libraries for scientific research. 

This year has been challenging for everyone, and the Scientific Python community is no exception. Even amidst all the chaos in 2020, we managed to bag some wins, find moments of joy, and help the global community. Here are some highlights: 

  • PyData projects helped fight against COVID-19
  • PyCon 2020, JupyterCon 2020, and PyData Global were hosted online successfully
  • Some new initiatives were started – 2i2c, Jupyter Book, and Coiled!
  • Open Source projects took a step in the direction of inclusivity 
  • PyData libraries were used in the US Presidential Election voter prediction
  • Nobel prize in Astrophysics!

These highlights are by no means exhaustive so, if you have more things to share, send us a tweet @coiledHQ!

Sunsetting Python 2

On 1st January 2020, we officially bid farewell to Python 2. Projects using Python 2 had started transitioning to Python 3 a few years back, but we saw more projects drop support for Python 2 this year. 

PyData projects help combat COVID-19

Around March 2020, the COVID-19 pandemic turned our lives topsy-turvy. The world witnessed country-wide lockdowns, the strictest safety protocols, and an economic crisis. Biomedical researchers were at the forefront helping us understand the novel virus and develop vaccines. Government institutions and economists were helping us navigate the financial emergency. Multiple PyData projects helped support them in this fight against COVID-19, including conda-forge, matplotlib, Galaxy, Julia, ObsPy, and Econ-Ark.

As mentioned in the NumFOCUS blog:

“conda-forge offers 2500 scientific tools to the research community and provides access to huge computing resources around the globe, via a web-interface and accessibility for researchers worldwide.”

“Galaxy is just one of the projects that heavily relies on conda-forge. Some of Galaxy and conda-forge’s applications included SARS-CoV-2 Dedicated Training Material and Webinars.”

Virtual Conferences

Conferences and meetups bring the tech community together and foster collaboration. This year, we saw some conferences getting cancelled or postponed, but many conferences like PyCon 2020, JupyterCon 2020, and PyData Global were moved to virtual spaces. 

Online conferences were accessible to more people this year as we saw a diverse set of speakers and participants. We also learnt about the challenges around time zones and different languages, and we saw the community adapt to these changes rapidly. Even though we couldn’t meet in-person, we did our best to continue collaborating and sharing!

New Initiatives and Project Improvements

Many exciting initiatives and improvements were announced this year!

Movement for Racial Justice

In May 2020, a movement for racial justice and equality started in the United States and spread across the globe. We saw large scale protests on the roads and important conversations on social media. The tech community also joined this movement. 

Projects using Git started renaming their default branches, and GitHub joined this effort by replacing the default branch for new projects from ‘master’ to ‘main’. We saw discussions around racial biases in Artificial Intelligence and Machine Learning algorithms. We learnt from each other, and tried making the scientific python community a more inclusive and welcome space.

US Presidential Election

The US presidential election was a big part of October and November 2020. Joe Biden was running against Donald Trump, and common consensus was that the stakes were high this time. Journalists and Data Scientists from around the world were trying to predict voter turnout and project the election results. They used many Python Open Science tools to analyze the voter data including pandas, Jupyter Notebooks, and Dask.

The analysis was especially challenging this year due to the COVID-19 pandemic. Andrew Therriault, a Data Scientist worked on this challenge for Bloomberg News, read about their process in Predicting Voter Turnout in the 2020 Presidential Election with Coiled and Dask.

Nobel Prize in Physics 2020

In October 2020, the Nobel Prize in Physics was awarded to Roger Penrose, Reinhard Genzel, and Andrea Ghez for the discovery of black holes, one of the most fascinating phenomena in the universe.

Andrea Ghez and Reinhard Genzel proved the existence of a supermassive black hole at the centre of our Milky Way Galaxy. They did this by mapping the orbits of the brightest stars that are closest to the centre of the Milky Way. Their research is adjacent to the first image of the black hole, captured last year by the Event Horizon Telescope collaboration. These research initiatives involved large-scale data analysis and image processing, done using PyData tools and libraries. 

Happy New Year!

As this year draws to a close, we’d like to extend a huge thanks to all the Python and Open Source communities. Their work is infrastructural, and oftentimes we don’t realize how much we depend on these tools for our daily work. To show your support, you can donate to NumFOCUS and help them reach their year-end fundraising goal.

The Coiled Team wishes everyone a fantastic new year, see you in 2021!