PyData Global 2021: Top 5 Highlights for the Python, Data Science and Dask Lover
• November 1, 2021
The Coiled Team had a blast at last week’s PyData Global 2021. We put together a list of our favourite sessions and our main takeaways from the conference…for those of you without time machines who couldn’t attend multiple sessions in parallel 😉
666 Lines of Beas(t)ley Assembler Code
With his characteristic mix of genius and humour, David Beazley’s keynote wove a fascinating narrative about the history of Python and the values and strategies baked into its DNA. He shared some priceless anecdotes about accidentally programming a supercomputer nicknamed ‘The Beast’ with 666 lines of Assembler code and the magical things that can happen when your office-trailer gets struck by lightning. Watch the whole keynote here.
- Pythonistas ask for forgiveness, not permission.
- Pythonistas tend to operate on the periphery…only to blast onto the stage and steal the show when the audience least expects it.
- A community of tool-makers is always going to be more powerful than any single tool.
If you’re a PyData history buff like us, we recommend checking out Matthew Rocklin’s video on the history of Dask. For more on the current and future state of all things high-performance and distributed computing, check out this blog to access a candid conversation between Matt and Peter Wang from Anaconda on The State of Distributed Computing.
Illegal Pineapple Pizzas
Turning a highly technical topic into an engaging presentation is not an easy feat. We think Francesco Tisiot did a phenomenal job with his talk “Get to know Apache Kafka with Jupyter Notebooks”. Besides a clear explanation of how Apache Kafka works and how to apply it, we also learned some fascinating facts about pizza etiquette in Italy.
- Never (and we mean never, not even at 3am after a night out) order a Pizza Hawaii in Italy. They’re illegal. Sort of…
- Using Apache Kafka from Jupyter Notebooks feels smooth and familiar.
- It really pays off to invest time in crafting a presentation that has a clear, relatable storyline (preferably peppered with some humour) and helpful, visually appealing diagrams to communicate your message in a memorable way. We won’t be forgetting this one!
Boring is the new flashy
With all the flash and hype around Big Data, AI, and ML, it’s sometimes easy to forget that some of the more ‘boring’ down-to-earth stuff like data formats and compression algorithms is actually just as crucial to your data science success. While tuning your ML model’s hyperparameters is important, spending time optimizing the quality and format of your data input can also lead to massive performance gains.
- The age-old mantra “Garbage In, Garbage Out” still rings very true.
- Converting your CSV files into Parquet can dramatically increase performance
- The new blosc2 compressor gives you a lot more flexibility in how your data is compressed
It’s all about the visuals
There were a lot of sessions about (interactive) visualisation this year. We especially enjoyed Nicolas Kruchten’s (Plotly) high-level overview of why visualisation matters and the 4 different levels of interactivity. For a brutally honest, on-the-spot comparison of competing visualisation libraries in Python, we recommend taking an hour to watch the Python Dashboarding Shootout and Showdown.
The Dask side is strong with this one!
And of course, this list wouldn’t be complete without us mentioning the strong Dask presence at this first-ever PyData Global, both from Dask maintainers as well as other presenters from across the business and academic worlds.
- April Rathe from Arrowstreet Capital presented an impressive walkthrough of her company’s journey with “Dask: from POC to Production”. We loved how she got her whole team to switch to using Dask 🙂
- Coiled Lead OSS Engineer James Bourbeau presented the new dask-snowflake connector at the “Snowflake & Dask: How to Scale Workloads using Distributed Fetch Capabilities”.
- Brendan Collins’ talk on “Spatial Analytics using Dask and Numba” dove into the new Xarray-Spatial library to conduct raster-based geospatial analysis.
- Coiled Account Executive Gus Cavanaugh gave an introductory-level tutorial on “High Performance Computing with Numba, Dask and Rapids”.
- Check out the “Data Processing at Scale” workshop given by no less than 4 core Dask maintainers for a thorough introduction to Dask and a lively Q&A about the future of databases.
What were your top highlights from PyData Global 2021?
Thanks for reading! You can learn more about our product Coiled Cloud, which provides hosted Dask clusters, docker-less managed software, and one-click deployments, and give it a spin for free today, when you click below.