Scaling Pandas with Dask with Matthew Powers
Pandas is a great technology for working with tabular data, but users frequently run into memory exceptions when dataset sizes grow. This talk explains how you can process bigger datasets with Dask. The talk will explain when it’s appropriate to scale up a computation (using more cores on a local machine) and when it’s better to scale out to a cluster. The talk also discusses how other types of computations can be scaled with Dask and provides some comparisons with Spark.
About the speaker: Matthew Powers is an Evangelist for Coiled, a managed Dask company. He worked in Spark for 6 years before transitioning to Dask and has a popular programming blog, mungingdata.com. He wrote two Spark books and maintains some popular open-source libraries.
Join us Wednesday, November 3rd at 6:35 pm US CDT time to dive into the wonderful world of scaling Pandas with Dask.