Blog

Scalable Python Deployments as a Service

James Bourbeau, Dask core contributor and maintainer who works at Coiled building tools for scalable computing, joins Hugo Bowne-Anderson to discuss and code about scalable data science deployments as a service and how he thinks about these things at Coiled.  Coiled Cloud is an opinionated deployment-as-a-service product/library for scaling Python data science and machine learning …

Scalable Python Deployments as a Service Read More »

Scalable Computing in Oceanography with Dask and xarray

Deepak Cherian is a physical oceanographer and project scientist at the National Center for Atmospheric Research. He recently joined us to discuss scalable computing in oceanography and how he leverages Dask, xarray (he’s a lead maintainer!), and terabyte-scale datasets to study the physics of oceans. In this post, we’ll summarize the key takeaways from the …

Scalable Computing in Oceanography with Dask and xarray Read More »

Interactive Image Processing at Scale with Napari

Nicholas Sofroniew, Imaging Tech Lead at Chan Zuckerberg Initiative, and Talley Lambert, Microscopist and Lecturer at Harvard Medical, recently joined us to chat about viewing and processing large datasets, with examples from the bio-imaging world. They’re experts in this area as developers of the napari package, which they showed us how to best use.

Large-Scale Machine Learning for Urban Planning

Brett Naul, founding engineer at Replica, joins Matt Rocklin and Hugo Bowne-Anderson to discuss large-scale machine learning and travel simulations for urban planning. Replica uses Dask to easily scale travel simulations to hundreds of millions of agents on Google Container Engine. The rich Python data science and statistical ecosystems make it easy to build new …

Large-Scale Machine Learning for Urban Planning Read More »

Coiled: Dask for Everyone, Everywhere

Data scientists increasingly solve large machine learning and data problems with Python.  But historically Python struggled with parallel computing.  This led many of us in the community to make Dask, a library for parallel computing and data science for Python. Dask has been a go-to solution for scalability in the Python data science stack for …

Coiled: Dask for Everyone, Everywhere Read More »

Zero Click Cloud Deployments

On this week’s Science Thursday, regulars Matt Rocklin and Hugo Bowne-Anderson are joined by guests Hamel Husain (Github), Chelle Gentemann (Farallon Institute), and Jeremiah Lowin (Prefect). Usually our guests show us their distributed data science work but this time we’re turning the tables: Matt and Hugo are going to show Dask and Coiled in action …

Zero Click Cloud Deployments Read More »

A JupyterLab setup with a Jupyter Notebook, Dask task stream, Dask Progress, and Dask Cluster Map.

Dask in the Cloud

When doing data science and/or machine learning, it is becoming increasingly common to need to scale up your analyses to larger datasets. When working in Python and the PyData ecosystem, Dask is a popular tool for doing so. There are many reasons for this, one being that Dask composes well with all of the PyData …

Dask in the Cloud Read More »

Imaging Earth’s subsurface with Python and Jupyter

Lindsey Heagy, a Postdoctoral researcher in the department of statistics at the University of California Berkeley, joins Matt Rocklin and Hugo Bowne-Anderson to discuss scientific computing in the geosciences with Python and Jupyter. Her research uses geophysical data to develop models of the subsurface for locating groundwater, characterizing mineral deposits, and environmental applications. Research often …

Imaging Earth’s subsurface with Python and Jupyter Read More »

A diagram of a multi-scheduler architecture in Kubernetes.

Dask in production: Multi-Scheduler architectures

I ran across an interesting problem yesterday: A company wanted to serve many Dask computations behind a web API endpoint. This is pretty common whenever people offer computation as a service or data as a service. Today the company uses the single-machine Dask scheduler inside of a web request, but they were curious about moving …

Dask in production: Multi-Scheduler architectures Read More »

Scalable Computing in Oceanography

Deepak Cherian, a physical oceanographer and project scientist at the National Center for Atmospheric Research, joins Matt Rocklin and Hugo Bowne-Anderson to discuss scalable computing in oceanography and how he leverages Dask, Xarray, and terabyte-scale datasets to study the physics of oceans. At the National Center for Atmospheric Research, Deepak Cherian studies the physics of …

Scalable Computing in Oceanography Read More »

A screenshot from Coiled's YouTube live stream with Rodrigo and Felipe Aramburu from BlazingSQL.

Accelerating Data Science with BlazingSQL and Dask

Rodrigo and Felipe Aramburu, the brothers BlazingSQL, recently joined us to discuss how they are empowering folks around the world to do GPU-accelerated data science in Python…with SQL! There’s an entire community of data analysts out there that need to become programmatically proficient for their long-term career trajectory. And for us, SQL is a great …

Accelerating Data Science with BlazingSQL and Dask Read More »

A four-quadrant graph with model size on the y-axis and data size on the x-axis.

Big Data vs. Big Model: Scaling Your ML Workflow

Tom Augspurger, Data Scientist at Anaconda and lead maintainer of Dask-ML, recently joined us to discuss how he likes to think about scalable machine learning in Python. As Tom shared with us on the live stream, “You have your machine learning workflow that works well for small problems. Then there are different types of scaling …

Big Data vs. Big Model: Scaling Your ML Workflow Read More »

Sign up for updates