How to turn Coiled into a Docker Image Pipeline

Richard Pelgrim August 23, 2021

, , , ,


tl;dr

You can hook Coiled up to your own Docker registry (like DockerHub) to create a pipeline that converts your conda or pip environments into Docker images that can be used elsewhere. This post shows you how to do that using the coiled.create_software_environment() command and by setting the container registry backend in the “Account” settings in your Coiled cloud dashboard. 

You can also check out the video below for a step-by-step tutorial on this process.

* * *

Why you might need Docker images

Many data scientists need to convert conda and/or pip environments into Docker images, for collaboration across teams or to move local working environments to run on the cloud. Most people achieve this manually using workflows that can get increasingly elaborate (for example, have a look at this Medium post.) While this technically works, it’s not always accessible to everyone. As a novice Data Scientist, I know I found the experience complex and difficult to master at first.

How Coiled can make this easier for you 

Coiled allows you to spin up on-demand Dask clusters in the cloud without having to worry about any of the DevOps like setting up nodes, security, scaling or even shutting the cluster down. As a cloud-based service designed for data scientists, Coiled needed to be able to convert conda and pip environments into Docker images to run correctly. The coiled.create_software_environment() command converts a environment.yml (conda) or a requirements.txt (pip) file into a Docker image, which is then distributed to all Dask workers to ensure consistent dependencies across your Coiled cluster. Below are two code snippets: one for conda and one for pip:

import coiled
coiled.create_software_environment(
	name=’my-env’,
	conda=’<path/to/environment.yml>’
)
import coiled
coiled.create_software_environment(
	name=’my-env’,
	pip=’<path/to/requirements.txt>’
)

In the spirit of community and open-source development, we’ve not tucked this functionality away but have made it a general-purpose tool. This means you can hook up Coiled to your own Docker registry (like DockerHub) to create a conda/pip-to-Docker build service. You can then use the Docker images created using Coiled anywhere you like.

How to Connect Coiled to your Docker Container Registry

By default, Coiled stores the software environments you create in the container registry of the cloud service you are running on: either AWS, GCP, or Azure.

You can change this setting within the “Account” tab of your Coiled Cloud dashboard to have your software environments saved as Docker images to your Docker Hub.

Any registry that supports the Docker Registry API V2 should work. Read more details in the “Backends” page of our docs.

Please note: using registries other than Docker Hub is an experimental feature under active development. Please reach out if you would like to discuss your use case at support@coiled.io or via our Coiled Community Slack channel.

And I’ll sneak in a bonus hint here: creating software environments does not actually require you to spin up a Cluster. This means you can use your conda/pip-to-Docker pipeline without burning any of your Coiled credits. 

Don’t tell Sales I told you!

A Personal Note:  The Value of Inventing Nothing

I work at Coiled so I am, of course, a little biased, but I think this functionality is a great example of general-purpose software being opened up to be applicable outside of its original intent. It follows the counterintuitive Principle of Minimum Creativity that Matt Rocklin has often proclaimed about Dask: “Invent nothing.” 

The lesson of this mantra is simple: instead of trying to reinvent the wheel, aim to create products that integrate with the tools that people are already using, thereby adding an extra brick to the rich ecosystem that already exists. This TalkPythonToMe episode with Matt Rocklin and Hugo Bowne-Anderson from Coiled has a lot more on this.

* * *

Thanks for reading. If you’re interested in trying out Coiled Cloud, which provides hosted Dask clusters, docker-less managed software, and one-click deployments, you can do so for free today when you click below.

Try Coiled Cloud