Coiled Runtime: Getting the Most out of Dask
• June 24, 2022
If you’ve been working within the Python ecosystem for a while, you may already be familiar with the power of Dask. Dask is a Python library that makes it easy to scale native Python scripts to run on multiple CPU cores in just a few seconds and with only a few, additional lines of code. This is great if you’re running code locally on your laptop as it allows you to take advantage of all the parallel processing your machine has to offer without having to get into the weeds of multicore programming.
However, things get tricky when you want to scale your code to run across multiple distributed machines. Doing so introduces a whole host of new challenges, from provisioning cloud resources, to handling instance failures, to coordinating data synchronization across machines, and to securing your cloud environment.
This is where Coiled and Coiled Runtime come in. Coiled brings the easy-to-use, intuitive features of Dask to large-scale clusters and cloud runtimes. It does so by providing a simple, native Python API via which you can easily scale to the cloud while letting Coiled’s backend handle the provisioning and managing of cloud resources.
Accessing Coiled Runtime
The first step towards integrating Coiled into your Python projects is accessing the coiled-runtime. Coiled runtime is a conda metapackage which is easily installable and configurable on top of Dask. It consists of a number of common data science frameworks and tools that are often used by engineers and data scientists working with Dask. What differentiates coiled-runtime is that the distribution of packages it provides has been meticulously tested to work seamlessly together and scale stably in the cloud.
conda install -c conda-forge coiled-runtime
If you prefer to do this while simultaneously creating a dedicated conda environment (which we recommend), run instead
conda create -n <env_name> python=<python_version> coiled-runtime
Of course, you’ll need to replace <env_name> and <python_version> with your desired environment name and python version, respectively.
The full recipe of packages that are included in the coiled-runtime can be found here. As you can see in the file, it includes common data science tools and libraries such as NumPy, scikit-learn, JupyterLab, pandas, and, of course, Dask.
Spinning up a Simple Cluster
You’ve likely installed coiled-runtime because you have an application that you want to scale to the cloud. Initializing a cluster is super easy. You can simply enter ipython and run the following two commands
>>> from coiled import Cluster
>>> cluster = Cluster(n_workers=10)
Where n_workers specifies the number of worker machines you want to recruit to run the program.
By default, your cluster will use the coiled/default-<python-version> software environment where <python-version> is your local python version. This environment includes coiled-runtime as a dependency, and hence, ensures your local machine and cluster environments are always compatible.
You should see a readout launch that looks much like the one below
From this dashboard, you’ll be able to monitor the progress of Coiled as it launches a cluster and begins recruiting AWS machines. The progress monitoring is pretty granular, displaying as discrete steps the processes of provisioning, booting instances, and launching software environments. You also get a readout specifying how many instances are in any of the “Ready”, “Stopping”, or “Stopped” states at any given time. The dashboard also displays details about worker instance configurations, such as the AWS region, scheduler instance type, and worker instance types.
You can also check our cluster state anytime on your dashboard at cloud.coiled.io.
Getting Dask up and running in the cloud with coiled-runtime is as simple as that! For information on some of the more complex things that you can accomplish with Coiled, check out the docs. Remember that the most important part of working with Coiled is actually to spend the bulk of your time building your Python application. Coiled is designed to get out of your way as much as possible so that deploying your Python applications to the cloud is painless.