Coiled deploys Dask in the cloud. Coiled also charges money. This raises the following question and our (admittedly biased) answer:
> Q: Could I just build this thing myself?
> A: Yes, you could. But that’ll cost more than buying it.
Let’s explore why. We highlight dimensions of cost to consider, and then walk through two example situations.
There are three ways we can think about build vs buy in this space:
Any one of these objectives can justify Coiled on its own. Different situations highlight different objectives. We’ll make this concrete in the two scenarios below, comparing the cost of building vs the cost of buying:
You want a solution quickly, you can’t pay for much, and you want to avoid fuss.
You use Dask on your laptop and want to scale to the cloud. You consider two options:
Computationally you use around 10,000 CPU hours a month
You want a tool that “just works” quickly, letting you focus on your primary task without fussing with new tools. You don’t want to buy anything complicated.
Dask-Cloudprovider runs in your account, so you'll need to set this up yourself.
Fortunately, AWS and GCP make this easy.
Dask Cloudprovider is easy to install
It leverages local credentials on your laptop like a .aws/config file
One CPU-core-hour on AWS costs roughly $0.05.
This varies if you use high memory nodes, GPUs, or Spot, but those require enough engineering that most folks avoid them to start.
You're going to waste cloud resources. Here are some common issues to be aware of:
Random shit just comes up. Issues can be obvious, like modifying your docker images constantly or issues can be one-off, like needing to update some Cloud API or spelunk down billing views. You'll need to directly manage your cloud.
Two days a month is probably conservative.
Coiled runs in your account, so you'll need to set this up yourself.
Fortunately, AWS and GCP make this easy.
Coiled is easy to install
It leverages your local credentials on your laptop, or you can deploy it from a cloud console
Using Coiled you're able to reduce your costs in two ways:
You improve your Python/Dask code so that you do less work. This is facilitated by Coiled's observability features, idle shutdown, and more.
You use the cloud more efficiently, leveraging features like Spot, ARM, efficient networking, and more.
At 10,000 CPU-hours per month or below Coiled doesn't cost anything. If you start using more than this then you'll start getting charged, but most individual users don't use enough to interest us commercially.
We'll start asking for money when we're managing $1000 of cloud spend for you or more
You'll still want to track cloud infrastructure, but mostly just to inspect detailed costs and update information. Two hours a month is plenty.
Learn more
You want to empower your team to operate at scale so that everyone can operate like your top performers. You’re mildly cost sensitive, but mostly care about team velocity and getting shit done.
Your top performers use Dask effectively today with good success and you’d like to roll this out to the group. You consider two options:
You have a team of five data professionals. They’re solid performers and one lead engineer is capable of setting up cloud infrastructure. Computationally, you use around 100,000 CPU hours a month.
You want a tool to empower your team so they can migrate off of single large instances and look at the entire dataset, rather than subsets. You also want something you understand and can control.
We break down costs into setup and monthly costs, assuming $100/hr for total cost of an engineer (including overhead). Details available if you hover.
You will want to set up Kubernetes (GKE, EKS) on your cloud, then helm install dask, and do basic configuration. This is easy.
Additionally, you'll set up something to manage software environments, like conda-store, and probably tweak with Kubernetes settings a bit. In particular we recommend having at least a few different node sizes (S/M/L) and different node pools to match, as well as verify your autoscalers can rapidly scale up and down (most cloud defaults are bad for bursty computational workloads).
You'll run into other issues as well, but this is a good starting point. Getting something workable takes one individual at least a couple of sprints in our experience, often longer.
Maintaining a Dask Gateway + Kubernetes deployment for a team is at least a quarter FTE from one of your better FTEs.
You'll be updating packages, cleaning up errant pods, updating Dask Gateway itself, adding users, and responding to mandatory cloud updates.
If you already have a Kubernetes team on-staff then some of this will be amortized with that group.
One CPU-core-hour on AWS costs roughly $0.05.
This varies if you use high memory nodes, GPUs, or Spot, but those require enough engineering that most folks avoid them to start.
You're probably not yet at the point where it makes sense to invest in Spot, ARM, and other cost savings measures.
You're going to waste cloud resources. Here are some common issues to be aware of:
Your team of five spends at least half a day a week tracking down and debugging dumb things, probably much more. That's ok, it's part of their job, but it could be avoided with tools and experts to help.
Setup and maintenance costs are lower. You find efficiencies in your workflows and how to use the cloud. Coiled charges you roughly 100% upcharge at this low usage.
In principle setting up Coiled is trivial, just coiled setup.
You may want to do some additional work though, like connecting Coiled to your container registry, setting up custom tagging for billing purposes, and clearing the security model with IT.
Maintenance is cheap. Coiled doesn't require any cloud resources when idling. You will likely cross-reference our billing with your clouds though, and probably assess your team's performance to identify future improvements.
Learn moreYou'll reduce your cloud spend in two ways:
Your team wasn't maximally efficient before (surprise!) and working with Coiled observability tools and Dask engineers you're able to find at least a 50% reduction.
Additionally, Coiled uses the cloud more efficiently, leveraging technologies like Spot, ARM, tuned instances for your workloads, and more. You're able to drop your effective CPU-hour cost to around $0.03.
Coiled charges you money. This is small though relative to the performance gains for your team, lack of distraction, and cloud efficiencies.
As you use more and can predict your usage, you'll negotiate for discounted bulk rates.
Your team still does dumb things, which take up time and burn money, but they do far less of this now that they have a tool that identifies errant behaviors, and a team of Dask/Python experts to help guide them.
Roadblocks become speedbumps.
The scenarios above focus on operational costs, but velocity may be more valuable. Often delivering a key result or accelerating a timeline can be worth more than just the costs of the hardware and the team.
Our experience is that building takes longer than expected to establish a high quality foundation, delaying key results and timelines. Conversely, teams that use products like Coiled are able to focus on what they’re trained to do best, and spend their time thinking about what’s important, delivering impact to the rest of their company.