Surprising Hidden Costs With DIY Dask
• October 7, 2021
Dask is a distributed computing framework for parallel computing that can run on clusters and allows IT teams to perform data engineering and advanced analytics on big datasets. You know your team is ready to scale beyond their desktops to use more data, but that requires more processing power. Dask is a Python native solution that builds on the existing Python ecosystem tools such as Pandas and NumPy. But how can you estimate the costs of building your own in-house Dask service (DIY Dask) vs. the costs of leveraging the Coiled, managed Dask service? Keep reading to find out!
A minimal DIY Dask service does not come close to having all the capabilities that most organizations require, such as security checks, a streamlined developer experience, and a fast time-to-deployment. Many IT departments are perfectly capable of building out all these capabilities but don’t have the time due to other high-priority DevOps projects.
Coiled takes care of scaling up, handling multiple users and clusters, and abstracts away low-level cloud resources and networking configuration. Coiled automatically creates and manages Dask clusters and their underlying infrastructure. You get out-of-the-box industry-standard security, authentication, SSO, access control, and credential management.
Coiled is solely focused on building a Dask service and has a dedicated engineering team where thousands of customers can share costs. This is why Coiled can offer organizations a better Dask service for less cost than a DIY Dask service.
DIY Dask Service
Utilizing dask-cloudprovider and dask-kubernetes makes it relatively easy to create an in-house Dask service. Keep in mind that the DIY Dask services are bare-bones out-of-the-box and require significant optimizations and additional features to be production-grade operational.
DIY Dask services have three types of costs:
- Development costs: the gap between current features of your DIY Dask service and the costs to build the features you’d like to have
- Maintenance costs: this is future costs of maintaining DIY Dask service and continuing to add new features
- Sunk costs: these are the costs that you’ve already spent on your DIY Dask service
DIY Dask services are missing features that are built into Coiled
Built-in user-level cost tracking and authentication is an example of a Coiled feature that hasn’t been added to many DIY implementations. Without user-level usage data, it’s hard to determine who is causing cost spikes and drops. It’s also hard to allocate costs by department or project team, which can lead to loss of productivity and could impact the time available to your IT department to analyze data.
DIY Dask services are usually slower than Coiled, limited by disk I/O, memory limits, CPU, or network bottlenecks, which results in larger cloud service bills. Dask needs to be optimized depending on the environment. If you don’t perform the necessary optimizations, your Dask runtime will run slowly. We encourage all clients to run the profiling code on their DIY setup and on Coiled to quantify the performance drag of your DIY implementation. Let us know your results on Twitter, by reaching out to @CoiledHQ.
Many IT departments are sophisticated enough to add these features to their DIY Dask service, but that takes time and needs prioritization among the myriad of other competing DevOps projects. Coiled can focus all their engineering resources on building Dask service features.
Coiled Managed Dask Service
Coiled uses spot instances by default and are continuing to explore ways to help clients find the cheapest clusters for their workflows, depending on market prices. The benefit of using Coiled Cloud as your standardized cloud environment for Dask clusters saves money with:
- Costs are controlled by spending limits and quotas for the team or by a user. Coiled automatically terminates clusters after 20 minutes of no activity. Accidentally leaving clusters running overnight or over the weekend is a common, and costly, mistake.
- IT management is centralized for the full stack – infrastructure and analytic workloads – with consistent, secure environments for compute resources. The infrastructure is fully documented and meets IT standards, eliminating shadow IT.
- Scalability on-demand is achieved to meet workload requirements. Coiled also autoscales clusters which automatically spin up or spin down nodes in a cluster, depending on the computational requirements of the workload. Autoscaling clusters are a great way to save money.
- Developer Experience is streamlined, and productivity is increased. Running secure data workloads at scale is extremely easy. The data scientist and data engineer focus on their core skills, and the infrastructure building has been done for them, and IT can easily control quotas and costs.
Do-It-Yourself Sunk Costs
Sunk costs are costs that have already been incurred and shouldn’t be factored into future decision-making. In behavioral economics terminology, “Individuals commit the sunk cost fallacy when they continue a behavior or endeavor as a result of previously invested resources (time, money or effort)” (source).
We understand why some organizations are hesitant to abandon a DIY solution that’s functioning, especially when they’ve spent a lot of time building the internal service. It’s human nature.
We recommend you evaluate costs based on the current feature gap and the additional features you’ll have to add in the future to maintain feature parity with Coiled. We have a team of engineers working around the clock to make the best Dask service in the market. You’ll need to continue devoting significant team resources to add new features, like GPU support, long-term Dask version support, and operational dashboards.
Advantages of Coiled Cloud for IT
SLAs: If there is a surprise weekend outage, the DIY IT managers need to debug the problem immediately. Your IT managers will be able to outsource emergency management to Coiled.
Dedicated Dask Support: Coiled also comes with support services that provide access to Dask experts, including the initial author of Dask. We’ve helped some clients with deeply technical performance issues make their jobs run faster, saving them a lot of money on their cloud bills.
Security: Coiled provides security controls that are standardized to protect sensitive information. Insecure services are a big vulnerability for organizations because they have read access to data lakes and permissions to provision compute power. A DIY service that gets breached could allow for a malicious user to steal your data assets. IT managers can rest assured with the security responsibility delegated to Coiled.
Scale Your Clusters
Contact us to get a detailed TCO analysis comparing your DIY Dask service and Coiled. The TCO analysis will be customized for each organization. When creating the TCO for your company we establish a baseline from your current state, then work with you to define your ideal state, requirements, and business goals. Then, we outline the costs to develop and maintain a DIY solution versus purchasing Coiled based on the above criteria.
Once we develop a TCO, we can help you develop estimates of how long it’ll take you to build that feature yourself. You can rest assured that Coiled is taking a different approach to cost management and will constantly build features and visualization tools to help IT professionals manage costs/budgets.
Fill out the form below to get in touch with the Coiled team!
Thanks for reading! Let us know what you think of this post or reach out with any questions, we’d love to hear from you.