DataFrames
Cloud data processing with Pandas or Polars. Even DuckDB. Anything but Spark.
Polars or DuckDB
Done before you can ask AI how to configure Spark.
Run Polars or DuckDB queries in the cloud.
- Run on a big cloud VM
- Run independent queries in parallel on many VMs
- Run on a schedule with Prefect or Dagster
import polars as pl
import coiled
# Specify size of machines and cloud region
@coiled.function(
cpu=128,
memory="512 GiB",
region="eu-central-1"
)
def run_query(filename):
df = pl.read_parquet(filename)
q = (
pl.scan_parquet(filename)
.filter(pl.col("balance") > 0)
.group_by("account")
.agg(pl.all().sum())
)
return q.collect
# Run queries in parallel on many machines
run_query.map(filenames)
import polars as pl
import coiled
# Specify size of machines and cloud region
@coiled.function(
cpu=128,
memory="512 GiB",
region="eu-central-1"
)
def run_query(filename):
df = pl.read_parquet(filename)
q = (
pl.scan_parquet(filename)
.filter(pl.col("balance") > 0)
.group_by("account")
.agg(pl.all().sum())
)
return q.collect
# Run queries in parallel on many machines
run_query.map(filenames)
Parallel Python with Dask
Your favorite Python libraries, at scale.
Churn through tabular data
- Load data from anywhere Pandas can
- Scale out to 100s of TiB
- Easily write custom logic on Pandas partitions
import coiled
import dask.dataframe as dd
cluster = coiled.Cluster(
region="us-east-2"
worker_memory="64 GiB",
)
client = cluster.get_client()
# Load Data
df = dd.read_parquet("s3://coiled-data/uber/")
df.base_passenger_fare.sum().compute()
# Query Data
df.driver_pay.sum().compute()
import coiled
import dask.dataframe as dd
cluster = coiled.Cluster(
region="us-east-2"
worker_memory="64 GiB",
)
client = cluster.get_client()
# Load Data
df = dd.read_parquet("s3://coiled-data/uber/")
df.base_passenger_fare.sum().compute()
# Query Data
df.driver_pay.sum().compute()
Easy and familiar API
You already know pandas - this leverages your expertise.
Pandas
import pandas as pd
df = df[df.value >= 0]
joined = df.merge(other, on="id")
joined.groupby("id").value.mean()
import pandas as pd
df = df[df.value >= 0]
joined = df.merge(other, on="id")
joined.groupby("id").value.mean()
Dask Dataframe
import dask.dataframe as dd
df = df[df.value >= 0]
joined = df.merge(other, on="id")
joined.groupby("id").value.mean().compute()
import dask.dataframe as dd
df = df[df.value >= 0]
joined = df.merge(other, on="id")
joined.groupby("id").value.mean().compute()
Faster than Spark
... and less painful too!
Dask Dataframe easily beats Apache Spark on standard benchmarks like TPC-H. And your sanity remains intact.
- Twice as fast, on average
- Doesn't require intense configuration
- Easier to debug (unless you love the JVM)

Delightful to use
These people said nice things about us, and we didn't even have to pay them.
"My team has started using Coiled this week. Got us up and running with clusters for ad hoc distributed workloads in no time."
Mike Bell
Data Scientist, Titan
"On my computer this takes days. Now it takes an hour. I had no experience with distributed systems."
Mohamed Akbarally
Data Scientist, With Marmalade
"I've been incredibly impressed with Coiled; it's quite literally the only piece of our entire ETL architecture that I never have to worry about."
Bobby George
Co-founder, Kestrel
"Coiled is natural and fun to use. It's Pythonic."
Lucas Gabriel Balista
Data Science Lead, Online Applications
FAQ
That's the promise, but it's mostly a lie.
Dask dataframe is built of many pandas dataframes and it uses the same API, so it's really similar. In reality though, distributed cloud computing gets complicated and for full performance you'll encounter differences.
Fortunately, Dask's dashboard is there to help you through this. And you're smart enough to debug it when needed - we give you the tools to see what's happening.
Several Terabytes is easy. Hundreds is doable.
On the low-end, if your data fits in memory we recommend using Pandas or Polars. Don't scale if you don't need to. You're too smart to add complexity when it's not needed.
On the high-end, Dask scales to 100s of thousands of cores and 100s of TiB of data. Above 10 TiB things get interesting and require tuning, but we're here to help. The hardest data problems need your expertise plus our infrastructure.
Coiled just turns on cloud machines. You can run whatever you like.
We started with Dask code but the platform ended up being useful for lots of things.
You know which tool is right for your problem - we just help you scale it.
Actually, we do (albeit a little shamefully).
Again, Coiled can run anything. Here are the docs. We respect that you might have legacy Spark code that's still valuable.
Not officially, but effectively yes.
It's not an official offering (there are many excellent solutions for cloud SQL today) but we do have customers who combine Coiled Batch with Trino with good results.
Get in contact if you'd like to learn more.
Get started
Know Python? Come use the cloud. Your first 10,000 CPU-hours per month are on us.
$ pip install coiled
$ coiled quickstart
Grant cloud access? (Y/n): Y
... Configuring ...
You're ready to go. 🎉
$ pip install coiled
$ coiled quickstart
Grant cloud access? (Y/n): Y
... Configuring ...
You're ready to go. 🎉