How Space Intelligence Maps Tropical Forests at Scale with Coiled
Processing hundreds of terabytes of satellite imagery to protect the world's forests

Mapping Forests at Planetary Scale#
Space Intelligence tackles one of the most computationally demanding geospatial challenges today: creating detailed maps of the world's tropical forests to support their protection and restoration. These aren't just any maps—they cover entire countries at 10-meter resolution, processing data across multiple spectral bands and time periods.
We're a 60-person company on a mission to map the world's tropical forests to support their mass protection and restoration. We do this using huge amounts of space-borne geospatial data, remote sensing data, machine learning, and a lot of data processing.
Ben Ritchie
Head of Engineering, Space Intelligence
The scale of this work is staggering. To map a country like Brazil, the team processes hundreds of terabytes of data from multiple satellite platforms, capturing everything from optical and infrared to radar imagery. This multi-spectral, multi-temporal approach allows them to distinguish subtle differences between forest types that would be impossible to detect with simpler methods.
To map a country like Brazil, we'll be pulling together hundreds of terabytes of data covering all the scenes over four or five satellite platforms for a whole year.
Their workflow combines multiple computationally intensive stages:
- Data acquisition and preprocessing: Gathering and normalizing multi-spectral satellite imagery
- Dimension reduction: Summarizing and aggregating raw data into consistent data cubes
- Machine learning: Running classification algorithms to identify forest types and conditions
- Post-processing: Applying sieving and other techniques to produce the final maps
The results power conservation efforts and carbon credit programs that fund forest protection in the global south—work that's critical in the fight against climate change.



The Python Geospatial Stack at Scale#
Space Intelligence built their technical approach on the Python geospatial ecosystem, using a combination of powerful open-source tools:
- Xarray provides their primary abstraction layer for multi-dimensional data
- Zarr serves as their intermediate format for efficient parallel access
- Cloud-Optimized GeoTIFFs (COGs) and SpatioTemporal Asset Catalogs (STAC) handle their long-term data storage and organization
- Scikit-learn powers their machine learning pipeline
- Dagster orchestrates their complex workflows
For our tech stack the primary abstraction layer we'll be writing our code in is Xarray, which just seems a nice, simple, understandable tool. It's powerful enough that our data engineers like it, and it's simple enough that our scientists like it.
But running this stack at the scale required for mapping entire countries pushed the limits of what's possible on individual machines. They needed a way to distribute their computation efficiently across many machines without adding complexity for their scientists and engineers.
The Journey to Efficient Cloud Computing#
Like many organizations, Space Intelligence's path to efficient cloud computing evolved through several stages:
Starting with Laptops#
They began with local processing on team members' laptops:
Very, very early on, we started off running stuff on laptops, downloading all the remote sensing data onto people's laptops, trying to run all the compute locally. As normal, that works great when you're looking at tiny areas and hits a brick wall as soon as you start to scale.
Moving to Serverless Functions#
Their first cloud approach used AWS Lambda functions, which quickly revealed limitations:
An early thing we did was to try and build our own tech from scratch in a cloud provider, in AWS. Trying to run relatively straightforward geospatial code straight on AWS functions or lambdas. And that kind of worked. It's a lot of management, and it was quite expensive because most of these serverless offerings in the cloud are up to an order of magnitude more expensive than raw VMs.
Building Their Own Dask Infrastructure#
Recognizing the need for distributed computing, they discovered Dask and built their own infrastructure:
We did build our own infrastructure from scratch. Actually, it served us really well for about a year and half. But the plain maths of it is we spent at least an engineer year building that infrastructure.
While functional, their DIY Kubernetes-based approach had significant inefficiencies:
When we actually dug into our old cluster, which used to be running on the Pangeo stack, used to be running on Kubernetes, when we actually dug into our use of that, something like half of our usage for that was completely wasted usage. It was down to cluster fragmentation.
In one particularly costly incident:
The cluster spun up to a huge size for whatever reason. And then it ended up with one container sitting on each of 100 nodes. All the other containers went away. But then all of the nodes were still held around by this one long running job. One memorable time, that one long running job managed to keep running for three days before I spotted it. And we'd kept a cluster of about 1,000 CPUs running for multiple days in the cloud, which got quite expensive.
As maintenance challenges mounted, they began looking for a better solution.
Coiled: Making Everything Easy#
After evaluating their options, Space Intelligence made the transition to Coiled and found that it simplified every aspect of their cloud computing workflow.
Easy Setup#
Despite their expectations of a complex migration, the team found Coiled surprisingly easy to implement:
The setup process was phenomenally easy. I was a little bit disarmed. I expected it to be hard. But the reality was there wasn't very much of that, not because it hadn't been thought through properly, just because it was a simple architecture.
Easy Scaling#
With Coiled, they could now process data at a truly continental scale:
The largest job we've ever run would be up to about 300 workers and we've processed data for areas up to the whole of Brazil at 10 meter resolution on Coiled, which is an awful lot of pixels.
Easy Monitoring#
The platform's metrics and observability features proved particularly valuable:
I really like its metrics views. It has actually really simple metrics—there's not very many of them, but they are clearly designed by Python users who know what actually matters. I would take four graphs of really high-quality, straight-to-the-details metrics over 100 graphs of gumpf any day of the week.
Easy Infrastructure#
Coiled's raw VM approach eliminated the inefficiencies they had experienced with Kubernetes:
The real advantage I see of the Coiled model is it's much simpler and it's much more resilient. It's just using raw VMs. It spins up effectively a new cluster of VMs for every cluster you need. So there's no real interaction between different users, different clusters.
Easy Integration#
They seamlessly integrated Coiled with their Dagster orchestration:
We plot out the whole of the high level data flow in Dagster, but do all of the real computation in Coiled. So our Dagster workers will always be very lightweight workers whose job is basically just to kick Coiled at the right time, build the task graph that Coiled needs and say, please go away and execute this. And then Coiled will do all of the heavy lifting.
Expert Support Across the Entire Stack#
While Space Intelligence initially focused on the platform capabilities, they discovered that Coiled's support model became one of the most valuable aspects of the partnership:
The bit that's impressed us most since we have hasn't been that. It's been the support offering. It is much closer to a member of your team who you can just phone up and get advice type model than a typical raise a ticket and then get a vague answer three days later type model.
This support extends beyond just troubleshooting Coiled itself:
One thing that's really interesting about Coiled is their willingness to step outside the sort of narrow confines of their tech stack. Some of the most interesting interactions we've had with Coiled has been when they've helped us to debug, to optimize areas of code which are going far slower than they should be. In most cases, the root cause of those bugs hasn't been in Dask or in Coiled. In most cases, it's been in the wider ecosystem of open source Python data science libraries that we use.
From Infrastructure to Innovation#
The most significant impact of adopting Coiled has been the shift in focus from maintaining infrastructure to delivering value:
The biggest change Coiled's made is allowing us to spend less of our time worrying about data infrastructure and more of our time getting on with building great maps on top of it.
This shift from infrastructure concerns to core business innovation has accelerated their ability to deliver detailed forest maps that drive conservation efforts and support carbon credit programs.
Looking Forward#
With their data infrastructure needs addressed by Coiled, Space Intelligence can focus on expanding their mission to protect and restore the world's tropical forests.
The combination of Coiled's scalable infrastructure and Space Intelligence's expertise in geospatial analysis and machine learning positions them to deliver increasingly detailed and accurate forest maps. These maps will continue to power conservation efforts and carbon credit programs that provide economic incentives for forest protection.
Coiled feels like a safe pair of hands. The biggest change Coiled's made is allowing us to spend less of our time worrying about data infrastructure and more of our time getting on with building great maps on top of it.
Ben Ritchie
Head of Engineering, Space Intelligence
As global attention on climate change and biodiversity loss intensifies, Space Intelligence's work becomes increasingly critical. By providing the tools to monitor and protect tropical forests at scale, they're helping to ensure these vital ecosystems continue to serve as carbon sinks and biodiversity hotspots for generations to come.