Logo

Coiled

PricingDocs
Sign inSign Up

How the National Snow and Ice Data Center processed 200 TB of data for $150

Distributed batch processing with Coiled for NASA's cryospheric data

Introduction: NSIDC and Earth's Cryospheric Data#

The National Snow and Ice Data Center (NSIDC) serves as one of NASA's 11 Distributed Active Archive Centers (DAACs), responsible for hosting and distributing critical satellite data related to Earth's cryosphere: the frozen parts of our planet including glaciers, ice sheets, and sea ice.

NSIDC is the DAAC in charge of cryospheric data for NASA. That relates to all data from satellites that observe the cryosphere, the poles and ice.

Luis Lopez

Software Engineer, National Snow and Ice Data Center

This data is essential for tracking climate change, predicting sea-level rise, and understanding Earth's complex systems. As NASA's Earth-observation fleet has grown more sophisticated, so has the volume of this data. NSIDC now manages over 6 petabytes of information from various missions, from satellites launched in the 1980s to today's advanced instruments.

The center's most data-intensive mission is ICESat-2, a satellite launched in 2018 equipped with a laser altimeter that measures the height of Earth's surface with unprecedented accuracy.

ICESat-2 is a laser altimeter in space. It's one of the most accurate lidar systems in orbit, measuring displacements by shooting individual photons to earth. Each file is easily 10 gigabytes, and we receive many many of those each day.

While primarily focused on measuring millimeter-scale changes in ice thickness, the satellite collects data across the entire globe. Scientists now use this data for applications ranging from measuring sea roughness to detecting theoretical rock waves in the Pacific Ocean that were previously impossible to observe.

Logo 1Logo 2Logo 3Logo 4

The Challenge: Bridging Scientific Expertise and Cloud Technology#

Like many scientific institutions, NSIDC is transitioning from a traditional archive facility into what NASA calls a "data science enabling center" that not only stores data but helps scientists access and process it efficiently.

Historically these DAACs were in charge of hosting and distributing information. Now with the cloud transition, we're becoming what NASA calls data science enabling centers for scientists.

This transition creates a gap between scientific expertise and cloud technology. Scientists who have developed expertise in fields like glaciology or remote sensing now need to navigate complex cloud services to perform their analyses.

Scientists aren't used to learning cloud APIs. What used to be simple, like listing a file on your hard drive, becomes a complex lesson in libraries and concepts and acronyms that have nothing to do with their science.

As a software engineer at NSIDC, Luis Lopez works to bridge these worlds. Half of his role involves creating tools and training that make cloud computing accessible to researchers who need to focus on their scientific work rather than becoming cloud experts.

When searching for solutions, Luis discovered Coiled, which offered a refreshingly simple approach to cloud computing:

Setting up Coiled was really simple. It's simpler than logging into your Google account. Just copy-paste a URL, and you have a token saved to your local drive.

For scientists working with specialized data formats and libraries, Coiled's environment synchronization eliminates a major pain point:

I mostly use Coiled for cloud orchestration. It's a one-liner to provision a cluster and specify the machine type I want. The automatic scanning and replication of my Python environment in the cloud is a killer feature.

Success Story: The It's Live Project#

Luis has used Coiled for numerous projects at NSIDC. For his most recent project, he worked with JPL on the "It's Live" project. This initiative maps the velocity of glaciers worldwide to understand their flow into oceans, a critical factor in sea level rise predictions.

The project maps the velocity of glaciers—how fast they flow to the ocean—because that's what's contributing the most to sea level rise.

This project required processing data from multiple satellite missions spanning decades, from Landsat 5 launched in the 1980s to today's Landsat 8, 9, and Sentinel satellites. The heterogeneous nature of the data complicated processing, with approximately 200 terabytes to analyze.

Using Coiled's batch functionality, Luis created an efficient processing pipeline that transformed this computational challenge into a manageable task:

I wanted to extract some data from each of the millions of files in parallel.

The results were remarkable. With Coiled, the team processed 200 terabytes of data in less than a day:

A little less than 200 terabytes of data that I processed in less than a day. And in hindsight, I could have done it way faster.

The cost efficiency was equally impressive:

The whole computation probably would have cost me like $150. I did it multiple times because the first time you don't get it right. The two or three times that I ran it for the whole mission, I think it was like $350.

It's very cost effective—per file, it's like $0.000... cents.

Business Impact: Cost Efficiency and Scientific Acceleration#

For scientific institutions like NSIDC, computational efficiency translates directly to research acceleration and more effective use of limited research funds. The most significant impact has been removing barriers that previously limited scientific exploration.

Without Coiled, processing datasets at this scale would require custom infrastructure development or complex cloud configurations that most scientists simply wouldn't attempt:

Without Coiled, it would be hard. You'd have to manually provision clusters using different schedulers or technologies. It's more complicated because you have to learn new APIs and systems.

This difference in effort changes the nature of scientific inquiry. With a simplified path to computation, scientists can ask bigger questions and process more comprehensive datasets:

The advantage of Coiled is that as long as you know basic Python, horizontal scaling becomes really simple.

For budget-conscious scientific institutions, the cost transparency Coiled provides is equally valuable:

Coiled keeps a running estimate of computation cost, which is crucial for scientists. Their first question is always 'how much will this cost?'

This visibility removes the financial uncertainty that often prevents scientists from utilizing cloud resources, allowing them to make informed decisions about the scale of their analyses.

Building for the Future: On-Demand Pangeo Forge#

With the success of projects like "It's Live," Luis now envisions creating more accessible data transformation pipelines for the scientific community. He's particularly excited about developing an on-demand Pangeo Forge workflow that would democratize access to transformed satellite data.

I'm excited about creating an on-demand Pangeo Forge workflow. Pangeo Forge is a project that runs code in the cloud to transform raw satellite data into data cubes that are more useful for scientists, but it requires significant computation.

These data cubes make complex datasets more accessible to researchers who lack the computational expertise to process raw satellite data. By using Coiled as the computational engine for this transformation, Luis hopes to create a system where more scientists can generate derived data products:

If we can use Coiled to implement Pangeo Forge workflows where we take satellite data and integrate it into data cubes, then other scientists could generate their own derived data products more easily.

This approach could transform how scientists interact with NASA's data archives, allowing them to focus on analysis and discovery rather than data engineering. For a field where technological barriers often restrict scientific progress, simplifying access to computation could accelerate research across multiple disciplines.

Conclusion: Accelerating Science Through Accessible Tools#

The work Luis is doing at NSIDC represents a successful bridge between scientific expertise and cloud technology, two domains that often struggle to connect effectively. By providing simple, straightforward tools for complex cloud operations, Coiled allows scientists to leverage modern computational capabilities without a steep learning curve.

Coiled has simplified how I scale Python code in the cloud. We have one-liners in the terminal for our laptops. With Coiled, we now have one-liners in the cloud.

Luis Lopez

Software Engineer, National Snow and Ice Data Center

As climate research becomes increasingly critical for understanding our changing planet, tools that accelerate scientific discovery and make data more accessible play an essential role. For scientists studying the cryosphere, the ability to process petabytes of data efficiently means better models, more accurate predictions, and ultimately more informed climate policies.

The future of scientific computing lies not just in more powerful machines but in more accessible tools that bring computational abundance to experts across disciplines. Through the bridge Luis is building between scientific expertise and cloud computing, he's helping fulfill NASA's vision of not just archiving data, but enabling the science that makes that data valuable.