A Changing Energy Landscape – Analyzing 8 Million Homes with Distributed Python

With electric vehicles, breakthroughs in energy storage, and a focus on caring for the environment, the way we use and manage energy is undergoing massive changes. And yet, it is currently surprisingly difficult to answer even seemingly simple questions: 

  • Is it possible to better align our energy consumption times to when renewable production is highest? Can storage help balance out this time difference? 
  • How much solar capacity is required for a particular home? 
  • How do electric vehicles fit into the picture? 

Without data, it’s nearly impossible to actually know which areas and actions can most effectively enable smart energy decisions. Keith Pasko, an independent researcher currently collaborating with https://www.rewiringamerica.org and Otherlab, joins us to share his analysis and modeling of home energy use. What may start as a few spreadsheets quickly scales to:

  • 8760 hourly data points, 
  • for >12 appliances/data sources, 
  • parameterized by 100s of inputs, 
  • for 80 million homes across the US. 

Keith relies on Dask and Coiled to map these questions to data-backed answers at such enormous scales, working to discover solutions on how to make renewable energy more predictable and secure. 

This post comes with an accompanying Jupyter notebook and dataset, which cover:

  • Launching a secure, custom cluster in 2 minutes
  • Loading 80+ GB of residential energy usage data for thousands of homes
  • Reconciling CSV tables from nested directories into a single dataframe
  • Storing the dataset in an optimized, compressed, columnar format – Parquet
  • Installing dependencies on the cluster in a live session using Dask and pip

View the notebook in GitHub, or launch a hosted instance on Coiled!

Understanding the energy landscape with distributed computing

Working with the US Department of Energy’s ARPA-E, Otherlab created a tool to aggregate multiple government datasets (such as RECS and CBECS) into a unified map of US energy usage broken down by category to high fidelity. This level of data scale helped to reveal, on a broad level, where the greatest opportunities for disruption and intervention exist within the energy economy. In particular, the team struck upon the residential sector as a prime opportunity for making an impact; the energy we use in our homes and cars consumes roughly 35% of the total US energy budget and represents ~42% of all greenhouse gas emissions. Residential heating and cooling alone account for more than the entirety of US chemical manufacturing (and can in fact be disrupted leveraging just one existing technology). 

A detailed look: energy use in the USA (www.energyliteracy.com)

Power to the people

This focus on the residential sector carries with it other benefits. The availability of renewable energy generation for the home became more affordable and capable over the past decade, and distributed power generation is widely advocated as a solution to an aging power grid. 

Using renewables can significantly increase efficiency. Despite the plant designs being very effective, 30% of energy in coal and oil is immediately lost when converting heat to electricity. Then, transmission over the grid consumes additional energy. Locally available renewables have nominally unlimited supply, in addition to being a cleaner technology.

Engaging individual citizen decision makers directly in the process of changing their energy ecosystem not only empowers, but may also be the fastest route to change. Along with this opportunity, however, comes the challenge of understanding home energy use in far more in-depth ways than ever before.

Analysis at scale with Python and Dask

This in-depth analysis includes expensive computational tasks, e.g. rolling aggregations, optimization calculations, clustering, pattern recognition, and more. Without proper infrastructure, the project gets unwieldy pretty much immediately — even at course hourly resolution, and for a fractional subset of the actual homes. Realtime, by-the-minute appliance-level energy load reporting is already technically within reach of individuals, and will only become more prevalent among households, making tackling these scaling issues even more crucial.

In the accompanying notebook, we use a distributed cluster to load, format and visualize this massive dataset. We cover best practices for loading a custom, nested dataset, and saving it in a compressed, columnar format for faster access. 

Launch the notebook here and try yourself!

Visualizing solar radiation on US county level.

Overcoming unpredictability

A key example of what these analyses can enable comes in the form of a duck – the California ISO “duck curve” representing the mismatch between solar production and home energy usage. Leveraging renewable energy is naturally better for the environment, but how practical is it when there are large time spans of over- or under-production, either on the utility scale or via rooftop solar? 

Tricky timing: the growing mismatch between home energy usage and renewable energy production.

In order to guarantee safety and comfort we need to devise smart controls and storage for managing these offsets. 

Dask and Coiled are being leveraged to find optimal solutions — using distributed dynamic time warping to find which loads should be used at which times, or large scale linear programs for optimizing when and how much battery/storage should be used.

Comparing solar generation with dinnertime cooking (for a population of 109 TMY9 zones)

One of the important products of Keith’s work in progress is a visual dashboard displaying energy production and residential consumption across the entire country. Among other purposes, it allows homeowners to make better informed decisions about the suitability of renewables for their situation.

Everyone interested will be able to look at the problem in detail, potentially leading to exciting new solutions.

Check out the accompanying notebook, and reach out to us! We’re excited to see what answers you find in the data!

Catch our conversation with Keith on March 23rd!

Tune in to our Livestream at 5 pm EST on Tuesday, 3/23 to hear more about harnessing distributed computing to make meaningful progress on important challenges. Grab your ticket by clicking below!


Sign up for updates