Multidimensional Data Processing with Dask Array and Xarray

​​This class focuses on Dask Array and Xarray, two libraries to analyze large volumes of multidimensional data.

Recommended for engineers or data scientists who typically work with large volumes of imaging, geophysical, or other data that might come in image, HDF5, or NetCDF files.

Learn Ideas and Gain Skills

  • How Dask Array extends NumPy to larger datasets
  • Select, filter, transform, and apply custom computations to data
  • Leverage Xarray indexes and labels to simplify your code

Dask Dask Array Data Processing xarray


Prerequisites

  • Python, basic level
  • NumPy, basic level

Topics

Introduction

  • Python and NumPy for multidimensional data
  • Limitations of NumPy
  • Dask Array model, XArray model

Core Array APIs and Operations

  • Loading Data
  • Slicing, processing, and aggregating data with the Numpy API
  • Chunking data for performance
  • Applying Numpy functions in parallel
  • Stacking data from custom sources
  • Integrating with Numba for performance

Intro to XArray

  • Loading data
  • DataArray and Dataset
  • Dimensions and Coordinates
  • Indexes, selecting data
  • Filtering and aggregating
  • Combining data
  • Plotting
  • Writing output

Best Practices

  • Optimal layouts on disk for performance
  • Managing memory and preserving results with “persist”
  • File formats
  • Chunking data