2024 Spring Lectures in Climate Data Science

Biweekly on Thursdays || Jan. 18, 2024 - May 23, 2024

Click to see past Lectures in Climate Data Science from Fall 2023, Spring 2023, and Fall 2022.

THURSDAY || JAN. 18, 2024

MOHAMED AZIZ BHOURI (Columbia University)
KAITLYN LOFTUS (Columbia University)

LEAP Research Updates
Watch on YouTube:
Bhouri: “Multi-fidelity climate model parameterization for extrapolation beyond training
Watch on YouTube:
Loftus: “Parameterizing cloud microphysics with machine learning-enabled Bayesian parameter inference”

Carl Vondrick
THURSDAY || FEB. 1, 2024

Columbia University

Multimodal Learning from Pixels to People
People experience the world through modalities of sight, sound, words, touch, and more. By leveraging their natural relationships and developing multimodal learning methods, my research creates artificial perception systems with diverse skills, including spatial, physical, logical, and cognitive abilities, for flexibly analyzing visual data. This multimodal approach provides versatile representations for tasks like 3D reconstruction, visual question answering, and object recognition, while offering inherent explainability and excellent zero-shot generalization across tasks. By closely integrating diverse modalities, we can overcome key challenges in machine learning and enable new capabilities for computer vision, especially for the many upcoming applications where trust is required.

THURSDAY || FEB. 15, 2024

QINGYUAN YANG (Columbia University)

LEAP Research Updates
Watch on YouTube:
Yu: “ClimSim2: Coupling Global Climate Models with Neural Network Cloud Emulators”
Watch on YouTube:
Yang: “Flexible use of additive Gaussian Processes as a powerful tool for more interpretable analysis and emulation of climate model PPEs”

Ryan Abernathey
THURSDAY || FEB. 29, 2024

Columbia University / Earthmover

The Future of Earth-System Data Infrastructure
Responding to the climate crisis requires a coordinated mobilization of academic research, government agencies, and a rapidly growing suite of private companies (broadly described as “climate tech”) aimed at addressing climate change adaptation and mitigation through commercial products and services. At the heart of this work is data: exabytes of data about the earth system, originating from satellites, sensors, and simulations and passing through many stages of processing and refinement as they reach end-user applications.

The growth of AI is fueling an insatiable demand for data across the board while also producing new sources of data, as AI-driven forecasts begin to emerge. Earth system data has many more users and use cases than it did a decade ago.

These trends require us to rethink our approach to data infrastructure, which has traditionally emphasized a one-way exchange of data files from a few large data providers to data consumers. My talk will review exciting recent progress in moving towards a cloud-native earth-system-data ecosystem, incorporating lessons from my work on open-source software such as Xarray, Zarr, and Pangeo. I’ll conclude with a vision for how a truly frictionless global data infrastructure can enable a radically more effective response to the climate crisis while also empowering those most impacted by climate change to play a greater role in solutions.

THURSDAY || MAR. 14, 2024

FLORENCIO PORTOCARRERO (Columbia Business School)
JAEYOUNG JUNG (Columbia University)

LEAP Research Updates
Watch on YouTube:
Portocarrero: “Communication Frames and Actors’ Responses to Sustainability- and Climate-related Information”
Watch on YouTube:
Jung: “A Multiscale Framework for Airflow-Canopy Interaction”

TUESDAY || MAR. 26, 2024 || 2:00pm (EDT)
** Please note day + time change for this Lecture. **

Pacific Northwest National Laboratory

‘Machine’ Learning Cloud Physics: Where Do We Stand?
Cloud physics is a collection of processes that comprise the sources, evolution and sinks of water in clouds (microphysics) and their links to the dynamical environment in the atmosphere. Cloud physics is critical for both weather and climate. Important cloud processes cross many orders of magnitude in scale, and this makes them very difficult to simulate reliably from the cloud to the climate scale. Naturally, machine learning methods are providing new opportunities for advancing our understanding of clouds and how we can predict them for both weather and climate. This presentation will provide an overview and examples of how machine learning methods are being used to simulate / emulate clouds for prediction and analyze our simulations to better represent observations. The presentation will conclude with some speculation on future directions for where we might usefully apply new machine learning methods for predicting clouds and constraining uncertainties in weather and climate prediction.

THURSDAY || APR. 11, 2024

YouTube links forthcoming

AYA LAHLOU (Columbia University)
DION HO (Columbia University)

LEAP Research Updates
Lahlou: “Modeling phenology under climate change using deep learning”
Ho: “Data science and physical modeling of energy flows in the atmosphere”

THURSDAY || APR. 25, 2024

Univ. of Washington

Madden-Julian Oscillation and Atmospheric Rivers: Toward a New Paradigm for S2S Forecast of High-Impact Weather Extremes
Part 1
Atmospheric rivers transport vast quantities of water vapor globally, some of which
translates into tremendous rainfall and potential flooding. For atmospheric rivers that
make landfall on the West Coast of the U.S., much of that water vapor is sourced from
tropical reservoirs in the Pacific Ocean like the Intertropical Convergence Zone. One
such source, the Madden Julian Oscillation, has been studied in connection with
atmospheric rivers either statistically (its presence increasing the likelihood of intense
rainfall) or through its modulation of jet streams in Pacific Basin altering atmospheric
rivers’ trajectories. The Madden Julian Oscillation’s impact as a direct water vapor
source on atmospheric river intensity and potential rainfall has been relatively
unexplored, however. This portion of the talk will present research findings showing
that atmospheric rivers with the Madden Julian Oscillation as a direct water vapor
source are characteristically different, and notably more intense than atmospheric rivers
not using the Madden Julian Oscillation as a direct water vapor reservoir. In particular,
these atmospheric rivers are larger, carry more water, and move that water vapor more
intensely. These results highlight the Madden Julian Oscillation’s importance in
atmospheric river intensification, and as an essential part of our seasonal-to-
subseasonal forecasting repertoire.
Part 2
The Madden Julian Oscillation is the largest source of interannual variability in the
tropics. This massive system can affect everything from atmospheric rivers to wildfires
in the U.S., and abroad. Despite these well-studied impacts, mention of the Madden
Julian Oscillation is noticeably absent from news coverage on weather extremes. This
portion of the talk will demonstrate how scientists can communicate their research in a
way that is salient, digestible, and more likely to be picked up by popular media. In
particular, a popular science article on the Madden Julian Oscillation will be used as a
case study for how a publicly niche, yet academically well-discussed, topic can be
transformed into a reader-friendly format.

THURSDAY || APR. 25, 2024


Generative emulation of weather forecast ensembles with diffusion models
Probabilistic forecasting is crucial to decision-making under uncertainty about future weather. The dominant approach to quantify uncertainty in numerical weather prediction is to use an ensemble of forecasts. However, as the resolution of numerical models increases to meet the demand for highly accurate Digital Twins, generating ensembles will become even more computationally expensive than it is today. To overcome this issue, we propose to generate ensemble forecasts at scale by leveraging recent advances in generative artificial intelligence.

Our approach learns a data-driven probabilistic diffusion model from the 5-member ensemble GEFS reforecast dataset. The model can then be sampled efficiently to produce very large and realistic weather forecast ensembles, conditioned on a few members of the operational GEFS forecasting system. The generated ensembles have similar predictive skill as the full GEFS 31-member ensemble, evaluated against reanalysis, and emulate well the statistics of large physics-based ensembles. We also apply the same methodology to developing a diffusion model for generative post-processing: the model directly learns to correct biases present in the emulated forecasting system by leveraging reanalysis data as labels during training. Ensembles from this generative post-processing model show greater reliability and accuracy, particularly in extreme event classification. In general, they are more reliable and forecast the probability of extreme weather more accurately than the GEFS operational ensemble, even after the latter is post-processed. Our models achieve these results at a small fraction of the cost incurred by the operational GEFS system.

THURSDAY || MAY 9, 2024

YouTube link forthcoming

YONGQUAN QU (Columbia University)

LEAP Research Updates
Qu: “Joint Parameter and Parameterization Inference with Uncertainty Quantification through Differentiable Programming”
Wu: “Data-driven probabilistic air-sea flux model using in-situ direct measurements”