2023 Spring Lectures in Climate Data Science

Biweekly on Thursdays || Feb. 2, 2023 - May 25, 2023

pierre gentine
THURSDAY || FEB. 2, 2023

Columbia University

Physics to Machine Learning and Machine Learning Back to Physics
Over the last couple of years, we have witnessed an explosion in the use of machine learning for Earth system science applications ranging from Earth monitoring to modeling. Machine learning has shown tremendous success in emulating complex physics such as atmospheric convection or terrestrial carbon and water fluxes using satellite or high-fidelity simulations in a supervised framework. However, machine learning, especially deep learning, is opaque (the so-called black box issue) and thus a question remains: what (new) understanding have we really developed?
I will here illustrate the value of machine learning to understand or discover new processes in climate, with an application to rainfall organization. I will also present new tools merging causal discovery and machine learning to improve the trustworthiness and interpretability of machine learning for climate and physics applications.

THURSDAY || FEB. 16, 2023

Concordia University (Montréal)

Climate Knowledge Transfer in a Time of Crises, Transitions, and New Collaborations
Drawing from his recent experience at Columbia University’s Climate School, as well with environmental justice alliances and municipal public-private climate partnerships, Jean-Noé will identify knowledge transfer challenges in the context of shifts and transitions in inter-sectoral collaboration dynamics. From that basis, Jean-Noé will then discuss a series of considerations that strengthen trust, equity, and integrated action, including the new mediation competencies of data stewards, ecosystemic evaluation and monitoring processes, humility and community wisdom-centred approaches, and the strategic place of social infrastructure in knowledge exchanges. To conclude, Jean-Noé will situate these techniques and tactics in the emerging field and practice of intersystems mediation and describe the mutually reinforcing advantages of data mapping frameworks and community data agreements for future climate knowledge networks.

THURSDAY || MAR. 2, 2023

Columbia University

Automating Discovery: From Cognitive Robotics to Particle Physics
Can robots discover scientific laws automatically? Despite the prevalence of big data and machine learning, the process of distilling data into scientific laws has resisted automation. This talk will outline a series of recent research projects, starting with self-reflecting robotic systems, and ending with machines that can formulate hypotheses, design experiments, and interpret the results, to discover new physical variables and scientific laws. We will see examples from biology to cosmology, from classical physics to modern physics, from big science to small science.

THURSDAY || MAR. 30, 2023

UC San Diego

Physics-Guided Deep Learning for Climate Dynamics
Mathematical models and computer simulations are widely used tools for understanding complex dynamics in climate, materials and infectious disease. However, existing approaches are computationally expensive at high resolution, fundamentally hindering real-time decision making in science and engineering. While deep learning has shown tremendous promise in accelerating simulations, it remains a grand challenge to incorporate physical principles in a systematic manner into the design and training of such models. In this talk, I will demonstrate how to principally incorporate physics in deep learning models to accelerate simulations and assist decision-making. I will showcase the applications of these models to challenging problems in climate science and infectious disease modeling.

THURSDAY || APR. 13, 2023

Allen Institute for AI (AI2)

Corrective Machine Learning for Interpretable Improvement of Climate Models
AI2, with GFDL, has developed a corrective hybrid machine learning (ML) methodology to 
improve weather forecast skill and reduce climate biases in a computationally efficient coarse-grid climate model.  The corrective ML is trained to emulate a time-dependent global reference by learning state-dependent ‘nudging tendencies’ of temperature, moisture and winds.  The reference can be a reanalysis (for present-climate simulation) or a finer-grid version of the same model that may be more trustworthy across a range of climates. The ML is interpreted as a correction to the combined physics parameterizations of the coarse-grid model.  Unlike in other emulation approaches, heat and moisture conservation are built in. We train the ML on global 25 km reference simulations in multiple climates, and separately on a year-long 3 km simulation, and apply it in 200 km coarse-grid simulations.  The ML reduces annual-mean land temperature and precipitation pattern biases by up to 50% and enhances weather forecast skill.

THURSDAY || APR. 27, 2023


Parameter Estimation for Improved Capacity of the Community Land Model for Actionable Science
The Community Land Model (CLM) is the land component of the Community Earth System Model (CESM). As the science of climate change evolves from questions about how much global climate change there will be to questions of what we are going to do to mitigate and adapt to this climate, the requirements for Earth System models (ESMs) are changing. Here, I will review how land models in ESMs are evolving to be applicable to an ever-growing range of questions related to food and water security, effectiveness of nature-based carbon dioxide removal methods, ecosystem vulnerability, and changes in extremes. I will focus on how parameter estimation, coupled with other model advances, can improve the utility of CLM and CESM for this broadening range of climate science objectives.

THURSDAY || MAY 4, 2023

Princeton University

Measuring And Enforcing Diversity In Machine Learning
Diversity is important for many areas of machine learning, including generative modeling, reinforcement learning, active learning, and dataset curation. Yet, little effort has gone into formalizing and understanding how to effectively measure or enforce diversity. This talk will describe the Vendi Score, a new metric for measuring diversity that connects and extends ideas from ecology and quantum mechanics. The Vendi Score is defined as the Shannon entropy of the eigenvalues of a user-defined similarity matrix. It is general in that (1) it can be applied to any domain where similarity can be defined and (2) it doesn’t require defining a probability distribution over the collection to be evaluated for diversity. The Vendi Score can therefore be used to measure the diversity of datasets, samples from a generative model, outputs from decoding algorithms, or any collection for which we want to assess diversity. We will showcase the Vendi Score as a diversity evaluation metric in several domains and as a means to improve the exploration of molecular conformation spaces.

THURSDAY || MAY 18, 2023

NVIDIA / UC Irvine

Adventures in hybrid physics-AI climate modeling within academia and full AI weather prediction with NVIDIA.
Low cloud forming turbulence is a key source of climate model prediction uncertainty that, despite seeming unapproachable to simulate on planetary scales, could soon come into computational range with hybrid machine learning methods. In Part 1, I will discuss a chain of recent work spanning UCI and Columbia driving in this direction that has tried to outsource explicit computations within “multi-scale” climate models to simple neural networks. Focus will be on the unsolved challenge of controlling stubborn prognostic error growth in such hybrid AI climate models and the emerging potential of physical renormalizations to achieve “climate invariance” and prognostic reliability. Some results emerging within LEAP trying to quantify the importance of such design decisions amidst the substantial noise of hyperparameter selection and prognostic pathologies will be included.

In Part 2, I will discuss how sophisticated developments in Artificial Intelligence in industry have enabled professional data scientists to develop powerful scientific tools that greatly surpass the capabilities of traditional hand-written weather prediction codes. I will discuss NVIDIA’s AI powered medium range weather-forecast model FourCastNet, and its imminent successors, which exhibit skill on par with the top global NWP models, such as the Integrated Forecast System (IFS) model from the European Center for Medium Range Weather Prediction (ECMWF), but with nearly instantaneous results, using a tiny fraction of the computer hardware and power. Scientific implications for massive-ensemble predictions of tail risk hazard in a changing climate will be discussed. I will conclude with an outlook on how NVIDIA is leveraging AI and GPU-accelerated computing to prototype several new technologies relevant to the world’s efforts to develop digital twins that reflect the current and projected future states of our planet.

THURSDAY || MAY 25, 2023

UC Irvine

From Compression to Convection: a Latent Variable Perspective
Latent variable models have been an integral part of probabilistic machine learning, ranging from simple mixture models to variational autoencoders to powerful diffusion probabilistic models at the center of recent media attention. Perhaps less well-appreciated is the intimate connection between latent variable models and compression, and the potential of these models for advancing natural science. I will begin by showcasing connections between variational methods and the theory and practice of neural data compression, ranging from constructing learnable codecs to assessing the fundamental compressibility of real-world data, such as images and particle physics data. I will then connect this lossy compression perspective to climate science problems, which often involve distribution shifts between unlabeled datasets, such as simulation data from different models or data simulated under different assumptions (e.g., global average temperatures). I will show that a combination of non-linear dimensionality reduction and vector quantization can assess the magnitude of these shifts and enable intercomparisons of different climate simulations. Additionally, when combined with physical model assumptions, this approach can provide insights into the implications of global warming on extreme precipitation.