Towards Representation Learning for Atmospheric Dynamics (AtmoDist)

Related tags

Deep LearningAtmoDist
Overview

Towards Representation Learning for Atmospheric Dynamics (AtmoDist)

The prediction of future climate scenarios under anthropogenic forcing is critical to understand climate change and to assess the impact of potentially counter-acting technologies. Machine learning and hybrid techniques for this prediction rely on informative metrics that are sensitive to pertinent but often subtle influences. For atmospheric dynamics, a critical part of the climate system, no well established metric exists and visual inspection is currently still often used in practice. However, this "eyeball metric" cannot be used for machine learning where an algorithmic description is required. Motivated by the success of intermediate neural network activations as basis for learned metrics, e.g. in computer vision, we present a novel, self-supervised representation learning approach specifically designed for atmospheric dynamics. Our approach, called AtmoDist, trains a neural network on a simple, auxiliary task: predicting the temporal distance between elements of a randomly shuffled sequence of atmospheric fields (e.g. the components of the wind field from reanalysis or simulation). The task forces the network to learn important intrinsic aspects of the data as activations in its layers and from these hence a discriminative metric can be obtained. We demonstrate this by using AtmoDist to define a metric for GAN-based super resolution of vorticity and divergence. Our upscaled data matches both visually and in terms of its statistics a high resolution reference closely and it significantly outperform the state-of-the-art based on mean squared error. Since AtmoDist is unsupervised, only requires a temporal sequence of fields, and uses a simple auxiliary task, it has the potential to be of utility in a wide range of applications.

Original implementation of

Hoffmann, Sebastian, and Christian Lessig. "Towards Representation Learning for Atmospheric Dynamics." arXiv preprint arXiv:2109.09076 (2021). https://arxiv.org/abs/2109.09076

presented as part of the NeurIPS 2021 Workshop on Tackling Climate Change with Machine Learning

We would like to thank Stengel et al. for openly making available their implementation (https://github.com/NREL/PhIRE) of Adversarial super-resolution of climatological wind and solar data on which we directly based the super-resolution part of this work.


Requirements

  • tensorflow 1.15.5
  • pyshtools (for SR evaluation)
  • pyspharm (for SR evaluation)
  • h5py
  • hdf5plugin
  • dask.array

Installation

pip install -e ./

This also makes available multiple command line tools that provide easy access to preprocessing, training, and evaluation routines. It's recommended to install the project in a virtual environment as to not polutte the global PATH.


CLI Tools

The provided CLI tools don't accept parameters but rather act as a shortcut to execute the corresponding script files. All parameters controlling the behaviour of the training etc. should thus be adjusted in the script files directly. We list both the command-line command, as well as the script file the command executes.

  • rplearn-data (python/phire/data_tool.py)
    • Samples patches and generates .tfrecords files from HDF5 data for the self-supervised representation-learning task.
  • rplearn-train (python/phire/rplearn/train.py)
    • Trains the representation network. By toggling comments, the same script is also used for evaluation of the trained network.
  • phire-data (python/phire/data_tool.py)
    • Samples patches and generates .tfrecords files from HDF5 data for the super-resolution task.
  • phire-train (python/phire/main.py)
    • Trains the SRGAN model using either MSE or a content-loss based on AtmoDist.
  • phire-eval (python/phire/evaluation/cli.py)
    • Evaluates trained SRGAN models using various metrics (e.g. energy spectrum, semivariogram, etc.). Generation of images is also part of this.

Project Structure

  • python/phire
    • Mostly preserved from the Stengel et al. implementation, this directory contains the code for the SR training. sr_network.py contains the actual GAN model, whereas PhIREGANs.py contains the main training loop, data pipeline, as well as interference procedure.
  • python/phire/rplearn
    • Contains everything related to representation learning task, i.e. AtmoDist. The actual ResNet models are defined in resnet.py, while the training procedure can be found in train.py.
  • python/phire/evaluation
    • Dedicated to the evaluation of the super-resolved fields. The main configuration of the evaluation is done in cli.py, while the other files mostly correspond to specific evaluation metrics.
  • python/phire/data
    • Static data shipped with the python package.
  • python/phire/jetstream
    • WiP: Prediction of jetstream latitude as downstream task.
  • scripts/
    • Various utility scripts, e.g. used to generate some of the figures seen in the paper.

Preparing the data

AtmoDist is trained on vorticity and divergence fields from ERA5 reanalysis data. The data was directly obtained as spherical harmonic coefficients from model level 120, before being converted to regular lat-lon grids (1280 x 2560) using pyshtools (right now not included in this repository).

We assume this gridded data to be stored in a hdf5 file for training and evaluation respectively containing a single dataset /data with dimensions C x T x H x W. These dimensions correspond to channel (/variable), time, height, and width respectively. Patches are then sampled from this hdf5 data and stored in .tfrecords files for training.

In practice, these "master" files actually contained virtual datasets, while the actual data was stored as one hdf5 file per year. This is however not a hard requirement. The script to create these virtual datasets is currently not included in the repository but might be at a later point of time.

To sample patches for training or evaluation run rplearn-data and phire-data.

Normalization

Normalization is done by the phire/data_tool.py script. This procedure is opaque to the models and data is only de-normalized during evaluation. The mean and standard deviations used for normalization can be specified using DataSampler.mean, DataSampler.std, DataSampler.mean_log1p, DataSampler.std_log1p. If specified as None, then these statistics will be calculated from the dataset using dask (this will take some time).


Training the AtmoDist model

  1. Specify dataset location, model name, output location, and number of classes (i.e. max delta T) in phire/rplearn/train.py
  2. Run training using rplearn-train
  3. Switch to evaluation by calling evaluate_all() and compute metrics on eval set
  4. Find optimal epoch and calculate normalization factors (for specific layer) using calculate_loss()

Training the SRGAN model

  1. Specify dataset location, model name, AtmoDist model to use, and training regimen in phire/main.py
  2. Run training using phire-train

Evaluating the SRGAN models

  1. Specify dataset location, models to evaluate, output location, and metrics to calculate in phire/evaluation/cli.py
  2. Evaluate using phire-eval
  3. Toggle the if-statement to generate comparing plots and data between different models and rerun phire-eval
Owner
Sebastian Hoffmann
Sebastian Hoffmann
MIM: MIM Installs OpenMMLab Packages

MIM provides a unified API for launching and installing OpenMMLab projects and their extensions, and managing the OpenMMLab model zoo.

OpenMMLab 254 Jan 04, 2023
Real-time VIBE: Frame by Frame Inference of VIBE (Video Inference for Human Body Pose and Shape Estimation)

Real-time VIBE Inference VIBE frame-by-frame. Overview This is a frame-by-frame inference fork of VIBE at [https://github.com/mkocabas/VIBE]. Usage: i

23 Jul 02, 2022
Cerberus Transformer: Joint Semantic, Affordance and Attribute Parsing

Cerberus Transformer: Joint Semantic, Affordance and Attribute Parsing Paper Introduction Multi-task indoor scene understanding is widely considered a

62 Dec 05, 2022
Prototypical Cross-Attention Networks for Multiple Object Tracking and Segmentation, NeurIPS 2021 Spotlight

PCAN for Multiple Object Tracking and Segmentation This is the offical implementation of paper PCAN for MOTS. We also present a trailer that consists

ETH VIS Group 328 Dec 29, 2022
LightningFSL: Pytorch-Lightning implementations of Few-Shot Learning models.

LightningFSL: Few-Shot Learning with Pytorch-Lightning In this repo, a number of pytorch-lightning implementations of FSL algorithms are provided, inc

Xu Luo 76 Dec 11, 2022
Privacy-Preserving Machine Learning (PPML) Tutorial Presented at PyConDE 2022

PPML: Machine Learning on Data you cannot see Repository for the tutorial on Privacy-Preserving Machine Learning (PPML) presented at PyConDE 2022 Abst

Valerio Maggio 10 Aug 16, 2022
Config files for my GitHub profile.

Canalyst Candas Data Science Library Name Canalyst Candas Description Built by a former PM / analyst to give anyone with a little bit of Python knowle

Canalyst Candas 13 Jun 24, 2022
Framework for abstracting Amiga debuggers and access to AmigaOS libraries and devices.

Framework for abstracting Amiga debuggers. This project provides abstration to control an Amiga remotely using a debugger. The APIs are not yet stable

Roc Vallès 39 Nov 22, 2022
Extreme Dynamic Classifier Chains - XGBoost for Multi-label Classification

Extreme Dynamic Classifier Chains Classifier chains is a key technique in multi-label classification, sinceit allows to consider label dependencies ef

6 Oct 08, 2022
基于PaddleOCR搭建的OCR server... 离线部署用

开头说明 DangoOCR 是基于大家的 CPU处理器 来运行的,CPU处理器 的好坏会直接影响其速度, 但不会影响识别的精度 ,目前此版本识别速度可能在 0.5-3秒之间,具体取决于大家机器的配置,可以的话尽量不要在运行时开其他太多东西。需要配合团子翻译器 Ver3.6 及其以上的版本才可以使用!

胖次团子 131 Dec 25, 2022
Incremental Transformer Structure Enhanced Image Inpainting with Masking Positional Encoding (CVPR2022)

Incremental Transformer Structure Enhanced Image Inpainting with Masking Positional Encoding by Qiaole Dong*, Chenjie Cao*, Yanwei Fu Paper and Supple

Qiaole Dong 190 Dec 27, 2022
Exporter for Storage Area Network (SAN)

SAN Exporter Prometheus exporter for Storage Area Network (SAN). We all know that each SAN Storage vendor has their own glossary of terms, health/perf

vCloud 32 Dec 16, 2022
Gym for multi-agent reinforcement learning

PettingZoo is a Python library for conducting research in multi-agent reinforcement learning, akin to a multi-agent version of Gym. Our website, with

Farama Foundation 1.6k Jan 09, 2023
Arquitetura e Desenho de Software.

S203 Este é um repositório dedicado às aulas de Arquitetura e Desenho de Software, cuja sigla é "S203". E agora, José? Como não tenho muito a falar aq

Fabio 7 Oct 23, 2021
A small library for creating and manipulating custom JAX Pytree classes

Treeo A small library for creating and manipulating custom JAX Pytree classes Light-weight: has no dependencies other than jax. Compatible: Treeo Tree

Cristian Garcia 58 Nov 23, 2022
Text and code for the forthcoming second edition of Think Bayes, by Allen Downey.

Think Bayes 2 by Allen B. Downey The HTML version of this book is here. Think Bayes is an introduction to Bayesian statistics using computational meth

Allen Downey 1.5k Jan 08, 2023
PaRT: Parallel Learning for Robust and Transparent AI

PaRT: Parallel Learning for Robust and Transparent AI This repository contains the code for PaRT, an algorithm for training a base network on multiple

Mahsa 0 May 02, 2022
Real-Time-Student-Attendence-System - Real Time Student Attendence System

Real-Time-Student-Attendence-System The Student Attendance Management System Pro

Rounak Das 1 Feb 15, 2022
Code for "ATISS: Autoregressive Transformers for Indoor Scene Synthesis", NeurIPS 2021

ATISS: Autoregressive Transformers for Indoor Scene Synthesis This repository contains the code that accompanies our paper ATISS: Autoregressive Trans

138 Dec 22, 2022
PyTorch implementation of PP-LCNet

PP-LCNet-Pytorch Pre-Trained Models Google Drive p018 Accuracy Models Top1 Top5 PPLCNet_x0_25 0.5186 0.7565 PPLCNet_x0_35 0.5809 0.8083 PPLCNet_x0_5 0

24 Dec 12, 2022