Implementation of experiments in the paper Clockwork Variational Autoencoders (project website) using JAX and Flax

Last update: Oct 05, 2022

Overview

Clockwork VAEs in JAX/Flax

Implementation of experiments in the paper Clockwork Variational Autoencoders (project website) using JAX and Flax, ported from the official TensorFlow implementation.

Running on a single TPU v3, training is 10x faster than reported in the paper (60h -> 6h on minerl).

Method

Clockwork VAEs are deep generative model that learn long-term dependencies in video by leveraging hierarchies of representations that progress at different clock speeds. In contrast to prior video prediction methods that typically focus on predicting sharp but short sequences in the future, Clockwork VAEs can accurately predict high-level content, such as object positions and identities, for 1000 frames.

Clockwork VAEs build upon the Recurrent State Space Model (RSSM), so each state contains a deterministic component for long-term memory and a stochastic component for sampling diverse plausible futures. Clockwork VAEs are trained end-to-end to optimize the evidence lower bound (ELBO) that consists of a reconstruction term for each image and a KL regularizer for each stochastic variable in the model.

Instructions

This repository contains the code for training the Clockwork VAE model on the datasets minerl, mazes, and mmnist.

The datasets will automatically be downloaded into the --datadir directory.

python3 train.py --logdir /path/to/logdir --datadir /path/to/datasets --config configs/<dataset>.yml

The evaluation script writes open-loop video predictions in both PNG and NPZ format and plots of PSNR and SSIM to the data directory.

python3 eval.py --logdir /path/to/logdir

Known differences from the original

Flax' default kernel initializer, layer precision and GRU implementation (avoiding redundant biases) are used.
For some configuration parameters, only the defaults are implemented.
Training metrics and videos are logged with wandb.
The base configuration is in config.py.

Added features:

This implementation runs on TPU out-of-the-box.
Apart from the config file, configuration can be done via command line and wandb.
Matching the seed of a previous run will exactly repeat it.

Things to watch out for

Replication of paper results for the mazes dataset has not been confirmed yet.

Getting evaluation metrics is a memory bottleneck during training, due to the large eval_seq_len. If you run out of device memory, consider lowering it during training, for example to 100. Remember to pass in the original value to eval.py to get unchanged results.

Acknowledgements

Thanks to Vaibhav Saxena and Danijar Hafner for helpful discussions and to Jamie Townsend for reviewing code.

Implementation of experiments in the paper Clockwork Variational Autoencoders (project website) using JAX and Flax

Related tags

Overview

Clockwork VAEs in JAX/Flax

Method

Instructions

Known differences from the original

Things to watch out for

Acknowledgements

Owner

Julius Kunze

Learning based AI for playing multi-round Koi-Koi hanafuda card games. Have fun.

Improved Fitness Optimization Landscapes for Sequence Design

Individual Tree Crown classification on WorldView-2 Images using Autoencoder -- Group 9 Weak learners - Final Project (Machine Learning 2020 Course)

This repo is the official implementation of "L2ight: Enabling On-Chip Learning for Optical Neural Networks via Efficient in-situ Subspace Optimization".

Source code and dataset for ACL2021 paper: "ERICA: Improving Entity and Relation Understanding for Pre-trained Language Models via Contrastive Learning".

Supervised multi-SNE (S-multi-SNE): Multi-view visualisation and classification

Implementation of several Bayesian multi-target tracking algorithms, including Poisson multi-Bernoulli mixture filters for sets of targets and sets of trajectories. The repository also includes the GOSPA metric and a metric for sets of trajectories to evaluate performance.

functorch is a prototype of JAX-like composable function transforms for PyTorch.

Breaking the Curse of Space Explosion: Towards Efficient NAS with Curriculum Search

Code release for DS-NeRF (Depth-supervised Neural Radiance Fields)

A pytorch reproduction of { Co-occurrence Feature Learning from Skeleton Data for Action Recognition and Detection with Hierarchical Aggregation }.

Convolutional Neural Network for 3D meshes in PyTorch

Personals scripts using ageitgey/face_recognition

Pre-trained BERT Models for Ancient and Medieval Greek, and associated code for LaTeCH 2021 paper titled - "A Pilot Study for BERT Language Modelling and Morphological Analysis for Ancient and Medieval Greek"

3D ResNets for Action Recognition (CVPR 2018)

PyTorch implementation of "Contrast to Divide: self-supervised pre-training for learning with noisy labels"

Source code for Zalo AI 2021 submission

Semantic Segmentation in Pytorch. Network include: FCN、FCN_ResNet、SegNet、UNet、BiSeNet、BiSeNetV2、PSPNet、DeepLabv3_plus、 HRNet、DDRNet

Conformer: Local Features Coupling Global Representations for Visual Recognition

The Surprising Effectiveness of Visual Odometry Techniques for Embodied PointGoal Navigation