SLAMP: Stochastic Latent Appearance and Motion Prediction

Last update: Dec 08, 2022

Overview

SLAMP: Stochastic Latent Appearance and Motion Prediction

Official implementation of the paper SLAMP: Stochastic Latent Appearance and Motion Prediction (Adil Kaan Akan, Erkut Erdem, Aykut Erdem, Fatma Guney), accepted and presented at ICCV 2021.

Requirements

All models were trained with Python 3.7.6 and PyTorch 1.4.0 using CUDA 10.1.

A list of required Python packages is available in the requirements.txt file.

Datasets

For preparations of datasets, we followed SRVP's code. Please follow the links below if you want to construct the datasets.

Stochastic Moving MNIST

KTH

BAIR

KITTI

For KITTI, you need to download the Raw KITTI dataset and extract the zip files. You can follow the official KITTI page.

A good idea might be preprocessing every image in the dataset so that all of them have a size of (w=310, h=92). Then, you can disable the resizing operation in the data loaders, which will speed up the training.

Cityscapes

For Cityscapes, you need to download leftImg8bit_sequence from the official Cityscapes page.

leftImg8bit_sequence contains 30-frame snippets (17Hz) surrounding each left 8-bit image (-19 | +10) from the train, val, and test sets (150000 images).

A good idea might be preprocessing every image in the dataset so that all of them have a size of (w=256, h=128). Then, you can disable the resizing operation in the data loaders, which will speed up the training.

Training

To train a new model, the script train.py should be used as follows:

Data directory ($DATA_DIR) and $SAVE_DIR must be given using options --data_root $DATA_DIR --log_dir $SAVE_DIR. To use GPU, you need to use --device flag.

for Stochastic Moving MNIST:

--n_past 5 --n_future 10 --n_eval 25 --z_dim_app 20 --g_dim_app 128 --z_dim_motion 20
--g_dim_motion 128 --last_frame_skip --running_avg --batch_size 32

for KTH:

--dataset kth --n_past 10 --n_future 10 --n_eval 40 --z_dim_app 50 --g_dim_app 128 --z_dim_motion 50 --model vgg
--g_dim_motion 128 --last_frame_skip --running_avg --sch_sampling 25 --batch_size 20

for BAIR:

--dataset bair --n_past 2 --n_future 10 --n_eval 30 --z_dim_app 64 --g_dim_app 128 --z_dim_motion 64 --model vgg
--g_dim_motion 128 --last_frame_skip --running_avg --sch_sampling 25 --batch_size 20 --channels 3

for KITTI:

--dataset bair --n_past 10 --n_future 10 --n_eval 30 --z_dim_app 32 --g_dim_app 64 --z_dim_motion 32 --batch_size 8
--g_dim_motion 64 --last_frame_skip --running_avg --model vgg --niter 151 --channels 3

for Cityscapes:

--dataset bair --n_past 10 --n_future 10 --n_eval 30 --z_dim_app 32 --g_dim_app 64 --z_dim_motion 32 --batch_size 7
--g_dim_motion 64 --last_frame_skip --running_avg --model vgg --niter 151 --channels 3 --epoch_size 1300

Testing

To evaluate a trained model, the script evaluate.py should be used as follows:

python evaluate.py --data_root $DATADIR --log_dir $LOG_DIR --model_path $MODEL_PATH

where $LOG_DIR is a directory where the results will be saved, $DATADIR is the directory containing the test set.

Important note: The directory containing the script should include a directory called lpips_weights which contains v0.1 LPIPS weights (from the official repository of The Unreasonable Effectiveness of Deep Features as a Perceptual Metric).

To run the evaluation on GPU, use the option --device.

Pretrained weight links with Dropbox

- For MNIST:

wget https://www.dropbox.com/s/eseisehe2u0epiy/slamp_mnist.pth

For KTH:

wget https://www.dropbox.com/s/7m0806nt7xt9bz8/slamp_kth.pth

For BAIR:

wget https://www.dropbox.com/s/cl1pzs5trw3ltr0/slamp_bair.pth

For KITTI:

wget https://www.dropbox.com/s/p7wdboswakyj7yi/slamp_kitti.pth

For Cityscapes:

wget https://www.dropbox.com/s/lzwiivr1irffhsj/slamp_cityscapes.pth

PSNR, SSIM, and LPIPS results reported in the paper were obtained with the following options:

for stochastic Moving MNIST:

python evaluate.py --data_root $DATADIR --log_dir $LOG_DIR --model_path $MODEL_PATH --n_past 5 --n_future 20

for KTH:

python evaluate.py --data_root $DATADIR --log_dir $LOG_DIR --model_path $MODEL_PATH --n_past 10 --n_future 30

for BAIR:

python evaluate.py --data_root $DATADIR --log_dir $LOG_DIR --model_path $MODEL_PATH --n_past 2 --n_future 28

for KITTI:

python evaluate.py --data_root $DATADIR --log_dir $LOG_DIR --model_path $MODEL_PATH --n_past 10 --n_future 20

for Cityscapes:

python evaluate.py --data_root $DATADIR --log_dir $LOG_DIR --model_path $MODEL_PATH --n_past 10 --n_future 20

To calculate FVD results, you can use calculate_fvd.py script as follows:

python calculate_fvd.py $LOG_DIR $SAMPLE_NAME

where $LOG_DIR is the directory containg the results generated by the evaluate script and $SAMPLE_NAME is the file which contains the samples such as psnr.npz, ssim.npz or lpips.npz. The script will print the FVD value at the end.

How to Cite

Please cite the paper if you benefit from our paper or the repository:

@InProceedings{Akan2021ICCV,
    author    = {Akan, Adil Kaan and Erdem, Erkut and Erdem, Aykut and Guney, Fatma},
    title     = {SLAMP: Stochastic Latent Appearance and Motion Prediction},
    booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
    month     = {October},
    year      = {2021},
    pages     = {14728-14737}
}

Acknowledgments

We would like to thank SRVP and SVG authors for making their repositories public. This repository contains several code segments from SRVP's repository and SVG's repository. We appreciate the efforts by Berkay Ugur Senocak for cleaning the code before release.

You might also like...

Exploring Versatile Prior for Human Motion via Motion Frequency Guidance (3DV2021)

Comments

Details on KTH and BAIR Validation Sets

Hi! Thanks for providing the implementation of SLAMP. In the data processing scripts (data/kth.py and data/bair.py), how do you generate kth_valset_40.npz and bair_valset_30.npz? Is it following the SRVP's code for generating test sets? Could you please provide some details on those sets? Thank you!

opened by hanghang177 4
nsample missing arguments

Hi during running your code, i was unexpectedly see an error due to missing arguments

File "/notebooks/slamp/helpers.py", line 362, in eval_step nsample = opt.nsample

File args.py doesnt have any definition about nsample, what does nsample mean? I suppose it should be the number of samples per batch in evaluation which means eval batch size Thanks for your reading

opened by eric-le-12 1

SLAMP: Stochastic Latent Appearance and Motion Prediction

Related tags

Overview

SLAMP: Stochastic Latent Appearance and Motion Prediction

Article

Preprint

Project Website

Pretrained Models

Requirements

Datasets

KITTI

Cityscapes

Training

Testing

How to Cite

Acknowledgments

You might also like...

Exploring Versatile Prior for Human Motion via Motion Frequency Guidance (3DV2021)

MAU: A Motion-Aware Unit for Video Prediction and Beyond, NeurIPS2021

Kaggle Lyft Motion Prediction for Autonomous Vehicles 4th place solution

[arXiv] What-If Motion Prediction for Autonomous Driving ❓🚗💨

Waymo motion prediction challenge 2021: 3rd place solution

Multi-Person Extreme Motion Prediction

Sound and Cost-effective Fuzzing of Stripped Binaries by Incremental and Stochastic Rewriting

Price-Prediction-For-a-Dream-Home - A machine learning based linear regression trained model for house price prediction.

Doge-Prediction - Coding Club prediction ig

Comments

Details on KTH and BAIR Validation Sets

nsample missing arguments

Releases(v1.0)

v1.0(Dec 10, 2021)

Owner

Kaan Akan

Official code for the publication "HyFactor: Hydrogen-count labelled graph-based defactorization Autoencoder".

Efficiently Disentangle Causal Representations

Official codebase for "B-Pref: Benchmarking Preference-BasedReinforcement Learning" contains scripts to reproduce experiments.

A flexible and extensible framework for gait recognition.

Jingju baseline - A baseline model of our project of Beijing opera script generation

[ICCV2021] Safety-aware Motion Prediction with Unseen Vehicles for Autonomous Driving

Scale-aware Automatic Augmentation for Object Detection (CVPR 2021)

QTool: A Low-bit Quantization Toolbox for Deep Neural Networks in Computer Vision

A Python library for differentiable optimal control on accelerators.

Bayesian-Torch is a library of neural network layers and utilities extending the core of PyTorch to enable the user to perform stochastic variational inference in Bayesian deep neural networks

Official Pytorch implementation of 6DRepNet: 6D Rotation representation for unconstrained head pose estimation.

CvT-ASSD: Convolutional vision-Transformerbased Attentive Single Shot MultiBox Detector (ICTAI 2021 CCF-C 会议)The 33rd IEEE International Conference on Tools with Artificial Intelligence

Use evolutionary algorithms instead of gridsearch in scikit-learn

"SOLQ: Segmenting Objects by Learning Queries", SOLQ is an end-to-end instance segmentation framework with Transformer.

PyTorch code for JEREX: Joint Entity-Level Relation Extractor

Parsing, analyzing, and comparing source code across many languages

This is the repository for The Machine Learning Workshops, published by AI DOJO

Implementation of a Transformer, but completely in Triton

Code for ICML 2021 paper: How could Neural Networks understand Programs?

ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators