Unsupervised Video Interpolation using Cycle Consistency

Overview

Unsupervised Video Interpolation using Cycle Consistency

Project | Paper | YouTube

Unsupervised Video Interpolation using Cycle Consistency
Fitsum A. Reda, Deqing Sun*, Aysegul Dundar, Mohammad Shoeybi, Guilin Liu, Kevin J. Shih, Andrew Tao, Jan Kautz, Bryan Catanzaro
NVIDIA Corporation
In International Conferene on Computer Vision (ICCV) 2019.
( * Currently affiliated with Google. )

Installation
# Get unsupervised video interpolation source codes
git clone https://github.com/NVIDIA/unsupervised-video-interpolation.git
cd unsupervised-video-interpolation
mkdir pretrained_models

# Build Docker Image
docker build -t unsupervised-video-interpolation -f Dockerfile .

If you prefer not to use docker, you can manually install the following requirements:

  • An NVIDIA GPU and CUDA 9.0 or higher. Some operations only have gpu implementation.
  • PyTorch (>= 1.0)
  • Python 3
  • numpy
  • scikit-image
  • imageio
  • pillow
  • tqdm
  • tensorboardX
  • natsort
  • ffmpeg
  • torchvision

To propose a model or change for inclusion, please submit a pull request.

Multiple GPU training and mixed precision training are supported, and the code provides examples for training and inference. For more help, type

python3 train.py --help

Network Architectures

Our repo now supports Super SloMo. Other video interpolation architectures can be integrated with our repo with minimal changes, for instance DVF or SepConv.

Pre-trained Models

We've included pre-trained models trained with cycle consistency (CC) alone, or with cycle consistency with Psuedo-supervised (CC + PS) losses.
Download checkpoints to a folder pretrained_models.

Supervised Baseline Weights

Unsupervised Finetuned Weights

Fully Unsupervised Weights for UCF101 evaluation

Data Loaders

We use VideoInterp and CycleVideoInterp (in datasets) dataloaders for all frame sequences, i.e. Adobe, YouTube, SlowFlow, Sintel, and UCF101.

We split Slowflow dataset into disjoint sets: A low FPS training (3.4K frames) and a high FPS test (414 frames) subset. We form the test set by selecting the first nine frames in each of the 46 clips, and train set by temporally sub-sampling the remaining frames from 240-fps to 30-fps. During evaluation, our models take as input the first and ninth frame in each test clip and interpolate seven intermediate frames. We follow a similar procedure for Sintel-1008fps, but interpolate 41 intermediate frames, i.e., conversion of frame rate from 24- to 1008-fps. Note, since SlowFlow and Sintel are of high resolution, we downsample all frames by a factor of 2 isotropically.
All training and evaluations presented in the paper are done on the spatially downsampled sequences.

For UCF101, we simply use the the test provided here.

Generating Interpolated Frames or Videos

  • --write_video and --write_images, if enabled will create an interpolated video and interpolated frame sequences, respectively.
#Example creation of interpolated videos, where we interleave low FPS input frames with one or more interpolated intermediate frames.
python3 eval.py --model CycleHJSuperSloMo --num_interp 7 --flow_scale 2 --val_file ${/path/to/input/sequences} \
    --name ${video_name} --save ${/path/to/output/folder} --post_fix ${output_image_tag} \
    --resume ${/path/to/pre-trained/model} --write_video
  • If input sequences for interpolation do not contain ground-truth intermediate frames, add --val_sample_rate 0 and --val_step_size 1 to the example script above.
  • For a simple test on two input frames, set --val_file to the folder containing both frames, and set --val_sample_rate 0, --val_step_size 1.

Images : Results and Comparisons

.
.
.

Inference for Unsupervised Models

  • UCF101: A total of 379 folders, each with three frames, with the middle frame being the ground-truth for a single frame interpolation.
# Evaluation of model trained with CC alone on Adobe-30fps dataset
# PSNR: 34.47, SSIM: 0.946, IE: 5.50
python3 eval.py --model CycleHJSuperSloMo --num_interp 1 --flow_scale 1 --val_file /path/to/ucf/root \
    --resume ./pretrained_models/fully_unsupervised_adobe30fps.pth
# Evaluation of model trained with CC alone on Battlefield-30fps dataset
# PSNR: 34.55, SSIM: 0.947, IE: 5.38
python3 eval.py --model CycleHJSuperSloMo --num_interp 1 --flow_scale 1 --val_file /path/to/ucf/root \
    --resume ./pretrained_models/fully_unsupervised_battlefield30fps.pth
  • SlowFlow: A total of 46 folders, each with nine frames, with the intermediate nine frames being ground-truths for a 30->240FPS multi-frame interpolation.
# Evaluation of model trained with CC alone on SlowFlow-30fps train split
# PSNR: 32.35, SSIM: 0.886, IE: 6.78
python3 eval.py --model CycleHJSuperSloMo --num_interp 7 --flow_scale 2 --val_file /path/to/SlowFlow/val \
    --resume ./pretrained_models/unsupervised_random2slowflow.pth
# Evaluation of model finetuned with CC+PS losses on SlowFlow-30fps train split.
# Model pre-trained with supervision on Adobe-240fps.
# PSNR: 33.05, SSIM: 0.890, IE: 6.62
python3 eval.py --model CycleHJSuperSloMo --num_interp 7 --flow_scale 2 --val_file /path/to/SlowFlow/val \
    --resume ./pretrained_models/unsupervised_adobe2slowflow.pth
# Evaluation of model finetuned with CC+PS losses on SlowFlow-30fps train split.
# Model pre-trained with supervision on Adobe+YouTube-240fps.
# PSNR: 33.20, SSIM: 0.891, IE: 6.56
python3 eval.py --model CycleHJSuperSloMo --num_interp 7 --flow_scale 2 --val_file /path/to/SlowFlow/val \
    --resume ./pretrained_models/unsupervised_adobe+youtube2slowflow.pth
  • Sintel: A total of 13 folders, each with 43 frames, with the intermediate 41 frames being ground-truths for a 30->1008FPS multi-frame interpolation.
We simply use the same commands used for SlowFlow, but setting `--num_interp 41`
and the corresponding `--resume *2sintel.pth` pre-trained models should lead to the number we presented in our papers.

Inference for Supervised Baseline Models

  • UCF101: A total of 379 folders, each with three frames, with the middle frame being the ground-truth for a single frame interpolation.
# Evaluation of model trained with Paird-GT on Adobe-240fps dataset
# PSNR: 34.63, SSIM: 0.946, IE: 5.48
python3 eval.py --model HJSuperSloMo --num_interp 1 --flow_scale 1 --val_file /path/to/ucf/root \
    --resume ./pretrained_models/baseline_superslomo_adobe.pth
  • SlowFlow: A total of 46 folders, each with nine frames, with the intermediate nine frames being ground-truths for a 30->240FPS multi-frame interpolation.
# Evaluation of model trained with paird-GT on Adobe-240fps dataset
# PSNR: 32.84, SSIM: 0.887, IE: 6.67
python3 eval.py --model HJSuperSloMo --num_interp 7 --flow_scale 2 --val_file /path/to/SlowFlow/val \
    --resume ./pretrained_models/baseline_superslomo_adobe.pth
# Evaluation of model trained with paird-GT on Adobe+YouTube-240fps dataset
# PSNR: 33.13, SSIM: 0.889, IE: 6.63
python3 eval.py --model HJSuperSloMo --num_interp 7 --flow_scale 2 --val_file /path/to/SlowFlow/val \
    --resume ./pretrained_models/baseline_superslomo_adobe+youtube.pth
  • Sintel: We use commands similar to SlowFlow, but setting --num_interp 41.

Training and Reproducing Our Results

# CC alone: Fully unsupervised training on SlowFlow and evaluation on SlowFlow
# SlowFlow/val target PSNR: 32.35, SSIM: 0.886, IE: 6.78
python3 -m torch.distributed.launch --nproc_per_node=16 train.py --model CycleHJSuperSloMo \
    --flow_scale 2.0 --batch_size 2 --crop_size 384 384 --print_freq 1 --dataset CycleVideoInterp \
    --step_size 1 --sample_rate 0 --num_interp 7 --val_num_interp 7 --skip_aug --save_freq 20 --start_epoch 0 \
    --train_file /path/to/SlowFlow/train --val_file SlowFlow/val --name unsupervised_slowflow --save /path/to/output 

# --nproc_per_node=16, we use a total of 16 V100 GPUs over two nodes.
# CC + PS: Unsupervised fine-tuning on SlowFlow with a baseline model pre-trained on Adobe+YouTube-240fps.
# SlowFlow/val target PSNR: 33.20, SSIM: 0.891, IE: 6.56
python3 -m torch.distributed.launch --nproc_per_node=16 train.py --model CycleHJSuperSloMo \
    --flow_scale 2.0 --batch_size 2 --crop_size 384 384 --print_freq 1 --dataset CycleVideoInterp \
    --step_size 1 --sample_rate 0 --num_interp 7 --val_num_interp 7 --skip_aug --save_freq 20 --start_epoch 0 \
    --train_file /path/to/SlowFlow/train --val_file /path/to/SlowFlow/val --name finetune_slowflow \
    --save /path/to/output --resume ./pretrained_models/baseline_superslomo_adobe+youtube.pth
# Supervised baseline training on Adobe240-fps and evaluation on SlowFlow
# SlowFlow/val target PSNR: 32.84, SSIM: 0.887, IE: 6.67
python3 -m torch.distributed.launch --nproc_per_node=16 train.py --model HJSuperSloMo \
    --flow_scale 2.0 --batch_size 2 --crop_size 352 352 --print_freq 1 --dataset VideoInterp \
    --num_interp 7 --val_num_interp 7 --skip_aug --save_freq 20 --start_epoch 0 --stride 32 \
    --train_file /path/to/Adobe-240fps/train --val_file /path/to/SlowFlow/val --name supervised_adobe \
    --save /path/to/output

Reference

If you find this implementation useful in your work, please acknowledge it appropriately and cite the paper or code accordingly:

@InProceedings{Reda_2019_ICCV,
author = {Fitsum A Reda and Deqing Sun and Aysegul Dundar and Mohammad Shoeybi and Guilin Liu and Kevin J Shih and Andrew Tao and Jan Kautz and Bryan Catanzaro},
title = {Unsupervised Video Interpolation Using Cycle Consistency},
booktitle = {The IEEE International Conference on Computer Vision (ICCV)},
month = {October},
year = {2019},
url={https://nv-adlr.github.io/publication/2019-UnsupervisedVideoInterpolation}
}

We encourage people to contribute to our code base and provide suggestions, point any issues, or solution using merge request, and we hope this repo is useful.

Acknowledgments

Parts of the code were inspired by NVIDIA/flownet2-pytorch, ClementPinard/FlowNetPytorch, and avinashpaliwal/Super-SloMo.

We would also like to thank Huaizu Jiang.

Coding style

  • 4 spaces for indentation rather than tabs
  • 80 character line length
  • PEP8 formatting
Owner
NVIDIA Corporation
NVIDIA Corporation
Kaggle Lyft Motion Prediction for Autonomous Vehicles 4th place solution

Lyft Motion Prediction for Autonomous Vehicles Code for the 4th place solution of Lyft Motion Prediction for Autonomous Vehicles on Kaggle. Discussion

44 Jun 27, 2022
SphereFace: Deep Hypersphere Embedding for Face Recognition

SphereFace: Deep Hypersphere Embedding for Face Recognition By Weiyang Liu, Yandong Wen, Zhiding Yu, Ming Li, Bhiksha Raj and Le Song License SphereFa

Weiyang Liu 1.5k Dec 29, 2022
eXPeditious Data Transfer

xpdt: eXPeditious Data Transfer About xpdt is (yet another) language for defining data-types and generating code for serializing and deserializing the

Gianni Tedesco 3 Jan 06, 2022
Perform Linear Classification with Multi-way Data

MultiwayClassification This is an R package to perform linear classification for data with multi-way structure. The distance-weighted discrimination (

Eric F. Lock 2 Dec 15, 2020
FastReID is a research platform that implements state-of-the-art re-identification algorithms.

FastReID is a research platform that implements state-of-the-art re-identification algorithms.

JDAI-CV 2.8k Jan 07, 2023
Defending graph neural networks against adversarial attacks (NeurIPS 2020)

GNNGuard: Defending Graph Neural Networks against Adversarial Attacks Authors: Xiang Zhang ( Zitnik Lab @ Harvard 44 Dec 07, 2022

Multi Task RL Baselines

MTRL Multi Task RL Algorithms Contents Introduction Setup Usage Documentation Contributing to MTRL Community Acknowledgements Introduction M

Facebook Research 171 Jan 09, 2023
An example project demonstrating how the Autonomous Learning Library can be used to build new reinforcement learning agents.

About This repository shows how Autonomous Learning Library can be used to build new reinforcement learning agents. In particular, it contains a model

Chris Nota 5 Aug 30, 2022
Improving Query Representations for DenseRetrieval with Pseudo Relevance Feedback:A Reproducibility Study.

APR The repo for the paper Improving Query Representations for DenseRetrieval with Pseudo Relevance Feedback:A Reproducibility Study. Environment setu

ielab 8 Nov 26, 2022
Point-NeRF: Point-based Neural Radiance Fields

Point-NeRF: Point-based Neural Radiance Fields Project Sites | Paper | Primary c

Qiangeng Xu 662 Jan 01, 2023
Yolo algorithm for detection + centroid tracker to track vehicles

Vehicle Tracking using Centroid tracker Algorithm used : Yolo algorithm for detection + centroid tracker to track vehicles Backend : opencv and python

6 Dec 21, 2022
HackBMU-5.0-Team-Ctrl-Alt-Elite - HackBMU 5.0 Team Ctrl Alt Elite

HackBMU-5.0-Team-Ctrl-Alt-Elite The search is over. We present to you ‘Health-A-

3 Feb 19, 2022
Megaverse is a new 3D simulation platform for reinforcement learning and embodied AI research

Megaverse Megaverse is a new 3D simulation platform for reinforcement learning and embodied AI research. The efficient design of the engine enables ph

Aleksei Petrenko 191 Dec 23, 2022
Evaluation toolkit of the informative tracking benchmark comprising 9 scenarios, 180 diverse videos, and new challenges.

Informative-tracking-benchmark Informative tracking benchmark (ITB) higher diversity. It contains 9 representative scenarios and 180 diverse videos. m

Xin Li 15 Nov 26, 2022
Chess reinforcement learning by AlphaGo Zero methods.

About Chess reinforcement learning by AlphaGo Zero methods. This project is based on these main resources: DeepMind's Oct 19th publication: Mastering

Samuel 2k Dec 29, 2022
Code for "The Box Size Confidence Bias Harms Your Object Detector"

The Box Size Confidence Bias Harms Your Object Detector - Code Disclaimer: This repository is for research purposes only. It is designed to maintain r

Johannes G. 24 Dec 07, 2022
SuRE Evaluation: A Supplementary Material

SuRE Evaluation: A Supplementary Material This repository contains supplementary material regarding the evaluations presented in the paper Visual Expl

NYU Visualization Lab 0 Dec 14, 2021
Learning Multiresolution Matrix Factorization and its Wavelet Networks on Graphs

Project Learning Multiresolution Matrix Factorization and its Wavelet Networks on Graphs, https://arxiv.org/pdf/2111.01940.pdf. Authors Truong Son Hy

5 Jun 28, 2022
Source code for models described in the paper "AudioCLIP: Extending CLIP to Image, Text and Audio" (https://arxiv.org/abs/2106.13043)

AudioCLIP Extending CLIP to Image, Text and Audio This repository contains implementation of the models described in the paper arXiv:2106.13043. This

458 Jan 02, 2023
Starter code for the ICCV 2021 paper, 'Detecting Invisible People'

Detecting Invisible People [ICCV 2021 Paper] [Website] Tarasha Khurana, Achal Dave, Deva Ramanan Introduction This repository contains code for Detect

Tarasha Khurana 28 Sep 16, 2022