This is the official repository of XVFI (eXtreme Video Frame Interpolation)

Overview

XVFI PWC PWC

This is the official repository of XVFI (eXtreme Video Frame Interpolation), https://arxiv.org/abs/2103.16206

Last Update: 20210607

We provide the training and test code along with the trained weights and the dataset (train+test) used for XVFI. If you find this repository useful, please consider citing our paper.

Examples of the VFI (x8 Multi-Frame Interpolation) results on X-TEST

results_045_resized results_079_resized results_158_resized
The [email protected] input frames are interpolated to be [email protected] frames. All results are encoded at 30fps to be played as x8 slow motion and spatially down-scaled due to the limit of file sizes. All methods are trained on X-TRAIN.

Table of Contents

  1. X4K1000FPS
  2. Requirements
  3. Test
  4. Test_Custom
  5. Training
  6. Reference
  7. Contact

X4K1000FPS

Dataset of high-resolution (4096×2160), high-fps (1000fps) video frames with extreme motion.

003 004 045 078 081 146
Some examples of X4K1000FPS dataset, which are frames of 1000-fps and 4K-resolution. Our dataset contains the various scenes with extreme motions. (Displayed in spatiotemporally subsampled .gif files)

We provide our X4K1000FPS dataset which consists of X-TEST and X-TRAIN. Please refer to our main/suppl. paper for the details of the dataset. You can download the dataset from this dropbox link.

X-TEST consists of 15 video clips with 33-length of 4K-1000fps frames. It follows the below directory format:

├──── YOUR_DIR/
    ├──── test/
       ├──── Type1/
          ├──── TEST01/
             ├──── 0000.png
             ├──── ...
             └──── 0032.png
          ├──── TEST02/
             ├──── 0000.png
             ├──── ...
             └──── 0032.png
          ├──── ...
       ├──── ...

X-TRAIN consists of 4,408 clips from various types of 110 scenes. The clips are 65-length of 1000fps frames. Each frame is the size of 768x768 cropped from 4K frame. It follows the below directory format:

├──── YOUR_DIR/
    ├──── train/
       ├──── 002/
          ├──── occ008.320/
             ├──── 0000.png
             ├──── ...
             └──── 0064.png
          ├──── occ008.322/
             ├──── 0000.png
             ├──── ...
             └──── 0064.png
          ├──── ...
       ├──── ...

After downloading the files from the link, decompress the encoded_test.tar.gz and encoded_train.tar.gz. The resulting .mp4 files can be decoded into .png files via running mp4_decoding.py. Please follow the instruction written in mp4_decoding.py.

Requirements

Our code is implemented using PyTorch1.7, and was tested under the following setting:

  • Python 3.7
  • PyTorch 1.7.1
  • CUDA 10.2
  • cuDNN 7.6.5
  • NVIDIA TITAN RTX GPU
  • Ubuntu 16.04 LTS

Caution: since there is "align_corners" option in "nn.functional.interpolate" and "nn.functional.grid_sample" in PyTorch1.7, we recommend you to follow our settings. Especially, if you use the other PyTorch versions, it may lead to yield a different performance.

Test

Quick Start for X-TEST (x8 Multi-Frame Interpolation as in Table 2)

  1. Download the source codes in a directory of your choice .
  2. First download our X-TEST test dataset by following the above section 'X4K1000FPS'.
  3. Download the pre-trained weights, which was trained by X-TRAIN, from this link to place in /checkpoint_dir/XVFInet_X4K1000FPS_exp1.
XVFI
└── checkpoint_dir
   └── XVFInet_X4K1000FPS_exp1
       ├── XVFInet_X4K1000FPS_exp1_latest.pt           
  1. Run main.py with the following options in parse_args:
python main.py --gpu 0 --phase 'test' --exp_num 1 --dataset 'X4K1000FPS' --module_scale_factor 4 --S_tst 5 --multiple 8 

==> It would yield (PSNR/SSIM/tOF) = (30.12/0.870/2.15).

python main.py --gpu 0 --phase 'test' --exp_num 1 --dataset 'X4K1000FPS' --module_scale_factor 4 --S_tst 3 --multiple 8 

==> It would yield (PSNR/SSIM/tOF) = (28.86/0.858/2.67).

Description

  • After running with the above test option, you can get the result images in /test_img_dir/XVFInet_X4K1000FPS_exp1, then obtain the PSNR/SSIM/tOF results per each test clip as "total_metrics.csv" in the same folder.
  • Our proposed XVFI-Net can start from any downscaled input upward by regulating '--S_tst', which is adjustable in terms of the number of scales for inference according to the input resolutions or the motion magnitudes.
  • You can get any Multi-Frame Interpolation (x M) result by regulating '--multiple'.

Quick Start for Vimeo90K (as in Fig. 8)

  1. Download the source codes in a directory of your choice .
  2. First download Vimeo90K dataset from this link (including 'tri_trainlist.txt') to place in /vimeo_triplet.
XVFI
└── vimeo_triplet
       ├──  sequences
       readme.txt
       tri_testlist.txt
       tri_trainlist.txt
  1. Download the pre-trained weights (XVFI-Net_v), which was trained by Vimeo90K, from this link to place in /checkpoint_dir/XVFInet_Vimeo_exp1.
XVFI
└── checkpoint_dir
   └── XVFInet_Vimeo_exp1
       ├── XVFInet_Vimeo_exp1_latest.pt           
  1. Run main.py with the following options in parse_args:
python main.py --gpu 0 --phase 'test' --exp_num 1 --dataset 'Vimeo' --module_scale_factor 2 --S_tst 1 --multiple 2

==> It would yield PSNR = 35.07 on Vimeo90K.

Description

  • After running with the above test option, you can get the result images in /test_img_dir/XVFInet_Vimeo_exp1.
  • There are certain code lines in front of the 'def main()' for a convenience when running with the Vimeo option.
  • The SSIM result of 0.9760 as in Fig. 8 was measured by matlab ssim function for a fair comparison after running the above guide because other SOTA methods did so. We also upload "compare_psnr_ssim.m" matlab file to obtain it.
  • It should be noted that there is a typo "S_trn and S_tst are set to 2" in the current version of XVFI paper, which should be modified to 1 (not 2), sorry for inconvenience.

Test_Custom

Quick Start for your own video data ('--custom_path') for any Multi-Frame Interpolation (x M)

  1. Download the source codes in a directory of your choice .
  2. First prepare your own video datasets in /custom_path by following a hierarchy as belows:
XVFI
└── custom_path
   ├── scene1
       ├── 'xxx.png'
       ├── ...
       └── 'xxx.png'
   ...
   
   ├── sceneN
       ├── 'xxxxx.png'
       ├── ...
       └── 'xxxxx.png'

  1. Download the pre-trained weights trained on X-TRAIN or Vimeo90K as decribed above.

  2. Run main.py with the following options in parse_args (ex) x8 Multi-Frame Interpolation):

# For the model trained on X-TRAIN
python main.py --gpu 0 --phase 'test_custom' --exp_num 1 --dataset 'X4K1000FPS' --module_scale_factor 4 --S_tst 5 --multiple 8 --custom_path './custom_path'
# For the model trained on Vimeo90K
python main.py --gpu 0 --phase 'test_custom' --exp_num 1 --dataset 'Vimeo' --module_scale_factor 2 --S_tst 1 --multiple 8 --custom_path './custom_path'

Description

  • Our proposed XVFI-Net can start from any downscaled input upward by regulating '--S_tst', which is adjustable in terms of the number of scales for inference according to the input resolutions or the motion magnitudes.
  • You can get any Multi-Frame Interpolation (x M) result by regulating '--multiple'.
  • It only supports for '.png' format.
  • Since we can not cover diverse possibilites of naming rule for custom frames, please sort your own frames properly.

Training

Quick Start for X-TRAIN

  1. Download the source codes in a directory of your choice .
  2. First download our X-TRAIN train/val/test datasets by following the above section 'X4K1000FPS' and place them as belows:
XVFI
└── X4K1000FPS
      ├──  train
          ├── 002
          ├── ...
          └── 172
      ├──  val
          ├── Type1
          ├── Type2
          ├── Type3
      ├──  test
          ├── Type1
          ├── Type2
          ├── Type3

  1. Run main.py with the following options in parse_args:
python main.py --phase 'train' --exp_num 1 --dataset 'X4K1000FPS' --module_scale_factor 4 --S_trn 3 --S_tst 5

Quick Start for Vimeo90K

  1. Download the source codes in a directory of your choice .
  2. First download Vimeo90K dataset from this link (including 'tri_trainlist.txt') to place in /vimeo_triplet.
XVFI
└── vimeo_triplet
       ├──  sequences
       readme.txt
       tri_testlist.txt
       tri_trainlist.txt
  1. Run main.py with the following options in parse_args:
python main.py --phase 'train' --exp_num 1 --dataset 'Vimeo' --module_scale_factor 2 --S_trn 1 --S_tst 1

Description

  • You can freely regulate other arguments in the parser of main.py, here

Reference

Hyeonjun Sim*, Jihyong Oh*, and Munchurl Kim "XVFI: eXtreme Video Frame Interpolation", https://arxiv.org/abs/2103.16206, 2021. (* equal contribution)

BibTeX

@article{sim2021xvfi,
  title={XVFI: eXtreme Video Frame Interpolation},
  author={Sim, Hyeonjun and Oh, Jihyong and Kim, Munchurl},
  journal={arXiv preprint arXiv:2103.16206},
  year={2021}
}

Contact

If you have any question, please send an email to either [email protected] or [email protected].

License

The source codes and datasets can be freely used for research and education only. Any commercial use should get formal permission first.

Owner
Jihyong Oh
KAIST Ph.D. Candidate 3rd yr. Please refer to my personal homepage as below (URL).
Jihyong Oh
Open AI's Python library

OpenAI Python Library The OpenAI Python library provides convenient access to the OpenAI API from applications written in the Python language. It incl

Pavan Ananth Sharma 3 Jul 10, 2022
Keras implementation of AdaBound

AdaBound for Keras Keras port of AdaBound Optimizer for PyTorch, from the paper Adaptive Gradient Methods with Dynamic Bound of Learning Rate. Usage A

Somshubra Majumdar 132 Sep 23, 2022
Tensorflow Implementation of the paper "Spectral Normalization for Generative Adversarial Networks" (ICML 2017 workshop)

tf-SNDCGAN Tensorflow implementation of the paper "Spectral Normalization for Generative Adversarial Networks" (https://www.researchgate.net/publicati

Nhat M. Nguyen 248 Nov 25, 2022
Bagua is a flexible and performant distributed training algorithm development framework.

Bagua is a flexible and performant distributed training algorithm development framework.

786 Dec 17, 2022
TorchMD-Net provides state-of-the-art graph neural networks and equivariant transformer neural networks potentials for learning molecular potentials

TorchMD-net TorchMD-Net provides state-of-the-art graph neural networks and equivariant transformer neural networks potentials for learning molecular

TorchMD 104 Jan 03, 2023
This repository contains code to run experiments in the paper "Signal Strength and Noise Drive Feature Preference in CNN Image Classifiers."

Signal Strength and Noise Drive Feature Preference in CNN Image Classifiers This repository contains code to run experiments in the paper "Signal Stre

0 Jan 19, 2022
Implementation for Homogeneous Unbalanced Regularized Optimal Transport

HUROT: An Homogeneous formulation of Unbalanced Regularized Optimal Transport. This repository provides code related to this preprint. This is an alph

Théo Lacombe 1 Feb 17, 2022
minimizer-space de Bruijn graphs (mdBG) for whole genome assembly

rust-mdbg: Minimizer-space de Bruijn graphs (mdBG) for whole-genome assembly rust-mdbg is an ultra-fast minimizer-space de Bruijn graph (mdBG) impleme

Barış Ekim 148 Dec 01, 2022
An offline deep reinforcement learning library

d3rlpy: An offline deep reinforcement learning library d3rlpy is an offline deep reinforcement learning library for practitioners and researchers. imp

Takuma Seno 817 Jan 02, 2023
This is a Tensorflow implementation of Learning to See in the Dark in CVPR 2018

Learning-to-See-in-the-Dark This is a Tensorflow implementation of Learning to See in the Dark in CVPR 2018, by Chen Chen, Qifeng Chen, Jia Xu, and Vl

5.3k Jan 01, 2023
Deep learning toolbox based on PyTorch for hyperspectral data classification.

Deep learning toolbox based on PyTorch for hyperspectral data classification.

Nicolas 304 Dec 28, 2022
Code for CVPR 2021 paper: Anchor-Free Person Search

Introduction This is the implementationn for Anchor-Free Person Search in CVPR2021 License This project is released under the Apache 2.0 license. Inst

158 Jan 04, 2023
Docker containers of baseline agents for the Crafter environment

Crafter Baselines This repository contains Docker containers for running various baselines on the Crafter environment. Reward Agents DreamerV2 based o

Danijar Hafner 17 Sep 25, 2022
4th place solution to datafactory challenge by Intermarché.

Solution to Datafactory challenge by Intermarché. 4th place solution to datafactory challenge by Intermarché. The objective of the challenge is to pre

Raphael Sourty 11 Mar 19, 2022
A modern pure-Python library for reading PDF files

pdf A modern pure-Python library for reading PDF files. The goal is to have a modern interface to handle PDF files which is consistent with itself and

6 Apr 06, 2022
Toolbox of models, callbacks, and datasets for AI/ML researchers.

Pretrained SOTA Deep Learning models, callbacks and more for research and production with PyTorch Lightning and PyTorch Website • Installation • Main

Pytorch Lightning 1.4k Dec 30, 2022
VolumeGAN - 3D-aware Image Synthesis via Learning Structural and Textural Representations

VolumeGAN - 3D-aware Image Synthesis via Learning Structural and Textural Representations 3D-aware Image Synthesis via Learning Structural and Textura

GenForce: May Generative Force Be with You 116 Dec 26, 2022
Towers of Babel: Combining Images, Language, and 3D Geometry for Learning Multimodal Vision. ICCV 2021.

Towers of Babel: Combining Images, Language, and 3D Geometry for Learning Multimodal Vision Download links and PyTorch implementation of "Towers of Ba

Blakey Wu 40 Dec 14, 2022
3DMV jointly combines RGB color and geometric information to perform 3D semantic segmentation of RGB-D scans.

3DMV 3DMV jointly combines RGB color and geometric information to perform 3D semantic segmentation of RGB-D scans. This work is based on our ECCV'18 p

Владислав Молодцов 0 Feb 06, 2022
Source code for paper "Document-Level Relation Extraction with Adaptive Thresholding and Localized Context Pooling", AAAI 2021

ATLOP Code for AAAI 2021 paper Document-Level Relation Extraction with Adaptive Thresholding and Localized Context Pooling. If you make use of this co

Wenxuan Zhou 146 Nov 29, 2022