Implementation for our ICCV2021 paper: Internal Video Inpainting by Implicit Long-range Propagation

Overview

Implicit Internal Video Inpainting

Implementation for our ICCV2021 paper: Internal Video Inpainting by Implicit Long-range Propagation

paper | project website | 4K data | demo video

Introduction

Want to remove objects from a video without days of training and thousands of training videos? Try our simple but effective internal video inpainting method. The inpainting process is zero-shot and implicit, which does not need any pretraining on large datasets or optical-flow estimation. We further extend the proposed method to more challenging tasks: video object removal with limited annotated masks, and inpainting on ultra high-resolution videos (e.g., 4K videos).

TO DO

  • Release code for 4K video inpainting

Setup

Installation

git clone https://github.com/Tengfei-Wang/Implicit-Internal-Video-Inpainting.git
cd Implicit-Internal-Video-Inpainting

Environment

This code is based on tensorflow 2.x (tested on tensorflow 2.2, 2.4).

The environment can be simply set up by Anaconda:

conda create -n IIVI python=3.7
conda activate IIVI
conda install tensorflow-gpu tensorboard
pip install pyaml 
pip install opencv-python
pip install tensorflow-addons

Or, you can also set up the environment from the provided environment.yml:

conda env create -f environment.yml
conda activate IIVI

Usage

Quick Start

We provide an example sequence 'bmx-trees' in ./inputs/ . To try our method:

python train.py

The default iterations is set to 50,000 in config/train.yml, and the internal learning takes ~4 hours with a single GPU. During the learning process, you can use tensorboard to check the inpainting results by:

tensorboard --logdir ./exp/logs

After the training, the final results can be saved in ./exp/results/ by:

python test.py

You can also modify 'model_restore' in config/test.yml to save results with different checkpoints.

Try Your Own Data

Data preprocess

Before training, we advise to dilate the object masks first to exclude some edge pixels. Otherwise, the imperfectly-annotated masks would lead to artifacts in the object removal task.

You can generate and preprocess the masks by this script:

python scripts/preprocess_mask.py --annotation_path inputs/annotations/bmx-trees

Basic training

Modify the config/train.yml, which indicates the video path, log path, and training iterations,etc.. The training iterations depends on the video length, and it typically takes 30,000 ~ 80,000 iterations for convergence for 100-frame videos. By default, we only use reconstruction loss for training, and it works well for most cases.

python train.py

Improve the sharpness and consistency

For some hard videos, the former training may not produce a pleasing result. You can fine-tune the trained model with another losses. To this end, modify the 'model_restore' in config/test.yml to the checkpoint path of basic training. Also set ambiguity_loss or stabilization_loss to True. Then fine-tune the basic checkpoint for 20,000-40,000 iterations.

python train.py

Inference

Modify the ./config/test.yml, which indicates the video path, log path, and save path.

python test.py

Mask Propagation from A Single Frame

When you only annotate the object mask of one frame (or few frames), our method can propagate it to other frames automatically.

Modify ./config/train_mask.yml. We typically set the training iterations to 4,000 ~ 20,000, and the learning rate to 1e-5 ~ 1e-4.

python train_mask.py

After training, modify ./config/test_mask.yml, and then:

python test_mask.py

High-resolution Video Inpainting

Our 4K videos and mask annotations can be downloaded in 4K data.

More Results

Our results on 70 DAVIS videos (including failure cases) can be found here for your reference :)
If you need the PNG version of our uncompressed results, please contact the authors.

Citation

If you find this work useful for your research, please cite:

@inproceedings{ouyang2021video,
  title={Internal Video Inpainting by Implicit Long-range Propagation},
  author={Ouyang, Hao and Wang, Tengfei and Chen, Qifeng},
  booktitle={International Conference on Computer Vision (ICCV) },
  year={2021}
} 

If you are also interested in the image inpainting or internal learning, this paper can be also helpful :)

@inproceedings{wang2021image,
  title={Image Inpainting with External-internal Learning and Monochromic Bottleneck},
  author={Wang, Tengfei and Ouyang, Hao and Chen, Qifeng},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages={5120--5129},
  year={2021}
}

Contact

Please send emails to Hao Ouyang or Tengfei Wang if there is any question

Cache Requests in Deta Bases and Echo them with Deta Micros

Deta Echo Cache Leverage the awesome Deta Micros and Deta Base to cache requests and echo them as needed. Stop worrying about slow public APIs or agre

Gingerbreadfork 8 Dec 07, 2021
Code for the paper "M2m: Imbalanced Classification via Major-to-minor Translation" (CVPR 2020)

M2m: Imbalanced Classification via Major-to-minor Translation This repository contains code for the paper "M2m: Imbalanced Classification via Major-to

79 Oct 13, 2022
SubOmiEmbed: Self-supervised Representation Learning of Multi-omics Data for Cancer Type Classification

SubOmiEmbed: Self-supervised Representation Learning of Multi-omics Data for Cancer Type Classification

Sayed Hashim 3 Nov 15, 2022
For storing the complete exploration of Visual Question Answering for our B.Tech Project

Multi-Image vqa @authors: Akhilesh, Janhavi, Harsh Paper summary, Ideas tried and their corresponding results: on wiki Other discussions: on discussio

Harsh Raj 3 Jun 16, 2022
VM3000 Microphones

VM3000-Microphones This project was completed by Ricky Leman under the supervision of Dr Ben Travaglione and Professor Melinda Hodkiewicz as part of t

UWA System Health Lab 0 Jun 04, 2021
The source code of "SIDE: Center-based Stereo 3D Detector with Structure-aware Instance Depth Estimation", accepted to WACV 2022.

SIDE: Center-based Stereo 3D Detector with Structure-aware Instance Depth Estimation The source code of our work "SIDE: Center-based Stereo 3D Detecto

10 Dec 18, 2022
Official implementation of Neural Bellman-Ford Networks (NeurIPS 2021)

NBFNet: Neural Bellman-Ford Networks This is the official codebase of the paper Neural Bellman-Ford Networks: A General Graph Neural Network Framework

MilaGraph 136 Dec 21, 2022
PyGCL: A PyTorch Library for Graph Contrastive Learning

PyGCL is a PyTorch-based open-source Graph Contrastive Learning (GCL) library, which features modularized GCL components from published papers, standa

PyGCL 588 Dec 31, 2022
StyleGAN2-ADA - Official PyTorch implementation

Abstract: Training generative adversarial networks (GAN) using too little data typically leads to discriminator overfitting, causing training to diverge. We propose an adaptive discriminator augmenta

NVIDIA Research Projects 3.2k Dec 30, 2022
TensorFlow 2 implementation of the Yahoo Open-NSFW model

TensorFlow 2 implementation of the Yahoo Open-NSFW model

Bosco Yung 101 Jan 01, 2023
Predict halo masses from simulations via graph neural networks

HaloGraphNet Predict halo masses from simulations via Graph Neural Networks. Given a dark matter halo and its galaxies, creates a graph with informati

Pablo Villanueva Domingo 20 Nov 15, 2022
Puzzle-CAM: Improved localization via matching partial and full features.

Puzzle-CAM The official implementation of "Puzzle-CAM: Improved localization via matching partial and full features".

Sanghyun Jo 150 Nov 14, 2022
Code repository for our paper regarding the L3D dataset.

The Large Labelled Logo Dataset (L3D): A Multipurpose and Hand-Labelled Continuously Growing Dataset Website: https://lhf-labs.github.io/tm-dataset Da

LHF Labs 9 Dec 14, 2022
PyTorch implementation of TSception V2 using DEAP dataset

TSception This is the PyTorch implementation of TSception V2 using DEAP dataset in our paper: Yi Ding, Neethu Robinson, Su Zhang, Qiuhao Zeng, Cuntai

Yi Ding 27 Dec 15, 2022
The Most Efficient Temporal Difference Learning Framework for 2048

moporgic/TDL2048+ TDL2048+ is a highly optimized temporal difference (TD) learning framework for 2048. Features Many common methods related to 2048 ar

Hung Guei 5 Nov 23, 2022
Checking fibonacci - Generating the Fibonacci sequence is a classic recursive problem

Fibonaaci Series Generating the Fibonacci sequence is a classic recursive proble

Moureen Caroline O 1 Feb 15, 2022
PCAM: Product of Cross-Attention Matrices for Rigid Registration of Point Clouds

PCAM: Product of Cross-Attention Matrices for Rigid Registration of Point Clouds PCAM: Product of Cross-Attention Matrices for Rigid Registration of P

valeo.ai 24 May 31, 2022
Surrogate-Assisted Genetic Algorithm for Wrapper Feature Selection

SAGA Surrogate-Assisted Genetic Algorithm for Wrapper Feature Selection Please refer to the Jupyter notebook (Example.ipynb) for an example of using t

9 Dec 28, 2022
MMDetection3D is an open source object detection toolbox based on PyTorch

MMDetection3D is an open source object detection toolbox based on PyTorch, towards the next-generation platform for general 3D detection. It is a part of the OpenMMLab project developed by MMLab.

OpenMMLab 3.2k Jan 05, 2023
Text to image synthesis using thought vectors

Text To Image Synthesis Using Thought Vectors This is an experimental tensorflow implementation of synthesizing images from captions using Skip Though

Paarth Neekhara 2.1k Jan 05, 2023