[CVPR2021] The source code for our paper 《Removing the Background by Adding the Background: Towards Background Robust Self-supervised Video Representation Learning》.

Related tags

Deep LearningBE
Overview

TBE

The source code for our paper "Removing the Background by Adding the Background: Towards Background Robust Self-supervised Video Representation Learning" [arxiv] [code][Project Website]

image

Citation

@inproceedings{wang2021removing,
  title={Removing the Background by Adding the Background: Towards Background Robust Self-supervised Video Representation Learning},
  author={Wang, Jinpeng and Gao, Yuting and Li, Ke and Lin, Yiqi and Ma, Andy J and Cheng, Hao and Peng, Pai and Ji, Rongrong and Sun, Xing},
  booktitle={CVPR},
  year={2021}
}

News

[2020.3.7] The first version of TBE are released!

0. Motivation

  • In camera-fixed situation, the static background in most frames remain similar in pixel-distribution.

  • We ask the model to be temporal sensitive rather than static sensitive.

  • We ask model to filter the additive Background Noise, which means to erasing background in each frame of the video.

Activation Map Visualization of BE

GIF

More hard example

2. Plug BE into any self-supervised learning method in two steps

The impementaion of BE is very simple, you can implement it in two lines by python:

rand_index = random.randint(t)
mixed_x[j] = (1-prob) * x + prob * x[rand_index]

Then, just need define a loss function like MSE:

loss = MSE(F(mixed_x),F(x))

2. Installation

Dataset Prepare

Please refer to [dataset.md] for details.

Requirements

  • Python3
  • pytorch1.1+
  • PIL
  • Intel (on the fly decode)
  • Skvideo.io
  • Matplotlib (gradient_check)

As Kinetics dataset is time-consuming for IO, we decode the avi/mpeg on the fly. Please refer to data/video_dataset.py for details.

3. Structure

  • datasets
    • list
      • hmdb51: the train/val lists of HMDB51/Actor-HMDB51
      • hmdb51_sta: the train/val lists of HMDB51_STA
      • ucf101: the train/val lists of UCF101
      • kinetics-400: the train/val lists of kinetics-400
      • diving48: the train/val lists of diving48
  • experiments
    • logs: experiments record in detials, include logs and trained models
    • gradientes:
    • visualization:
    • pretrained_model:
  • src
    • Contrastive
      • data: load data
      • loss: the loss evaluate in this paper
      • model: network architectures
      • scripts: train/eval scripts
      • augmentation: detail implementation of BE augmentation
      • utils
      • feature_extract.py: feature extractor given pretrained model
      • main.py: the main function of pretrain / finetune
      • trainer.py
      • option.py
      • pt.py: BE pretrain
      • ft.py: BE finetune
    • Pretext
      • main.py the main function of pretrain / finetune
      • loss: the loss include classification loss

4. Run

(1). Download dataset lists and pretrained model

A copy of both dataset lists is provided in anonymous. The Kinetics-pretrained models are provided in anonymous.

cd .. && mkdir datasets
mv [path_to_lists] to datasets
mkdir experiments && cd experiments
mkdir pretrained_models && logs
mv [path_to_pretrained_model] to ../experiments/pretrained_model

Download and extract frames of Actor-HMDB51.

wget -c  anonymous
unzip
python utils/data_process/gen_hmdb51_dir.py
python utils/data_process/gen_hmdb51_frames.py

(2). Network Architecture

The network is in the folder src/model/[].py

Method #logits_channel
C3D 512
R2P1D 2048
I3D 1024
R3D 2048

All the logits_channel are feed into a fc layer with 128-D output.

For simply, we divide the source into Contrastive and Pretext, "--method pt_and_ft" means pretrain and finetune in once.

Action Recognition

Random Initialization

For random initialization baseline. Just comment --weights in line 11 of ft.sh. Like below:

#!/usr/bin/env bash
python main.py \
--method ft --arch i3d \
--ft_train_list ../datasets/lists/diving48/diving48_v2_train_no_front.txt \
--ft_val_list ../datasets/lists/diving48/diving48_v2_test_no_front.txt \
--ft_root /data1/DataSet/Diving48/rgb_frames/ \
--ft_dataset diving48 --ft_mode rgb \
--ft_lr 0.001 --ft_lr_steps 10 20 25 30 35 40 --ft_epochs 45 --ft_batch_size 4 \
--ft_data_length 64 --ft_spatial_size 224 --ft_workers 4 --ft_stride 1 --ft_dropout 0.5 \
--ft_print-freq 100 --ft_fixed 0 # \
# --ft_weights ../experiments/kinetics_contrastive.pth

BE(Contrastive)

Kinetics
bash scripts/kinetics/pt_and_ft.sh
UCF101
bash scripts/ucf101/ucf101.sh
Diving48
bash scripts/Diving48/diving48.sh

For Triplet loss optimization and moco baseline, just modify --pt_method

BE (Triplet)

--pt_method be_triplet

BE(Pretext)

bash scripts/hmdb51/i3d_pt_and_ft_flip_cls.sh

or

bash scripts/hmdb51/c3d_pt_and_ft_flip.sh

Notice: More Training Options and ablation study can be find in scripts

Video Retrieve and other visualization

(1). Feature Extractor

As STCR can be easily extend to other video representation task, we offer the scripts to perform feature extract.

python feature_extractor.py

The feature will be saved as a single numpy file in the format [video_nums,features_dim] for further visualization.

(2). Reterival Evaluation

modify line60-line62 in reterival.py.

python reterival.py

Results

Action Recognition

Kinetics Pretrained (I3D)

Method UCF101 HMDB51 Diving48
Random Initialization 57.9 29.6 17.4
MoCo Baseline 70.4 36.3 47.9
BE 86.5 56.2 62.6

Video Retrieve (HMDB51-C3D)

Method @1 @5 @10 @20 @50
BE 10.2 27.6 40.5 56.2 76.6

More Visualization

T-SNE

please refer to utils/visualization/t_SNE_Visualization.py for details.

Confusion_Matrix

please refer to utils/visualization/confusion_matrix.py for details.

Acknowledgement

This work is partly based on UEL and MoCo.

License

The code are released under the CC-BY-NC 4.0 LICENSE.

Owner
Jinpeng Wang
Focus on Biometrics and Video Understanding, Self/Semi Supervised Learning.
Jinpeng Wang
🤗 Transformers: State-of-the-art Natural Language Processing for Pytorch, TensorFlow, and JAX.

English | 简体中文 | 繁體中文 State-of-the-art Natural Language Processing for Jax, PyTorch and TensorFlow 🤗 Transformers provides thousands of pretrained mo

Hugging Face 77.2k Jan 02, 2023
Toolbox to analyze temporal context invariance of deep neural networks

PyTCI A toolbox that estimates the integration window of a sensory response using the "Temporal Context Invariance" paradigm (TCI). The TCI method Int

4 Oct 23, 2022
PyTorch reimplementation of REALM and ORQA

PyTorch reimplementation of REALM and ORQA

Li-Huai (Allan) Lin 17 Aug 20, 2022
An API-first distributed deployment system of deep learning models using timeseries data to analyze and predict systems behaviour

Gordo Building thousands of models with timeseries data to monitor systems. Table of content About Examples Install Uninstall Developer manual How to

Equinor 26 Dec 27, 2022
CHERRY is a python library for predicting the interactions between viral and prokaryotic genomes

CHERRY is a python library for predicting the interactions between viral and prokaryotic genomes. CHERRY is based on a deep learning model, which consists of a graph convolutional encoder and a link

Kenneth Shang 12 Dec 15, 2022
ICON: Implicit Clothed humans Obtained from Normals (CVPR 2022)

ICON: Implicit Clothed humans Obtained from Normals Yuliang Xiu · Jinlong Yang · Dimitrios Tzionas · Michael J. Black CVPR 2022 News 🚩 [2022/04/26] H

Yuliang Xiu 1.1k Jan 04, 2023
Code for the paper "Location-aware Single Image Reflection Removal"

Location-aware Single Image Reflection Removal The shown images are provided by the datasets from IBCLN, ERRNet, SIR2 and the Internet images. The cod

72 Dec 08, 2022
Deep learning model, heat map, data prepo

deep learning model, heat map, data prepo

Pamela Dekas 1 Jan 14, 2022
Deep Q Learning with OpenAI Gym and Pokemon Showdown

pokemon-deep-learning An openAI gym project for pokemon involving deep q learning. Made by myself, Sam Little, and Layton Webber. This code captures g

2 Dec 22, 2021
Convolutional Neural Network to detect deforestation in the Amazon Rainforest

Convolutional Neural Network to detect deforestation in the Amazon Rainforest This project is part of my final work as an Aerospace Engineering studen

5 Feb 17, 2022
The official code repo of "HTS-AT: A Hierarchical Token-Semantic Audio Transformer for Sound Classification and Detection"

Hierarchical Token Semantic Audio Transformer Introduction The Code Repository for "HTS-AT: A Hierarchical Token-Semantic Audio Transformer for Sound

Knut(Ke) Chen 134 Jan 01, 2023
High-Resolution Image Synthesis with Latent Diffusion Models

Latent Diffusion Models arXiv | BibTeX High-Resolution Image Synthesis with Latent Diffusion Models Robin Rombach*, Andreas Blattmann*, Dominik Lorenz

CompVis Heidelberg 5.6k Dec 30, 2022
This is an official implementation for "DeciWatch: A Simple Baseline for 10x Efficient 2D and 3D Pose Estimation"

DeciWatch: A Simple Baseline for 10× Efficient 2D and 3D Pose Estimation This repo is the official implementation of "DeciWatch: A Simple Baseline for

117 Dec 24, 2022
Implementation of the paper "Self-Promoted Prototype Refinement for Few-Shot Class-Incremental Learning"

Self-Promoted Prototype Refinement for Few-Shot Class-Incremental Learning This is the implementation of the paper "Self-Promoted Prototype Refinement

Kai Zhu 78 Dec 02, 2022
Wandb-predictions - WANDB Predictions With Python

WANDB API CI/CD Below we capture the CI/CD scenarios that we would expect with o

Anish Shah 6 Oct 07, 2022
Maximum Spatial Perturbation for Image-to-Image Translation (Official Implementation)

MSPC for I2I This repository is by Yanwu Xu and contains the PyTorch source code to reproduce the experiments in our CVPR2022 paper Maximum Spatial Pe

51 Dec 14, 2022
Avalanche RL: an End-to-End Library for Continual Reinforcement Learning

Avalanche RL: an End-to-End Library for Continual Reinforcement Learning Avalanche Website | Getting Started | Examples | Tutorial | API Doc | Paper |

ContinualAI 43 Dec 24, 2022
Code for the paper SphereRPN: Learning Spheres for High-Quality Region Proposals on 3D Point Clouds Object Detection, ICIP 2021.

SphereRPN Code for the paper SphereRPN: Learning Spheres for High-Quality Region Proposals on 3D Point Clouds Object Detection, ICIP 2021. Authors: Th

Thang Vu 15 Dec 02, 2022
Check out the StyleGAN repo and place it in the same directory hierarchy as the present repo

Variational Model Inversion Attacks Kuan-Chieh Wang, Yan Fu, Ke Li, Ashish Khisti, Richard Zemel, Alireza Makhzani Most commands are in run_scripts. W

Jackson Wang 15 Dec 26, 2022
An LSTM based GAN for Human motion synthesis

GAN-motion-Prediction An LSTM based GAN for motion synthesis has a few issues reading H3.6M data from A.Jain et al , will fix soon. Prediction of the

Amogh Adishesha 9 Jun 17, 2022