[CVPR2021] The source code for our paper 《Removing the Background by Adding the Background: Towards Background Robust Self-supervised Video Representation Learning》.

Last update: Dec 28, 2022

Related tags

Deep Learning BE

Overview

TBE

The source code for our paper "Removing the Background by Adding the Background: Towards Background Robust Self-supervised Video Representation Learning" [arxiv] [code][Project Website]

Citation

@inproceedings{wang2021removing,
  title={Removing the Background by Adding the Background: Towards Background Robust Self-supervised Video Representation Learning},
  author={Wang, Jinpeng and Gao, Yuting and Li, Ke and Lin, Yiqi and Ma, Andy J and Cheng, Hao and Peng, Pai and Ji, Rongrong and Sun, Xing},
  booktitle={CVPR},
  year={2021}
}

News

[2020.3.7] The first version of TBE are released!

0. Motivation

In camera-fixed situation, the static background in most frames remain similar in pixel-distribution.

We ask the model to be temporal sensitive rather than static sensitive.
We ask model to filter the additive Background Noise, which means to erasing background in each frame of the video.

Activation Map Visualization of BE

GIF

More hard example

2. Plug BE into any self-supervised learning method in two steps

The impementaion of BE is very simple, you can implement it in two lines by python:

rand_index = random.randint(t)
mixed_x[j] = (1-prob) * x + prob * x[rand_index]

Then, just need define a loss function like MSE:

loss = MSE(F(mixed_x),F(x))

2. Installation

Dataset Prepare

Please refer to [dataset.md] for details.

Requirements

Python3
pytorch1.1+
PIL
Intel (on the fly decode)
Skvideo.io
Matplotlib (gradient_check)

As Kinetics dataset is time-consuming for IO, we decode the avi/mpeg on the fly. Please refer to data/video_dataset.py for details.

3. Structure

datasets
- list
  - hmdb51: the train/val lists of HMDB51/Actor-HMDB51
  - hmdb51_sta: the train/val lists of HMDB51_STA
  - ucf101: the train/val lists of UCF101
  - kinetics-400: the train/val lists of kinetics-400
  - diving48: the train/val lists of diving48
experiments
- logs: experiments record in detials, include logs and trained models
- gradientes:
- visualization:
- pretrained_model:
src
- Contrastive
  - data: load data
  - loss: the loss evaluate in this paper
  - model: network architectures
  - scripts: train/eval scripts
  - augmentation: detail implementation of BE augmentation
  - utils
  - feature_extract.py: feature extractor given pretrained model
  - main.py: the main function of pretrain / finetune
  - trainer.py
  - option.py
  - pt.py: BE pretrain
  - ft.py: BE finetune
- Pretext
  - main.py the main function of pretrain / finetune
  - loss: the loss include classification loss

4. Run

(1). Download dataset lists and pretrained model

A copy of both dataset lists is provided in anonymous. The Kinetics-pretrained models are provided in anonymous.

cd .. && mkdir datasets
mv [path_to_lists] to datasets
mkdir experiments && cd experiments
mkdir pretrained_models && logs
mv [path_to_pretrained_model] to ../experiments/pretrained_model

Download and extract frames of Actor-HMDB51.

wget -c  anonymous
unzip
python utils/data_process/gen_hmdb51_dir.py
python utils/data_process/gen_hmdb51_frames.py

(2). Network Architecture

The network is in the folder src/model/[].py

Method	#logits_channel
C3D	512
R2P1D	2048
I3D	1024
R3D	2048

All the logits_channel are feed into a fc layer with 128-D output.

For simply, we divide the source into Contrastive and Pretext, "--method pt_and_ft" means pretrain and finetune in once.

Action Recognition

Random Initialization

For random initialization baseline. Just comment --weights in line 11 of ft.sh. Like below:

#!/usr/bin/env bash
python main.py \
--method ft --arch i3d \
--ft_train_list ../datasets/lists/diving48/diving48_v2_train_no_front.txt \
--ft_val_list ../datasets/lists/diving48/diving48_v2_test_no_front.txt \
--ft_root /data1/DataSet/Diving48/rgb_frames/ \
--ft_dataset diving48 --ft_mode rgb \
--ft_lr 0.001 --ft_lr_steps 10 20 25 30 35 40 --ft_epochs 45 --ft_batch_size 4 \
--ft_data_length 64 --ft_spatial_size 224 --ft_workers 4 --ft_stride 1 --ft_dropout 0.5 \
--ft_print-freq 100 --ft_fixed 0 # \
# --ft_weights ../experiments/kinetics_contrastive.pth

BE(Contrastive)

Kinetics

bash scripts/kinetics/pt_and_ft.sh

UCF101

bash scripts/ucf101/ucf101.sh

Diving48

bash scripts/Diving48/diving48.sh

For Triplet loss optimization and moco baseline, just modify --pt_method

BE (Triplet)

--pt_method be_triplet

BE(Pretext)

bash scripts/hmdb51/i3d_pt_and_ft_flip_cls.sh

bash scripts/hmdb51/c3d_pt_and_ft_flip.sh

Notice: More Training Options and ablation study can be find in scripts

Video Retrieve and other visualization

(1). Feature Extractor

As STCR can be easily extend to other video representation task, we offer the scripts to perform feature extract.

python feature_extractor.py

The feature will be saved as a single numpy file in the format [video_nums,features_dim] for further visualization.

(2). Reterival Evaluation

modify line60-line62 in reterival.py.

python reterival.py

Results

Action Recognition

Kinetics Pretrained (I3D)

Method	UCF101	HMDB51	Diving48
Random Initialization	57.9	29.6	17.4
MoCo Baseline	70.4	36.3	47.9
BE	86.5	56.2	62.6

Video Retrieve (HMDB51-C3D)

Method	@1	@5	@10	@20	@50
BE	10.2	27.6	40.5	56.2	76.6

More Visualization

T-SNE

please refer to utils/visualization/t_SNE_Visualization.py for details.

Confusion_Matrix

please refer to utils/visualization/confusion_matrix.py for details.

Acknowledgement

This work is partly based on UEL and MoCo.

License

The code are released under the CC-BY-NC 4.0 LICENSE.

[CVPR2021] The source code for our paper 《Removing the Background by Adding the Background: Towards Background Robust Self-supervised Video Representation Learning》.

Related tags

Overview

TBE

Citation

News

0. Motivation

Activation Map Visualization of BE

GIF

More hard example

2. Plug BE into any self-supervised learning method in two steps

2. Installation

Dataset Prepare

Requirements

3. Structure

4. Run

(1). Download dataset lists and pretrained model

(2). Network Architecture

Action Recognition

Random Initialization

BE(Contrastive)

Kinetics

UCF101

Diving48

BE (Triplet)

BE(Pretext)

Video Retrieve and other visualization

(1). Feature Extractor

(2). Reterival Evaluation

Results

Action Recognition

Kinetics Pretrained (I3D)

Video Retrieve (HMDB51-C3D)

More Visualization

T-SNE

Confusion_Matrix

Acknowledgement

License

Owner

Jinpeng Wang

A PyTorch-based R-YOLOv4 implementation which combines YOLOv4 model and loss function from R3Det for arbitrary oriented object detection.

🔪 Elimination based Lightweight Neural Net with Pretrained Weights

N-HiTS: Neural Hierarchical Interpolation for Time Series Forecasting

The code for our paper submitted to RAL/IROS 2022: OverlapTransformer: An Efficient and Rotation-Invariant Transformer Network for LiDAR-Based Place Recognition.

arxiv-sanity, but very lite, simply providing the core value proposition of the ability to tag arxiv papers of interest and have the program recommend similar papers.

Code for our CVPR2021 paper coordinate attention

This app is a simple example of using Strealit to create a financial data web app.

PyTorch implementation of image classification models for CIFAR-10/CIFAR-100/MNIST/FashionMNIST/Kuzushiji-MNIST/ImageNet

The official code of "SCROLLS: Standardized CompaRison Over Long Language Sequences".

DC3: A Learning Method for Optimization with Hard Constraints

img2pose: Face Alignment and Detection via 6DoF, Face Pose Estimation

Code for the Paper: Alexandra Lindt and Emiel Hoogeboom.

MASS (Mueen's Algorithm for Similarity Search) - a python 2 and 3 compatible library used for searching time series sub-sequences under z-normalized Euclidean distance for similarity.

[NIPS 2021] UOTA: Improving Self-supervised Learning with Automated Unsupervised Outlier Arbitration.

Code for the paper titled "Prabhupadavani: A Code-mixed Speech Translation Data for 25 languages"

Supplementary code for SIGGRAPH 2021 paper: Discovering Diverse Athletic Jumping Strategies

The official repository for our paper "The Neural Data Router: Adaptive Control Flow in Transformers Improves Systematic Generalization".

Biomarker identification for COVID-19 Severity in BALF cells Single-cell RNA-seq data

The implementation our EMNLP 2021 paper "Enhanced Language Representation with Label Knowledge for Span Extraction".

A simple Neural Network that predicts the label for a series of handwritten digits