[NeurIPS'20] Self-supervised Co-Training for Video Representation Learning. Tengda Han, Weidi Xie, Andrew Zisserman.

Related tags

Deep LearningCoCLR
Overview

CoCLR: Self-supervised Co-Training for Video Representation Learning

arch

This repository contains the implementation of:

  • InfoNCE (MoCo on videos)
  • UberNCE (supervised contrastive learning on videos)
  • CoCLR

Link:

[Project Page] [PDF] [Arxiv]

News

  • [2021.01.29] Upload both RGB and optical flow dataset for UCF101 (links).
  • [2021.01.11] Update our paper for NeurIPS2020 final version: corrected InfoNCE-RGB-linearProbe baseline result in Table1 from 52.3% (pretrained for 800 epochs, unnessary and unfair) to 46.8% (pretrained for 500 epochs, fair comparison). Thanks @liuhualin333 for pointing out.
  • [2020.12.08] Update instructions.
  • [2020.11.17] Upload pretrained weights for UCF101 experiments.
  • [2020.10.30] Update "draft" dataloader files, CoCLR code, evaluation code as requested by some researchers. Will check and add detailed instructions later.

Pretrain Instruction

  • InfoNCE pretrain on UCF101-RGB
CUDA_VISIBLE_DEVICES=0,1 python -m torch.distributed.launch \
--nproc_per_node=2 main_nce.py --net s3d --model infonce --moco-k 2048 \
--dataset ucf101-2clip --seq_len 32 --ds 1 --batch_size 32 \
--epochs 300 --schedule 250 280 -j 16
  • InfoNCE pretrain on UCF101-Flow
CUDA_VISIBLE_DEVICES=0,1 python -m torch.distributed.launch \
--nproc_per_node=2 main_nce.py --net s3d --model infonce --moco-k 2048 \
--dataset ucf101-f-2clip --seq_len 32 --ds 1 --batch_size 32 \
--epochs 300 --schedule 250 280 -j 16
  • CoCLR pretrain on UCF101 for one cycle
CUDA_VISIBLE_DEVICES=0,1 python -m torch.distributed.launch \
--nproc_per_node=2 main_coclr.py --net s3d --topk 5 --moco-k 2048 \
--dataset ucf101-2stream-2clip --seq_len 32 --ds 1 --batch_size 32 \
--epochs 100 --schedule 80 --name_prefix Cycle1-FlowMining_ -j 8 \
--pretrain {rgb_infoNCE_checkpoint.pth.tar} {flow_infoNCE_checkpoint.pth.tar}
CUDA_VISIBLE_DEVICES=0,1 python -m torch.distributed.launch \
--nproc_per_node=2 main_coclr.py --net s3d --topk 5 --moco-k 2048 --reverse \
--dataset ucf101-2stream-2clip --seq_len 32 --ds 1 --batch_size 32 \
--epochs 100 --schedule 80 --name_prefix Cycle1-RGBMining_ -j 8 \
--pretrain {flow_infoNCE_checkpoint.pth.tar} {rgb_cycle1_checkpoint.pth.tar} 
  • InfoNCE pretrain on K400-RGB
CUDA_VISIBLE_DEVICES=0,1,2,3 python -m torch.distributed.launch \
--nproc_per_node=4 main_infonce.py --net s3d --model infonce --moco-k 16384 \
--dataset k400-2clip --lr 1e-3 --seq_len 32 --ds 1 --batch_size 32 \
--epochs 300 --schedule 250 280 -j 16
  • InfoNCE pretrain on K400-Flow
CUDA_VISIBLE_DEVICES=0,1,2,3 python -m torch.distributed.launch \
--nproc_per_node=4 teco_fb_main.py --net s3d --model infonce --moco-k 16384 \
--dataset k400-f-2clip --lr 1e-3 --seq_len 32 --ds 1 --batch_size 32 \
--epochs 300 --schedule 250 280 -j 16
  • CoCLR pretrain on K400 for one cycle
CUDA_VISIBLE_DEVICES=0,1 python -m torch.distributed.launch \
--nproc_per_node=2 main_coclr.py --net s3d --topk 5 --moco-k 16384 \
--dataset k400-2stream-2clip --seq_len 32 --ds 1 --batch_size 32 \
--epochs 50 --schedule 40 --name_prefix Cycle1-FlowMining_ -j 8 \
--pretrain {rgb_infoNCE_checkpoint.pth.tar} {flow_infoNCE_checkpoint.pth.tar}
CUDA_VISIBLE_DEVICES=0,1 python -m torch.distributed.launch \
--nproc_per_node=2 main_coclr.py --net s3d --topk 5 --moco-k 16384 --reverse \
--dataset k400-2stream-2clip --seq_len 32 --ds 1 --batch_size 32 \
--epochs 50 --schedule 40 --name_prefix Cycle1-RGBMining_ -j 8 \
--pretrain {flow_infoNCE_checkpoint.pth.tar} {rgb_cycle1_checkpoint.pth.tar} 

Finetune Instruction

cd eval/ e.g. finetune UCF101-rgb:

CUDA_VISIBLE_DEVICES=0,1 python main_classifier.py --net s3d --dataset ucf101 \
--seq_len 32 --ds 1 --batch_size 32 --train_what ft --epochs 500 --schedule 400 450 \
--pretrain {selected_rgb_pretrained_checkpoint.pth.tar}

then run the test with 10-crop (test-time augmentation is helpful, 10-crop gives better result than center-crop):

CUDA_VISIBLE_DEVICES=0,1 python main_classifier.py --net s3d --dataset ucf101 \
--seq_len 32 --ds 1 --batch_size 32 --train_what ft --epochs 500 --schedule 400 450 \
--test {selected_rgb_finetuned_checkpoint.pth.tar} --ten_crop

Nearest-neighbour Retrieval Instruction

cd eval/ e.g. nn-retrieval for UCF101-rgb

CUDA_VISIBLE_DEVICES=0 python main_classifier.py --net s3d --dataset ucf101 \
--seq_len 32 --ds 1 --test {selected_rgb_pretrained_checkpoint.pth.tar} --retrieval

Linear-probe Instruction

cd eval/

from extracted feature

The code support two methods on linear-probe, either feed the data end-to-end and freeze the backbone, or train linear layer on extracted features. Both methods give similar best results in our experiments.

e.g. on extracted features (after run NN-retrieval command above, features will be saved in os.path.dirname(checkpoint))

CUDA_VISIBLE_DEVICES=0 python feature_linear_probe.py --dataset ucf101 \
--test {feature_dirname} --final_bn --lr 1.0 --wd 1e-3

Note that the default setting should give an alright performance, maybe 1-2% lower than our paper's figure. For different datasets, lr and wd need to be tuned from lr: 0.1 to 1.0; wd: 1e-4 to 1e-1.

load data and freeze backbone

alternatively, feed data end-to-end and freeze the backbone.

CUDA_VISIBLE_DEVICES=0,1 python main_classifier.py --net s3d --dataset ucf101 \
--seq_len 32 --ds 1 --batch_size 32 --train_what last --epochs 100 --schedule 60 80 \
--optim sgd --lr 1e-1 --wd 1e-3 --final_bn --pretrain {selected_rgb_pretrained_checkpoint.pth.tar}

Similarly, lr and wd need to be tuned for different datasets for best performance.

Dataset

Result

Finetune entire network for action classification on UCF101: arch

Pretrained Weights

Our models:

Baseline models:

Kinetics400-pretrained models:

Owner
Tengda Han
Tengda Han
Learning Compatible Embeddings, ICCV 2021

LCE Learning Compatible Embeddings, ICCV 2021 by Qiang Meng, Chixiang Zhang, Xiaoqiang Xu and Feng Zhou Paper: Arxiv We cannot release source codes pu

Qiang Meng 25 Dec 17, 2022
GAN-generated image detection based on CNNs

GAN-image-detection This repository contains a GAN-generated image detector developed to distinguish real images from synthetic ones. The detector is

Image and Sound Processing Lab 17 Dec 15, 2022
Official repo for SemanticGAN https://nv-tlabs.github.io/semanticGAN/

SemanticGAN This is the official code for: Semantic Segmentation with Generative Models: Semi-Supervised Learning and Strong Out-of-Domain Generalizat

151 Dec 28, 2022
This is the official code release for the paper Shape and Material Capture at Home

This is the official code release for the paper Shape and Material Capture at Home. The code enables you to reconstruct a 3D mesh and Cook-Torrance BRDF from one or more images captured with a flashl

89 Dec 10, 2022
Facial Expression Detection In The Realtime

The human's facial expressions is very important to detect thier emotions and sentiment. It can be very efficient to use to make our computers make interviews. Furthermore, we have robots now can det

Adel El-Nabarawy 4 Mar 01, 2022
A decent AI that solves daily Wordle puzzles. Works with different websites with similar wordlists,.

Wordle-AI A decent AI that solves daily "Wordle" puzzles. Works with different websites with similar wordlists. When prompted with "Word:" enter the w

Ethan 1 Feb 10, 2022
yolov5 deepsort 行人 车辆 跟踪 检测 计数

yolov5 deepsort 行人 车辆 跟踪 检测 计数 实现了 出/入 分别计数。 默认是 南/北 方向检测,若要检测不同位置和方向,可在 main.py 文件第13行和21行,修改2个polygon的点。 默认检测类别:行人、自行车、小汽车、摩托车、公交车、卡车。 检测类别可在 detect

554 Dec 30, 2022
Implementation of "Deep Implicit Templates for 3D Shape Representation"

Deep Implicit Templates for 3D Shape Representation Zerong Zheng, Tao Yu, Qionghai Dai, Yebin Liu. arXiv 2020. This repository is an implementation fo

Zerong Zheng 144 Dec 07, 2022
Code for Pose-Controllable Talking Face Generation by Implicitly Modularized Audio-Visual Representation (CVPR 2021)

Pose-Controllable Talking Face Generation by Implicitly Modularized Audio-Visual Representation (CVPR 2021) Hang Zhou, Yasheng Sun, Wayne Wu, Chen Cha

Hang_Zhou 628 Dec 28, 2022
B2EA: An Evolutionary Algorithm Assisted by Two Bayesian Optimization Modules for Neural Architecture Search

B2EA: An Evolutionary Algorithm Assisted by Two Bayesian Optimization Modules for Neural Architecture Search This is the offical implementation of the

SNU ADSL 0 Feb 07, 2022
repro_eval is a collection of measures to evaluate the reproducibility/replicability of system-oriented IR experiments

repro_eval repro_eval is a collection of measures to evaluate the reproducibility/replicability of system-oriented IR experiments. The measures were d

IR Group at Technische Hochschule Köln 9 May 25, 2022
Code for the paper "Benchmarking and Analyzing Point Cloud Classification under Corruptions"

ModelNet-C Code for the paper "Benchmarking and Analyzing Point Cloud Classification under Corruptions". For the latest updates, see: sites.google.com

Jiawei Ren 45 Dec 28, 2022
A Closer Look at Reference Learning for Fourier Phase Retrieval

A Closer Look at Reference Learning for Fourier Phase Retrieval This repository contains code for our NeurIPS 2021 Workshop on Deep Learning and Inver

Tobias Uelwer 1 Oct 28, 2021
BMVC 2021 Oral: code for BI-GCN: Boundary-Aware Input-Dependent Graph Convolution for Biomedical Image Segmentation

BMVC 2021 BI-GConv: Boundary-Aware Input-Dependent Graph Convolution for Biomedical Image Segmentation Necassary Dependencies: PyTorch 1.2.0 Python 3.

Yanda Meng 15 Nov 08, 2022
UNION: An Unreferenced Metric for Evaluating Open-ended Story Generation

UNION Automatic Evaluation Metric described in the paper UNION: An UNreferenced MetrIc for Evaluating Open-eNded Story Generation (EMNLP 2020). Please

50 Dec 30, 2022
Distributed Asynchronous Hyperparameter Optimization better than HyperOpt.

UltraOpt : Distributed Asynchronous Hyperparameter Optimization better than HyperOpt. UltraOpt is a simple and efficient library to minimize expensive

98 Aug 16, 2022
ElasticFace: Elastic Margin Loss for Deep Face Recognition

This is the official repository of the paper: ElasticFace: Elastic Margin Loss for Deep Face Recognition Paper on arxiv: arxiv Model Log file Pretrain

Fadi Boutros 113 Dec 14, 2022
Deep learning library for solving differential equations and more

DeepXDE Voting on whether we should have a Slack channel for discussion. DeepXDE is a library for scientific machine learning. Use DeepXDE if you need

Lu Lu 1.4k Dec 29, 2022
Code for ECCV 2020 paper "Contacts and Human Dynamics from Monocular Video".

Contact and Human Dynamics from Monocular Video This is the official implementation for the ECCV 2020 spotlight paper by Davis Rempe, Leonidas J. Guib

Davis Rempe 207 Jan 05, 2023
This repository contains the code used to quantitatively evaluate counterfactual examples in the associated paper.

On Quantitative Evaluations of Counterfactuals Install To install required packages with conda, run the following command: conda env create -f requi

Frederik Hvilshøj 1 Jan 16, 2022