Implementation of momentum^2 teacher

Overview

Momentum^2 Teacher: Momentum Teacher with Momentum Statistics for Self-Supervised Learning

Requirements

  1. All experiments are done with python3.6, torch==1.5.0; torchvision==0.6.0

Usage

Data Preparation

Prepare the ImageNet data in ${root_of_your_clone}/data/imagenet_train, ${root_of_your_clone}/data/imagenet_val. Since we have an internal platform(storage) to read imagenet, I have not tried the local mode. You may need to do some modification in momentum_teacher/data/dataset.py to support the local mode.

Training

Before training, ensure the path (namely ${root_of_clone}) is added in your PYTHONPATH, e.g.

export PYTHONPATH=$PYTHONPATH:${root_of_clone}

To do unsupervised pre-training of a ResNet-50 model on ImageNet in an 8-gpu machine, run:

  1. using -d to specify gpu_id for training, e.g., -d 0-7
  2. using -b to specify batch_size, e.g., -b 256
  3. using --experiment-name to specify the output folder, and the training log & models will be dumped to './outputs/${experiment-name}'
  4. using -f to specify the description file of ur experiment.

e.g.,

python3 momentum_teacher/tools/train.py -b 256 -d 0-7 --experiment-name your_exp -f momentum_teacher/exps/arxiv/exp_8_v100/momentum2_teacher_100e_exp.py

Linear Evaluation:

With a pre-trained model, to train a supervised linear classifier on frozen features/weights in an 8 gpus machine, run:

  1. using -d to specify gpu_id for training, e.g., -d 0-7
  2. using -b to specify batch_size, e.g., -b 256
  3. using --experiment-name to specify the folder for saving pre-training models.
python3 momentum_teacher/tools/eval.py -b 256 --experiment-name your_exp -f momentum_teacher/exps/arxiv/linear_eval_exp_byol.py

Results

Results of Pretraining on a Single Machine

After pretraining on 8 NVIDIA V100 GPUS and 1024 batch-sizes, the results of linear-evaluation are:

pre-train code pre-train
epochs
pre-train time accuracy weights
path 100 ~1.8 day 70.7 -
path 200 ~3.6 day 72.7 -
path 300 ~5.5 day 73.8 -

After pretraining on 8 NVIDIA 2080 GPUS and 256 batch-sizes, the results of linear-evaluation are:

pre-train code pre-train
epochs
pre-train time accuracy wights
path 100 ~2.5 day 70.4 -
path 200 ~5 day 72.3 -
path 300 ~7.5 day 72.9 -

Results of Pretraining on Multiple Machines

E.g., To do unsupervised pre-training with 4096 batch-sizes and 32 V100 GPUs. run:

Suggesting that each machine has 8 V100 GPUs and there are 4 machines

# machine 1:
export MACHINE=0; export MACHINE_TOTAL=4; python3 momentum_teacher/tools/train.py -b 4096 -f xxx
# machine 2:
export MACHINE=1; export MACHINE_TOTAL=4; python3 momentum_teacher/tools/train.py -b 4096 -f xxx
# machine 3:
export MACHINE=2; export MACHINE_TOTAL=4; python3 momentum_teacher/tools/train.py -b 4096 -f xxx
# machine 4:
export MACHINE=3; export MACHINE_TOTAL=4; python3 momentum_teacher/tools/train.py -b 4096 -f xxx

results of linear-eval:

pre-train code pre-train
epochs
pre-train time accuracy weights
path 100 ~11hour 70.3 -
path 200 ~22hour 72.5 -
path 300 ~33hour 73.7 -

To do unsupervised pre-training with 4096 batch-sizes and 128 2080 GPUs, pls follow the above guides. Results of linear-eval:

pre-train code pre-train
epochs
pre-train time accuracy weights
path 100 ~5hour 69.0 -
path 200 ~10hour 71.5 -
path 300 ~15hour 72.3 -

Disclaimer

This is an implementation for Momentum^2 Teacher, it is worth noting that:

  • The original implementation is based on our internal Platform.
  • This released version has slightly better performances compared with the tech report's.
Owner
jemmy li
jemmy li
An official implementation of the Anchor DETR.

Anchor DETR: Query Design for Transformer-Based Detector Introduction This repository is an official implementation of the Anchor DETR. We encode the

MEGVII Research 276 Dec 28, 2022
PyTorch original implementation of Cross-lingual Language Model Pretraining.

XLM NEW: Added XLM-R model. PyTorch original implementation of Cross-lingual Language Model Pretraining. Includes: Monolingual language model pretrain

Facebook Research 2.7k Dec 27, 2022
Find the Heart simple Python Game

This is a simple Python game for finding a heart emoji. There is a 3 x 3 matrix in which a heart emoji resides. The location of the heart is randomized and is not revealed. The player must guess the

p.katekomol 1 Jan 24, 2022
A machine learning library for spiking neural networks. Supports training with both torch and jax pipelines, and deployment to neuromorphic hardware.

Rockpool Rockpool is a Python package for developing signal processing applications with spiking neural networks. Rockpool allows you to build network

SynSense 21 Dec 14, 2022
Simple image captioning model - CLIP prefix captioning.

Simple image captioning model - CLIP prefix captioning.

688 Jan 04, 2023
Python package for covariance matrices manipulation and Biosignal classification with application in Brain Computer interface

pyRiemann pyRiemann is a python package for covariance matrices manipulation and classification through Riemannian geometry. The primary target is cla

447 Jan 05, 2023
Plugin for Gaffer providing direct acess to asset from PolyHaven.com. Only HDRIs at the moment, Cycles and Arnold supported

GafferHaven Plugin for Gaffer providing direct acess to asset from PolyHaven.com. Only HDRIs are supported at the moment, in Cycles and Arnold lights.

Jakub Vondra 6 Jan 26, 2022
PyTorch implementation of paper "StarEnhancer: Learning Real-Time and Style-Aware Image Enhancement" (ICCV 2021 Oral)

StarEnhancer StarEnhancer: Learning Real-Time and Style-Aware Image Enhancement (ICCV 2021 Oral) Abstract: Image enhancement is a subjective process w

IDKiro 133 Dec 28, 2022
NitroFE is a Python feature engineering engine which provides a variety of modules designed to internally save past dependent values for providing continuous calculation.

NitroFE is a Python feature engineering engine which provides a variety of modules designed to internally save past dependent values for providing continuous calculation.

100 Sep 28, 2022
[ACL-IJCNLP 2021] "EarlyBERT: Efficient BERT Training via Early-bird Lottery Tickets"

EarlyBERT This is the official implementation for the paper in ACL-IJCNLP 2021 "EarlyBERT: Efficient BERT Training via Early-bird Lottery Tickets" by

VITA 13 May 11, 2022
RATE: Overcoming Noise and Sparsity of Textual Features in Real-Time Location Estimation (CIKM'17)

RATE: Overcoming Noise and Sparsity of Textual Features in Real-Time Location Estimation This is the implementation of RATE: Overcoming Noise and Spar

Yu Zhang 5 Feb 10, 2022
StyleGAN of All Trades: Image Manipulation withOnly Pretrained StyleGAN

StyleGAN of All Trades: Image Manipulation withOnly Pretrained StyleGAN This is the PyTorch implementation of StyleGAN of All Trades: Image Manipulati

360 Dec 28, 2022
A comprehensive list of published machine learning applications to cosmology

ml-in-cosmology This github attempts to maintain a comprehensive list of published machine learning applications to cosmology, organized by subject ma

George Stein 290 Dec 29, 2022
Code for paper Adaptively Aligned Image Captioning via Adaptive Attention Time

Adaptively Aligned Image Captioning via Adaptive Attention Time This repository includes the implementation for Adaptively Aligned Image Captioning vi

Lun Huang 45 Aug 27, 2022
Gesture-controlled Video Game. Just swing your finger and play the game without touching your PC

Gesture Controlled Video Game Detailed Blog : https://www.analyticsvidhya.com/blog/2021/06/gesture-controlled-video-game/ Introduction This project is

Devbrat Anuragi 35 Jan 06, 2023
Memory-efficient optimum einsum using opt_einsum planning and PyTorch kernels.

opt-einsum-torch There have been many implementations of Einstein's summation. numpy's numpy.einsum is the least efficient one as it only runs in sing

Haoyan Huo 9 Nov 18, 2022
Shared Attention for Multi-label Zero-shot Learning

Shared Attention for Multi-label Zero-shot Learning Overview This repository contains the implementation of Shared Attention for Multi-label Zero-shot

dathuynh 26 Dec 14, 2022
CS583: Deep Learning

CS583: Deep Learning

Shusen Wang 2.6k Dec 30, 2022
FIRA: Fine-Grained Graph-Based Code Change Representation for Automated Commit Message Generation

FIRA is a learning-based commit message generation approach, which first represents code changes via fine-grained graphs and then learns to generate commit messages automatically.

Van 21 Dec 30, 2022
Official PyTorch implementation of the paper "Likelihood Training of Schrödinger Bridge using Forward-Backward SDEs Theory (SB-FBSDE)"

Official PyTorch implementation of the paper "Likelihood Training of Schrödinger Bridge using Forward-Backward SDEs Theory (SB-FBSDE)" which introduces a new class of deep generative models that gene

Guan-Horng Liu 43 Jan 03, 2023