CVPR 2022 "Online Convolutional Re-parameterization"

Last update: Dec 21, 2022

Overview

OREPA: Online Convolutional Re-parameterization

This repo is the PyTorch implementation of our paper to appear in CVPR2022 on "Online Convolutional Re-parameterization", authored by Mu Hu, Junyi Feng, Jiashen Hua, Baisheng Lai, Jianqiang Huang, Xiaojin Gong and Xiansheng Hua from Zhejiang University and Alibaba Cloud.

What is Structural Re-parameterization?

Re-parameterization (Re-param) means different architectures can be mutually converted through equivalent transformation of parameters. For example, a branch of 1x1 convolution and a branch of 3x3 convolution, can be transferred into a single branch of 3x3 convolution for faster inference.
When the model for deployment is fixed, the task of re-param can be regarded as finding a complex training-time structure, which can be transfered back to the original one, for free performance improvements.

Why do we propose Online RE-PAram? (OREPA)

While current re-param blocks (ACNet, ExpandNet, ACNetv2, etc) are still feasible for small models, more complecated design for further performance gain on larger models could lead to unaffordable training budgets.
We observed that batch normalization (norm) layers are significant in re-param blocks, while their training-time non-linearity prevents us from optimizing computational costs during training.

What is OREPA?

OREPA is a two-step pipeline.

Linearization: Replace the branch-wise norm layers to scaling layers to enable the linear squeezing of a multi-branch/layer topology.
Squeezing: Squeeze the linearized block into a single layer, where the convolution upon feature maps is reduced from multiple times to one.

How does OREPA work?

Through OREPA we could reduce the training budgets while keeping a comparable performance. Then we improve accuracy by additional components, which brings minor extra training costs since they are merged in an online scheme.
We theoretically present that the removal of branch-wise norm layers risks a multi-branch structure degrading into a single-branch one, indicating that the norm-scaling layer replacement is critical for protecting branch diversity.

ImageNet Results

Create a new issue for any code-related questions. Feel free to direct me as well at [email protected] for any paper-related questions.

Dependency
Checkpoints
Training
Evaluation
Transfer Learning on COCO and Cityscapes
About Quantization and Gradient Tweaking
Citation

Dependency

Models released in this work is trained and tested on:

CentOS Linux
Python 3.8.8 (Anaconda 4.9.1)
PyTorch 1.9.0 / torchvision 0.10.0
NVIDIA CUDA 10.2
4x NVIDIA V100 GPUs

pip install torch torchvision
pip install numpy matplotlib Pillow
pip install scikit-image

Checkpoints

Download our pre-trained models with OREPA:

Note that we don't need to decompress the pre-trained models. Just load the files of .pth.tar format directly.

Training

A complete list of training options is available with

python train.py -h
python test.py -h
python convert.py -h

Train ResNets (ResNeXt and WideResNet included)

CUDA_VISIBLE_DEVICES="0,1,2,3" python train.py -a ResNet-18 -t OREPA --data [imagenet-path]
# -a for architecture (ResNet-18, ResNet-34, ResNet-50, ResNet-101, ResNet-18-2x, ResNeXt-50)
# -t for re-param method (base, DBB, OREPA)

Train RepVGGs

CUDA_VISIBLE_DEVICES="0,1,2,3" python train.py -a RepVGG-A0 -t OREPA_VGG --data [imagenet-path]
# -a for architecture (RepVGG-A0, RepVGG-A1, RepVGG-A2)
# -t for re-param method (base, RepVGG, OREPA_VGG)

Evaluation

Use your self-trained model or our pretrained model

CUDA_VISIBLE_DEVICES="0" python test.py train [trained-model-path] -a ResNet-18 -t OREPA

Convert the training-time models into inference-time models

CUDA_VISIBLE_DEVICES="0" python convert.py [trained-model-path] [deploy-model-path-to-save] -a ResNet-18 -t OREPA

Evaluate with the converted model

CUDA_VISIBLE_DEVICES="0" python test.py deploy [deploy-model-path] -a ResNet-18 -t OREPA

Transfer Learning on COCO and Cityscapes

We use mmdetection and mmsegmentation tools on COCO and Cityscapes respectively. If you decide to use our pretrained model for downstream tasks, it is strongly suggested that the learning rate of the first stem layer should be fine adjusted, since the deep linear stem layer has a very different weight distribution from the vanilla one after ImageNet training. Contact @Sixkplus (Junyi Feng) for more details on configurations and checkpoints of the reported ResNet-50-backbone models.

About Quantization and Gradient Tweaking

For re-param models, special weight regulization strategies are required for furthur quantization. Meanwhile, dynamic gradient tweaking or differential searching methods might greatly boost the performance. Currently we have not deployed such techniques to OREPA yet. However such methods could be probably applied to our industrial usage in the future. For experience exchanging and sharing on such topics please contact @Sixkplus (Junyi Feng).

Citation

If you use our code or method in your work, please cite the following:

@inproceedings{hu22OREPA,
	title={Online Convolutional Re-parameterization},
	author={Mu Hu and Junyi Feng and Jiashen Hua and Baisheng Lai and Jianqiang Huang and Xiansheng Hua and Xiaojin Gong},
	booktitle={CVPR},
	year={2022}
}

Related Repositories

Codes of this work is developed upon Xiaohan Ding's re-param repositories "Diverse Branch Block: Building a Convolution as an Inception-like Unit" and "RepVGG: Making VGG-style ConvNets Great Again" with similar protocols. Xiaohan Ding is a Ph.D. from Tsinghua University and an expert in structural re-parameterization.

CVPR 2022 "Online Convolutional Re-parameterization"

Related tags

Overview

OREPA: Online Convolutional Re-parameterization

What is Structural Re-parameterization?

Why do we propose Online RE-PAram? (OREPA)

What is OREPA?

How does OREPA work?

ImageNet Results

Contents

Dependency

Checkpoints

Training

Evaluation

Transfer Learning on COCO and Cityscapes

About Quantization and Gradient Tweaking

Citation

Related Repositories

Owner

Mu Hu

Cross-Modal Contrastive Learning for Text-to-Image Generation

auto-tuning momentum SGD optimizer

Code for the paper Relation Prediction as an Auxiliary Training Objective for Improving Multi-Relational Graph Representations (AKBC 2021).

How to Learn a Domain Adaptive Event Simulator? ACM MM, 2021

Underwater industrial application yolov5m6

Robocop is your personal mini voice assistant made using Python.

Awesome Transformers in Medical Imaging

[ICCV2021] Official Pytorch implementation for SDGZSL (Semantics Disentangling for Generalized Zero-Shot Learning)

⚖️🔁🔮🕵️‍♂️🦹🖼️ Code for Measuring the Contribution of Multiple Model Representations in Detecting Adversarial Instances paper.

ACAV100M: Automatic Curation of Large-Scale Datasets for Audio-Visual Video Representation Learning. In ICCV, 2021.

Receptive Field Block Net for Accurate and Fast Object Detection, ECCV 2018

A Pytorch Implementation of Source Data-free Domain Adaptation for a Faster R-CNN

Delta Conformity Sociopatterns Analysis - Delta Conformity Sociopatterns Analysis

Doods2 - API for detecting objects in images and video streams using Tensorflow

TDmatch is a Python library developed to perform matching tasks in three categories:

Codes and models for the paper "Learning Unknown from Correlations: Graph Neural Network for Inter-novel-protein Interaction Prediction".

TensorFlow implementation of "A Simple Baseline for Bayesian Uncertainty in Deep Learning"

Awesome Artificial Intelligence, Machine Learning and Deep Learning as we learn it

Myia prototyping

CSAW-M: An Ordinal Classification Dataset for Benchmarking Mammographic Masking of Cancer

CVPR 2022 "Online Convolutional Re-parameterization"

Related tags

Overview

OREPA: Online Convolutional Re-parameterization

What is Structural Re-parameterization?

Why do we propose Online RE-PAram? (OREPA)

What is OREPA?

How does OREPA work?

ImageNet Results

Contents

Dependency

Checkpoints

Training

Evaluation

Transfer Learning on COCO and Cityscapes

About Quantization and Gradient Tweaking

Citation

Related Repositories

Owner

Mu Hu

Cross-Modal Contrastive Learning for Text-to-Image Generation

auto-tuning momentum SGD optimizer

Code for the paper Relation Prediction as an Auxiliary Training Objective for Improving Multi-Relational Graph Representations (AKBC 2021).

How to Learn a Domain Adaptive Event Simulator? ACM MM, 2021

Underwater industrial application yolov5m6

Robocop is your personal mini voice assistant made using Python.

Awesome Transformers in Medical Imaging

[ICCV2021] Official Pytorch implementation for SDGZSL (Semantics Disentangling for Generalized Zero-Shot Learning)

⚖️🔁🔮🕵️‍♂️🦹🖼️ Code for *Measuring the Contribution of Multiple Model Representations in Detecting Adversarial Instances* paper.

ACAV100M: Automatic Curation of Large-Scale Datasets for Audio-Visual Video Representation Learning. In ICCV, 2021.

Receptive Field Block Net for Accurate and Fast Object Detection, ECCV 2018

A Pytorch Implementation of Source Data-free Domain Adaptation for a Faster R-CNN

Delta Conformity Sociopatterns Analysis - Delta Conformity Sociopatterns Analysis

Doods2 - API for detecting objects in images and video streams using Tensorflow

TDmatch is a Python library developed to perform matching tasks in three categories:

Codes and models for the paper "Learning Unknown from Correlations: Graph Neural Network for Inter-novel-protein Interaction Prediction".

TensorFlow implementation of "A Simple Baseline for Bayesian Uncertainty in Deep Learning"

Awesome Artificial Intelligence, Machine Learning and Deep Learning as we learn it

Myia prototyping

CSAW-M: An Ordinal Classification Dataset for Benchmarking Mammographic Masking of Cancer

⚖️🔁🔮🕵️‍♂️🦹🖼️ Code for Measuring the Contribution of Multiple Model Representations in Detecting Adversarial Instances paper.