Code for our CVPR2021 paper coordinate attention

Last update: Jan 05, 2023

Related tags

Overview

Coordinate Attention for Efficient Mobile Network Design (preprint)

This repository is a PyTorch implementation of our coordinate attention (will appear in CVPR2021).

Our coordinate attention can be easily plugged into any classic building blocks as a feature representation augmentation tool. Here (pytorch-image-models) is a code base that you might want to train a classification model on ImageNet.

Note that the results reported in the paper are based on regular training setting (200 training epochs, random crop, and cosine learning schedule) without using extra label smoothing, random augmentation, random erasing, mixup. For specific numbers in ImageNet classification, COCO object detection, and semantic segmentation, please refer to our paper.

Comparison to Squeeze-and-Excitation block and CBAM

(a) Squeeze-and-Excitation block (b) CBAM (C) Coordinate attention block

How to plug the proposed CA block in the inverted residual block and the sandglass block

(a) MobileNetV2 (b) MobileNeXt

Some tips for designing lightweight attention blocks

SiLU activation (h_swish in the code) works better than ReLU6
Either horizontal or vertical direction attention performs the same to the SE attention
When applied to MobileNeXt, adding the attention block after the first depthwise 3x3 convolution works better
Note sure whether the results would be better if a softmax is applied between the horizontal and vertical features

Object detection

We use this repo (ssdlite-pytorch-mobilenext).

Semantic segmentation

We use this repo. You can also refer to mmsegmentation alternatively.

Citation

You may want to cite:

@inproceedings{hou2021coordinate,
  title={Coordinate Attention for Efficient Mobile Network Design},
  author={Hou, Qibin and Zhou, Daquan and Feng, Jiashi},
  booktitle={CVPR},
  year={2021}
}

@inproceedings{sandler2018mobilenetv2,
  title={Mobilenetv2: Inverted residuals and linear bottlenecks},
  author={Sandler, Mark and Howard, Andrew and Zhu, Menglong and Zhmoginov, Andrey and Chen, Liang-Chieh},
  booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
  pages={4510--4520},
  year={2018}
}

@inproceedings{zhou2020rethinking,
  title={Rethinking bottleneck structure for efficient mobile network design},
  author={Zhou, Daquan and Hou, Qibin and Chen, Yunpeng and Feng, Jiashi and Yan, Shuicheng}
  booktitle={ECCV},
  year={2020}
}

@inproceedings{hu2018squeeze,
  title={Squeeze-and-excitation networks},
  author={Hu, Jie and Shen, Li and Sun, Gang},
  booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
  pages={7132--7141},
  year={2018}
}

@inproceedings{woo2018cbam,
  title={Cbam: Convolutional block attention module},
  author={Woo, Sanghyun and Park, Jongchan and Lee, Joon-Young and Kweon, In So},
  booktitle={Proceedings of the European conference on computer vision (ECCV)},
  pages={3--19},
  year={2018}
}

Code for our CVPR2021 paper coordinate attention

Related tags

Overview

Coordinate Attention for Efficient Mobile Network Design (preprint)

Comparison to Squeeze-and-Excitation block and CBAM

How to plug the proposed CA block in the inverted residual block and the sandglass block

Some tips for designing lightweight attention blocks

Object detection

Semantic segmentation

Citation

Owner

Qibin (Andrew) Hou

Instant-nerf-pytorch - NeRF trained SUPER FAST in pytorch

A decent AI that solves daily Wordle puzzles. Works with different websites with similar wordlists,.

Deep Reinforcement Learning for Multiplayer Online Battle Arena

A Multi-modal Perception Tracker (MPT) for speaker tracking using both audio and visual modalities

Official code for 'Robust Siamese Object Tracking for Unmanned Aerial Manipulator' and offical introduction to UAMT100 benchmark

Semi-supervised Domain Adaptation via Minimax Entropy

[CVPR 2022] Official code for the paper: "A Stitch in Time Saves Nine: A Train-Time Regularizing Loss for Improved Neural Network Calibration"

ParaGen is a PyTorch deep learning framework for parallel sequence generation

VL-LTR: Learning Class-wise Visual-Linguistic Representation for Long-Tailed Visual Recognition

Fake videos detection by tracing the source using video hashing retrieval.

Structure Information is the Key: Self-Attention RoI Feature Extractor in 3D Object Detection

Compute execution plan: A DAG representation of work that you want to get done. Individual nodes of the DAG could be simple python or shell tasks or complex deeply nested parallel branches or embedded DAGs themselves.

《Train in Germany, Test in The USA: Making 3D Object Detectors Generalize》(CVPR 2020)

Computer Vision is an elective course of MSAI, SCSE, NTU, Singapore

Structural Constraints on Information Content in Human Brain States

DRLib：A concise deep reinforcement learning library, integrating HER and PER for almost off policy RL algos.

This code implements constituency parse tree aggregation

Learning What and Where to Draw

SeqTR: A Simple yet Universal Network for Visual Grounding

[CVPR 2022] "The Principle of Diversity: Training Stronger Vision Transformers Calls for Reducing All Levels of Redundancy" by Tianlong Chen, Zhenyu Zhang, Yu Cheng, Ahmed Awadallah, Zhangyang Wang