计算机视觉中用到的注意力模块和其他即插即用模块PyTorch Implementation Collection of Attention Module and Plug&Play Module

Overview

Awesome-Attention-Mechanism-in-cv

Table of Contents

Introduction

PyTorch实现多种计算机视觉中网络设计中用到的Attention机制,还收集了一些即插即用模块。由于能力有限精力有限,可能很多模块并没有包括进来,有任何的建议或者改进,可以提交issue或者进行PR。

Attention Mechanism

Paper Publish Link Main Idea Blog
Global Second-order Pooling Convolutional Networks CVPR19 GSoPNet 将高阶和注意力机制在网络中部地方结合起来
Neural Architecture Search for Lightweight Non-Local Networks CVPR20 AutoNL NAS+LightNL
Squeeze and Excitation Network CVPR18 SENet 最经典的通道注意力 zhihu
Selective Kernel Network CVPR19 SKNet SE+动态选择 zhihu
Convolutional Block Attention Module ECCV18 CBAM 串联空间+通道注意力 zhihu
BottleNeck Attention Module BMVC18 BAM 并联空间+通道注意力 zhihu
Concurrent Spatial and Channel ‘Squeeze & Excitation’ in Fully Convolutional Networks MICCAI18 scSE 并联空间+通道注意力 zhihu
Non-local Neural Networks CVPR19 Non-Local(NL) self-attention zhihu
GCNet: Non-local Networks Meet Squeeze-Excitation Networks and Beyond ICCVW19 GCNet 对NL进行改进 zhihu
CCNet: Criss-Cross Attention for Semantic Segmentation ICCV19 CCNet 对NL改进
SA-Net:shuffle attention for deep convolutional neural networks ICASSP 21 SANet SGE+channel shuffle zhihu
ECA-Net: Efficient Channel Attention for Deep Convolutional Neural Networks CVPR20 ECANet SE的改进
Spatial Group-wise Enhance: Improving Semantic Feature Learning in Convolutional Networks CoRR19 SGENet Group+spatial+channel
FcaNet: Frequency Channel Attention Networks CoRR20 FcaNet 频域上的SE操作
$A^2\text{-}Nets$: Double Attention Networks NeurIPS18 DANet NL的思想应用到空间和通道
Asymmetric Non-local Neural Networks for Semantic Segmentation ICCV19 APNB spp+NL
Efficient Attention: Attention with Linear Complexities CoRR18 EfficientAttention NL降低计算量
Image Restoration via Residual Non-local Attention Networks ICLR19 RNAN
Exploring Self-attention for Image Recognition CVPR20 SAN 理论性很强,实现起来很简单
An Empirical Study of Spatial Attention Mechanisms in Deep Networks ICCV19 None MSRA综述self-attention
Object-Contextual Representations for Semantic Segmentation ECCV20 OCRNet 复杂的交互机制,效果确实好
IAUnet: Global Context-Aware Feature Learning for Person Re-Identification TTNNLS20 IAUNet 引入时序信息
ResNeSt: Split-Attention Networks CoRR20 ResNeSt SK+ResNeXt
Gather-Excite: Exploiting Feature Context in Convolutional Neural Networks NeurIPS18 GENet SE续作
Improving Convolutional Networks with Self-calibrated Convolutions CVPR20 SCNet 自校正卷积
Rotate to Attend: Convolutional Triplet Attention Module WACV21 TripletAttention CHW两两互相融合
Dual Attention Network for Scene Segmentation CVPR19 DANet self-attention
Relation-Aware Global Attention for Person Re-identification CVPR20 RGA 用于reid
Attentional Feature Fusion WACV21 AFF 特征融合的attention方法
An Attentive Survey of Attention Models CoRR19 None 包括NLP/CV/推荐系统等方面的注意力机制
Stand-Alone Self-Attention in Vision Models NeurIPS19 FullAttention 全部的卷积都替换为self-attention
BiSeNet: Bilateral Segmentation Network for Real-time Semantic Segmentation ECCV18 BiSeNet 类似FPN的特征融合方法 zhihu
DCANet: Learning Connected Attentions for Convolutional Neural Networks CoRR20 DCANet 增强attention之间信息流动
An Empirical Study of Spatial Attention Mechanisms in Deep Networks ICCV19 None 对空间注意力进行针对性分析
Look closer to see better: Recurrent attention convolutional neural network for fine-grained image recognition CVPR17 Oral RA-CNN 细粒度识别
Guided Attention Network for Object Detection and Counting on Drones ACM MM20 GANet 处理目标检测问题
Attention Augmented Convolutional Networks ICCV19 AANet 多头+引入额外特征映射
GLOBAL SELF-ATTENTION NETWORKS FOR IMAGE RECOGNITION ICLR21 GSA 新的全局注意力模块
Attention-Guided Hierarchical Structure Aggregation for Image Matting CVPR20 HAttMatting 抠图方面的应用,高层使用通道注意力机制,然后再使用空间注意力机制指导低层。
Weight Excitation: Built-in Attention Mechanisms in Convolutional Neural Networks ECCV20 None 与SE互补的权值激活机制
Expectation-Maximization Attention Networks for Semantic Segmentation ICCV19 Oral EMANet EM+Attention

Plug and Play Module

  • ACBlock
  • Swish、wish Activation
  • ASPP Block
  • DepthWise Convolution
  • Fused Conv & BN
  • MixedDepthwise Convolution
  • PSP Module
  • RFBModule
  • SematicEmbbedBlock
  • SSH Context Module
  • Some other usefull tools such as concate feature map、flatten feature map
  • WeightedFeatureFusion:EfficientDet中的FPN用到的fuse方式
  • StripPooling:CVPR2020中核心代码StripPooling
  • GhostModule: CVPR2020GhostNet的核心模块
  • SlimConv: SlimConv3x3
  • Context Gating: video classification
  • EffNetBlock: EffNet
  • ECCV2020 BorderDet: Border aligment module
  • CVPR2019 DANet: Dual Attention
  • Object Contextual Representation for sematic segmentation: OCRModule
  • FPT: 包含Self Transform、Grounding Transform、Rendering Transform
  • DOConv: 阿里提出的Depthwise Over-parameterized Convolution
  • PyConv: 起源人工智能研究院提出的金字塔卷积
  • ULSAM:用于紧凑型CNN的超轻量级子空间注意力模块
  • DGC: ECCV 2020用于加速卷积神经网络的动态分组卷积
  • DCANet: ECCV 2020 学习卷积神经网络的连接注意力
  • PSConv: ECCV 2020 将特征金字塔压缩到紧凑的多尺度卷积层中
  • Dynamic Convolution: CVPR2020 动态滤波器卷积(非官方)
  • CondConv: Conditionally Parameterized Convolutions for Efficient Inference

Evaluation

基于CIFAR10+ResNet+待测评模块,对模块进行初步测评。测评代码来自于另外一个库:https://github.com/kuangliu/pytorch-cifar/ 实验过程中,不使用预训练权重,进行随机初始化。

模型 top1 acc time params(MB)
SENet18 95.28% 1:27:50 11,260,354
ResNet18 95.16% 1:13:03 11,173,962
ResNet50 95.50% 4:24:38 23,520,842
ShuffleNetV2 91.90% 1:02:50 1,263,854
GoogLeNet 91.90% 1:02:50 6,166,250
MobileNetV2 92.66% 2:04:57 2,296,922
SA-ResNet50 89.83% 2:10:07 23,528,758
SA-ResNet18 95.07% 1:39:38 11,171,394

Paper List

SENet 论文: https://arxiv.org/abs/1709.01507 解读:https://zhuanlan.zhihu.com/p/102035721

Contribute

欢迎在issue中提出补充的文章paper和对应code链接。

Owner
PJDong
Computer vision learner, deep learner
PJDong
Official implementation of CVPR2020 paper "Deep Generative Model for Robust Imbalance Classification"

Deep Generative Model for Robust Imbalance Classification Deep Generative Model for Robust Imbalance Classification Xinyue Wang, Yilin Lyu, Liping Jin

9 Nov 01, 2022
Lip Reading - Cross Audio-Visual Recognition using 3D Convolutional Neural Networks

Lip Reading - Cross Audio-Visual Recognition using 3D Convolutional Neural Networks - Official Project Page This repository contains the code develope

Amirsina Torfi 1.7k Dec 18, 2022
Modified fork of Xuebin Qin's U-2-Net Repository. Used for demonstration purposes.

U^2-Net (U square net) Modified version of U2Net used for demonstation purposes. Paper: U^2-Net: Going Deeper with Nested U-Structure for Salient Obje

Shreyas Bhat Kera 13 Aug 28, 2022
Official implementation of the network presented in the paper "M4Depth: A motion-based approach for monocular depth estimation on video sequences"

M4Depth This is the reference TensorFlow implementation for training and testing depth estimation models using the method described in M4Depth: A moti

Michaël Fonder 76 Jan 03, 2023
Smart edu-autobooking - Johnson @ DMI-UNICT study room self-booking system

smart_edu-autobooking Sistema di autoprenotazione per l'aula studio [email protected]

Davide Carnemolla 17 Jun 20, 2022
Code release for "COTR: Correspondence Transformer for Matching Across Images"

COTR: Correspondence Transformer for Matching Across Images This repository contains the inference code for COTR. We plan to release the training code

UBC Computer Vision Group 360 Jan 06, 2023
MobileNetV1-V2,MobileNeXt,GhostNet,AdderNet,ShuffleNetV1-V2,Mobile+ViT etc.

MobileNetV1-V2,MobileNeXt,GhostNet,AdderNet,ShuffleNetV1-V2,Mobile+ViT etc. ⭐⭐⭐⭐⭐

568 Jan 04, 2023
Python scripts form performing stereo depth estimation using the high res stereo model in PyTorch .

PyTorch-High-Res-Stereo-Depth-Estimation Python scripts form performing stereo depth estimation using the high res stereo model in PyTorch. Stereo dep

Ibai Gorordo 26 Nov 24, 2022
Code base of object detection

rmdet code base of object detection. 环境安装: 1. 安装conda python环境 - `conda create -n xxx python=3.7/3.8` - `conda activate xxx` 2. 运行脚本,自动安装pytorch1

3 Mar 08, 2022
[BMVC2021] The official implementation of "DomainMix: Learning Generalizable Person Re-Identification Without Human Annotations"

DomainMix [BMVC2021] The official implementation of "DomainMix: Learning Generalizable Person Re-Identification Without Human Annotations" [paper] [de

Wenhao Wang 17 Dec 20, 2022
Tooling for converting STAC metadata to ODC data model

手语识别 0、使用到的模型 (1). openpose,作者:CMU-Perceptual-Computing-Lab https://github.com/CMU-Perceptual-Computing-Lab/openpose (2). 图像分类classification,作者:Bubbl

Open Data Cube 65 Dec 20, 2022
An adaptive hierarchical energy management strategy for hybrid electric vehicles

An adaptive hierarchical energy management strategy This project contains the source code of an adaptive hierarchical EMS combining heuristic equivale

19 Dec 13, 2022
这是一个利用facenet和retinaface实现人脸识别的库,可以进行在线的人脸识别。

Facenet+Retinaface:人脸识别模型在Pytorch当中的实现 目录 注意事项 Attention 所需环境 Environment 文件下载 Download 预测步骤 How2predict 参考资料 Reference 注意事项 该库中包含了两个网络,分别是retinaface和

Bubbliiiing 102 Dec 30, 2022
FLAVR is a fast, flow-free frame interpolation method capable of single shot multi-frame prediction

FLAVR is a fast, flow-free frame interpolation method capable of single shot multi-frame prediction. It uses a customized encoder decoder architecture with spatio-temporal convolutions and channel ga

Tarun K 280 Dec 23, 2022
A curated list of awesome resources related to Semantic Search🔎 and Semantic Similarity tasks.

A curated list of awesome resources related to Semantic Search🔎 and Semantic Similarity tasks.

224 Jan 04, 2023
Implementation of Google Brain's WaveGrad high-fidelity vocoder

WaveGrad Implementation (PyTorch) of Google Brain's high-fidelity WaveGrad vocoder (paper). First implementation on GitHub with high-quality generatio

Ivan Vovk 363 Dec 27, 2022
VID-Fusion: Robust Visual-Inertial-Dynamics Odometry for Accurate External Force Estimation

VID-Fusion VID-Fusion: Robust Visual-Inertial-Dynamics Odometry for Accurate External Force Estimation Authors: Ziming Ding , Tiankai Yang, Kunyi Zhan

ZJU FAST Lab 86 Nov 18, 2022
An University Project of Quera Web Crawling.

WebCrawlerProject An University Project of Quera Web Crawling. خزشگر اینستاگرام در این پروژه شما باید با استفاده از کتابخانه های زیر یک خزشگر اینستاگر

Mahdi 3 Aug 12, 2022
Exploring Versatile Prior for Human Motion via Motion Frequency Guidance (3DV2021)

Exploring Versatile Prior for Human Motion via Motion Frequency Guidance [Video Demo] [Paper] Installation Requirements Python 3.6 PyTorch 1.1.0 Pleas

Jiachen Xu 19 Oct 28, 2022
Code for "Learning the Best Pooling Strategy for Visual Semantic Embedding", CVPR 2021

Learning the Best Pooling Strategy for Visual Semantic Embedding Official PyTorch implementation of the paper Learning the Best Pooling Strategy for V

Jiacheng Chen 106 Jan 06, 2023