Official code for paper "Demystifying Local Vision Transformer: Sparse Connectivity, Weight Sharing, and Dynamic Weight"

Overview

Demysitifing Local Vision Transformer, arxiv

This is the official PyTorch implementation of our paper. We simply replace local self attention by (dynamic) depth-wise convolution with lower computational cost. The performance is on par with the Swin Transformer.

Besides, the main contribution of our paper is the theorical and detailed comparison between depth-wise convolution and local self attention from three aspects: sparse connectivity, weight sharing and dynamic weight. By this paper, we want community to rethinking the local self attention and depth-wise convolution, and the basic model architeture designing rules.

Codes and models for object detection and semantic segmentation are avaliable in Detection and Segmentation.

We also give MLP based Swin Transformer models and Inhomogenous dynamic convolution in the ablation studies. These codes and models will coming soon.

Reference

@article{han2021demystifying,
  title={Demystifying Local Vision Transformer: Sparse Connectivity, Weight Sharing, and Dynamic Weight},
  author={Han, Qi and Fan, Zejia and Dai, Qi and Sun, Lei and Cheng, Ming-Ming and Liu, Jiaying and Wang, Jingdong},
  journal={arXiv preprint arXiv:2106.04263},
  year={2021}
}

1. Requirements

torch>=1.5.0, torchvision, timm, pyyaml; apex-amp

data perpare: ImageNet dataset with the following structure:

│imagenet/
├──train/
│  ├── n01440764
│  │   ├── n01440764_10026.JPEG
│  │   ├── n01440764_10027.JPEG
│  │   ├── ......
│  ├── ......
├──val/
│  ├── n01440764
│  │   ├── ILSVRC2012_val_00000293.JPEG
│  │   ├── ILSVRC2012_val_00002138.JPEG
│  │   ├── ......
│  ├── ......

2. Trainning

For tiny model, we train with batch-size 128 on 8 GPUs. When trainning base model, we use batch-size 64 on 16 GPUs with OpenMPI to keep the total batch-size unchanged. (With the same trainning setting, the base model couldn't train with AMP due to the anomalous gradient values.)

Please change the data path in sh scripts first.

For tiny model:

bash scripts/run_dwnet_tiny_patch4_window7_224.sh 
bash scripts/run_dynamic_dwnet_tiny_patch4_window7_224.sh

For base model, use multi node with OpenMPI:

bash scripts/run_dwnet_base_patch4_window7_224.sh 
bash scripts/run_dynamic_dwnet_base_patch4_window7_224.sh

3. Evaluation

python -m torch.distributed.launch --nproc_per_node 1 --master_port 12345 main.py --cfg configs/change_to_config_file --resume /path/to/model --data-path /path/to/imagenet --eval

4. Models

Models are provided by training on ImageNet with resolution 224.

Model #params FLOPs Top1 Acc Download
dwnet_tiny 24M 3.8G 81.2 github
dynamic_dwnet_tiny 51M 3.8G 81.8 github
dwnet_base 74M 12.9G 83.2 github
dynamic_dwnet_base 162M 13.0G 83.2 github

Detection (see Detection for details):

Backbone Pretrain Lr Schd box mAP mask mAP #params FLOPs config model
DWNet-T ImageNet-1K 3x 49.9 43.4 82M 730G config github
DWNet-B ImageNet-1K 3x 51.0 44.1 132M 924G config github
Dynamic-DWNet-T ImageNet-1K 3x 50.5 43.7 108M 730G config github
Dynamic-DWNet-B ImageNet-1K 3x 51.2 44.4 219M 924G config github

Segmentation (see Segmentation for details):

Backbone Pretrain Lr Schd mIoU #params FLOPs config model
DWNet-T ImageNet-1K 160K 45.5 56M 928G config github
DWNet-B ImageNet-1K 160K 48.3 108M 1129G config github
Dynamic-DWNet-T ImageNet-1K 160K 45.7 83M 928G config github
Dynamic-DWNet-B ImageNet-1K 160K 48.0 195M 1129G config github

LICENSE

This repo is under the MIT license. Some codes are borrow from Swin Transformer.

You might also like...
Official code repository of the paper Learning Associative Inference Using Fast Weight Memory by Schlag et al.

Learning Associative Inference Using Fast Weight Memory This repository contains the offical code for the paper Learning Associative Inference Using F

Official PyTorch code for CVPR 2020 paper
Official PyTorch code for CVPR 2020 paper "Deep Active Learning for Biased Datasets via Fisher Kernel Self-Supervision"

Deep Active Learning for Biased Datasets via Fisher Kernel Self-Supervision https://arxiv.org/abs/2003.00393 Abstract Active learning (AL) aims to min

Official Code for ICML 2021 paper
Official Code for ICML 2021 paper "Revisiting Point Cloud Shape Classification with a Simple and Effective Baseline"

Revisiting Point Cloud Shape Classification with a Simple and Effective Baseline Ankit Goyal, Hei Law, Bowei Liu, Alejandro Newell, Jia Deng Internati

CVPR 2021 - Official code repository for the paper: On Self-Contact and Human Pose.
CVPR 2021 - Official code repository for the paper: On Self-Contact and Human Pose.

selfcontact This repo is part of our project: On Self-Contact and Human Pose. [Project Page] [Paper] [MPI Project Page] It includes the main function

CVPR 2021 - Official code repository for the paper: On Self-Contact and Human Pose.
CVPR 2021 - Official code repository for the paper: On Self-Contact and Human Pose.

SMPLify-XMC This repo is part of our project: On Self-Contact and Human Pose. [Project Page] [Paper] [MPI Project Page] License Software Copyright Lic

Official code of paper "PGT: A Progressive Method for Training Models on Long Videos" on CVPR2021

PGT Code for paper PGT: A Progressive Method for Training Models on Long Videos. Install Run pip install -r requirements.txt. Run python setup.py buil

This is the official code of our paper
This is the official code of our paper "Diversity-based Trajectory and Goal Selection with Hindsight Experience Relay" (PRICAI 2021)

Diversity-based Trajectory and Goal Selection with Hindsight Experience Replay This is the official implementation of our paper "Diversity-based Traje

The official code for paper "R2D2: Recursive Transformer based on Differentiable Tree for Interpretable Hierarchical Language Modeling".

R2D2 This is the official code for paper titled "R2D2: Recursive Transformer based on Differentiable Tree for Interpretable Hierarchical Language Mode

Official repository with code and data accompanying the NAACL 2021 paper "Hurdles to Progress in Long-form Question Answering" (https://arxiv.org/abs/2103.06332).

Hurdles to Progress in Long-form Question Answering This repository contains the official scripts and datasets accompanying our NAACL 2021 paper, "Hur

Comments
  • CVE-2007-4559 Patch

    CVE-2007-4559 Patch

    Patching CVE-2007-4559

    Hi, we are security researchers from the Advanced Research Center at Trellix. We have began a campaign to patch a widespread bug named CVE-2007-4559. CVE-2007-4559 is a 15 year old bug in the Python tarfile package. By using extract() or extractall() on a tarfile object without sanitizing input, a maliciously crafted .tar file could perform a directory path traversal attack. We found at least one unsantized extractall() in your codebase and are providing a patch for you via pull request. The patch essentially checks to see if all tarfile members will be extracted safely and throws an exception otherwise. We encourage you to use this patch or your own solution to secure against CVE-2007-4559. Further technical information about the vulnerability can be found in this blog.

    If you have further questions you may contact us through this projects lead researcher Kasimir Schulz.

    opened by TrellixVulnTeam 0
  • Possible bug?

    Possible bug?

    class DynamicDWConv(nn.Module):
        def __init__(self, dim, kernel_size, bias=True, stride=1, padding=1, groups=1, reduction=4):
            super().__init__()
            self.dim = dim
            self.kernel_size = kernel_size
            self.stride = stride 
            self.padding = padding 
            self.groups = groups 
    
            self.pool = nn.AdaptiveAvgPool2d((1, 1))
            self.conv1 = nn.Conv2d(dim, dim // reduction, 1, bias=False)
            self.bn = nn.BatchNorm2d(dim // reduction)
            self.relu = nn.ReLU(inplace=True)
            self.conv2 = nn.Conv2d(dim // reduction, dim * kernel_size * kernel_size, 1)
            if bias:
                self.bias = nn.Parameter(torch.zeros(dim))
            else:
                self.bias = None
    
        def forward(self, x):
            b, c, h, w = x.shape
            weight = self.conv2(self.relu(self.bn(self.conv1(self.pool(x)))))
            weight = weight.view(b * self.dim, 1, self.kernel_size, self.kernel_size)
            x = F.conv2d(x.reshape(1, -1, h, w), weight, self.bias.repeat(b), stride=self.stride, padding=self.padding, groups=b * self.groups)
            x = x.view(b, c, x.shape[-2], x.shape[-1])
            return x
    

    This function seems to give error when groups is not equal to dim.

    opened by yxchng 0
Owner
Attention for Vision and Visualization
Official implementation of EdiTTS: Score-based Editing for Controllable Text-to-Speech

EdiTTS: Score-based Editing for Controllable Text-to-Speech Official implementation of EdiTTS: Score-based Editing for Controllable Text-to-Speech. Au

Neosapience 98 Dec 25, 2022
DGCNN - Dynamic Graph CNN for Learning on Point Clouds

DGCNN is the author's re-implementation of Dynamic Graph CNN, which achieves state-of-the-art performance on point-cloud-related high-level tasks including category classification, semantic segmentat

Wang, Yue 1.3k Dec 26, 2022
Deep Learning GPU Training System

DIGITS DIGITS (the Deep Learning GPU Training System) is a webapp for training deep learning models. The currently supported frameworks are: Caffe, To

NVIDIA Corporation 4.1k Jan 03, 2023
ML course - EPFL Machine Learning Course, Fall 2021

EPFL Machine Learning Course CS-433 Machine Learning Course, Fall 2021 Repository for all lecture notes, labs and projects - resources, code templates

EPFL Machine Learning and Optimization Laboratory 1k Jan 04, 2023
A collection of implementations of deep domain adaptation algorithms

Deep Transfer Learning on PyTorch This is a PyTorch library for deep transfer learning. We divide the code into two aspects: Single-source Unsupervise

Yongchun Zhu 647 Jan 03, 2023
A platform for intelligent agent learning based on a 3D open-world FPS game developed by Inspir.AI.

Wilderness Scavenger: 3D Open-World FPS Game AI Challenge This is a platform for intelligent agent learning based on a 3D open-world FPS game develope

46 Nov 24, 2022
Multimodal Descriptions of Social Concepts: Automatic Modeling and Detection of (Highly Abstract) Social Concepts evoked by Art Images

MUSCO - Multimodal Descriptions of Social Concepts Automatic Modeling of (Highly Abstract) Social Concepts evoked by Art Images This project aims to i

0 Aug 22, 2021
Code for Temporally Abstract Partial Models

Code for Temporally Abstract Partial Models Accompanies the code for the experimental section of the paper: Temporally Abstract Partial Models, Khetar

DeepMind 19 Jul 13, 2022
The Easy-to-use Dialogue Response Selection Toolkit for Researchers

Easy-to-use toolkit for retrieval-based Chatbot Recent Activity Our released RRS corpus can be found here. Our released BERT-FP post-training checkpoi

GMFTBY 32 Nov 13, 2022
ColBERT: Contextualized Late Interaction over BERT (SIGIR'20)

Update: if you're looking for ColBERTv2 code, you can find it alongside a new simpler API, in the branch new_api. ColBERT ColBERT is a fast and accura

Stanford Future Data Systems 637 Jan 08, 2023
Implementation of SiameseXML (ICML 2021)

SiameseXML Code for SiameseXML: Siamese networks meet extreme classifiers with 100M labels Best Practices for features creation Adding sub-words on to

Extreme Classification 35 Nov 06, 2022
Demo code for ICCV 2021 paper "Sensor-Guided Optical Flow"

Sensor-Guided Optical Flow Demo code for "Sensor-Guided Optical Flow", ICCV 2021 This code is provided to replicate results with flow hints obtained f

10 Mar 16, 2022
TPH-YOLOv5: Improved YOLOv5 Based on Transformer Prediction Head for Object Detection on Drone-Captured Scenarios

TPH-YOLOv5 This repo is the implementation of "TPH-YOLOv5: Improved YOLOv5 Based on Transformer Prediction Head for Object Detection on Drone-Captured

cv516Buaa 439 Dec 22, 2022
Baleen: Robust Multi-Hop Reasoning at Scale via Condensed Retrieval (NeurIPS'21)

Baleen Baleen is a state-of-the-art model for multi-hop reasoning, enabling scalable multi-hop search over massive collections for knowledge-intensive

Stanford Future Data Systems 22 Dec 05, 2022
Implementation of "Selection via Proxy: Efficient Data Selection for Deep Learning" from ICLR 2020.

Selection via Proxy: Efficient Data Selection for Deep Learning This repository contains a refactored implementation of "Selection via Proxy: Efficien

Stanford Future Data Systems 70 Nov 16, 2022
lightweight python wrapper for vowpal wabbit

vowpal_porpoise Lightweight python wrapper for vowpal_wabbit. Why: Scalable, blazingly fast machine learning. Install Install vowpal_wabbit. Clone and

Joseph Reisinger 163 Nov 24, 2022
Built a deep neural network (DNN) that functions as an end-to-end machine translation pipeline

Built a deep neural network (DNN) that functions as an end-to-end machine translation pipeline. The pipeline accepts english text as input and returns the French translation.

Afropunk Technologist 1 Jan 24, 2022
Continuous Conditional Random Field Convolution for Point Cloud Segmentation

CRFConv This repository is the implementation of "Continuous Conditional Random Field Convolution for Point Cloud Segmentation" 1. Setup 1) Building c

Fei Yang 8 Dec 08, 2022
A deep learning framework for historical document image analysis

DIVA-DAF Description A deep learning framework for historical document image analysis. How to run Install dependencies # clone project git clone https

9 Aug 04, 2022
Pytorch implementation for "Distribution-Balanced Loss for Multi-Label Classification in Long-Tailed Datasets" (ECCV 2020 Spotlight)

Distribution-Balanced Loss [Paper] The implementation of our paper Distribution-Balanced Loss for Multi-Label Classification in Long-Tailed Datasets (

Tong WU 304 Dec 22, 2022