Code for "Searching for Efficient Multi-Stage Vision Transformers"

Overview

Searching for Efficient Multi-Stage Vision Transformers

This repository contains the official Pytorch implementation of "Searching for Efficient Multi-Stage Vision Transformers" and is based on DeiT and timm.

photo not available

Illustration of the proposed multi-stage ViT-Res network.


photo not available

Illustration of weight-sharing neural architecture search with multi-architectural sampling.


photo not available

Accuracy-MACs trade-offs of the proposed ViT-ResNAS. Our networks achieves comparable results to previous work.

Content

  1. Requirements
  2. Data Preparation
  3. Pre-Trained Models
  4. Training ViT-Res
  5. Performing Neural Architecture Search
  6. Evaluation

Requirements

The codebase is tested with 8 V100 (16GB) GPUs.

To install requirements:

    pip install -r requirements.txt

Docker files are provided to set up the environment. Please run:

    cd docker

    sh 1_env_setup.sh
    
    sh 2_build_docker_image.sh
    
    sh 3_run_docker_image.sh

Make sure that the configuration specified in 3_run_docker_image.sh is correct before running the command.

Data Preparation

Download and extract ImageNet train and val images from http://image-net.org/. The directory structure is the standard layout for the torchvision datasets.ImageFolder, and the training and validation data is expected to be in the train/ folder and val folder respectively:

/path/to/imagenet/
  train/
    class1/
      img1.jpeg
    class2/
      img2.jpeg
  val/
    class1/
      img3.jpeg
    class/2
      img4.jpeg

Pre-Trained Models

Pre-trained weights of super-networks and searched networks can be found here.

Training ViT-Res

To train ViT-Res-Tiny, modify IMAGENET_PATH in scripts/vit-sr-nas/reference_net/tiny.sh and run:

    sh scripts/vit-sr-nas/reference_net/tiny.sh 

We use 8 GPUs for training. Please modify numbers of GPUs (--nproc_per_node) and adjust batch size (--batch-size) if different numbers of GPUs are used.

Performing Neural Architecture Search

0. Building Sub-Train and Sub-Val Set

Modify _SOURCE_DIR, _SUB_TRAIN_DIR, and _SUB_VAL_DIR in search_utils/build_subset.py, and run:

    cd search_utils
    
    python build_subset.py
    
    cd ..

1. Super-Network Training

Before running each script, modify IMAGENET_PATH (directed to the directory containing the sub-train and sub-val sets).

For ViT-ResNAS-Tiny, run:

    sh scripts/vit-sr-nas/super_net/tiny.sh

For ViT-ResNAS-Small and Medium, run:

    sh scripts/vit-sr-nas/super_net/small.sh

2. Evolutionary Search

Before running each script, modify IMAGENET_PATH (directed to the directory containing the sub-train and sub-val sets) and MODEL_PATH.

For ViT-ResNAS-Tiny, run:

    sh scripts/vit-sr-nas/evolutionary_search/tiny.sh

For ViT-ResNAS-Small, run:

    sh scripts/vit-sr-nas/evolutionary_search/[email protected]

For ViT-ResNAS-Medium, run:

    sh scripts/vit-sr-nas/evolutionary_search/[email protected]

After running evolutionary search for each network, see summary.txt in output directory and modify network_def.

For example, the network_def in summary.txt is ((4, 220), (1, (220, 5, 32), (220, 880), 1), (1, (220, 5, 32), (220, 880), 1), (1, (220, 7, 32), (220, 800), 1), (1, (220, 7, 32), (220, 800), 0), (1, (220, 5, 32), (220, 720), 1), (1, (220, 5, 32), (220, 720), 1), (1, (220, 5, 32), (220, 720), 1), (3, 220, 440), (1, (440, 10, 48), (440, 1760), 1), (1, (440, 10, 48), (440, 1440), 1), (1, (440, 10, 48), (440, 1920), 1), (1, (440, 10, 48), (440, 1600), 1), (1, (440, 12, 48), (440, 1600), 1), (1, (440, 12, 48), (440, 1120), 0), (1, (440, 12, 48), (440, 1440), 1), (3, 440, 880), (1, (880, 16, 64), (880, 3200), 1), (1, (880, 12, 64), (880, 3200), 1), (1, (880, 16, 64), (880, 2880), 1), (1, (880, 12, 64), (880, 3200), 0), (1, (880, 12, 64), (880, 2240), 1), (1, (880, 12, 64), (880, 3520), 0), (1, (880, 14, 64), (880, 2560), 1), (2, 880, 1000)).

Remove the element in the tuple that has 1 in the first element and 0 in the last element (e.g. (1, (220, 5, 32), (220, 880), 0)).

This reflects that the transformer block is removed in a searched network.

After this modification, the network_def becomes ((4, 220), (1, (220, 5, 32), (220, 880), 1), (1, (220, 5, 32), (220, 880), 1), (1, (220, 7, 32), (220, 800), 1), (1, (220, 5, 32), (220, 720), 1), (1, (220, 5, 32), (220, 720), 1), (1, (220, 5, 32), (220, 720), 1), (3, 220, 440), (1, (440, 10, 48), (440, 1760), 1), (1, (440, 10, 48), (440, 1440), 1), (1, (440, 10, 48), (440, 1920), 1), (1, (440, 10, 48), (440, 1600), 1), (1, (440, 12, 48), (440, 1600), 1), (1, (440, 12, 48), (440, 1440), 1), (3, 440, 880), (1, (880, 16, 64), (880, 3200), 1), (1, (880, 12, 64), (880, 3200), 1), (1, (880, 16, 64), (880, 2880), 1), (1, (880, 12, 64), (880, 2240), 1), (1, (880, 14, 64), (880, 2560), 1), (2, 880, 1000)).

Then, use the searched network_def for searched network training.

3. Searched Network Training

Before running each script, modify IMAGENET_PATH.

For ViT-ResNAS-Tiny, run:

    sh scripts/vit-sr-nas/searched_net/tiny.sh

For ViT-ResNAS-Small, run:

    sh scripts/vit-sr-nas/searched_net/[email protected]

For ViT-ResNAS-Medium, run:

    sh scripts/vit-sr-nas/searched_net/[email protected]

4. Fine-tuning Trained Networks at Higher Resolution

Before running, modify IMAGENET_PATH and FINETUNE_PATH (directed to trained ViT-ResNAS-Medium checkpoint). Then, run:

    sh scripts/vit-sr-nas/finetune/[email protected]

To fine-tune at different resolutions, modify --model, --input-size and --mix-patch-len. We provide models at resolutions 280, 336, and 392 as shown in here. Note that --input-size must be equal to "56 * --mix-patch-len" since the spatial size in ViT-ResNAS is reduced by 56X.

Evaluation

Before running, modify IMAGENET_PATH and MODEL_PATH. Then, run:

    sh scripts/vit-sr-nas/eval/[email protected]

Questions

Please direct questions to Yi-Lun Liao ([email protected]).

License

This repository is released under the CC-BY-NC 4.0. license as found in the LICENSE file.

Owner
Yi-Lun Liao
Yi-Lun Liao
PyTorch/GPU re-implementation of the paper Masked Autoencoders Are Scalable Vision Learners

Masked Autoencoders: A PyTorch Implementation This is a PyTorch/GPU re-implementation of the paper Masked Autoencoders Are Scalable Vision Learners: @

Meta Research 4.8k Jan 04, 2023
95.47% on CIFAR10 with PyTorch

Train CIFAR10 with PyTorch I'm playing with PyTorch on the CIFAR10 dataset. Prerequisites Python 3.6+ PyTorch 1.0+ Training # Start training with: py

5k Dec 30, 2022
Audio2Face - Audio To Face With Python

Audio2Face Discription We create a project that transforms audio to blendshape w

FACEGOOD 724 Dec 26, 2022
Official Pytorch implementation for 2021 ICCV paper "Learning Motion Priors for 4D Human Body Capture in 3D Scenes" and trained models / data

Learning Motion Priors for 4D Human Body Capture in 3D Scenes (LEMO) Official Pytorch implementation for 2021 ICCV (oral) paper "Learning Motion Prior

165 Dec 19, 2022
Unofficial & improved implementation of NeRF--: Neural Radiance Fields Without Known Camera Parameters

[Unofficial code-base] NeRF--: Neural Radiance Fields Without Known Camera Parameters [ Project | Paper | Official code base ] ⬅️ Thanks the original

Jianfei Guo 239 Dec 22, 2022
Federated Deep Reinforcement Learning for the Distributed Control of NextG Wireless Networks.

FDRL-PC-Dyspan Federated Deep Reinforcement Learning for the Distributed Control of NextG Wireless Networks. This repository contains the entire code

Peyman Tehrani 17 Nov 18, 2022
Source code for ZePHyR: Zero-shot Pose Hypothesis Rating @ ICRA 2021

ZePHyR: Zero-shot Pose Hypothesis Rating ZePHyR is a zero-shot 6D object pose estimation pipeline. The core is a learned scoring function that compare

R-Pad - Robots Perceiving and Doing 18 Aug 22, 2022
AntiFuzz: Impeding Fuzzing Audits of Binary Executables

AntiFuzz: Impeding Fuzzing Audits of Binary Executables Get the paper here: https://www.usenix.org/system/files/sec19-guler.pdf Usage: The python scri

Chair for Sys­tems Se­cu­ri­ty 88 Dec 21, 2022
A PyTorch implementation of "Predict then Propagate: Graph Neural Networks meet Personalized PageRank" (ICLR 2019).

APPNP ⠀ A PyTorch implementation of Predict then Propagate: Graph Neural Networks meet Personalized PageRank (ICLR 2019). Abstract Neural message pass

Benedek Rozemberczki 329 Dec 30, 2022
Official code of ICCV2021 paper "Residual Attention: A Simple but Effective Method for Multi-Label Recognition"

CSRA This is the official code of ICCV 2021 paper: Residual Attention: A Simple But Effective Method for Multi-Label Recoginition Demo, Train and Vali

163 Dec 22, 2022
This implements the learning and inference/proposal algorithm described in "Learning to Propose Objects, Krähenbühl and Koltun"

Learning to propose objects This implements the learning and inference/proposal algorithm described in "Learning to Propose Objects, Krähenbühl and Ko

Philipp Krähenbühl 90 Sep 10, 2021
Link prediction using Multiple Order Local Information (MOLI)

Understanding the network formation pattern for better link prediction Authors: [e

Wu Lab 0 Oct 18, 2021
A paper using optimal transport to solve the graph matching problem.

GOAT A paper using optimal transport to solve the graph matching problem. https://arxiv.org/abs/2111.05366 Repo structure .github: Files specifying ho

neurodata 8 Jan 04, 2023
ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators

ELECTRA Introduction ELECTRA is a method for self-supervised language representation learning. It can be used to pre-train transformer networks using

Google Research 2.1k Dec 28, 2022
Bootstrapped Unsupervised Sentence Representation Learning (ACL 2021)

Install first pip3 install -e . Training python3 training/unsupervised_tuning.py python3 training/supervised_tuning.py python3 training/multilingual_

yanzhang_nlp 26 Jul 22, 2022
NAVER BoostCamp Final Project

CV 14조 final project Super Resolution and Deblur module Inference code & Pretrained weight Repo SwinIR Deblur 실행 방법 streamlit run WebServer/Server_SRD

JiSeong Kim 5 Sep 06, 2022
Airborne Optical Sectioning (AOS) is a wide synthetic-aperture imaging technique

AOS: Airborne Optical Sectioning Airborne Optical Sectioning (AOS) is a wide synthetic-aperture imaging technique that employs manned or unmanned airc

JKU Linz, Institute of Computer Graphics 39 Dec 09, 2022
Official PyTorch implementation of the preprint paper "Stylized Neural Painting", accepted to CVPR 2021.

Official PyTorch implementation of the preprint paper "Stylized Neural Painting", accepted to CVPR 2021.

Zhengxia Zou 1.5k Dec 28, 2022
[3DV 2020] PeeledHuman: Robust Shape Representation for Textured 3D Human Body Reconstruction

PeeledHuman: Robust Shape Representation for Textured 3D Human Body Reconstruction International Conference on 3D Vision, 2020 Sai Sagar Jinka1, Rohan

Rohan Chacko 39 Oct 12, 2022
UnsupervisedR&R: Unsupervised Pointcloud Registration via Differentiable Rendering

UnsupervisedR&R: Unsupervised Pointcloud Registration via Differentiable Rendering This repository holds all the code and data for our recent work on

Mohamed El Banani 118 Dec 06, 2022