A collection of SOTA Image Classification Models in PyTorch

Overview

SOTA Image Classification Models in PyTorch

Intended for easy to use and integrate SOTA image classification models into object detection, semantic segmentation, pose estimation, etc.

Open In Colab

visiontransformer

Model Zoo

Model ImageNet-1k Top-1 Acc
(%)
Params
(M)
GFLOPs Variants & Weights
MicroNet 51.4|59.4|62.5 2|2|3 6M|12M|21M M1|M2|M3
MobileFormer 76.7|77.9|79.3 9|11|14 214M|294M|508M 214|294|508
GFNet 80.1|81.5|82.9 15|32|54 2|5|8 T|S|B
PVTv2 78.7|82.0|83.6 14|25|63 2|4|10 B1|B2|B4
ResT 79.6|81.6|83.6 14|30|52 2|4|8 S|B|L
Conformer 81.3|83.4|84.1 24|38|83 5|11|23 T|S|B
Shuffle 82.4|83.6|84.0 28|50|88 5|9|16 T|S|B
CSWin 82.7|83.6|84.2 23|35|78 4|7|15 T|S|B
CycleMLP 81.6|83.0|83.2 27|52|76 4|10|12 B2|B4|B5
HireMLP 81.8|83.1|83.4 33|58|96 4|8|14 S|B|L
sMLP 81.9|83.1|83.4 24|49|66 5|10|14 T|S|B
XCiT 80.4|83.9|84.3 12|48|84 2|9|16 T|S|M
VOLO 84.2|85.2|85.4 27|59|86 7|14|21 D1|D2|D3
Table Notes
  • Image size is 224x224. EfficientNetv2 uses progressive learning (image size from 128 to 380).
  • All models' weights are from official repositories.
  • Only models trained on ImageNet1k are compared.
  • (Parameters > 200M) Models are not included.
  • PVTv2, ResT, Conformer, XCiT and CycleMLP models work with any image size.

Usage

Requirements (click to expand)
  • python >= 3.6
  • torch >= 1.8.1
  • torchvision >= 0.9.1

Other requirements can be installed with pip install -r requirements.txt.


Show Available Models
$ python tools/show.py

A table with model names and variants will be shown:

Model Names    Model Variants
-------------  --------------------------------
ResNet         ['18', '34', '50', '101', '152']
MicroNet       ['M1', 'M2', 'M3']
GFNet          ['T', 'S', 'B']
PVTv2          ['B1', 'B2', 'B3', 'B4', 'B5']
ResT           ['S', 'B', 'L']
Conformer      ['T', 'S', 'B']
Shuffle        ['T', 'S', 'B']
CSWin          ['T', 'S', 'B', 'L']
CycleMLP       ['B1', 'B2', 'B3', 'B4', 'B5']
XciT           ['T', 'S', 'M', 'L']
VOLO           ['D1', 'D2', 'D3', 'D4']
Inference
  • Download your desired model's weights from Model Zoo table.
  • Change MODEL parameters and TEST parameters in config file here. And run the the following command.
$ python tools/infer.py --cfg configs/test.yaml

You will see an output similar to this:

File: assests\dog.jpg >>>>> Golden retriever

Training (click to expand)
$ python tools/train.py --cfg configs/train.yaml

Evaluate (click to expand)
$ python tools/val.py --cfg configs/train.yaml

Fine-tune (click to expand)

Fine-tune on CIFAR-10:

$ python tools/finetune.py --cfg configs/finetune.yaml

References (click to expand)

Citations (click to expand)
@article{zhql2021ResT,
  title={ResT: An Efficient Transformer for Visual Recognition},
  author={Zhang, Qinglong and Yang, Yubin},
  journal={arXiv preprint arXiv:2105.13677v3},
  year={2021}
}

@article{peng2021conformer,
  title={Conformer: Local Features Coupling Global Representations for Visual Recognition}, 
  author={Zhiliang Peng and Wei Huang and Shanzhi Gu and Lingxi Xie and Yaowei Wang and Jianbin Jiao and Qixiang Ye},
  journal={arXiv preprint arXiv:2105.03889},
  year={2021},
}

@misc{dong2021cswin,
  title={CSWin Transformer: A General Vision Transformer Backbone with Cross-Shaped Windows}, 
  author={Xiaoyi Dong and Jianmin Bao and Dongdong Chen and Weiming Zhang and Nenghai Yu and Lu Yuan and Dong Chen and Baining Guo},
  year={2021},
  eprint={2107.00652},
  archivePrefix={arXiv},
  primaryClass={cs.CV}
}

@misc{chen2021cyclemlp,
  title={CycleMLP: A MLP-like Architecture for Dense Prediction}, 
  author={Shoufa Chen and Enze Xie and Chongjian Ge and Ding Liang and Ping Luo},
  year={2021},
  eprint={2107.10224},
  archivePrefix={arXiv},
  primaryClass={cs.CV}
}

@misc{wang2021pvtv2,
  title={PVTv2: Improved Baselines with Pyramid Vision Transformer}, 
  author={Wenhai Wang and Enze Xie and Xiang Li and Deng-Ping Fan and Kaitao Song and Ding Liang and Tong Lu and Ping Luo and Ling Shao},
  year={2021},
  eprint={2106.13797},
  archivePrefix={arXiv},
  primaryClass={cs.CV}
}

@misc{elnouby2021xcit,
  title={XCiT: Cross-Covariance Image Transformers}, 
  author={Alaaeldin El-Nouby and Hugo Touvron and Mathilde Caron and Piotr Bojanowski and Matthijs Douze and Armand Joulin and Ivan Laptev and Natalia Neverova and Gabriel Synnaeve and Jakob Verbeek and Hervé Jegou},
  year={2021},
  eprint={2106.09681},
  archivePrefix={arXiv},
  primaryClass={cs.CV}
}

@misc{yuan2021volo,
  title={VOLO: Vision Outlooker for Visual Recognition}, 
  author={Li Yuan and Qibin Hou and Zihang Jiang and Jiashi Feng and Shuicheng Yan},
  year={2021},
  eprint={2106.13112},
  archivePrefix={arXiv},
  primaryClass={cs.CV}
}

@misc{yan2020micronet,
  title={MicroNet for Efficient Language Modeling}, 
  author={Zhongxia Yan and Hanrui Wang and Demi Guo and Song Han},
  year={2020},
  eprint={2005.07877},
  archivePrefix={arXiv},
  primaryClass={cs.CL}
}

@misc{chen2021mobileformer,
  title={Mobile-Former: Bridging MobileNet and Transformer}, 
  author={Yinpeng Chen and Xiyang Dai and Dongdong Chen and Mengchen Liu and Xiaoyi Dong and Lu Yuan and Zicheng Liu},
  year={2021},
  eprint={2108.05895},
  archivePrefix={arXiv},
  primaryClass={cs.CV}
}

@article{rao2021global,
  title={Global Filter Networks for Image Classification},
  author={Rao, Yongming and Zhao, Wenliang and Zhu, Zheng and Lu, Jiwen and Zhou, Jie},
  journal={arXiv preprint arXiv:2107.00645},
  year={2021}
}

@article{huang2021shuffle,
  title={Shuffle Transformer: Rethinking Spatial Shuffle for Vision Transformer},
  author={Huang, Zilong and Ben, Youcheng and Luo, Guozhong and Cheng, Pei and Yu, Gang and Fu, Bin},
  journal={arXiv preprint arXiv:2106.03650},
  year={2021}
}

You might also like...
Quickly comparing your image classification models with the state-of-the-art models (such as DenseNet, ResNet, ...)
Quickly comparing your image classification models with the state-of-the-art models (such as DenseNet, ResNet, ...)

Image Classification Project Killer in PyTorch This repo is designed for those who want to start their experiments two days before the deadline and ki

Optimizing DR with hard negatives and achieving SOTA first-stage retrieval performance on TREC DL Track (SIGIR 2021 Full Paper).
Optimizing DR with hard negatives and achieving SOTA first-stage retrieval performance on TREC DL Track (SIGIR 2021 Full Paper).

Optimizing Dense Retrieval Model Training with Hard Negatives Jingtao Zhan, Jiaxin Mao, Yiqun Liu, Jiafeng Guo, Min Zhang, Shaoping Ma This repo provi

SOTA model in CIFAR10
SOTA model in CIFAR10

A PyTorch Implementation of CIFAR Tricks 调研了CIFAR10数据集上各种trick,数据增强,正则化方法,并进行了实现。目前项目告一段落,如果有更好的想法,或者希望一起维护这个项目可以提issue或者在我的主页找到我的联系方式。 0. Requirement

A toolkit for document-level event extraction, containing some SOTA model implementations
A toolkit for document-level event extraction, containing some SOTA model implementations

❤️ A Toolkit for Document-level Event Extraction with & without Triggers Hi, there 👋 . Thanks for your stay in this repo. This project aims at buildi

Collection of generative models, e.g. GAN, VAE in Pytorch and Tensorflow.

Generative Models Collection of generative models, e.g. GAN, VAE in Pytorch and Tensorflow. Also present here are RBM and Helmholtz Machine. Note: Gen

Collection of generative models in Pytorch version.
Collection of generative models in Pytorch version.

pytorch-generative-model-collections Original : [Tensorflow version] Pytorch implementation of various GANs. This repository was re-implemented with r

PyTorch implementation and pretrained models for XCiT models. See XCiT: Cross-Covariance Image Transformer Implement face detection, and age and gender classification, and emotion classification.
Implement face detection, and age and gender classification, and emotion classification.

YOLO Keras Face Detection Implement Face detection, and Age and Gender Classification, and Emotion Classification. (image from wider face dataset) Ove

Hl classification bc - A Network-Based High-Level Data Classification Algorithm Using Betweenness Centrality
Hl classification bc - A Network-Based High-Level Data Classification Algorithm Using Betweenness Centrality

A Network-Based High-Level Data Classification Algorithm Using Betweenness Centr

Releases(v0.2.0)
Owner
sithu3
AI Developer
sithu3
A library that can print Python objects in human readable format

objprint A library that can print Python objects in human readable format Install pip install objprint Usage op Use op() (or objprint()) to print obj

319 Dec 25, 2022
The final project of "Applying AI to EHR Data" of "AI for Healthcare" nanodegree - Udacity.

Patient Selection for Diabetes Drug Testing Project Overview EHR data is becoming a key source of real-world evidence (RWE) for the pharmaceutical ind

Omar Laham 1 Jan 14, 2022
Pytorch implementation of CVPR2020 paper “VectorNet: Encoding HD Maps and Agent Dynamics from Vectorized Representation”

VectorNet Re-implementation This is the unofficial pytorch implementation of CVPR2020 paper "VectorNet: Encoding HD Maps and Agent Dynamics from Vecto

120 Jan 06, 2023
null

DeformingThings4D dataset Video | Paper DeformingThings4D is an synthetic dataset containing 1,972 animation sequences spanning 31 categories of human

208 Jan 03, 2023
Real-world Anomaly Detection in Surveillance Videos- pytorch Re-implementation

Real world Anomaly Detection in Surveillance Videos : Pytorch RE-Implementation This repository is a re-implementation of "Real-world Anomaly Detectio

seominseok 62 Dec 08, 2022
Implementation of the federated dual coordinate descent (FedDCD) method.

FedDCD.jl Implementation of the federated dual coordinate descent (FedDCD) method. Installation To install, just call Pkg.add("https://github.com/Zhen

Zhenan Fan 6 Sep 21, 2022
Decision Transformer: A brand new Offline RL Pattern

DecisionTransformer_StepbyStep Intro Decision Transformer: A brand new Offline RL Pattern. 这是关于NeurIPS 2021 热门论文Decision Transformer的复现。 👍 原文地址: Deci

Irving 14 Nov 22, 2022
Repositório para arquivos sobre o Módulo 1 do curso Top Coders da Let's Code + Safra

850-Safra-DS-ModuloI Repositório para arquivos sobre o Módulo 1 do curso Top Coders da Let's Code + Safra Para aprender mais Git https://learngitbranc

Brian Nunes 7 Dec 10, 2022
An ML & Correlation platform for transforming disparate data points of interest into usable intelligence.

SSIDprobeCollector An ML & Correlation platform for transforming disparate data points of interest into usable intelligence. At a High level the platf

Bill Reyor 1 Jan 30, 2022
Interactive dimensionality reduction for large datasets

BlosSOM 🌼 BlosSOM is a graphical environment for running semi-supervised dimensionality reduction with EmbedSOM. You can use it to explore multidimen

19 Dec 14, 2022
A very tiny, very simple, and very secure file encryption tool.

Picocrypt is a very tiny (hence "Pico"), very simple, yet very secure file encryption tool. It uses the modern ChaCha20-Poly1305 cipher suite as well

Evan Su 1k Dec 30, 2022
PyTorch code for our ECCV 2018 paper "Image Super-Resolution Using Very Deep Residual Channel Attention Networks"

PyTorch code for our ECCV 2018 paper "Image Super-Resolution Using Very Deep Residual Channel Attention Networks"

Yulun Zhang 1.2k Dec 26, 2022
Accurate identification of bacteriophages from metagenomic data using Transformer

PhaMer is a python library for identifying bacteriophages from metagenomic data. PhaMer is based on a Transorfer model and rely on protein-based vocab

Kenneth Shang 9 Nov 30, 2022
Official PyTorch implementation of MX-Font (Multiple Heads are Better than One: Few-shot Font Generation with Multiple Localized Experts)

Introduction Pytorch implementation of Multiple Heads are Better than One: Few-shot Font Generation with Multiple Localized Expert. | paper Song Park1

Clova AI Research 97 Dec 23, 2022
PyTorch GPU implementation of the ES-RNN model for time series forecasting

Fast ES-RNN: A GPU Implementation of the ES-RNN Algorithm A GPU-enabled version of the hybrid ES-RNN model by Slawek et al that won the M4 time-series

Kaung 305 Jan 03, 2023
Code repo for "Cross-Scale Internal Graph Neural Network for Image Super-Resolution" (NeurIPS'20)

IGNN Code repo for "Cross-Scale Internal Graph Neural Network for Image Super-Resolution" [paper] [supp] Prepare datasets 1 Download training dataset

Shangchen Zhou 278 Jan 03, 2023
Evidential Softmax for Sparse Multimodal Distributions in Deep Generative Models

Evidential Softmax for Sparse Multimodal Distributions in Deep Generative Models Abstract Many applications of generative models rely on the marginali

Stanford Intelligent Systems Laboratory 9 Jun 06, 2022
In the AI for TSP competition we try to solve optimization problems using machine learning.

AI for TSP Competition Goal In the AI for TSP competition we try to solve optimization problems using machine learning. The competition will be hosted

Paulo da Costa 11 Nov 27, 2022
Generative vs Discriminative: Rethinking The Meta-Continual Learning (NeurIPS 2021)

Generative vs Discriminative: Rethinking The Meta-Continual Learning (NeurIPS 2021) In this repository we provide PyTorch implementations for GeMCL; a

4 Apr 15, 2022
Implementation of ETSformer, state of the art time-series Transformer, in Pytorch

ETSformer - Pytorch Implementation of ETSformer, state of the art time-series Transformer, in Pytorch Install $ pip install etsformer-pytorch Usage im

Phil Wang 121 Dec 30, 2022