This repository contains an implementation of ConvMixer for the ICLR 2022 submission "Patches Are All You Need?".

Overview

Patches Are All You Need? 🤷

This repository contains an implementation of ConvMixer for the ICLR 2022 submission "Patches Are All You Need?".

Code overview

The most important code is in convmixer.py. We trained ConvMixers using the timm framework, which we copied from here.

Update: ConvMixer is now integrated into the timm framework itself. You can see the PR here.

Inside pytorch-image-models, we have made the following modifications. (Though one could look at the diff, we think it is convenient to summarize them here.)

  • Added ConvMixers
    • added timm/models/convmixer.py
    • modified timm/models/__init__.py
  • Added "OneCycle" LR Schedule
    • added timm/scheduler/onecycle_lr.py
    • modified timm/scheduler/scheduler.py
    • modified timm/scheduler/scheduler_factory.py
    • modified timm/scheduler/__init__.py
    • modified train.py (added two lines to support this LR schedule)

We are confident that the use of the OneCycle schedule here is not critical, and one could likely just as well train ConvMixers with the built-in cosine schedule.

Evaluation

We provide some model weights below:

Model Name Kernel Size Patch Size File Size
ConvMixer-1536/20 9 7 207MB
ConvMixer-768/32* 7 7 85MB
ConvMixer-1024/20 9 14 98MB

* Important: ConvMixer-768/32 here uses ReLU instead of GELU, so you would have to change convmixer.py accordingly (we will fix this later).

You can evaluate ConvMixer-1536/20 as follows:

python validate.py --model convmixer_1536_20 --b 64 --num-classes 1000 --checkpoint [/path/to/convmixer_1536_20_ks9_p7.pth.tar] [/path/to/ImageNet1k-val]

You should get a 81.37% accuracy.

Training

If you had a node with 10 GPUs, you could train a ConvMixer-1536/20 as follows (these are exactly the settings we used):

sh distributed_train.sh 10 [/path/to/ImageNet1k] 
    --train-split [your_train_dir] 
    --val-split [your_val_dir] 
    --model convmixer_1536_20 
    -b 64 
    -j 10 
    --opt adamw 
    --epochs 150 
    --sched onecycle 
    --amp 
    --input-size 3 224 224
    --lr 0.01 
    --aa rand-m9-mstd0.5-inc1 
    --cutmix 0.5 
    --mixup 0.5 
    --reprob 0.25 
    --remode pixel 
    --num-classes 1000 
    --warmup-epochs 0 
    --opt-eps=1e-3 
    --clip-grad 1.0

We also included a ConvMixer-768/32 in timm/models/convmixer.py (though it is simple to add more ConvMixers). We trained that one with the above settings but with 300 epochs instead of 150 epochs.

In the near future, we will upload weights.

The tweetable version of ConvMixer, which requires from torch.nn import *:

def ConvMixr(h,d,k,p,n):
 S,C,A=Sequential,Conv2d,lambda x:S(x,GELU(),BatchNorm2d(h))
 R=type('',(S,),{'forward':lambda s,x:s[0](x)+x})
 return S(A(C(3,h,p,p)),*[S(R(A(C(h,h,k,groups=h,padding=k//2))),A(C(h,h,1))) for i in range(d)],AdaptiveAvgPool2d((1,1)),Flatten(),Linear(h,n))
Comments
  • Cifar10 baseline doesn't reach 95%

    Cifar10 baseline doesn't reach 95%

    Hello, I tried convmixer256 on Cifar-10 with the same timm options specified for ImageNet (except the num_classes) and it doesn't go beyond 90% accuracy. Could you please specify the options used for Cifar-10 experiment ?

    opened by K-H-Ismail 13
  • What's new about this model?

    What's new about this model?

    Why “patches” are all you need? Patch embedding is Conv7x7 stem, The body is simply repeated Conv9x9 + Conv1x1, (Not challenging your work, it's indeed very interesting), but just kindly wondering what's new about this model?

    opened by vztu 5
  • Training scheme modifications for small GPUs

    Training scheme modifications for small GPUs

    Hi authors. Your paper has demonstrated a quite intriguing observation. I wish you luck with your submission. Thanks for sharing the code of the submission. When running the code, I got an issue regarding OOM when using the default batch size of 64. In the end I can only run with 8 samples per batch per GPU as my GPUs have only 11GB. I would like to know if you have tried smaller GPUs and achieved the same results. So far, besides learning rate modified according to the linear rule, I haven't made any change yet. If you tried training using smaller GPUs before, could you please share your experience? Thank you very much!

    opened by justanhduc 4
  • Experiments with full convolutional layers instead of patch embedding?

    Experiments with full convolutional layers instead of patch embedding?

    Have the author tried to replace the patch embedding with the just convolution?That is, using 1 stride instead of p?

    With this setting, this is a standard convolution network like MobileNet. I wonder what would be the performance?Is the performance gain of Convmix due to the patch embedding or the depthwise conv layers?

    Very interested in this work, thanks.

    opened by forjiuzhou 2
  • Training time

    Training time

    Hi, first of all thanks for a very interesting paper.

    I would like to know how long did it take you to train the models? I'm trying to train ConvMixer-768/32 using 2xV100 and one epoch is ~3 hours, so I would estimate that full training would take ~= 2 * 3 * 300 ~= 1800 GPU hours, which is insane. Even if you trained with 10 GPUs it would take ~1 week for one experiment to finish. Are my calculations correct?

    opened by bonlime 1
  • padding=same?

    padding=same?

    https://github.com/tmp-iclr/convmixer/blob/1cefd860a1a6a85369887d1a633425cedc2afd0a/convmixer.py#L18 There is an error:TypeError: conv2d(): argument 'padding' (position 5) must be tuple of ints, not str.

    opened by linhaoqi027 1
  • Add Docker environment & web demo

    Add Docker environment & web demo

    Hey @ashertrockman, @tmp-iclr ! wave

    This pull request makes it possible to run your model inside a Docker environment, which makes it easier for other people to run it. We're using an open source tool called Cog to make this process easier.

    This also means we can make a web page where other people can try out your model! View it here: https://replicate.com/locuslab/convmixer and have a look at some Image classification examples we already uploaded.

    By clicking "Claim this model" You'll be able to edit the everything, and we'll feature it on our website and tweet about it too.

    In case you're wondering who I am, I'm from Replicate, where we're trying to make machine learning reproducible. We got frustrated that we couldn't run all the really interesting ML work being done. So, we're going round implementing models we like. blush

    opened by ariel415el 0
  • Add Docker environment & web demo

    Add Docker environment & web demo

    Hey @ashertrockman, @tmp-iclr ! 👋

    This pull request makes it possible to run your model inside a Docker environment, which makes it easier for other people to run it. We're using an open source tool called Cog to make this process easier.

    This also means we can make a web page where other people can try out your model! View it here: https://replicate.com/locuslab/convmixer and have a look at some Image classification examples we already uploaded.

    By clicking "Claim this model" You'll be able to edit the everything, and we'll feature it on our website and tweet about it too.

    In case you're wondering who I am, I'm from Replicate, where we're trying to make machine learning reproducible. We got frustrated that we couldn't run all the really interesting ML work being done. So, we're going round implementing models we like. 😊

    opened by ariel415el 0
  • Fix notebooks

    Fix notebooks

    Hi.

    Fixed errors in pytorch-image-models/notebooks/{EffResNetComparison,GeneralizationToImageNetV2}.ipynb notebooks:

    • added missed pynvml installation;
    • resolved missed imports;
    • resolved errors due to outdated calls of timm library.

    Tested in colab env: "Run all" without any errors.

    opened by amrzv 0
  • CIFAR-10 training settings

    CIFAR-10 training settings

    First of all, thank you for the interesting work. I was experimenting the one with patch size 1 and kernel size 9 with CIFAR-10 with the following training settings:

    --model tiny_convmixer
     -b 64 -j 8 
    --opt adamw 
    --epochs 200 
    --sched onecycle 
    --amp 
    --input-size 3 32 32 
    --lr 0.01 
    --aa rand-m9-mstd0.5-inc1 
    --cutmix 0.5 
    --mixup 0.5 
    --reprob 0.25 
    --remode pixel 
    --num-classes 10
    --warmup-epochs 0
    --opt-eps 1e-3
    --clip-grad 1.0
    --scale 0.75 1.0
    --weight-decay 0.01
    --mean 0.4914 0.4822 0.4465
    --std 0.2471 0.2435 0.2616
    

    I could get only 95.89%. I am supposed to get 96.03% according to Table 4 in the paper. Can you please let me know any setting I missed? Thank you again.

    opened by fugokidi 0
  • Segmentation ConvMixer architecture ?

    Segmentation ConvMixer architecture ?

    I was trying to figure what a Segmentation ConvMixer would look like, and came up with that (residual connection inspired by MultiResUNet). Does it make sense to you ?

    image

    opened by divideconcept 0
  • Request more experiment results to compare to other architecture.

    Request more experiment results to compare to other architecture.

    Hi! This work is pretty interesting, but I think there should are more results like in "Demystifying Local Vision Transformer: Sparse Connectivity, Weight Sharing, and Dynamic Weight" as they replace local self-attention with depth-wise convolution in Swin Transformer. Since you conduct an advanced one with a more simple architecture compared to SwinTransformer, so I wonder if ConvMixer can get similar performance on object detection and semantic segmentation.

    opened by LuoXin-s 1
Releases(timm-v1.0)
Owner
ICLR 2022 Author
Patches Are All You Need? 🤷
ICLR 2022 Author
The official implementation of paper Siamese Transformer Pyramid Networks for Real-Time UAV Tracking, accepted by WACV22

SiamTPN Introduction This is the official implementation of the SiamTPN (WACV2022). The tracker intergrates pyramid feature network and transformer in

Robotics and Intelligent Systems Control @ NYUAD 28 Nov 25, 2022
Automatically download the cwru data set, and then divide it into training data set and test data set

Automatically download the cwru data set, and then divide it into training data set and test data set.自动下载cwru数据集,然后分训练数据集和测试数据集

6 Jun 27, 2022
This project is based on RIFE and aims to make RIFE more practical for users by adding various features and design new models

CPM 项目描述 CPM(Chinese Pretrained Models)模型是北京智源人工智能研究院和清华大学发布的中文大规模预训练模型。官方发布了三种规模的模型,参数量分别为109M、334M、2.6B,用户需申请与通过审核,方可下载。 由于原项目需要考虑大模型的训练和使用,需要安装较为复杂

hzwer 190 Jan 08, 2023
PyTorch implementation of the paper Deep Networks from the Principle of Rate Reduction

Deep Networks from the Principle of Rate Reduction This repository is the official PyTorch implementation of the paper Deep Networks from the Principl

459 Dec 27, 2022
Deploy recommendation engines with Edge Computing

RecoEdge: Bringing Recommendations to the Edge A one stop solution to build your recommendation models, train them and, deploy them in a privacy prese

NimbleEdge 131 Jan 02, 2023
[ECCV'20] Convolutional Occupancy Networks

Convolutional Occupancy Networks Paper | Supplementary | Video | Teaser Video | Project Page | Blog Post This repository contains the implementation o

622 Dec 30, 2022
MRQy is a quality assurance and checking tool for quantitative assessment of magnetic resonance imaging (MRI) data.

Front-end View Backend View Table of Contents Description Prerequisites Running Basic Information Measurements User Interface Feedback and usage Descr

Center for Computational Imaging and Personalized Diagnostics 58 Dec 02, 2022
PyTorch implementation for "Sharpness-aware Quantization for Deep Neural Networks".

Sharpness-aware Quantization for Deep Neural Networks This is the official repository for our paper: Sharpness-aware Quantization for Deep Neural Netw

Zhuang AI Group 30 Dec 19, 2022
Code & Data for Enhancing Photorealism Enhancement

Enhancing Photorealism Enhancement Stephan R. Richter, Hassan Abu AlHaija, Vladlen Koltun Paper | Website (with side-by-side comparisons) | Video (Pap

Intelligent Systems Lab Org 1.1k Dec 31, 2022
Black-Box-Tuning - Black-Box Tuning for Language-Model-as-a-Service

Black-Box-Tuning Source code for paper "Black-Box Tuning for Language-Model-as-a-Service". Being busy recently, the code in this repo and this tutoria

Tianxiang Sun 149 Jan 04, 2023
LSTM Neural Networks for Spectroscopic Studies of Type Ia Supernovae

Package Description The difficulties in acquiring spectroscopic data have been a major challenge for supernova surveys. snlstm is developed to provide

7 Oct 11, 2022
Phonetic PosteriorGram (PPG)-Based Voice Conversion (VC)

ppg-vc Phonetic PosteriorGram (PPG)-Based Voice Conversion (VC) This repo implements different kinds of PPG-based VC models. Pretrained models. More m

Liu Songxiang 227 Dec 28, 2022
CLNTM - Contrastive Learning for Neural Topic Model

Contrastive Learning for Neural Topic Model This repository contains the impleme

Thong Thanh Nguyen 25 Nov 24, 2022
Abstractive opinion summarization system (SelSum) and the largest dataset of Amazon product summaries (AmaSum). EMNLP 2021 conference paper.

Learning Opinion Summarizers by Selecting Informative Reviews This repository contains the codebase and the dataset for the corresponding EMNLP 2021

Arthur Bražinskas 39 Jan 01, 2023
Code for our EMNLP 2021 paper “Heterogeneous Graph Neural Networks for Keyphrase Generation”

GATER This repository contains the code for our EMNLP 2021 paper “Heterogeneous Graph Neural Networks for Keyphrase Generation”. Our implementation is

Jiacheng Ye 12 Nov 24, 2022
An open source machine learning library for performing regression tasks using RVM technique.

Introduction neonrvm is an open source machine learning library for performing regression tasks using RVM technique. It is written in C programming la

Siavash Eliasi 33 May 31, 2022
Official PyTorch code of DeepPanoContext: Panoramic 3D Scene Understanding with Holistic Scene Context Graph and Relation-based Optimization (ICCV 2021 Oral).

DeepPanoContext (DPC) [Project Page (with interactive results)][Paper] DeepPanoContext: Panoramic 3D Scene Understanding with Holistic Scene Context G

Cheng Zhang 66 Nov 16, 2022
This repository contains a PyTorch implementation of the paper Learning to Assimilate in Chaotic Dynamical Systems.

Amortized Assimilation This repository contains a PyTorch implementation of the paper Learning to Assimilate in Chaotic Dynamical Systems. Abstract: T

4 Aug 16, 2022
Keras implementation of Real-Time Semantic Segmentation on High-Resolution Images

Keras-ICNet [paper] Keras implementation of Real-Time Semantic Segmentation on High-Resolution Images. Training in progress! Requisites Python 3.6.3 K

Aitor Ruano 87 Dec 16, 2022
Code release for ConvNeXt model

A ConvNet for the 2020s Official PyTorch implementation of ConvNeXt, from the following paper: A ConvNet for the 2020s. arXiv 2022. Zhuang Liu, Hanzi

Meta Research 4.6k Jan 08, 2023