[NeurIPS 2021]: Are Transformers More Robust Than CNNs? (Pytorch implementation & checkpoints)

Overview

Are Transformers More Robust Than CNNs?

Pytorch implementation for NeurIPS 2021 Paper: Are Transformers More Robust Than CNNs?

Our implementation is based on DeiT.

Introduction

Transformer emerges as a powerful tool for visual recognition. In addition to demonstrating competitive performance on a broad range of visual benchmarks, recent works also argue that Transformers are much more robust than Convolutions Neural Networks (CNNs). Nonetheless, surprisingly, we find these conclusions are drawn from unfair experimental settings, where Transformers and CNNs are compared at different scales and are applied with distinct training frameworks. In this paper, we aim to provide the first fair & in-depth comparisons between Transformers and CNNs, focusing on robustness evaluations.

With our unified training setup, we first challenge the previous belief that Transformers outshine CNNs when measuring adversarial robustness. More surprisingly, we find CNNs can easily be as robust as Transformers on defending against adversarial attacks, if they properly adopt Transformers' training recipes. While regarding generalization on out-of-distribution samples, we show pre-training on (external) large-scale datasets is not a fundamental request for enabling Transformers to achieve better performance than CNNs. Moreover, our ablations suggest such stronger generalization is largely benefited by the Transformer's self-attention-like architectures per se, rather than by other training setups. We hope this work can help the community better understand and benchmark the robustness of Transformers and CNNs.

Pretrained models

We provide both pretrained vanilla models and adversarially trained models.

Vanilla Training

Main Results

Pretrained Model ImageNet ImageNet-A ImageNet-C Stylized-ImageNet
Res50-Ori download link 76.9 3.2 57.9 8.3
Res50-Align download link 76.3 4.5 55.6 8.2
Res50-Best download link 75.7 6.3 52.3 10.8
DeiT-Small download link 76.8 12.2 48.0 13.0

Model Size

ResNets:

  • ResNets fully aligned (with DeiT's training recipe) model, denoted as res*:
Model Size Pretrained Model ImageNet ImageNet-A ImageNet-C Stylized-ImageNet
Res18* 11.69M download link 67.83 1.92 64.14 7.92
Res50* 25.56M download link 76.28 4.53 55.62 8.17
Res101* 44.55M download link 77.97 8.84 49.19 11.60
  • ResNets best model (for Out-of-Distribution (OOD) generalization), denoted as res-best:
Model Size Pretrained Model ImageNet ImageNet-A ImageNet-C Stylized-ImageNet
Res18-best 11.69M download link 66.81 2.03 62.65 9.45
Res50-best 25.56M download link 75.74 6.32 52.25 10.77
Res101-best 44.55M download link 77.83 11.49 47.35 13.28

DeiTs:

Model Size Pretrained Model ImageNet ImageNet-A ImageNet-C Stylized-ImageNet
DeiT-Mini 9.98M download link 72.89 8.19 54.68 9.88
DeiT-Small 22.05M download link 76.82 12.21 47.99 12.98

Model Distillation

Architecture Pretrained Model ImageNet ImageNet-A ImageNet-C Stylized-ImageNet
Teacher DeiT-Small download link 76.8 12.2 48.0 13.0
Student Res50*-Distill download link 76.7 5.2 54.2 9.8
Teacher Res50* download link 76.3 4.5 55.6 8.2
Student DeiT-S-Distill download link 76.2 10.9 49.3 11.9

Adversarial Training

Pretrained Model Clean Acc PGD-100 Auto Attack
Res50-ReLU download link 66.77 32.26 26.41
Res50-GELU download link 67.38 40.27 35.51
DeiT-Small download link 66.50 40.32 35.50

Vanilla Training

Data preparation

Download and extract ImageNet train and val images from http://image-net.org/. The directory structure is the standard layout for the torchvision, and the training and validation data is expected to be in the train folder and val folder respectively:

/path/to/imagenet/
  train/
    class1/
      img1.jpeg
    class2/
      img2.jpeg
  val/
    class1/
      img3.jpeg
    class/2
      img4.jpeg

Environment

Install dependencies:

pip3 install -r requirements.txt

Training Scripts

To train a ResNet model on ImageNet run:

bash script/res.sh

To train a DeiT model on ImageNet run:

bash script/deit.sh

Generalization to Out-of-Distribution Sample

Data Preparation

Download and extract ImageNet-A, ImageNet-C, Stylized-ImageNet val images:

/path/to/datasets/
  val/
    class1/
      img1.jpeg
    class/2
      img2.jpeg

Evaluation Scripts

To evaluate pre-trained models, run:

bash script/generation_to_ood.sh

It is worth noting that for ImageNet-C evaluation, the error rate is calculated based on the Noise, Blur, Weather and Digital categories.

Adversarial Training

To perform adversarial training on ResNet run:

bash script/advres.sh

To do adversarial training on DeiT run:

bash scripts/advdeit.sh

Robustness to Adversarial Example

PGD Attack Evaluation

To evaluate the pre-trained models, run:

bash script/eval_advtraining.sh

AutoAttack Evaluation

./autoattack contains the AutoAttack public package, with a little modification to best support ImageNet evaluation.

cd autoattack/
bash autoattack.sh

Patch Attack Evaluation

Please refer to PatchAttack

Citation

If you use our code, models or wish to refer to our results, please use the following BibTex entry:

@inproceedings{bai2021transformers,
  title     = {Are Transformers More Robust Than CNNs?},
  author    = {Bai, Yutong and Mei, Jieru and Yuille, Alan and Xie, Cihang},
  booktitle = {Thirty-Fifth Conference on Neural Information Processing Systems},
  year      = {2021},
}
Owner
Yutong Bai
CS Ph.D student @ JHU, CCVL
Yutong Bai
This repository contains FEDOT - an open-source framework for automated modeling and machine learning (AutoML)

package tests docs license stats support This repository contains FEDOT - an open-source framework for automated modeling and machine learning (AutoML

National Center for Cognitive Research of ITMO University 482 Dec 26, 2022
Detail-Preserving Transformer for Light Field Image Super-Resolution

DPT Official Pytorch implementation of the paper "Detail-Preserving Transformer for Light Field Image Super-Resolution" accepted by AAAI 2022 . Update

50 Jan 01, 2023
Implement slightly different caffe-segnet in tensorflow

Tensorflow-SegNet Implement slightly different (see below for detail) SegNet in tensorflow, successfully trained segnet-basic in CamVid dataset. Due t

Tseng Kuan Lun 364 Oct 27, 2022
DeepSTD: Mining Spatio-temporal Disturbances of Multiple Context Factors for Citywide Traffic Flow Prediction

DeepSTD: Mining Spatio-temporal Disturbances of Multiple Context Factors for Citywide Traffic Flow Prediction This is the implementation of DeepSTD in

5 Sep 26, 2022
Tracking Progress in Question Answering over Knowledge Graphs

Tracking Progress in Question Answering over Knowledge Graphs Table of contents Question Answering Systems with Descriptions The QA Systems Table cont

Knowledge Graph Question Answering 47 Jan 02, 2023
Duke Machine Learning Winter School: Computer Vision 2022

mlwscv2002 Welcome to the Duke Machine Learning Winter School: Computer Vision 2022! The MLWS-CV includes 3 hands-on training sessions on implementing

Duke + Data Science (+DS) 9 May 25, 2022
Official Pytorch Implementation of GraphiT

GraphiT: Encoding Graph Structure in Transformers This repository implements GraphiT, described in the following paper: Grégoire Mialon*, Dexiong Chen

Inria Thoth 80 Nov 27, 2022
Pytorch Implementation of "Diagonal Attention and Style-based GAN for Content-Style disentanglement in image generation and translation" (ICCV 2021)

DiagonalGAN Official Pytorch Implementation of "Diagonal Attention and Style-based GAN for Content-Style Disentanglement in Image Generation and Trans

32 Dec 06, 2022
Tensorflow 2.x based implementation of EDSR, WDSR and SRGAN for single image super-resolution

Single Image Super-Resolution with EDSR, WDSR and SRGAN A Tensorflow 2.x based implementation of Enhanced Deep Residual Networks for Single Image Supe

Martin Krasser 1.3k Jan 06, 2023
Codes to pre-train T5 (Text-to-Text Transfer Transformer) models pre-trained on Japanese web texts

t5-japanese Codes to pre-train T5 (Text-to-Text Transfer Transformer) models pre-trained on Japanese web texts. The following is a list of models that

Kimio Kuramitsu 1 Dec 13, 2021
An algorithmic trading bot that learns and adapts to new data and evolving markets using Financial Python Programming and Machine Learning.

ALgorithmic_Trading_with_ML An algorithmic trading bot that learns and adapts to new data and evolving markets using Financial Python Programming and

1 Mar 14, 2022
Machine learning and Deep learning models, deploy on telegram (the best social media)

Semi Intelligent BOT The project involves : Classifying fake news Classifying objects such as aeroplane, automobile, bird, cat, deer, dog, frog, horse

MohammadReza Norouzi 5 Mar 06, 2022
Full Stack Deep Learning Labs

Full Stack Deep Learning Labs Welcome! Project developed during lab sessions of the Full Stack Deep Learning Bootcamp. We will build a handwriting rec

Full Stack Deep Learning 1.2k Dec 31, 2022
Taking A Closer Look at Domain Shift: Category-level Adversaries for Semantics Consistent Domain Adaptation

Taking A Closer Look at Domain Shift: Category-level Adversaries for Semantics Consistent Domain Adaptation (CVPR2019) This is a pytorch implementatio

Yawei Luo 280 Jan 01, 2023
OpenIPDM is a MATLAB open-source platform that stands for infrastructures probabilistic deterioration model

Open-Source Toolbox for Infrastructures Probabilistic Deterioration Modelling OpenIPDM is a MATLAB open-source platform that stands for infrastructure

CIVML 0 Jan 20, 2022
TransMIL: Transformer based Correlated Multiple Instance Learning for Whole Slide Image Classification

TransMIL: Transformer based Correlated Multiple Instance Learning for Whole Slide Image Classification [NeurIPS 2021] Abstract Multiple instance learn

132 Dec 30, 2022
This is a package for LiDARTag, described in paper: LiDARTag: A Real-Time Fiducial Tag System for Point Clouds

LiDARTag Overview This is a package for LiDARTag, described in paper: LiDARTag: A Real-Time Fiducial Tag System for Point Clouds (PDF)(arXiv). This wo

University of Michigan Dynamic Legged Locomotion Robotics Lab 159 Dec 21, 2022
RGBD-Net - This repository contains a pytorch lightning implementation for the 3DV 2021 RGBD-Net paper.

[3DV 2021] We propose a new cascaded architecture for novel view synthesis, called RGBD-Net, which consists of two core components: a hierarchical depth regression network and a depth-aware generator

Phong Nguyen Ha 4 May 26, 2022
Dynamic Graph Event Detection

DyGED Dynamic Graph Event Detection Get Started pip install -r requirements.txt TODO Paper link to arxiv, and how to cite. Twitter Weather dataset tra

Mert Koşan 3 May 09, 2022
Implementation of ConvMixer-Patches Are All You Need? in TensorFlow and Keras

Patches Are All You Need? - ConvMixer ConvMixer, an extremely simple model that is similar in spirit to the ViT and the even-more-basic MLP-Mixer in t

Sayan Nath 8 Oct 03, 2022