Adversarial Robustness Comparison of Vision Transformer and MLP-Mixer to CNNs

Last update: Oct 24, 2022

Overview

Adversarial Robustness Comparison of Vision Transformer and MLP-Mixer to CNNs ArXiv

Abstract

Convolutional Neural Networks (CNNs) have become the de facto gold standard in computer vision applications in the past years. Recently, however, new model architectures have been proposed challenging the status quo. The Vision Transformer (ViT) relies solely on attention modules, while the MLP-Mixer architecture substitutes the self-attention modules with Multi-Layer Perceptrons (MLPs). Despite their great success, CNNs have been widely known to be vulnerable to adversarial attacks, causing serious concerns for security-sensitive applications. Thus, it is critical for the community to know whether the newly proposed ViT and MLP-Mixer are also vulnerable to adversarial attacks. To this end, we empirically evaluate their adversarial robustness under several adversarial attack setups and benchmark them against the widely used CNNs. Overall, we find that the two architectures, especially ViT, are more robust than their CNN models. Using a toy example, we also provide empirical evidence that the lower adversarial robustness of CNNs can be partially attributed to their shift-invariant property. Our frequency analysis suggests that the most robust ViT architectures tend to rely more on low-frequency features compared with CNNs. Additionally, we have an intriguing finding that MLP-Mixer is extremely vulnerable to universal adversarial perturbations.

Setup

Set Paths

Set the paths in ./config.py according to your system and environment.

Download ViT Checkpoints

Run bash ./download_checkpoints.sh

NeurIPS dataset

We are providing the NeurIPS adversarial challenge dataset together with this repository. The images are stored in ./images together with the data sheet in ./images.csv

Evaluate Models

As a sanity check you can evaluate the models on the NeurIPS dataset and check if the numbers match Table 1 of the paper with bash ./experiments/eval_models.sh

White-box attack

For the white-box attacks you can run the corresponding script.

PGD attack

bash ./experiments/attack_pgd.sh

FGSM attack

bash ./experiments/attack_fgsm.sh

C&W

bash ./experiments/attack_cw.sh

DeepFool

bash ./experiments/attack_deepfool.sh

Black-box attack

Query-based
Transfer-based

For the black-box attacks you can run the corresponding script.

Transferability with I-FGSM

bash ./experiments/transferability.sh

Universal Adversarial Attack

Run bash ./experiments/attack_uap.sh

Docker

We provide a Dockerfile to get better reproducibility of the results presented in the paper. Have a look in the docker folder.

Credits

We would like to credit the following resources, which helped tremendously in our development-process.

Citation

@article{benz2021adversarial,
  title={Adversarial Robustness Comparison of Vision Transformer and MLP-Mixer to CNNs},
  author={Benz, Philipp and Ham, Soomin and Zhang, Chaoning and Karjauv, Adil and Kweon, In So},
  journal={arXiv preprint arXiv:2110.02797},
  year={2021}
}

Adversarial Robustness Comparison of Vision Transformer and MLP-Mixer to CNNs

Related tags

Overview

Adversarial Robustness Comparison of Vision Transformer and MLP-Mixer to CNNs ArXiv

Abstract

Setup

Set Paths

Download ViT Checkpoints

NeurIPS dataset

Evaluate Models

White-box attack

PGD attack

FGSM attack

C&W

DeepFool

Black-box attack

Transferability with I-FGSM

Universal Adversarial Attack

Docker

Credits

Citation

Owner

Philipp Benz

FaceAnon - Anonymize people in images and videos using yolov5-crowdhuman

ncnn is a high-performance neural network inference framework optimized for the mobile platform

A new version of the CIDACS-RL linkage tool suitable to a cluster computing environment.

PyTorch implementation for COMPLETER: Incomplete Multi-view Clustering via Contrastive Prediction (CVPR 2021)

領域を指定し、キーを入力することで画像を保存するツールです。クラス分類用のデータセット作成を想定しています。

Rate-limit-semaphore - Semaphore implementation with rate limit restriction for async-style (any core)

Context Axial Reverse Attention Network for Small Medical Objects Segmentation

Image-to-image regression with uncertainty quantification in PyTorch

Graph Posterior Network: Bayesian Predictive Uncertainty for Node Classification (NeurIPS 2021)

Bottom-up Human Pose Estimation

EgoNN: Egocentric Neural Network for Point Cloud Based 6DoF Relocalization at the City Scale

Progressive Domain Adaptation for Object Detection

Code base of object detection

Statistical and Algorithmic Investing Strategies for Everyone

Predicting Event Memorability from Contextual Visual Semantics

In this project we investigate the performance of the SetCon model on realistic video footage. Therefore, we implemented the model in PyTorch and tested the model on two example videos.

A fast model to compute optical flow between two input images.

Brain tumor detection using CNN (InceptionResNetV2 Model)

The Submission for SIMMC 2.0 Challenge 2021

Retinal Vessel Segmentation with Pixel-wise Adaptive Filters (ISBI 2022)