RaftMLP: How Much Can Be Done Without Attention and with Less Spatial Locality?

Related tags

Deep Learningraft-mlp
Overview

RaftMLP

RaftMLP: How Much Can Be Done Without Attention and with Less Spatial Locality?

By Yuki Tatsunami and Masato Taki (Rikkyo University)

[arxiv]

Abstract

For the past ten years, CNN has reigned supreme in the world of computer vision, but recently, Transformer has been on the rise. However, the quadratic computational cost of self-attention has become a serious problem in practice applications. There has been much research on architectures without CNN and self-attention in this context. In particular, MLP-Mixer is a simple architecture designed using MLPs and hit an accuracy comparable to the Vision Transformer. However, the only inductive bias in this architecture is the embedding of tokens. This leaves open the possibility of incorporating a non-convolutional (or non-local) inductive bias into the architecture, so we used two simple ideas to incorporate inductive bias into the MLP-Mixer while taking advantage of its ability to capture global correlations. A way is to divide the token-mixing block vertically and horizontally. Another way is to make spatial correlations denser among some channels of token-mixing. With this approach, we were able to improve the accuracy of the MLP-Mixer while reducing its parameters and computational complexity. The small model that is RaftMLP-S is comparable to the state-of-the-art global MLP-based model in terms of parameters and efficiency per calculation. In addition, we tackled the problem of fixed input image resolution for global MLP-based models by utilizing bicubic interpolation. We demonstrated that these models could be applied as the backbone of architectures for downstream tasks such as object detection. However, it did not have significant performance and mentioned the need for MLP-specific architectures for downstream tasks for global MLP-based models.

About Environment

Our base is PyTorch, Torchvision, and Ignite. We use mmdetection and mmsegmentation for object detection and semantic segmentation. We also use ClearML, AWS, etc., for experiment management.

We also use Docker for our environment, and with Docker and NVIDIA Container Toolkit installed, we can build a runtime environment at the ready.

Require

  • NVIDIA Driver
  • Docker(19.03+)
  • Docker Compose(1.28.0+)
  • NVIDIA Container Toolkit

Prepare

clearml.conf

Please copy clearml.conf.sample, you can easily create clearml.conf. Unless you have a Clear ML account, you should use the account. Next, you obtain the access key and secret key of the service. Let's write them on clearml.conf. If you don't have an AWS account, you will need one. Then, create an IAM user and an S3 bucket, and grant the IAM user a policy that allows you to read and write objects to the bucket you created. Include the access key and secret key of the IAM user you created and the region of the bucket you made in your clearml.conf.

docker-compose.yml

Please copy docker-compose.yml.sample to docker-compose.yml. Change the path/to/datasets in the volumes section to an appropriate directory where the datasets are stored. You can set device_ids on your environment. If you train semantic segmentation models or object detection models, you should set WANDB_API_KEY.

Datasets

Except for ImageNet, our codes automatically download datasets, but we recommend downloading them beforehand. Datasets need to be placed in the location set in the datasets directory in docker-compose.yml.

ImageNet1k

Please go to URL and register on the site. Then you can download ImageNet1k dataset. You should place it under path/to/datasets with the following structure.

│imagenet/
├──train/
│  ├── n01440764
│  │   ├── n01440764_10026.JPEG
│  │   ├── n01440764_10027.JPEG
│  │   ├── ......
│  ├── ......
├──val/
│  ├── n01440764
│  │   ├── ILSVRC2012_val_00000293.JPEG
│  │   ├── ILSVRC2012_val_00002138.JPEG
│  │   ├── ......
│  ├── ......

CIFAR10

No problem, just let the code download automatically. URL

CIFAR100

No problem, just let the code download automatically. URL

Oxford 102 Flowers

No problem, just let the code download automatically. URL

Stanford Cars

You should place it under path/to/datasets with the following structure.

│stanford_cars/
├──cars_train/
│  ├── 00001.jpg
│  ├── 00002.jpg
│  ├── ......
├──cars_test/
│  ├── 00001.jpg
│  ├── 00002.jpg
│  ├── ......
├──devkit/
│  ├── cars_meta.mat
│  ├── cars_test_annos.mat
│  ├── cars_train_annos.mat
│  ├── eval_train.m
│  ├── README.txt
│  ├── train_perfect_preds.txt
├──cars_test_annos_withlabels.matcars_test_annos_withlabels.mat

URL

iNaturalist18

You should place it under path/to/datasets with the following structure.

│i_naturalist_18/
├──train_val2018/
│  ├──Actinopterygii/
│  │  ├──2229/
│  │  │  ├── 014a31153ac74bf87f1f730480e4a27a.jpg
│  │  │  ├── 037d062cc1b8a85821449d2cdeca7749.jpg
│  │  │  ├── ......
│  │  ├── ......
│  ├── ......
├──train2018.json
├──val2018.json

URL

iNaturalist19

You should place it under path/to/datasets with the following structure.

│i_naturalist_19/
├──train_val2019/
│  ├──Amphibians/
│  │  ├──153/
│  │  │  ├── 0042d05b4ffbd5a1ce2fc56513a7777e.jpg
│  │  │  ├── 006f69e838b87cfff3d12120795c4ada.jpg
│  │  │  ├── ......
│  │  ├── ......
│  ├── ......
├──train2019.json
├──val2019.json

URL

MS COCO

You should place it under path/to/datasets with the following structure.

│coco/
├──train2017/
│  ├── 000000000009.jpg
│  ├── 000000000025.jpg
│  ├── ......
├──val2017/
│  ├── 000000000139.jpg
│  ├── 000000000285.jpg
│  ├── ......
├──annotations/
│  ├── captions_train2017.json
│  ├── captions_val2017.json
│  ├── instances_train2017.json
│  ├── instances_val2017.json
│  ├── person_keypoints_train2017.json
│  ├── person_keypoints_val2017.json

URL

ADE20K

In order for you to download the ADE20k dataset, you have to register at this site and get approved. Once downloaded the dataset, place it so that it has the following structure.

│ade/
├──ADEChallengeData2016/
│  ├──annotations/
│  │  ├──training/
│  │  │  ├── ADE_train_00000001.png
│  │  │  ├── ADE_train_00000002.png
│  │  │  ├── ......
│  │  ├──validation/
│  │  │  ├── ADE_val_00000001.png
│  │  │  ├── ADE_val_00000002.png
│  │  │  ├── ......
│  ├──images/
│  │  ├──training/
│  │  │  ├── ADE_train_00000001.jpg
│  │  │  ├── ADE_train_00000002.jpg
│  │  │  ├── ......
│  │  ├──validation/
│  │  │  ├── ADE_val_00000001.jpg
│  │  │  ├── ADE_val_00000002.jpg
│  │  │  ├── ......
│  │  ├──
│  ├──objectInfo150.txt
│  ├──sceneCategories.txt

ImageNet1k

configs/settings are available. Each of the training conducted in Subsection 4.1 can be performed in the following commands.

docker run trainer python run.py settings=imagenet-raft-mlp-cross-mlp-emb-s
docker run trainer python run.py settings=imagenet-raft-mlp-cross-mlp-emb-m
docker run trainer python run.py settings=imagenet-raft-mlp-cross-mlp-emb-l

The ablation study for channel rafts in subsection 4.2 ran the following commands.

Ablation Study

docker run trainer python run.py settings=imagenet-org-mixer
docker run trainer python run.py settings=imagenet-raft-mlp-r-1
docker run trainer python run.py settings=imagenet-raft-mlp-r-2
docker run trainer python run.py settings=imagenet-raft-mlp

The ablation study for multi-scale patch embedding in subsection 4.2 ran the following commands.

docker run trainer python run.py settings=imagenet-raft-mlp-cross-mlp-emb-m
docker run trainer python run.py settings=imagenet-raft-mlp-hierarchy-m

Transfer Learning

docker run trainer python run.py settings=finetune/cars-org-mixer.yaml
docker run trainer python run.py settings=finetune/cars-raft-mlp-cross-mlp-emb-s.yaml
docker run trainer python run.py settings=finetune/cars-raft-mlp-cross-mlp-emb-m.yaml
docker run trainer python run.py settings=finetune/cars-raft-mlp-cross-mlp-emb-l.yaml
docker run trainer python run.py settings=finetune/cifar10-org-mixer.yaml
docker run trainer python run.py settings=finetune/cifar10-raft-mlp-cross-mlp-emb-s.yaml
docker run trainer python run.py settings=finetune/cifar10-raft-mlp-cross-mlp-emb-m.yaml
docker run trainer python run.py settings=finetune/cifar10-raft-mlp-cross-mlp-emb-l.yaml
docker run trainer python run.py settings=finetune/cifar100-org-mixer.yaml
docker run trainer python run.py settings=finetune/cifar100-raft-mlp-cross-mlp-emb-s.yaml
docker run trainer python run.py settings=finetune/cifar100-raft-mlp-cross-mlp-emb-m.yaml
docker run trainer python run.py settings=finetune/cifar100-raft-mlp-cross-mlp-emb-l.yaml
docker run trainer python run.py settings=finetune/flowers102-org-mixer.yaml
docker run trainer python run.py settings=finetune/flowers102-raft-mlp-cross-mlp-emb-s.yaml
docker run trainer python run.py settings=finetune/flowers102-raft-mlp-cross-mlp-emb-m.yaml
docker run trainer python run.py settings=finetune/flowers102-raft-mlp-cross-mlp-emb-l.yaml
docker run trainer python run.py settings=finetune/inat18-org-mixer.yaml
docker run trainer python run.py settings=finetune/inat18-raft-mlp-cross-mlp-emb-s.yaml
docker run trainer python run.py settings=finetune/inat18-raft-mlp-cross-mlp-emb-m.yaml
docker run trainer python run.py settings=finetune/inat18-raft-mlp-cross-mlp-emb-l.yaml
docker run trainer python run.py settings=finetune/inat19-org-mixer.yaml
docker run trainer python run.py settings=finetune/inat19-raft-mlp-cross-mlp-emb-s.yaml
docker run trainer python run.py settings=finetune/inat19-raft-mlp-cross-mlp-emb-m.yaml
docker run trainer python run.py settings=finetune/inat19-raft-mlp-cross-mlp-emb-l.yaml

Object Detection

The weights already trained by ImageNet should be placed in the following path.

path/to/datasets/weights/imagenet-raft-mlp-cross-mlp-emb-s/last_model_0.pt
path/to/datasets/weights/imagenet-raft-mlp-cross-mlp-emb-l/last_model_0.pt
path/to/datasets/weights/imagenet-raft-mlp-cross-mlp-emb-m/last_model_0.pt
path/to/datasets/weights/imagenet-org-mixer/last_model_0.pt

Please execute the following commands.

docker run trainer bash ./detection.sh configs/detection/maskrcnn_org_mixer_fpn_1x_coco.py 8 --seed=42 --deterministic --gpus=8
docker run trainer bash ./detection.sh configs/detection/maskrcnn_raftmlp_l_fpn_1x_coco.py 8 --seed=42 --deterministic --gpus=8
docker run trainer bash ./detection.sh configs/detection/maskrcnn_raftmlp_m_fpn_1x_coco.py 8 --seed=42 --deterministic --gpus=8
docker run trainer bash ./detection.sh configs/detection/maskrcnn_raftmlp_s_fpn_1x_coco.py 8 --seed=42 --deterministic --gpus=8
docker run trainer bash ./detection.sh configs/detection/retinanet_org_mixer_fpn_1x_coco.py 8 --seed=42 --deterministic --gpus=8
docker run trainer bash ./detection.sh configs/detection/retinanet_raftmlp_l_fpn_1x_coco.py 8 --seed=42 --deterministic --gpus=8
docker run trainer bash ./detection.sh configs/detection/retinanet_raftmlp_m_fpn_1x_coco.py 8 --seed=42 --deterministic --gpus=8
docker run trainer bash ./detection.sh configs/detection/retinanet_raftmlp_s_fpn_1x_coco.py 8 --seed=42 --deterministic --gpus=8

Semantic Segmentation

As with object detection, the following should be executed after placing the weight files in advance.

docker run trainer bash ./segmentation.sh configs/segmentation/fpn_org_mixer_512x512_40k_ade20k.py 8 --seed=42 --deterministic --gpus=8
docker run trainer bash ./segmentation.sh configs/segmentation/fpn_raftmlp_s_512x512_40k_ade20k.py 8 --seed=42 --deterministic --gpus=8
docker run trainer bash ./segmentation.sh configs/segmentation/fpn_raftmlp_m_512x512_40k_ade20k.py 8 --seed=42 --deterministic --gpus=8
docker run trainer bash ./segmentation.sh configs/segmentation/fpn_raftmlp_l_512x512_40k_ade20k.py 8 --seed=42 --deterministic --gpus=8

Reference

@misc{tatsunami2021raftmlp,
  title={RaftMLP: How Much Can Be Done Without Attention and with Less Spatial Locality?},
  author={Yuki Tatsunami and Masato Taki},
  year={2021}
  eprint={2108.04384},
  archivePrefix={arXiv},
  primaryClass={cs.CV}
}

License

This repository is relased under the Apache 2.0 license as douns in the LICENSE file.

Owner
Okojo
Okojo
Stochastic gradient descent with model building

Stochastic Model Building (SMB) This repository includes a new fast and robust stochastic optimization algorithm for training deep learning models. Th

S. Ilker Birbil 22 Jan 19, 2022
Pytorch implementation of XRD spectral identification from COD database

XRDidentifier Pytorch implementation of XRD spectral identification from COD database. Details will be explained in the paper to be submitted to NeurI

Masaki Adachi 4 Jan 07, 2023
Wordle Env: A Daily Word Environment for Reinforcement Learning

Wordle Env: A Daily Word Environment for Reinforcement Learning Setup Steps: git pull [email&#

2 Mar 28, 2022
This is the official pytorch implementation of AutoDebias, an automatic debiasing method for recommendation.

AutoDebias This is the official pytorch implementation of AutoDebias, a debiasing method for recommendation system. AutoDebias is proposed in the pape

Dong Hande 77 Nov 25, 2022
Vector Neurons: A General Framework for SO(3)-Equivariant Networks

Vector Neurons: A General Framework for SO(3)-Equivariant Networks Created by Congyue Deng, Or Litany, Yueqi Duan, Adrien Poulenard, Andrea Tagliasacc

Congyue Deng 332 Dec 29, 2022
An ever-growing playground of notebooks showcasing CLIP's impressive zero-shot capabilities.

Playground for CLIP-like models Demo Colab Link GradCAM Visualization Naive Zero-shot Detection Smarter Zero-shot Detection Captcha Solver Changelog 2

Kevin Zakka 101 Dec 30, 2022
Code + pre-trained models for the paper Keeping Your Eye on the Ball Trajectory Attention in Video Transformers

Motionformer This is an official pytorch implementation of paper Keeping Your Eye on the Ball: Trajectory Attention in Video Transformers. In this rep

Facebook Research 192 Dec 23, 2022
CTF challenges from redpwnCTF 2021

redpwnCTF 2021 Challenges This repository contains challenges from redpwnCTF 2021 in the rCDS format; challenge information is in the challenge.yaml f

redpwn 27 Dec 07, 2022
Code repository for the paper "Doubly-Trained Adversarial Data Augmentation for Neural Machine Translation" with instructions to reproduce the results.

Doubly Trained Neural Machine Translation System for Adversarial Attack and Data Augmentation Languages Experimented: Data Overview: Source Target Tra

Steven Tan 1 Aug 18, 2022
Complete the code of prefix-tuning in low data setting

Prefix Tuning Note: 作者在论文中提到使用真实的word去初始化prefix的操作(Initializing the prefix with activations of real words,significantly improves generation)。我在使用作者提供的

Andrew Zeng 4 Jul 11, 2022
ConvMAE: Masked Convolution Meets Masked Autoencoders

ConvMAE ConvMAE: Masked Convolution Meets Masked Autoencoders Peng Gao1, Teli Ma1, Hongsheng Li2, Jifeng Dai3, Yu Qiao1, 1 Shanghai AI Laboratory, 2 M

Alpha VL Team of Shanghai AI Lab 345 Jan 08, 2023
SCU OlympicsRunning Baseline

Competition 1v1 running Environment check details in Jidi Competition RLChina2021智能体竞赛 做出的修改: 奖励重塑:修改了环境,重新设置了奖励的分配,使得奖励组成不只有零和博弈,还有探索环境的奖励。 算法微调:修改了官

ZiSeoi Wong 2 Nov 23, 2021
MINIROCKET: A Very Fast (Almost) Deterministic Transform for Time Series Classification

MINIROCKET: A Very Fast (Almost) Deterministic Transform for Time Series Classification

187 Dec 26, 2022
A Next Generation ConvNet by FaceBookResearch Implementation in PyTorch(Original) and TensorFlow.

ConvNeXt A Next Generation ConvNet by FaceBookResearch Implementation in PyTorch(Original) and TensorFlow. A FacebookResearch Implementation on A Conv

Raghvender 2 Feb 14, 2022
Personal project about genus-0 meshes, spherical harmonics and a cow

How to transform a cow into spherical harmonics ? Spot the cow, from Keenan Crane's blog Context In the field of Deep Learning, training on images or

3 Aug 22, 2022
Local trajectory planner based on a multilayer graph framework for autonomous race vehicles.

Graph-Based Local Trajectory Planner The graph-based local trajectory planner is python-based and comes with open interfaces as well as debug, visuali

TUM - Institute of Automotive Technology 160 Jan 04, 2023
Joint Discriminative and Generative Learning for Person Re-identification. CVPR'19 (Oral)

Joint Discriminative and Generative Learning for Person Re-identification [Project] [Paper] [YouTube] [Bilibili] [Poster] [Supp] Joint Discriminative

NVIDIA Research Projects 1.2k Dec 30, 2022
A copy of Ares that costs 30 fucking dollars.

Finalement, j'ai décidé d'abandonner cette idée, je me suis comporté comme un enfant qui été en colère. Comme m'ont dit certaines personnes j'ai des c

Bleu 24 Apr 14, 2022
Implementation of TabTransformer, attention network for tabular data, in Pytorch

Tab Transformer Implementation of Tab Transformer, attention network for tabular data, in Pytorch. This simple architecture came within a hair's bread

Phil Wang 420 Jan 05, 2023
Source codes for "Structure-Aware Abstractive Conversation Summarization via Discourse and Action Graphs"

Structure-Aware-BART This repo contains codes for the following paper: Jiaao Chen, Diyi Yang:Structure-Aware Abstractive Conversation Summarization vi

GT-SALT 56 Dec 08, 2022