RaftMLP: How Much Can Be Done Without Attention and with Less Spatial Locality?

Related tags

Deep Learningraft-mlp
Overview

RaftMLP

RaftMLP: How Much Can Be Done Without Attention and with Less Spatial Locality?

By Yuki Tatsunami and Masato Taki (Rikkyo University)

[arxiv]

Abstract

For the past ten years, CNN has reigned supreme in the world of computer vision, but recently, Transformer has been on the rise. However, the quadratic computational cost of self-attention has become a serious problem in practice applications. There has been much research on architectures without CNN and self-attention in this context. In particular, MLP-Mixer is a simple architecture designed using MLPs and hit an accuracy comparable to the Vision Transformer. However, the only inductive bias in this architecture is the embedding of tokens. This leaves open the possibility of incorporating a non-convolutional (or non-local) inductive bias into the architecture, so we used two simple ideas to incorporate inductive bias into the MLP-Mixer while taking advantage of its ability to capture global correlations. A way is to divide the token-mixing block vertically and horizontally. Another way is to make spatial correlations denser among some channels of token-mixing. With this approach, we were able to improve the accuracy of the MLP-Mixer while reducing its parameters and computational complexity. The small model that is RaftMLP-S is comparable to the state-of-the-art global MLP-based model in terms of parameters and efficiency per calculation. In addition, we tackled the problem of fixed input image resolution for global MLP-based models by utilizing bicubic interpolation. We demonstrated that these models could be applied as the backbone of architectures for downstream tasks such as object detection. However, it did not have significant performance and mentioned the need for MLP-specific architectures for downstream tasks for global MLP-based models.

About Environment

Our base is PyTorch, Torchvision, and Ignite. We use mmdetection and mmsegmentation for object detection and semantic segmentation. We also use ClearML, AWS, etc., for experiment management.

We also use Docker for our environment, and with Docker and NVIDIA Container Toolkit installed, we can build a runtime environment at the ready.

Require

  • NVIDIA Driver
  • Docker(19.03+)
  • Docker Compose(1.28.0+)
  • NVIDIA Container Toolkit

Prepare

clearml.conf

Please copy clearml.conf.sample, you can easily create clearml.conf. Unless you have a Clear ML account, you should use the account. Next, you obtain the access key and secret key of the service. Let's write them on clearml.conf. If you don't have an AWS account, you will need one. Then, create an IAM user and an S3 bucket, and grant the IAM user a policy that allows you to read and write objects to the bucket you created. Include the access key and secret key of the IAM user you created and the region of the bucket you made in your clearml.conf.

docker-compose.yml

Please copy docker-compose.yml.sample to docker-compose.yml. Change the path/to/datasets in the volumes section to an appropriate directory where the datasets are stored. You can set device_ids on your environment. If you train semantic segmentation models or object detection models, you should set WANDB_API_KEY.

Datasets

Except for ImageNet, our codes automatically download datasets, but we recommend downloading them beforehand. Datasets need to be placed in the location set in the datasets directory in docker-compose.yml.

ImageNet1k

Please go to URL and register on the site. Then you can download ImageNet1k dataset. You should place it under path/to/datasets with the following structure.

│imagenet/
├──train/
│  ├── n01440764
│  │   ├── n01440764_10026.JPEG
│  │   ├── n01440764_10027.JPEG
│  │   ├── ......
│  ├── ......
├──val/
│  ├── n01440764
│  │   ├── ILSVRC2012_val_00000293.JPEG
│  │   ├── ILSVRC2012_val_00002138.JPEG
│  │   ├── ......
│  ├── ......

CIFAR10

No problem, just let the code download automatically. URL

CIFAR100

No problem, just let the code download automatically. URL

Oxford 102 Flowers

No problem, just let the code download automatically. URL

Stanford Cars

You should place it under path/to/datasets with the following structure.

│stanford_cars/
├──cars_train/
│  ├── 00001.jpg
│  ├── 00002.jpg
│  ├── ......
├──cars_test/
│  ├── 00001.jpg
│  ├── 00002.jpg
│  ├── ......
├──devkit/
│  ├── cars_meta.mat
│  ├── cars_test_annos.mat
│  ├── cars_train_annos.mat
│  ├── eval_train.m
│  ├── README.txt
│  ├── train_perfect_preds.txt
├──cars_test_annos_withlabels.matcars_test_annos_withlabels.mat

URL

iNaturalist18

You should place it under path/to/datasets with the following structure.

│i_naturalist_18/
├──train_val2018/
│  ├──Actinopterygii/
│  │  ├──2229/
│  │  │  ├── 014a31153ac74bf87f1f730480e4a27a.jpg
│  │  │  ├── 037d062cc1b8a85821449d2cdeca7749.jpg
│  │  │  ├── ......
│  │  ├── ......
│  ├── ......
├──train2018.json
├──val2018.json

URL

iNaturalist19

You should place it under path/to/datasets with the following structure.

│i_naturalist_19/
├──train_val2019/
│  ├──Amphibians/
│  │  ├──153/
│  │  │  ├── 0042d05b4ffbd5a1ce2fc56513a7777e.jpg
│  │  │  ├── 006f69e838b87cfff3d12120795c4ada.jpg
│  │  │  ├── ......
│  │  ├── ......
│  ├── ......
├──train2019.json
├──val2019.json

URL

MS COCO

You should place it under path/to/datasets with the following structure.

│coco/
├──train2017/
│  ├── 000000000009.jpg
│  ├── 000000000025.jpg
│  ├── ......
├──val2017/
│  ├── 000000000139.jpg
│  ├── 000000000285.jpg
│  ├── ......
├──annotations/
│  ├── captions_train2017.json
│  ├── captions_val2017.json
│  ├── instances_train2017.json
│  ├── instances_val2017.json
│  ├── person_keypoints_train2017.json
│  ├── person_keypoints_val2017.json

URL

ADE20K

In order for you to download the ADE20k dataset, you have to register at this site and get approved. Once downloaded the dataset, place it so that it has the following structure.

│ade/
├──ADEChallengeData2016/
│  ├──annotations/
│  │  ├──training/
│  │  │  ├── ADE_train_00000001.png
│  │  │  ├── ADE_train_00000002.png
│  │  │  ├── ......
│  │  ├──validation/
│  │  │  ├── ADE_val_00000001.png
│  │  │  ├── ADE_val_00000002.png
│  │  │  ├── ......
│  ├──images/
│  │  ├──training/
│  │  │  ├── ADE_train_00000001.jpg
│  │  │  ├── ADE_train_00000002.jpg
│  │  │  ├── ......
│  │  ├──validation/
│  │  │  ├── ADE_val_00000001.jpg
│  │  │  ├── ADE_val_00000002.jpg
│  │  │  ├── ......
│  │  ├──
│  ├──objectInfo150.txt
│  ├──sceneCategories.txt

ImageNet1k

configs/settings are available. Each of the training conducted in Subsection 4.1 can be performed in the following commands.

docker run trainer python run.py settings=imagenet-raft-mlp-cross-mlp-emb-s
docker run trainer python run.py settings=imagenet-raft-mlp-cross-mlp-emb-m
docker run trainer python run.py settings=imagenet-raft-mlp-cross-mlp-emb-l

The ablation study for channel rafts in subsection 4.2 ran the following commands.

Ablation Study

docker run trainer python run.py settings=imagenet-org-mixer
docker run trainer python run.py settings=imagenet-raft-mlp-r-1
docker run trainer python run.py settings=imagenet-raft-mlp-r-2
docker run trainer python run.py settings=imagenet-raft-mlp

The ablation study for multi-scale patch embedding in subsection 4.2 ran the following commands.

docker run trainer python run.py settings=imagenet-raft-mlp-cross-mlp-emb-m
docker run trainer python run.py settings=imagenet-raft-mlp-hierarchy-m

Transfer Learning

docker run trainer python run.py settings=finetune/cars-org-mixer.yaml
docker run trainer python run.py settings=finetune/cars-raft-mlp-cross-mlp-emb-s.yaml
docker run trainer python run.py settings=finetune/cars-raft-mlp-cross-mlp-emb-m.yaml
docker run trainer python run.py settings=finetune/cars-raft-mlp-cross-mlp-emb-l.yaml
docker run trainer python run.py settings=finetune/cifar10-org-mixer.yaml
docker run trainer python run.py settings=finetune/cifar10-raft-mlp-cross-mlp-emb-s.yaml
docker run trainer python run.py settings=finetune/cifar10-raft-mlp-cross-mlp-emb-m.yaml
docker run trainer python run.py settings=finetune/cifar10-raft-mlp-cross-mlp-emb-l.yaml
docker run trainer python run.py settings=finetune/cifar100-org-mixer.yaml
docker run trainer python run.py settings=finetune/cifar100-raft-mlp-cross-mlp-emb-s.yaml
docker run trainer python run.py settings=finetune/cifar100-raft-mlp-cross-mlp-emb-m.yaml
docker run trainer python run.py settings=finetune/cifar100-raft-mlp-cross-mlp-emb-l.yaml
docker run trainer python run.py settings=finetune/flowers102-org-mixer.yaml
docker run trainer python run.py settings=finetune/flowers102-raft-mlp-cross-mlp-emb-s.yaml
docker run trainer python run.py settings=finetune/flowers102-raft-mlp-cross-mlp-emb-m.yaml
docker run trainer python run.py settings=finetune/flowers102-raft-mlp-cross-mlp-emb-l.yaml
docker run trainer python run.py settings=finetune/inat18-org-mixer.yaml
docker run trainer python run.py settings=finetune/inat18-raft-mlp-cross-mlp-emb-s.yaml
docker run trainer python run.py settings=finetune/inat18-raft-mlp-cross-mlp-emb-m.yaml
docker run trainer python run.py settings=finetune/inat18-raft-mlp-cross-mlp-emb-l.yaml
docker run trainer python run.py settings=finetune/inat19-org-mixer.yaml
docker run trainer python run.py settings=finetune/inat19-raft-mlp-cross-mlp-emb-s.yaml
docker run trainer python run.py settings=finetune/inat19-raft-mlp-cross-mlp-emb-m.yaml
docker run trainer python run.py settings=finetune/inat19-raft-mlp-cross-mlp-emb-l.yaml

Object Detection

The weights already trained by ImageNet should be placed in the following path.

path/to/datasets/weights/imagenet-raft-mlp-cross-mlp-emb-s/last_model_0.pt
path/to/datasets/weights/imagenet-raft-mlp-cross-mlp-emb-l/last_model_0.pt
path/to/datasets/weights/imagenet-raft-mlp-cross-mlp-emb-m/last_model_0.pt
path/to/datasets/weights/imagenet-org-mixer/last_model_0.pt

Please execute the following commands.

docker run trainer bash ./detection.sh configs/detection/maskrcnn_org_mixer_fpn_1x_coco.py 8 --seed=42 --deterministic --gpus=8
docker run trainer bash ./detection.sh configs/detection/maskrcnn_raftmlp_l_fpn_1x_coco.py 8 --seed=42 --deterministic --gpus=8
docker run trainer bash ./detection.sh configs/detection/maskrcnn_raftmlp_m_fpn_1x_coco.py 8 --seed=42 --deterministic --gpus=8
docker run trainer bash ./detection.sh configs/detection/maskrcnn_raftmlp_s_fpn_1x_coco.py 8 --seed=42 --deterministic --gpus=8
docker run trainer bash ./detection.sh configs/detection/retinanet_org_mixer_fpn_1x_coco.py 8 --seed=42 --deterministic --gpus=8
docker run trainer bash ./detection.sh configs/detection/retinanet_raftmlp_l_fpn_1x_coco.py 8 --seed=42 --deterministic --gpus=8
docker run trainer bash ./detection.sh configs/detection/retinanet_raftmlp_m_fpn_1x_coco.py 8 --seed=42 --deterministic --gpus=8
docker run trainer bash ./detection.sh configs/detection/retinanet_raftmlp_s_fpn_1x_coco.py 8 --seed=42 --deterministic --gpus=8

Semantic Segmentation

As with object detection, the following should be executed after placing the weight files in advance.

docker run trainer bash ./segmentation.sh configs/segmentation/fpn_org_mixer_512x512_40k_ade20k.py 8 --seed=42 --deterministic --gpus=8
docker run trainer bash ./segmentation.sh configs/segmentation/fpn_raftmlp_s_512x512_40k_ade20k.py 8 --seed=42 --deterministic --gpus=8
docker run trainer bash ./segmentation.sh configs/segmentation/fpn_raftmlp_m_512x512_40k_ade20k.py 8 --seed=42 --deterministic --gpus=8
docker run trainer bash ./segmentation.sh configs/segmentation/fpn_raftmlp_l_512x512_40k_ade20k.py 8 --seed=42 --deterministic --gpus=8

Reference

@misc{tatsunami2021raftmlp,
  title={RaftMLP: How Much Can Be Done Without Attention and with Less Spatial Locality?},
  author={Yuki Tatsunami and Masato Taki},
  year={2021}
  eprint={2108.04384},
  archivePrefix={arXiv},
  primaryClass={cs.CV}
}

License

This repository is relased under the Apache 2.0 license as douns in the LICENSE file.

Owner
Okojo
Okojo
Semantic Image Synthesis with SPADE

Semantic Image Synthesis with SPADE New implementation available at imaginaire repository We have a reimplementation of the SPADE method that is more

NVIDIA Research Projects 7.3k Jan 07, 2023
[NAACL & ACL 2021] SapBERT: Self-alignment pretraining for BERT.

SapBERT: Self-alignment pretraining for BERT This repo holds code for the SapBERT model presented in our NAACL 2021 paper: Self-Alignment Pretraining

Cambridge Language Technology Lab 104 Dec 07, 2022
nfelo: a power ranking, prediction, and betting model for the NFL

nfelo nfelo is a power ranking, prediction, and betting model for the NFL. Nfelo take's 538's Elo framework and further adapts it for the NFL, hence t

6 Nov 22, 2022
Reimplementation of the paper "Attention, Learn to Solve Routing Problems!" in jax/flax.

JAX + Attention Learn To Solve Routing Problems Reinplementation of the paper Attention, Learn to Solve Routing Problems! using Jax and Flax. Fully su

Gabriela Surita 7 Dec 01, 2022
A Tensorflow implementation of the Text Conditioned Auxiliary Classifier Generative Adversarial Network for Generating Images from text descriptions

A Tensorflow implementation of the Text Conditioned Auxiliary Classifier Generative Adversarial Network for Generating Images from text descriptions

Ayushman Dash 93 Aug 04, 2022
A tool to visualise the results of AlphaFold2 and inspect the quality of structural predictions

AlphaFold Analyser This program produces high quality visualisations of predicted structures produced by AlphaFold. These visualisations allow the use

Oliver Powell 3 Nov 13, 2022
September-Assistant - Open-source Windows Voice Assistant

September - Windows Assistant September is an open-source Windows personal assis

The Nithin Balaji 9 Nov 22, 2022
This project is for a Twitter bot that monitors a bird feeder in my backyard. Any detected birds are identified and posted to Twitter.

Backyard Birdbot Introduction This is a silly hobby project to use existing ML models to: Detect any birds sighted by a webcam Identify whic

Chi Young Moon 71 Dec 25, 2022
CUDA Python Low-level Bindings

CUDA Python Low-level Bindings

NVIDIA Corporation 529 Jan 03, 2023
PyTorch reimplementation of the Smooth ReLU activation function proposed in the paper "Real World Large Scale Recommendation Systems Reproducibility and Smooth Activations" [arXiv 2022].

Smooth ReLU in PyTorch Unofficial PyTorch reimplementation of the Smooth ReLU (SmeLU) activation function proposed in the paper Real World Large Scale

Christoph Reich 10 Jan 02, 2023
Rasterize with the least efforts for researchers.

utils3d Rasterize and do image-based 3D transforms with the least efforts for researchers. Based on numpy and OpenGL. It could be helpful when you wan

Ruicheng Wang 8 Dec 15, 2022
Text to image synthesis using thought vectors

Text To Image Synthesis Using Thought Vectors This is an experimental tensorflow implementation of synthesizing images from captions using Skip Though

Paarth Neekhara 2.1k Jan 05, 2023
Joint deep network for feature line detection and description

SOLD² - Self-supervised Occlusion-aware Line Description and Detection This repository contains the implementation of the paper: SOLD² : Self-supervis

Computer Vision and Geometry Lab 427 Dec 27, 2022
ComputerVision - This repository aims at realized easy network architecture

ComputerVision This repository aims at realized easy network architecture Colori

DongDong 4 Dec 14, 2022
Tree LSTM implementation in PyTorch

Tree-Structured Long Short-Term Memory Networks This is a PyTorch implementation of Tree-LSTM as described in the paper Improved Semantic Representati

Riddhiman Dasgupta 529 Dec 10, 2022
CLOCs: Camera-LiDAR Object Candidates Fusion for 3D Object Detection

CLOCs is a novel Camera-LiDAR Object Candidates fusion network. It provides a low-complexity multi-modal fusion framework that improves the performance of single-modality detectors. CLOCs operates on

Su Pang 254 Dec 16, 2022
A general and strong 3D object detection codebase that supports more methods, datasets and tools (debugging, recording and analysis).

ALLINONE-Det ALLINONE-Det is a general and strong 3D object detection codebase built on OpenPCDet, which supports more methods, datasets and tools (de

Michael.CV 5 Nov 03, 2022
Sequence-to-Sequence learning using PyTorch

Seq2Seq in PyTorch This is a complete suite for training sequence-to-sequence models in PyTorch. It consists of several models and code to both train

Elad Hoffer 514 Nov 17, 2022
A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation

Segnet is deep fully convolutional neural network architecture for semantic pixel-wise segmentation. This is implementation of http://arxiv.org/pdf/15

Pradyumna Reddy Chinthala 190 Dec 15, 2022
Official implementation of CATs: Cost Aggregation Transformers for Visual Correspondence NeurIPS'21

CATs: Cost Aggregation Transformers for Visual Correspondence NeurIPS'21 For more information, check out the paper on [arXiv]. Training with different

Sunghwan Hong 120 Jan 04, 2023