Implements MLP-Mixer: An all-MLP Architecture for Vision.

Overview

MLP-Mixer-CIFAR10

This repository implements MLP-Mixer as proposed in MLP-Mixer: An all-MLP Architecture for Vision. The paper introduces an all MLP (Multi-layer Perceptron) architecture for computer vision tasks. Yannic Kilcher walks through the architecture in this video.

Experiments reported in this repository are on CIFAR-10.

What's included?

  • Distributed training with mixed-precision.
  • Visualization of the token-mixing MLP weights.
  • A TensorBoard callback to keep track of the learned linear projections of the image patches.
Screen.Recording.2021-05-25.at.5.49.20.PM.mov

Notebooks

Note: These notebooks are runnable on Colab. If you don't have access to a tensor-core GPU, please disable the mixed-precision block while running the code.

Results

MLP-Mixer achieves competitive results. The figure below summarizes top-1 accuracies on CIFAR-10 test set with respect to varying MLP blocks.


Notable hyperparameters are:

  • Image size: 72x72
  • Patch size: 9x9
  • Hidden dimension for patches: 64
  • Hidden dimension for patches: 128

The table below reports the parameter counts for the different MLP-Mixer variants:


ResNet20 (0.571969 Million) achieves 78.14% under the exact same training configuration. Refer to this notebook for more details.

Models

You can reproduce the results reported above. The model files are available here.

Acknowledgements

ML-GDE Program for providing GCP credits.

You might also like...
An All-MLP solution for Vision, from Google AI
An All-MLP solution for Vision, from Google AI

MLP Mixer - Pytorch An All-MLP solution for Vision, from Google AI, in Pytorch. No convolutions nor attention needed! Yannic Kilcher video Install $ p

Implementation of
Implementation of "A MLP-like Architecture for Dense Prediction"

A MLP-like Architecture for Dense Prediction (arXiv) Updates (22/07/2021) Initial release. Model Zoo We provide CycleMLP models pretrained on ImageNet

Model search is a framework that implements AutoML algorithms for model architecture search at scale
Model search is a framework that implements AutoML algorithms for model architecture search at scale

Model search (MS) is a framework that implements AutoML algorithms for model architecture search at scale. It aims to help researchers speed up their exploration process for finding the right model architecture for their classification problems (i.e., DNNs with different types of layers).

A task-agnostic vision-language architecture as a step towards General Purpose Vision
A task-agnostic vision-language architecture as a step towards General Purpose Vision

Towards General Purpose Vision Systems By Tanmay Gupta, Amita Kamath, Aniruddha Kembhavi, and Derek Hoiem Overview Welcome to the official code base f

MLP-Like Vision Permutator for Visual Recognition (PyTorch)
MLP-Like Vision Permutator for Visual Recognition (PyTorch)

Vision Permutator: A Permutable MLP-Like Architecture for Visual Recognition (arxiv) This is a Pytorch implementation of our paper. We present Vision

code for paper
code for paper "Does Unsupervised Architecture Representation Learning Help Neural Architecture Search?"

Does Unsupervised Architecture Representation Learning Help Neural Architecture Search? Code for paper: Does Unsupervised Architecture Representation

Implementation of ResMLP, an all MLP solution to image classification, in Pytorch
Implementation of ResMLP, an all MLP solution to image classification, in Pytorch

ResMLP - Pytorch Implementation of ResMLP, an all MLP solution to image classification out of Facebook AI, in Pytorch Install $ pip install res-mlp-py

Implementation of gMLP, an all-MLP replacement for Transformers, in Pytorch
Implementation of gMLP, an all-MLP replacement for Transformers, in Pytorch

Implementation of gMLP, an all-MLP replacement for Transformers, in Pytorch

Episodic Transformer (E.T.) is a novel attention-based architecture for vision-and-language navigation. E.T. is based on a multimodal transformer that encodes language inputs and the full episode history of visual observations and actions.
Comments
  • Could patches number != MLP token mixing dimension?

    Could patches number != MLP token mixing dimension?

    I try to change the model into B/16 MLP-Mixer. is this setting, the patch number ( sequence length) != MLP token mixing dimension. But the code will report an error when it implements "x = layers.Add()([x, token_mixing])" because the two operation numbers have different shapes. Take an example, B/16 Settings: image 3232, 2D hidden layer 768, PP= 16*16, token mixing mlp dimentsion= 384, channel mlp dimension = 3072. Thus patch number ( sequence length) = 4, table value shape= (4, 768) When the code runs x = layers.Add()([x, token_mixing]) in the token mixing layer. rx shape=[4, 768], token_mixing shape = [384, 768]

    It is strange why the MLP-Mixer paper could set different parameters "patch number ( sequence length) != MLP token mixing dimensio"

    opened by LouiValley 2
  • Why the accuracy drops after epoch 100/100 (accuracy drops from 91% to 71%)

    Why the accuracy drops after epoch 100/100 (accuracy drops from 91% to 71%)

    I trained the Network ( NUM_MIXER_LAYERS =4 )

    At epoch 100:

    Epoch 100/100

    1/44 [..............................] - ETA: 1s - loss: 0.2472 - accuracy: 0.9160 3/44 [=>............................] - ETA: 1s - loss: 0.2424 - accuracy: 0.9162 5/44 [==>...........................] - ETA: 1s - loss: 0.2431 - accuracy: 0.9155 7/44 [===>..........................] - ETA: 1s - loss: 0.2424 - accuracy: 0.9154 9/44 [=====>........................] - ETA: 1s - loss: 0.2419 - accuracy: 0.9155 11/44 [======>.......................] - ETA: 1s - loss: 0.2423 - accuracy: 0.9150 13/44 [=======>......................] - ETA: 1s - loss: 0.2426 - accuracy: 0.9145 15/44 [=========>....................] - ETA: 1s - loss: 0.2430 - accuracy: 0.9142 17/44 [==========>...................] - ETA: 1s - loss: 0.2433 - accuracy: 0.9140 19/44 [===========>..................] - ETA: 1s - loss: 0.2435 - accuracy: 0.9138 21/44 [=============>................] - ETA: 0s - loss: 0.2438 - accuracy: 0.9136 23/44 [==============>...............] - ETA: 0s - loss: 0.2439 - accuracy: 0.9135 25/44 [================>.............] - ETA: 0s - loss: 0.2440 - accuracy: 0.9134 27/44 [=================>............] - ETA: 0s - loss: 0.2440 - accuracy: 0.9133 29/44 [==================>...........] - ETA: 0s - loss: 0.2442 - accuracy: 0.9132 31/44 [====================>.........] - ETA: 0s - loss: 0.2445 - accuracy: 0.9130 33/44 [=====================>........] - ETA: 0s - loss: 0.2447 - accuracy: 0.9129 35/44 [======================>.......] - ETA: 0s - loss: 0.2450 - accuracy: 0.9127 37/44 [========================>.....] - ETA: 0s - loss: 0.2454 - accuracy: 0.9125 39/44 [=========================>....] - ETA: 0s - loss: 0.2459 - accuracy: 0.9123 41/44 [==========================>...] - ETA: 0s - loss: 0.2463 - accuracy: 0.9121 43/44 [============================>.] - ETA: 0s - loss: 0.2469 - accuracy: 0.9119 44/44 [==============================] - 2s 46ms/step - loss: 0.2474 - accuracy: 0.9117 - val_loss: 1.1145 - val_accuracy: 0.7226

    Then it still have an extra training, 1/313 [..............................] - ETA: 24:32 - loss: 0.5860 - accuracy: 0.8125 8/313 [..............................] - ETA: 2s - loss: 1.2071 - accuracy: 0.6953  ..... 313/313 [==============================] - ETA: 0s - loss: 1.0934 - accuracy: 0.7161 313/313 [==============================] - 12s 22ms/step - loss: 1.0934 - accuracy: 0.7161 Test accuracy: 71.61

    opened by LouiValley 1
  • Consider either turning off auto-sharding or switching the auto_shard_policy to DATA

    Consider either turning off auto-sharding or switching the auto_shard_policy to DATA

    Excuse me, when I try to run it on the serve, it tips:

    Consider either turning off auto-sharding or switching the auto_shard_policy to DATA to shard this dataset. You can do this by creating a new tf.data.Options() object then setting options.experimental_distribute.auto_shard_policy = AutoShardPolicy.DATA before applying the options object to the dataset via dataset.with_options(options). 2021-11-21 11:59:20.861052: W tensorflow/python/util/util.cc:348] Sets are not currently considered sequences, but this may change in the future, so consider avoiding using them.

    BTW, my TensorFlow version is 2.4.0, how to fix this problem?

    opened by LouiValley 1
Releases(Models)
Owner
Sayak Paul
Trying to learn how machines learn.
Sayak Paul
PyExplainer: A Local Rule-Based Model-Agnostic Technique (Explainable AI)

PyExplainer PyExplainer is a local rule-based model-agnostic technique for generating explanations (i.e., why a commit is predicted as defective) of J

AI Wizards for Software Management (AWSM) Research Group 14 Nov 13, 2022
PyTorch implementation of "Dataset Knowledge Transfer for Class-Incremental Learning Without Memory" (WACV2022)

Dataset Knowledge Transfer for Class-Incremental Learning Without Memory [Paper] [Slides] Summary Introduction Installation Reproducing results Citati

Habib Slim 5 Dec 05, 2022
A Distributional Approach To Controlled Text Generation

A Distributional Approach To Controlled Text Generation This is the repository code for the ICLR 2021 paper "A Distributional Approach to Controlled T

NAVER 102 Jan 07, 2023
Miscellaneous and lightweight network tools

Network Tools Collection of miscellaneous and lightweight network tools to simplify daily operations, administration, and troubleshooting of networks.

Nicholas Russo 22 Mar 22, 2022
A PyTorch-Based Framework for Deep Learning in Computer Vision

TorchCV: A PyTorch-Based Framework for Deep Learning in Computer Vision @misc{you2019torchcv, author = {Ansheng You and Xiangtai Li and Zhen Zhu a

Donny You 2.2k Jan 09, 2023
WSDM2022 Challenge - Large scale temporal graph link prediction

WSDM 2022 Large-scale Temporal Graph Link Prediction - Baseline and Initial Test Set WSDM Cup Website link Link to this challenge This branch offers A

Deep Graph Library 34 Dec 29, 2022
Game Agent Framework. Helping you create AIs / Bots that learn to play any game you own!

Serpent.AI - Game Agent Framework (Python) Update: Revival (May 2020) Development work has resumed on the framework with the aim of bringing it into 2

Serpent.AI 6.4k Jan 05, 2023
Official implementation of the PICASO: Permutation-Invariant Cascaded Attentional Set Operator

PICASO Official PyTorch implemetation for the paper PICASO:Permutation-Invariant Cascaded Attentive Set Operator. Requirements Python 3 torch = 1.0 n

Samira Zare 0 Dec 23, 2021
Official PyTorch implementation of "Contrastive Learning from Extremely Augmented Skeleton Sequences for Self-supervised Action Recognition" in AAAI2022.

AimCLR This is an official PyTorch implementation of "Contrastive Learning from Extremely Augmented Skeleton Sequences for Self-supervised Action Reco

Gty 44 Dec 17, 2022
AI4Good project for detecting waste in the environment

Detect waste AI4Good project for detecting waste in environment. www.detectwaste.ml. Our latest results were published in Waste Management journal in

108 Dec 25, 2022
Inflated i3d network with inception backbone, weights transfered from tensorflow

I3D models transfered from Tensorflow to PyTorch This repo contains several scripts that allow to transfer the weights from the tensorflow implementat

Yana 479 Dec 08, 2022
Code repository for "Stable View Synthesis".

Stable View Synthesis Code repository for "Stable View Synthesis". Setup Install the following Python packages in your Python environment - numpy (1.1

Intelligent Systems Lab Org 195 Dec 24, 2022
Repository accompanying the "Sign Pose-based Transformer for Word-level Sign Language Recognition" paper

by Matyáš Boháček and Marek Hrúz, University of West Bohemia Should you have any questions or inquiries, feel free to contact us here. Repository acco

Matyáš Boháček 30 Dec 30, 2022
WiFi-based Multi-task Sensing

WiFi-based Multi-task Sensing Introduction WiFi-based sensing has aroused immense attention as numerous studies have made significant advances over re

zhangx289 6 Nov 24, 2022
The official repository for "Intermediate Layers Matter in Momentum Contrastive Self Supervised Learning" paper.

Intermdiate layer matters - SSL The official repository for "Intermediate Layers Matter in Momentum Contrastive Self Supervised Learning" paper. Downl

Aakash Kaku 35 Sep 19, 2022
QRec: A Python Framework for quick implementation of recommender systems (TensorFlow Based)

Introduction QRec is a Python framework for recommender systems (Supported by Python 3.7.4 and Tensorflow 1.14+) in which a number of influential and

Yu 1.4k Dec 30, 2022
Train an RL agent to execute natural language instructions in a 3D Environment (PyTorch)

Gated-Attention Architectures for Task-Oriented Language Grounding This is a PyTorch implementation of the AAAI-18 paper: Gated-Attention Architecture

Devendra Chaplot 234 Nov 05, 2022
The source code for CATSETMAT: Cross Attention for Set Matching in Bipartite Hypergraphs

catsetmat The source code for CATSETMAT: Cross Attention for Set Matching in Bipartite Hypergraphs To be able to run it, add catsetmat to PYTHONPATH H

2 Dec 19, 2022
Code for our EMNLP 2021 paper "Learning Kernel-Smoothed Machine Translation with Retrieved Examples"

KSTER Code for our EMNLP 2021 paper "Learning Kernel-Smoothed Machine Translation with Retrieved Examples" [paper]. Usage Download the processed datas

jiangqn 23 Nov 24, 2022
The-Secret-Sharing-Schemes - This interactive script demonstrates the Secret Sharing Schemes algorithm

The-Secret-Sharing-Schemes This interactive script demonstrates the Secret Shari

Nishaant Goswamy 1 Jan 02, 2022