PyTorch/GPU re-implementation of the paper Masked Autoencoders Are Scalable Vision Learners

Related tags

Deep Learningmae
Overview

Masked Autoencoders: A PyTorch Implementation

This is a PyTorch/GPU re-implementation of the paper Masked Autoencoders Are Scalable Vision Learners:

@Article{MaskedAutoencoders2021,
  author  = {Kaiming He and Xinlei Chen and Saining Xie and Yanghao Li and Piotr Doll{\'a}r and Ross Girshick},
  journal = {arXiv:2111.06377},
  title   = {Masked Autoencoders Are Scalable Vision Learners},
  year    = {2021},
}
  • The original implementation was in TensorFlow+TPU. This re-implementation is in PyTorch+GPU.

  • This repo is a modification on the DeiT repo. Installation and preparation follow that repo.

  • This repo is based on timm==0.3.2, for which a fix is needed to work with PyTorch 1.8.1+.

Catalog

  • Visualization demo
  • Pre-trained checkpoints + fine-tuning code
  • Pre-training code

Visualization demo

Run our interactive visualization demo using Colab notebook (no GPU needed):

Fine-tuning with pre-trained checkpoints

The following table provides the pre-trained checkpoints used in the paper, converted from TF/TPU to PT/GPU:

ViT-Base ViT-Large ViT-Huge
pre-trained checkpoint download download download
md5 8cad7c b8b06e 9bdbb0

The fine-tuning instruction is in FINETUNE.md.

By fine-tuning these pre-trained models, we rank #1 in these classification tasks (detailed in the paper):

ViT-B ViT-L ViT-H ViT-H448 prev best
ImageNet-1K (no external data) 83.6 85.9 86.9 87.8 87.1
following are evaluation of the same model weights (fine-tuned in original ImageNet-1K):
ImageNet-Corruption (error rate) 51.7 41.8 33.8 36.8 42.5
ImageNet-Adversarial 35.9 57.1 68.2 76.7 35.8
ImageNet-Rendition 48.3 59.9 64.4 66.5 48.7
ImageNet-Sketch 34.5 45.3 49.6 50.9 36.0
following are transfer learning by fine-tuning the pre-trained MAE on the target dataset:
iNaturalists 2017 70.5 75.7 79.3 83.4 75.4
iNaturalists 2018 75.4 80.1 83.0 86.8 81.2
iNaturalists 2019 80.5 83.4 85.7 88.3 84.1
Places205 63.9 65.8 65.9 66.8 66.0
Places365 57.9 59.4 59.8 60.3 58.0

Pre-training

The pre-training instruction is in PRETRAIN.md.

License

This project is under the CC-BY-NC 4.0 license. See LICENSE for details.

Owner
Meta Research
Meta Research
Neural implicit reconstruction experiments for the Vector Neuron paper

Neural Implicit Reconstruction with Vector Neurons This repository contains code for the neural implicit reconstruction experiments in the paper Vecto

Congyue Deng 35 Jan 02, 2023
Code for Contrastive-Geometry Networks for Generalized 3D Pose Transfer

CGTransformer Code for our AAAI 2022 paper "Contrastive-Geometry Transformer network for Generalized 3D Pose Transfer" Contrastive-Geometry Transforme

18 Jun 28, 2022
FFCV: Fast Forward Computer Vision (and other ML workloads!)

Fast Forward Computer Vision: train models at a fraction of the cost with accele

FFCV 2.3k Jan 03, 2023
Fake News Detection Using Machine Learning Methods

Fake-News-Detection-Using-Machine-Learning-Methods Fake news is always a real and dangerous issue. However, with the presence and abundance of various

Achraf Safsafi 1 Jan 11, 2022
VOS: Learning What You Don’t Know by Virtual Outlier Synthesis

VOS This is the source code accompanying the paper VOS: Learning What You Don’t

248 Dec 25, 2022
Keeping it safe - AI Based COVID-19 Tracker using Deep Learning and facial recognition

Keeping it safe - AI Based COVID-19 Tracker using Deep Learning and facial recognition

Vansh Wassan 15 Jun 17, 2021
Implementation of the GVP-Transformer, which was used in the paper "Learning inverse folding from millions of predicted structures" for de novo protein design alongside Alphafold2

GVP Transformer (wip) Implementation of the GVP-Transformer, which was used in the paper Learning inverse folding from millions of predicted structure

Phil Wang 19 May 06, 2022
Distilled coarse part of LoFTR adapted for compatibility with TensorRT and embedded divices

Coarse LoFTR TRT Google Colab demo notebook This project provides a deep learning model for the Local Feature Matching for two images that can be used

Kirill 46 Dec 24, 2022
This repository contains all data used for writing a research paper Multiple Object Trackers in OpenCV: A Benchmark, presented in ISIE 2021 conference in Kyoto, Japan.

OpenCV-Multiple-Object-Tracking Python is version 3.6.7 to install opencv: pip uninstall opecv-python pip uninstall opencv-contrib-python pip install

6 Dec 19, 2021
Discriminative Region Suppression for Weakly-Supervised Semantic Segmentation

Discriminative Region Suppression for Weakly-Supervised Semantic Segmentation (AAAI 2021) Official pytorch implementation of our paper: Discriminative

Beom 74 Dec 27, 2022
Pytorch Implementation of paper "Noisy Natural Gradient as Variational Inference"

Noisy Natural Gradient as Variational Inference PyTorch implementation of Noisy Natural Gradient as Variational Inference. Requirements Python 3 Pytor

Tony JiHyun Kim 119 Dec 02, 2022
Garbage Detection system which will detect objects based on whether it is plastic waste or plastics or just garbage.

Garbage Detection using Yolov5 on Jetson Nano 2gb Developer Kit. Garbage detection system which will detect objects based on whether it is plastic was

Rishikesh A. Bondade 2 May 13, 2022
A public available dataset for road boundary detection in aerial images

Topo-boundary This is the official github repo of paper Topo-boundary: A Benchmark Dataset on Topological Road-boundary Detection Using Aerial Images

Zhenhua Xu 79 Jan 04, 2023
Bayesian regularization for functional graphical models.

BayesFGM Paper: Jiajing Niu, Andrew Brown. Bayesian regularization for functional graphical models. Requirements R version 3.6.3 and up Python 3.6 and

0 Oct 07, 2021
an implementation of softmax splatting for differentiable forward warping using PyTorch

softmax-splatting This is a reference implementation of the softmax splatting operator, which has been proposed in Softmax Splatting for Video Frame I

Simon Niklaus 338 Dec 28, 2022
Perturbed Self-Distillation: Weakly Supervised Large-Scale Point Cloud Semantic Segmentation (ICCV2021)

Perturbed Self-Distillation: Weakly Supervised Large-Scale Point Cloud Semantic Segmentation (ICCV2021) This is the implementation of PSD (ICCV 2021),

12 Dec 12, 2022
Experiments for Neural Flows paper

Neural Flows: Efficient Alternative to Neural ODEs [arxiv] TL;DR: We directly model the neural ODE solutions with neural flows, which is much faster a

54 Dec 07, 2022
Augmented Traffic Control: A tool to simulate network conditions

Augmented Traffic Control Full documentation for the project is available at http://facebook.github.io/augmented-traffic-control/. Overview Augmented

Meta Archive 4.3k Jan 08, 2023
Implementation of FitVid video prediction model in JAX/Flax.

FitVid Video Prediction Model Implementation of FitVid video prediction model in JAX/Flax. If you find this code useful, please cite it in your paper:

Google Research 62 Nov 25, 2022
University of Rochester 2021 Summer REU focusing on music sentiment transfer using CycleGAN

Music-Sentiment-Transfer University of Rochester 2021 Summer REU focusing on music sentiment transfer using CycleGAN Poster: Music Sentiment Transfer

Miles Sigel 2 Jan 24, 2022