Re-implememtation of MAE (Masked Autoencoders Are Scalable Vision Learners) using PyTorch.

Last update: Dec 14, 2021

Related tags

Overview

mae-repo

PyTorch re-implememtation of "masked autoencoders are scalable vision learners". In this repo, it heavily borrows codes from codebase https://github.com/lucidrains/vit-pytorch (for MAE architectures) and https://github.com/pengzhiliang/MAE-pytorch (for training loop).

prepare ImageNet1K datasets

To train MAE, one should prepare ImageNet_ILSVRC2012 and place ILSVRC2012_*.tar in the ${datasets_path}. To shorten the overhead of first run, one can manually untar the tarfile into train and val directories, as follow (refered to https://gist.github.com/BIGBALLON/8a71d225eff18d88e469e6ea9b39cef4).

mkdir train && mv ILSVRC2012_img_train.tar train/ && cd train
tar -xvf ILSVRC2012_img_train.tar && rm -f ILSVRC2012_img_train.tar
find . -name "*.tar" | while read NAME ; do mkdir -p "${NAME%.tar}"; tar -xvf "${NAME}" -C "${NAME%.tar}"; rm -f "${NAME}"; done
cd ..

mkdir val && mv ILSVRC2012_img_val.tar val/ && cd val && tar -xvf ILSVRC2012_img_val.tar
wget -qO- https://raw.githubusercontent.com/soumith/imagenetloader.torch/master/valprep.sh | bash

modify configuration file

To separate code and config, we try to split configurations to yaml file, located in configs directory, such as imagenet1k-vit-base.yml. One can modify 'model' setting following MAE and ViT to configure model architecture parameters of ViT-base, large and huge.

One can modify 'optim' for optimizer settings. And modify 'training' and 'data' for training settings. Note that, modify 'training:batch_size' to fit the GPU memory of one GPU card. Total batch_size is equal to batch_size multiplied by number of GPU cards.

train

CUDA_VISIBLE_DEVICES=0,1,2,3,5,6,7 OMP_NUM_THREADS=1 python -m torch.distributed.launch --nproc_per_node=8 mae_test.py
--datasets_path ${datasets_path}
--config imagenet1k-vit-base.yml
--doc mae-vit-base16-dec8-512

ToDo lists

add pretrain mode
add fine-tunning mode
support mixed precision training
support distributed training
verify the correctness of this re-implementation

Re-implememtation of MAE (Masked Autoencoders Are Scalable Vision Learners) using PyTorch.

Related tags

Overview

mae-repo

prepare ImageNet1K datasets

modify configuration file

train

ToDo lists

Owner

Peng Qiao

maximal update parametrization (µP)

Reinforcement Learning with Q-Learning Algorithm on gym's frozen lake environment implemented in python

Deep Learning tutorials in jupyter notebooks.

This a classic fintech problem that introduces real life difficulties such as data imbalance. Check out the notebook to find out more!

Vis2Mesh: Efficient Mesh Reconstruction from Unstructured Point Clouds of Large Scenes with Learned Virtual View Visibility ICCV2021

Deep Q-learning for playing chrome dino game

Implementation for ACProp ( Momentum centering and asynchronous update for adaptive gradient methdos, NeurIPS 2021)

Official code for ICCV2021 paper "M3D-VTON: A Monocular-to-3D Virtual Try-on Network"

Code accompanying paper: Meta-Learning to Improve Pre-Training

Deep Compression for Dense Point Cloud Maps.

Sub-Cluster AdaCos: Learning Representations for Anomalous Sound Detection.

VIMPAC: Video Pre-Training via Masked Token Prediction and Contrastive Learning

Official code for "Towards An End-to-End Framework for Flow-Guided Video Inpainting" (CVPR2022)

基于PaddleOCR搭建的OCR server... 离线部署用

Python code for loading the Aschaffenburg Pose Dataset.

A Blender python script for getting asset browser custom preview images for objects and collections.

Nonuniform-to-Uniform Quantization: Towards Accurate Quantization via Generalized Straight-Through Estimation. In CVPR 2022.

This project uses reinforcement learning on stock market and agent tries to learn trading. The goal is to check if the agent can learn to read tape. The project is dedicated to hero in life great Jesse Livermore.

Julia and Matlab codes to simulated all problems in El-Hachem, McCue and Simpson (2021)

VLGrammar: Grounded Grammar Induction of Vision and Language