Unofficial PyTorch implementation of Masked Autoencoders Are Scalable Vision Learners

Last update: Jan 04, 2023

Related tags

Overview

Unofficial PyTorch implementation of Masked Autoencoders Are Scalable Vision Learners

This repository is built upon BEiT, thanks very much!

Now, we only implement the pretrain process according to the paper, and can't guarantee the performance reported in the paper can be reproduced!

Difference

At the same time, shuffle and unshuffle operations don't seem to be directly accessible in pytorch, so we use another method to realize this process:

For shuffle, we used the method of randomly generating mask-map (14x14) in BEiT, where mask=0 illustrates keep the token, mask=1 denotes drop the token (not participating caculation in Encoder). Then all visible tokens (mask=0) are put into encoder network.
For unshuffle, we get the postion embeddings (with adding the shared mask token) of all mask tokens according to the mask-map and then concate them with the visible tokens (from encoder), and put them into the decoder network to recontrust.

TODO

implement the finetune process
reuse the model in modeling_pretrain.py
caculate the normalized pixels target
add the cls token in the encoder
...

Setup

pip install -r requirements.txt

Run

# Set the path to save checkpoints
OUTPUT_DIR='output/'
# path to imagenet-1k train set
DATA_PATH='../ImageNet_ILSVRC2012/train'


OMP_NUM_THREADS=1 python -m torch.distributed.launch --nproc_per_node=8 run_mae_pretraining.py \
        --data_path ${DATA_PATH} \
        --mask_ratio 0.75 \
        --model pretrain_mae_base_patch16_224 \
        --batch_size 128 \
        --opt_betas 0.9 0.95 \
        --warmup_epochs 40 \
        --epochs 1600 \
        --output_dir ${OUTPUT_DIR}

Note: the pretrain result is on the way ~

Unofficial PyTorch implementation of Masked Autoencoders Are Scalable Vision Learners

Related tags

Overview

Unofficial PyTorch implementation of Masked Autoencoders Are Scalable Vision Learners

Difference

TODO

Setup

Run

Owner

Zhiliang Peng

Codes for "Solving Long-tailed Recognition with Deep Realistic Taxonomic Classifier"

PyTorch code for ICPR 2020 paper Future Urban Scene Generation Through Vehicle Synthesis

Official implementation of the ICCV 2021 paper "Conditional DETR for Fast Training Convergence".

The official pytorch implemention of the CVPR paper "Temporal Modulation Network for Controllable Space-Time Video Super-Resolution".

「PyTorch Implementation of AnimeGANv2」を用いて、生成した顔画像を元の画像に上書きするデモ

Material del curso IIC2233 Programación Avanzada 📚

Pytorch Implementations of large number classical backbone CNNs, data enhancement, torch loss, attention, visualization and some common algorithms.

Sample and Computation Redistribution for Efficient Face Detection

PyImpetus is a Markov Blanket based feature subset selection algorithm that considers features both separately and together as a group in order to provide not just the best set of features but also the best combination of features

Source code for Fixed-Point GAN for Cloud Detection

Calibrate your listeners! Robust communication-based training for pragmatic speakers. Findings of EMNLP 2021.

SegNet including indices pooling for Semantic Segmentation with tensorflow and keras

A torch implementation of "Pixel-Level Domain Transfer"

PyJokes - Joking around with Python library pyjokes

sktime companion package for deep learning based on TensorFlow

This Deep Learning Model Predicts that from which disease you are suffering.

【Arxiv】Exploring Separable Attention for Multi-Contrast MR Image Super-Resolution

Predict bus arrival time using VertexAI and Nvidia's Jetson Nano

这是一个利用facenet和retinaface实现人脸识别的库，可以进行在线的人脸识别。

Semantic graph parser based on Categorial grammars