An pytorch implementation of Masked Autoencoders Are Scalable Vision Learners

Overview

An pytorch implementation of Masked Autoencoders Are Scalable Vision Learners

This is a coarse version for MAE, only make the pretrain model, the finetune and linear is comming soon.

1. Introduction

This repo is the MAE-vit model which impelement with pytorch, no reference any reference code so this is a non-official version. Because of the limitation of time and machine, I only trained the vit-tiny model for encoder. mae

2. Enveriments

  • python 3.7+
  • pytorch 1.7.1
  • pillow
  • timm
  • opencv-python

3. Model Config

Pretrain Config

  • BaseConfig
    img_size = 224,
    patch_size = 16,
  • Encoder The encoder if follow the Vit-tiny model config
    encoder_dim = 192,
    encoder_depth = 12,
    encoder_heads = 3,
  • Decoder The decoder is followed the kaiming paper config.
    decoder_dim = 512,
    decoder_depth = 8,
    decoder_heads = 16, 
  • Mask
    1. We use the shuffle patch after Sin-Cos position embeeding for encoder.
    2. Mask the shuffle patch, keep the mask index.
    3. Unshuffle the mask patch and combine with the encoder embeeding before the position embeeding for decoder.
    4. Restruction decoder embeeidng by convtranspose.
    5. Build the mask map with mask index for cal the loss(only consider the mask patch).

Finetune Config

Wait for the results

TODO:

  • Finetune Trainig
  • Linear Training

4. Results

decoder Restruction the imagenet validation image from pretrain model, compare with the kaiming results, restruction quality is less than he. May be the encoder model is too small TT.

The Mae-Vit-tiny pretrain models is here, you can download to test the restruction result. Put the ckpt in weights folder.

5. Training & Inference

  • dataset prepare

    /data/home/imagenet/xxx.jpeg, 0
    /data/home/imagenet/xxx.jpeg, 1
    ...
    /data/home/imagenet/xxx.jpeg, 999
    
  • Training

    1. Pretrain

      #!/bin/bash
      OMP_NUM_THREADS=1
      MKL_NUM_THREADS=1
      export OMP_NUM_THREADS
      export MKL_NUM_THREADS
      cd MAE-Pytorch;
      CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python -W ignore -m torch.distributed.launch --nproc_per_node 8 train_mae.py \
      --batch_size 256 \
      --num_workers 32 \
      --lr 1.5e-4 \
      --optimizer_name "adamw" \
      --cosine 1 \
      --max_epochs 300 \
      --warmup_epochs 40 \
      --num-classes 1000 \
      --crop_size 224 \
      --patch_size 16 \
      --color_prob 0.0 \
      --calculate_val 0 \
      --weight_decay 5e-2 \
      --lars 0 \
      --mixup 0.0 \
      --smoothing 0.0 \
      --train_file $train_file \
      --val_file $val_file \
      --checkpoints-path $ckpt_folder \
      --log-dir $log_folder
    2. Finetune TODO:

      • training
    3. Linear TODO:

      • training
  • Inference

    1. pretrian
    python mae_test.py --test_image xxx.jpg --ckpt weights.pth
    1. classification TODO:
      • training

6. TODO

  • VIT-BASE model training.
  • SwinTransformers for MAE.
  • Finetune & Linear training.

Finetune is trainig, the weights may be comming soon.

Owner
FlyEgle
JOYY AI GROUP - Machine Learning Engineer(Computer Vision)
FlyEgle
Code for our CVPR2021 paper coordinate attention

Coordinate Attention for Efficient Mobile Network Design (preprint) This repository is a PyTorch implementation of our coordinate attention (will appe

Qibin (Andrew) Hou 726 Jan 05, 2023
PaddleBoBo是基于PaddlePaddle和PaddleSpeech、PaddleGAN等开发套件的虚拟主播快速生成项目

PaddleBoBo - 元宇宙时代,你也可以动手做一个虚拟主播。 PaddleBoBo是基于飞桨PaddlePaddle深度学习框架和PaddleSpeech、PaddleGAN等开发套件的虚拟主播快速生成项目。PaddleBoBo致力于简单高效、可复用性强,只需要一张带人像的图片和一段文字,就能

502 Jan 08, 2023
Official implementation of "DSP: Dual Soft-Paste for Unsupervised Domain Adaptive Semantic Segmentation"

DSP Official implementation of "DSP: Dual Soft-Paste for Unsupervised Domain Adaptive Semantic Segmentation". Accepted by ACM Multimedia 2021. Authors

20 Oct 24, 2022
StyleGAN-NADA: CLIP-Guided Domain Adaptation of Image Generators

StyleGAN-NADA: CLIP-Guided Domain Adaptation of Image Generators [Project Website] [Replicate.ai Project] StyleGAN-NADA: CLIP-Guided Domain Adaptation

992 Dec 30, 2022
Clockwork Variational Autoencoder

Clockwork Variational Autoencoders (CW-VAE) Vaibhav Saxena, Jimmy Ba, Danijar Hafner If you find this code useful, please reference in your paper: @ar

Vaibhav Saxena 35 Nov 06, 2022
dataset for ECCV 2020 "Motion Capture from Internet Videos"

Motion Capture from Internet Videos Motion Capture from Internet Videos Junting Dong*, Qing Shuai*, Yuanqing Zhang, Xian Liu, Xiaowei Zhou, Hujun Bao

ZJU3DV 98 Dec 07, 2022
Puzzle-CAM: Improved localization via matching partial and full features.

Puzzle-CAM The official implementation of "Puzzle-CAM: Improved localization via matching partial and full features".

Sanghyun Jo 150 Nov 14, 2022
This is a repository of our model for weakly-supervised video dense anticipation.

Introduction This is a repository of our model for weakly-supervised video dense anticipation. More results on GTEA, Epic-Kitchens etc. will come soon

2 Apr 09, 2022
GeneDisco is a benchmark suite for evaluating active learning algorithms for experimental design in drug discovery.

GeneDisco is a benchmark suite for evaluating active learning algorithms for experimental design in drug discovery.

22 Dec 12, 2022
Starter code for the ICCV 2021 paper, 'Detecting Invisible People'

Detecting Invisible People [ICCV 2021 Paper] [Website] Tarasha Khurana, Achal Dave, Deva Ramanan Introduction This repository contains code for Detect

Tarasha Khurana 28 Sep 16, 2022
Official implementation of "OpenPifPaf: Composite Fields for Semantic Keypoint Detection and Spatio-Temporal Association" in PyTorch.

openpifpaf Continuously tested on Linux, MacOS and Windows: New 2021 paper: OpenPifPaf: Composite Fields for Semantic Keypoint Detection and Spatio-Te

VITA lab at EPFL 50 Dec 29, 2022
The dataset and source code for our paper: "Did You Ask a Good Question? A Cross-Domain Question IntentionClassification Benchmark for Text-to-SQL"

TriageSQL The dataset and source code for our paper: "Did You Ask a Good Question? A Cross-Domain Question Intention Classification Benchmark for Text

Yusen Zhang 22 Nov 09, 2022
PyTorch implementation of the ACL, 2021 paper Parameter-efficient Multi-task Fine-tuning for Transformers via Shared Hypernetworks.

Parameter-efficient Multi-task Fine-tuning for Transformers via Shared Hypernetworks This repo contains the PyTorch implementation of the ACL, 2021 pa

Rabeeh Karimi Mahabadi 98 Dec 28, 2022
Official repository for MixFaceNets: Extremely Efficient Face Recognition Networks

MixFaceNets This is the official repository of the paper: MixFaceNets: Extremely Efficient Face Recognition Networks. (Accepted in IJCB2021) https://i

Fadi Boutros 51 Dec 13, 2022
An onlinel learning to rank python codebase.

OLTR Online learning to rank python codebase. The code related to Pairwise Differentiable Gradient Descent (ranker/PDGDLinearRanker.py) is copied from

ielab 5 Jul 18, 2022
Dynamic View Synthesis from Dynamic Monocular Video

Towards Robust Monocular Depth Estimation: Mixing Datasets for Zero-shot Cross-dataset Transfer This repository contains code to compute depth from a

Intelligent Systems Lab Org 2.3k Jan 01, 2023
PyTorch implemention of ICCV'21 paper SGPA: Structure-Guided Prior Adaptation for Category-Level 6D Object Pose Estimation

SGPA: Structure-Guided Prior Adaptation for Category-Level 6D Object Pose Estimation This is the PyTorch implemention of ICCV'21 paper SGPA: Structure

Chen Kai 24 Dec 05, 2022
One-Shot Neural Ensemble Architecture Search by Diversity-Guided Search Space Shrinking

One-Shot Neural Ensemble Architecture Search by Diversity-Guided Search Space Shrinking This is an official implementation for NEAS presented in CVPR

Multimedia Research 19 Sep 08, 2022
python library for invisible image watermark (blind image watermark)

invisible-watermark invisible-watermark is a python library and command line tool for creating invisible watermark over image.(aka. blink image waterm

Shield Mountain 572 Jan 07, 2023
[ICLR 2021] Is Attention Better Than Matrix Decomposition?

Enjoy-Hamburger 🍔 Official implementation of Hamburger, Is Attention Better Than Matrix Decomposition? (ICLR 2021) Under construction. Introduction T

Gsunshine 271 Dec 29, 2022