ConvMAE: Masked Convolution Meets Masked Autoencoders

Overview

ConvMAE

ConvMAE: Masked Convolution Meets Masked Autoencoders

Peng Gao1, Teli Ma1, Hongsheng Li2, Jifeng Dai3, Yu Qiao1,

1 Shanghai AI Laboratory, 2 MMLab, CUHK, 3 Sensetime Research.

This repo is the official implementation of ConvMAE: Masked Convolution Meets Masked Autoencoders. It currently concludes codes and models for the following tasks:

ImageNet Pretrain: See PRETRAIN.md.
ImageNet Finetune: See FINETUNE.md.
Object Detection: See DETECTION.md.
Semantic Segmentation: See SEGMENTATION.md.

Updates

16/May/2022

The supported codes and models for COCO object detection and instance segmentation are available.

11/May/2022

  1. Pretrained models on ImageNet-1K for ConvMAE.
  2. The supported codes and models for ImageNet-1K finetuning and linear probing are provided.

08/May/2022

The preprint version is public at arxiv.

Introduction

ConvMAE framework demonstrates that multi-scale hybrid convolution-transformer can learn more discriminative representations via the mask auto-encoding scheme.

  • We present the strong and efficient self-supervised framework ConvMAE, which is easy to implement but show outstanding performances on downstream tasks.
  • ConvMAE naturally generates hierarchical representations and exhibit promising performances on object detection and segmentation.
  • ConvMAE-Base improves the ImageNet finetuning accuracy by 1.4% compared with MAE-Base. On object detection with Mask-RCNN, ConvMAE-Base achieves 53.2 box AP and 47.1 mask AP with a 25-epoch training schedule while MAE-Base attains 50.3 box AP and 44.9 mask AP with 100 training epochs. On ADE20K with UperNet, ConvMAE-Base surpasses MAE-Base by 3.6 mIoU (48.1 vs. 51.7).

tenser

Pretrain on ImageNet-1K

The following table provides pretrained checkpoints and logs used in the paper.

ConvMAE-Base
pretrained checkpoints download
logs download

Main Results on ImageNet-1K

Models #Params(M) Supervision Encoder Ratio Pretrain Epochs FT [email protected](%) LIN [email protected](%) FT logs/weights LIN logs/weights
BEiT 88 DALLE 100% 300 83.0 37.6 - -
MAE 88 RGB 25% 1600 83.6 67.8 - -
SimMIM 88 RGB 100% 800 84.0 56.7 - -
MaskFeat 88 HOG 100% 300 83.6 N/A - -
data2vec 88 RGB 100% 800 84.2 N/A - -
ConvMAE-B 88 RGB 25% 1600 85.0 70.9 log/weight

Main Results on COCO

Mask R-CNN

Models Pretrain Pretrain Epochs Finetune Epochs #Params(M) FLOPs(T) box AP mask AP logs/weights
Swin-B IN21K w/ labels 300 36 109 0.7 51.4 45.4 -
Swin-L IN21K w/ labels 300 36 218 1.1 52.4 46.2 -
MViTv2-B IN21K w/ labels 300 36 73 0.6 53.1 47.4 -
MViTv2-L IN21K w/ labels 300 36 239 1.3 53.6 47.5 -
Benchmarking-ViT-B IN1K w/o labels 1600 100 118 0.9 50.4 44.9 -
Benchmarking-ViT-L IN1K w/o labels 1600 100 340 1.9 53.3 47.2 -
ViTDet IN1K w/o labels 1600 100 111 0.8 51.2 45.5 -
MIMDet-ViT-B IN1K w/o labels 1600 36 127 1.1 51.5 46.0 -
MIMDet-ViT-L IN1K w/o labels 1600 36 345 2.6 53.3 47.5 -
ConvMAE-B IN1K w/o lables 1600 25 104 0.9 53.2 47.1 log/weight

Main Results on ADE20K

UperNet

Models Pretrain Pretrain Epochs Finetune Iters #Params(M) FLOPs(T) mIoU logs/weights
DeiT-B IN1K w/ labels 300 16K 163 0.6 45.6 -
Swin-B IN1K w/ labels 300 16K 121 0.3 48.1 -
MoCo V3 IN1K 300 16K 163 0.6 47.3 -
DINO IN1K 400 16K 163 0.6 47.2 -
BEiT IN1K+DALLE 1600 16K 163 0.6 47.1 -
PeCo IN1K 300 16K 163 0.6 46.7 -
CAE IN1K+DALLE 800 16K 163 0.6 48.8 -
MAE IN1K 1600 16K 163 0.6 48.1 -
ConvMAE-B IN1K 1600 16K 153 0.6 51.7 soon

Main Results on Kinetics-400

Models Pretrain Epochs Finetune Epochs #Params(M) Top1 Top5 logs/weights
VideoMAE-B 200 100 87 77.8
VideoMAE-B 800 100 87 79.4
VideoMAE-B 1600 100 87 79.8
VideoMAE-B 1600 100 (w/ Repeated Aug) 87 80.7 94.7
SpatioTemporalLearner-B 800 150 (w/ Repeated Aug) 87 81.3 94.9
VideoConvMAE-B 200 100 86 80.1 94.3 Soon
VideoConvMAE-B 800 100 86 81.7 95.1 Soon
VideoConvMAE-B-MSD 800 100 86 82.7 95.5 Soon

Main Results on Something-Something V2

Models Pretrain Epochs Finetune Epochs #Params(M) Top1 Top5 logs/weights
VideoMAE-B 200 40 87 66.1
VideoMAE-B 800 40 87 69.3
VideoMAE-B 2400 40 87 70.3
VideoConvMAE-B 200 40 86 67.7 91.2 Soon
VideoConvMAE-B 800 40 86 69.9 92.4 Soon
VideoConvMAE-B-MSD 800 40 86 70.7 93.0 Soon

Getting Started

Prerequisites

  • Linux
  • Python 3.7+
  • CUDA 10.2+
  • GCC 5+

Training and evaluation

Acknowledgement

The pretraining and finetuning of our project are based on DeiT and MAE. The object detection and semantic segmentation parts are based on MIMDet and MMSegmentation respectively. Thanks for their wonderful work.

License

ConvMAE is released under the MIT License.

Citation

@article{gao2022convmae,
  title={ConvMAE: Masked Convolution Meets Masked Autoencoders},
  author={Gao, Peng and Ma, Teli and Li, Hongsheng and Dai, Jifeng and Qiao, Yu},
  journal={arXiv preprint arXiv:2205.03892},
  year={2022}
}
Owner
Alpha VL Team of Shanghai AI Lab
Alpha VL Team of Shanghai AI Lab
Repo público onde postarei meus estudos de Python, buscando aprender por meio do compartilhamento do aprendizado!

Seja bem vindo à minha repo de Estudos em Python 3! Este é um repositório criado por um programador amador que estuda tópicos de finanças, estatística

32 Dec 24, 2022
Data Preparation, Processing, and Visualization for MoVi Data

MoVi-Toolbox Data Preparation, Processing, and Visualization for MoVi Data, https://www.biomotionlab.ca/movi/ MoVi is a large multipurpose dataset of

Saeed Ghorbani 51 Nov 27, 2022
Use CLIP to represent video for Retrieval Task

A Straightforward Framework For Video Retrieval Using CLIP This repository contains the basic code for feature extraction and replication of results.

Jesus Andres Portillo Quintero 54 Dec 22, 2022
This repo is developed for Strong Baseline For Vehicle Re-Identification in Track 2 Ai-City-2021 Challenges

A STRONG BASELINE FOR VEHICLE RE-IDENTIFICATION This paper is accepted to the IEEE Conference on Computer Vision and Pattern Recognition Workshop(CVPR

Cybercore Co. Ltd 78 Dec 29, 2022
A Genetic Programming platform for Python with TensorFlow for wicked-fast CPU and GPU support.

Karoo GP Karoo GP is an evolutionary algorithm, a genetic programming application suite written in Python which supports both symbolic regression and

Kai Staats 149 Jan 09, 2023
Official repository for the CVPR 2021 paper "Learning Feature Aggregation for Deep 3D Morphable Models"

Deep3DMM Official repository for the CVPR 2021 paper Learning Feature Aggregation for Deep 3D Morphable Models. Requirements This code is tested on Py

38 Dec 27, 2022
A Real-ESRGAN equipped Colab notebook for CLIP Guided Diffusion

#360Diffusion automatically upscales your CLIP Guided Diffusion outputs using Real-ESRGAN. Latest Update: Alpha 1.61 [Main Branch] - 01/11/22 Layout a

78 Nov 02, 2022
Credo AI Lens is a comprehensive assessment framework for AI systems. Lens standardizes model and data assessment, and acts as a central gateway to assessments created in the open source community.

Lens by Credo AI - Responsible AI Assessment Framework Lens is a comprehensive assessment framework for AI systems. Lens standardizes model and data a

Credo AI 27 Dec 14, 2022
Autoencoders pretraining using clustering

Autoencoders pretraining using clustering

IITiS PAN 2 Dec 16, 2021
An Implementation of Fully Convolutional Networks in Tensorflow.

Update An example on how to integrate this code into your own semantic segmentation pipeline can be found in my KittiSeg project repository. tensorflo

Marvin Teichmann 1.1k Dec 12, 2022
Official code repository for "Exploring Neural Models for Query-Focused Summarization"

Query-Focused Summarization Official code repository for "Exploring Neural Models for Query-Focused Summarization" This is a work in progress. Expect

Salesforce 29 Dec 18, 2022
Pytorch implementation of the unsupervised object discovery method LOST.

LOST Pytorch implementation of the unsupervised object discovery method LOST. More details can be found in the paper: Localizing Objects with Self-Sup

Valeo.ai 189 Dec 25, 2022
Explaining in Style: Training a GAN to explain a classifier in StyleSpace

Explaining in Style: Official TensorFlow Colab Explaining in Style: Training a GAN to explain a classifier in StyleSpace Oran Lang, Yossi Gandelsman,

Google 197 Nov 08, 2022
Streamlit App For Product Analysis - Streamlit App For Product Analysis

Streamlit_App_For_Product_Analysis Здравствуйте! Перед вами дашборд, позволяющий

Grigory Sirotkin 1 Jan 10, 2022
CondNet: Conditional Classifier for Scene Segmentation

CondNet: Conditional Classifier for Scene Segmentation Introduction The fully convolutional network (FCN) has achieved tremendous success in dense vis

ycszen 31 Jul 22, 2022
A Pytorch Implementation of Source Data-free Domain Adaptation for a Faster R-CNN

A Pytorch Implementation of Source Data-free Domain Adaptation for a Faster R-CNN Please follow Faster R-CNN and DAF to complete the environment confi

2 Jan 12, 2022
This toolkit provides codes to download and pre-process the SLUE datasets, train the baseline models, and evaluate SLUE tasks.

slue-toolkit We introduce Spoken Language Understanding Evaluation (SLUE) benchmark. This toolkit provides codes to download and pre-process the SLUE

ASAPP Research 39 Sep 21, 2022
This repository contains the accompanying code for Deep Virtual Markers for Articulated 3D Shapes, ICCV'21

Deep Virtual Markers This repository contains the accompanying code for Deep Virtual Markers for Articulated 3D Shapes, ICCV'21 Getting Started Get sa

KimHyomin 45 Oct 07, 2022
🐤 Nix-TTS: An Incredibly Lightweight End-to-End Text-to-Speech Model via Non End-to-End Distillation

🐤 Nix-TTS An Incredibly Lightweight End-to-End Text-to-Speech Model via Non End-to-End Distillation Rendi Chevi, Radityo Eko Prasojo, Alham Fikri Aji

Rendi Chevi 156 Jan 09, 2023
TART - A PyTorch implementation for Transition Matrix Representation of Trees with Transposed Convolutions

TART This project is a PyTorch implementation for Transition Matrix Representati

Lee Sael 2 Jan 19, 2022