Mae segmentation - Reproduction of semantic segmentation using masked autoencoder (mae)

Last update: Dec 17, 2022

Overview

ADE20k Semantic segmentation with MAE

Getting started

Install the mmsegmentation library and some required packages.

pip install mmcv-full==1.3.0 mmsegmentation==0.11.0
pip install scipy timm==0.3.2

Install apex for mixed-precision training

git clone https://github.com/NVIDIA/apex
cd apex
pip install -v --disable-pip-version-check --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./

Follow the guide in mmseg to prepare the ADE20k dataset.

Fine-tuning for Reproducing Results of MAE ViT-Base

Command:

tools/dist_train.sh configs/mae/upernet_mae_base_12_512_slide_160k_ade20k.py 8 --seed 0  --options model.pretrained=https://dl.fbaipublicfiles.com/mae/pretrain/mae_pretrain_vit_base.pth

Expected results log(paper results: 48.1 mIoU):

+--------+-------+-------+-------+
| Scope  | mIoU  | mAcc  | aAcc  |
+--------+-------+-------+-------+
| global | 48.15 | 58.99 | 83.05 |
+--------+-------+-------+-------+

Evaluation

Command format:

tools/dist_test.sh  <CONFIG_PATH> <CHECKPOINT_PATH> <NUM_GPUS> --eval mIoU

Acknowledgment

This code is built using the mmsegmentation library, Timm library, the Swin repository, XCiT, SETR, BEiT and the MAE repository.

Mae segmentation - Reproduction of semantic segmentation using masked autoencoder (mae)

Related tags

Overview

ADE20k Semantic segmentation with MAE

Getting started

Fine-tuning for Reproducing Results of MAE ViT-Base

Evaluation

Acknowledgment

Owner

Fast and simple implementation of RL algorithms, designed to run fully on GPU.

Backend code to use MCPI's python API to make infinite worlds with custom generation

Snscrape-jsonl-urls-extractor - Extracts urls from jsonl produced by snscrape

Deep Reinforcement Learning with pytorch & visdom

[CVPR'20] TTSR: Learning Texture Transformer Network for Image Super-Resolution

[CVPR'22] COAP: Learning Compositional Occupancy of People

Implementation of Neural Style Transfer in Pytorch

FAST Aiming at the problems of cumbersome steps and slow download speed of GNSS data

Multi-scale discriminator feature-wise loss function

Pytorch Implementation of Zero-Shot Image-to-Text Generation for Visual-Semantic Arithmetic

Data Augmentation with Variational Autoencoders

Python code for loading the Aschaffenburg Pose Dataset.

KE-Dialogue: Injecting knowledge graph into a fully end-to-end dialogue system.

Source code of our BMVC 2021 paper: AniFormer: Data-driven 3D Animation with Transformer

🦙 LaMa Image Inpainting, Resolution-robust Large Mask Inpainting with Fourier Convolutions, WACV 2022

Combining Latent Space and Structured Kernels for Bayesian Optimization over Combinatorial Spaces

Implementation of GGB color space

A Survey on Deep Learning Technique for Video Segmentation

Differential rendering based motion capture blender project.

Fast, Attemptable Route Planner for Navigation in Known and Unknown Environments