SeMask: Semantically Masked Transformers for Semantic Segmentation.

Last update: Dec 30, 2022

Overview

SeMask: Semantically Masked Transformers

Jitesh Jain, Anukriti Singh, Nikita Orlov, Zilong Huang, Jiachen Li, Steven Walton, Humphrey Shi

This repo contains the code for our paper SeMask: Semantically Masked Transformers for Semantic Segmentation.

Results
Setup Instructions
Citing SeMask

1. Results

Note: † denotes the backbones were pretrained on ImageNet-22k and 384x384 resolution images.

ADE20K

Method	Backbone	Crop Size	mIoU	mIoU (ms+flip)	#params	config	Checkpoint
SeMask-T FPN	SeMask Swin-T	512x512	42.11	43.16	35M	config	TBD
SeMask-S FPN	SeMask Swin-S	512x512	45.92	47.63	56M	config	TBD
SeMask-B FPN	SeMask Swin-B^†	512x512	49.35	50.98	96M	config	TBD
SeMask-L FPN	SeMask Swin-L^†	640x640	51.89	53.52	211M	config	TBD
SeMask-L MaskFormer	SeMask Swin-L^†	640x640	54.75	56.15	219M	config	TBD
SeMask-L Mask2Former	SeMask Swin-L^†	640x640	56.41	57.52	222M	config	TBD
SeMask-L Mask2Former FAPN	SeMask Swin-L^†	640x640	56.68	58.00	227M	config	TBD
SeMask-L Mask2Former MSFAPN	SeMask Swin-L^†	640x640	56.54	58.22	224M	config	TBD

Cityscapes

Method	Backbone	Crop Size	mIoU	mIoU (ms+flip)	#params	config	Checkpoint
SeMask-T FPN	SeMask Swin-T	768x768	74.92	76.56	34M	config	TBD
SeMask-S FPN	SeMask Swin-S	768x768	77.13	79.14	56M	config	TBD
SeMask-B FPN	SeMask Swin-B^†	768x768	77.70	79.73	96M	config	TBD
SeMask-L FPN	SeMask Swin-L^†	768x768	78.53	80.39	211M	config	TBD
SeMask-L Mask2Former	SeMask Swin-L^†	512x1024	83.97	84.98	222M	config	TBD

COCO-Stuff 10k

Method	Backbone	Crop Size	mIoU	mIoU (ms+flip)	#params	config	Checkpoint
SeMask-T FPN	SeMask Swin-T	512x512	37.53	38.88	35M	config	TBD
SeMask-S FPN	SeMask Swin-S	512x512	40.72	42.27	56M	config	TBD
SeMask-B FPN	SeMask Swin-B^†	512x512	44.63	46.30	96M	config	TBD
SeMask-L FPN	SeMask Swin-L^†	640x640	47.47	48.54	211M	config	TBD

2. Setup Instructions

We provide the codebase with SeMask incorporated into various models. Please check the setup instructions inside the corresponding folders:

SeMask-FPN: Setup Instructions
SeMask-MaskFormer: Setup Instructions
SeMask-Mask2Former: Setup Instructions
SeMask-FAPN: Setup Instructions

3. Citing SeMask

@article{jain2022semask,
  title={SeMask: Semantically Masking Transformer Backbones for Effective Semantic Segmentation},
  author={Jitesh Jain and Anukriti Singh and Nikita Orlov and Zilong Huang and Jiachen Li and Steven Walton and Humphrey Shi},
  journal={arXiv preprint arXiv:...},
  year={2022}
}

Acknowledgements

Code is based heavily on the following repositories: Swin-Transformer-Semantic-Segmentation, Mask2Former, MaskFormer and FaPN-full.

SeMask: Semantically Masked Transformers for Semantic Segmentation.

Related tags

Overview

SeMask: Semantically Masked Transformers

Contents

1. Results

ADE20K

Cityscapes

COCO-Stuff 10k

2. Setup Instructions

3. Citing SeMask

Acknowledgements

Owner

Picsart AI Research (PAIR)

Hard cater examples from Hopper ICLR paper

PyTorch implementation for "Mining Latent Structures with Contrastive Modality Fusion for Multimedia Recommendation"

Code for HodgeNet: Learning Spectral Geometry on Triangle Meshes, in SIGGRAPH 2021.

Biomarker identification for COVID-19 Severity in BALF cells Single-cell RNA-seq data

🕹️ Official Implementation of Conditional Motion In-betweening (CMIB) 🏃

Dataset for the Research2Clinics @ NeurIPS 2021 Paper: What Do You See in this Patient? Behavioral Testing of Clinical NLP Models

Prefix-Tuning: Optimizing Continuous Prompts for Generation

Agile SVG maker for python

R-package accompanying the paper "Dynamic Factor Model for Functional Time Series: Identification, Estimation, and Prediction"

PyTorch Implementation of ByteDance's Cross-speaker Emotion Transfer Based on Speaker Condition Layer Normalization and Semi-Supervised Training in Text-To-Speech

A python3 tool to take a 360 degree survey of the RF spectrum (hamlib + rotctld + RTL-SDR/HackRF)

A Simple Example for Imitation Learning with Dataset Aggregation (DAGGER) on Torcs Env

Implementation of Learning Gradient Fields for Molecular Conformation Generation (ICML 2021).

VISNOTATE: An Opensource tool for Gaze-based Annotation of WSI Data

This is the code for "HyperNeRF: A Higher-Dimensional Representation for Topologically Varying Neural Radiance Fields".

CSPML (crystal structure prediction with machine learning-based element substitution)

A Self-Supervised Contrastive Learning Framework for Aspect Detection

EigenGAN Tensorflow, EigenGAN: Layer-Wise Eigen-Learning for GANs

Code for "Contextual Non-Local Alignment over Full-Scale Representation for Text-Based Person Search"

Face Recognition plus identification simply and fast | Python