[CVPR 2021] Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers

Last update: Jan 05, 2023

Related tags

Overview

SEgmentation TRansformers -- SETR

Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers,
Sixiao Zheng, Jiachen Lu, Hengshuang Zhao, Xiatian Zhu, Zekun Luo, Yabiao Wang, Yanwei Fu, Jianfeng Feng, Tao Xiang, Philip HS Torr, Li Zhang,
CVPR 2021

Installation

Our project is developed based on mmsegmentation. Please follow the official mmsegmentation INSTALL.md and getting_started.md for installation and dataset preparation.

Main results

Cityscapes

Method	Crop Size	Batch size	iteration	set	mIoU
SETR-Naive	768x768	8	40k	val	77.37	model config
SETR-Naive	768x768	8	80k	val	77.90	model config
SETR-MLA	768x768	8	40k	val	76.65	model config
SETR-MLA	768x768	8	80k	val	77.24	model config
SETR-PUP	768x768	8	40k	val	78.39	model config
SETR-PUP	768x768	8	80k	val	79.34	model config
SETR-Naive-DeiT	768x768	8	40k	val	77.85	model config
SETR-Naive-DeiT	768x768	8	80k	val	78.66	model config
SETR-MLA-DeiT	768x768	8	40k	val	78.04	model config
SETR-MLA-DeiT	768x768	8	80k	val	78.98	model config
SETR-PUP-DeiT	768x768	8	40k	val	78.79	model config
SETR-PUP-DeiT	768x768	8	80k	val	79.45	model config

ADE20K

Method	Crop Size	Batch size	iteration	set	mIoU	mIoU(ms+flip)
SETR-Naive	512x512	16	160k	Val	48.06	48.80	model config
SETR-MLA	512x512	8	160k	val	48.27	50.03	model config
SETR-MLA	512x512	16	160k	val	48.64	50.28	model config
SETR-PUP	512x512	16	160k	val	48.58	50.09	model config

Pascal Context

Method	Crop Size	Batch size	iteration	set	mIoU	mIoU(ms+flip)
SETR-Naive	480x480	16	80k	val	52.89	53.61	model config
SETR-MLA	480x480	8	80k	val	54.39	55.39	model config
SETR-MLA	480x480	16	80k	val	54.87	55.83	model config
SETR-PUP	480x480	16	80k	val	54.40	55.27	model config

Get Started

Train

./tools/dist_train.sh ${CONFIG_FILE} ${GPU_NUM} 
# For example, train a SETR-PUP on Cityscapes dataset with 8 GPUs
./tools/dist_train.sh configs/SETR/SETR_PUP_768x768_40k_cityscapes_bs_8.py 8

Single-scale testing

./tools/dist_test.sh ${CONFIG_FILE} ${CHECKPOINT_FILE} ${GPU_NUM}  [--eval ${EVAL_METRICS}]
# For example, test a SETR-PUP on Cityscapes dataset with 8 GPUs
./tools/dist_test.sh configs/SETR/SETR_PUP_768x768_40k_cityscapes_bs_8.py \
work_dirs/SETR_PUP_768x768_40k_cityscapes_bs_8/iter_40000.pth \
8 --eval mIoU

Multi-scale testing

Use the config file ending in _MS.py in configs/SETR.

./tools/dist_test.sh ${CONFIG_FILE} ${CHECKPOINT_FILE} ${GPU_NUM}  [--eval ${EVAL_METRICS}]
# For example, test a SETR-PUP on Cityscapes dataset with 8 GPUs
./tools/dist_test.sh configs/SETR/SETR_PUP_768x768_40k_cityscapes_bs_8_MS.py \
work_dirs/SETR_PUP_768x768_40k_cityscapes_bs_8/iter_40000.pth \
8 --eval mIoU

Please see getting_started.md for the more basic usage of training and testing.

Reference

@inproceedings{SETR,
    title={Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers}, 
    author={Zheng, Sixiao and Lu, Jiachen and Zhao, Hengshuang and Zhu, Xiatian and Luo, Zekun and Wang, Yabiao and Fu, Yanwei and Feng, Jianfeng and Xiang, Tao and Torr, Philip H.S. and Zhang, Li},
    booktitle={CVPR},
    year={2021}
}

License

MIT

Acknowledgement

Thanks to previous open-sourced repo:
mmsegmentation
pytorch-image-models

[CVPR 2021] Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers

Related tags

Overview

SEgmentation TRansformers -- SETR

Installation

Main results

Cityscapes

ADE20K

Pascal Context

Get Started

Train

Single-scale testing

Multi-scale testing

Reference

License

Acknowledgement

Owner

Fudan Zhang Vision Group

Semantically Contrastive Learning for Low-light Image Enhancement

VID-Fusion: Robust Visual-Inertial-Dynamics Odometry for Accurate External Force Estimation

A collection of semantic image segmentation models implemented in TensorFlow

EMNLP'2021: Simple Entity-centric Questions Challenge Dense Retrievers

Unsupervised Image to Image Translation with Generative Adversarial Networks

Homepage of paper: Paint Transformer: Feed Forward Neural Painting with Stroke Prediction, ICCV 2021.

Code of paper "Compositionally Generalizable 3D Structure Prediction"

Neighborhood Contrastive Learning for Novel Class Discovery

Keepsake is a Python library that uploads files and metadata (like hyperparameters) to Amazon S3 or Google Cloud Storage

Joint Unsupervised Learning (JULE) of Deep Representations and Image Clusters.

Official implementation of the article "Unsupervised JPEG Domain Adaptation For Practical Digital Forensics"

CS50x-AI - Artificial Intelligence with Python from Harvard University

Code for the CVPR 2021 paper "Triple-cooperative Video Shadow Detection"

[CVPR2021] Domain Consensus Clustering for Universal Domain Adaptation

Deep learning-based approach to discovering Granger causality networks in multivariate time series

Unified Interface for Constructing and Managing Workflows on different workflow engines, such as Argo Workflows, Tekton Pipelines, and Apache Airflow.

Offical implementation of Shunted Self-Attention via Multi-Scale Token Aggregation

MACE is a deep learning inference framework optimized for mobile heterogeneous computing platforms.

PyTorch Implementation of ByteDance's Cross-speaker Emotion Transfer Based on Speaker Condition Layer Normalization and Semi-Supervised Training in Text-To-Speech

Code release for General Greedy De-bias Learning