Official PyTorch implementation of Segmenter: Transformer for Semantic Segmentation

Overview

Segmenter: Transformer for Semantic Segmentation

Figure 1 from paper

Segmenter: Transformer for Semantic Segmentation by Robin Strudel*, Ricardo Garcia*, Ivan Laptev and Cordelia Schmid.

*Equal Contribution

Installation

Define os environment variables pointing to your checkpoint and dataset directory, put in your .bashrc:

export DATASET=/path/to/dataset/dir

Install PyTorch 1.9 then pip install . at the root of this repository.

To download ADE20K, use the following command:

python -m segm.scripts.prepare_ade20k $DATASET

Model Zoo

We release models with a Vision Transformer backbone initialized from the improved ViT models.

ADE20K

Segmenter models with ViT backbone:

Name mIoU (SS/MS) # params Resolution FPS Download
Seg-T-Mask/16 38.1 / 38.8 7M 512x512 52.4 model config log
Seg-S-Mask/16 45.3 / 46.9 27M 512x512 34.8 model config log
Seg-B-Mask/16 48.5 / 50.0 106M 512x512 24.1 model config log
Seg-L-Mask/16 51.3 / 53.2 334M 512x512 10.6 model config log
Seg-L-Mask/16 51.8 / 53.6 334M 640x640 - model config log

Segmenter models with DeiT backbone:

Name mIoU (SS/MS) # params Resolution FPS Download
Seg-B/16 47.1 / 48.1 87M 512x512 27.3 model config log
Seg-B-Mask/16 48.7 / 50.1 106M 512x512 24.1 model config log

Pascal Context

Name mIoU (SS/MS) # params Resolution FPS Download
Seg-L-Mask/16 58.1 / 59.0 334M 480x480 - model config log

Inference

Download one checkpoint with its configuration in a common folder, for example seg_tiny_mask.

You can generate segmentation maps from your own data with:

python -m segm.inference --model-path seg_tiny_mask/checkpoint.pth -i images/ -o segmaps/ 

To evaluate on ADE20K, run the command:

# single-scale evaluation:
python -m segm.eval.miou seg_tiny_mask/checkpoint.pth ade20k --singlescale
# multi-scale evaluation:
python -m segm.eval.miou seg_tiny_mask/checkpoint.pth ade20k --multiscale

Train

Train Seg-T-Mask/16 on ADE20K on a single GPU:

python -m segm.train --log-dir seg_tiny_mask --dataset ade20k \
  --backbone vit_tiny_patch16_384 --decoder mask_transformer

To train Seg-B-Mask/16, simply set vit_base_patch16_384 as backbone and launch the above command using a minimum of 4 V100 GPUs (~12 minutes per epoch) and up to 8 V100 GPUs (~7 minutes per epoch). The code uses SLURM environment variables.

Logs

To plot the logs of your experiments, you can use

python -m segm.utils.logs logs.yml

with logs.yml located in utils/ with the path to your experiments logs:

root: /path/to/checkpoints/
logs:
  seg-t: seg_tiny_mask/log.txt
  seg-b: seg_base_mask/log.txt

Video Segmentation

Zero shot video segmentation on DAVIS video dataset with Seg-B-Mask/16 model trained on ADE20K.

BibTex

@article{strudel2021,
  title={Segmenter: Transformer for Semantic Segmentation},
  author={Strudel, Robin and Garcia, Ricardo and Laptev, Ivan and Schmid, Cordelia},
  journal={arXiv preprint arXiv:2105.05633},
  year={2021}
}

Acknowledgements

The Vision Transformer code is based on timm library and the semantic segmentation training and evaluation pipeline is using mmsegmentation.

Owner
PhD student at Ecole Normale Supérieure and INRIA Paris
Stochastic gradient descent with model building

Stochastic Model Building (SMB) This repository includes a new fast and robust stochastic optimization algorithm for training deep learning models. Th

S. Ilker Birbil 22 Jan 19, 2022
Aerial Single-View Depth Completion with Image-Guided Uncertainty Estimation (RA-L/ICRA 2020)

Aerial Depth Completion This work is described in the letter "Aerial Single-View Depth Completion with Image-Guided Uncertainty Estimation", by Lucas

ETHZ V4RL 70 Dec 22, 2022
Film review classification

Film review classification Решение задачи классификации отзывов на фильмы на положительные и отрицательные с помощью рекуррентных нейронных сетей 1. З

Nikita Dukin 3 Jan 21, 2022
This is a repository for a semantic segmentation inference API using the OpenVINO toolkit

BMW-IntelOpenVINO-Segmentation-Inference-API This is a repository for a semantic segmentation inference API using the OpenVINO toolkit. It's supported

BMW TechOffice MUNICH 34 Nov 24, 2022
Kaggle competition: Springleaf Marketing Response

PruebaEnel Prueba Kaggle-Springleaf-master Prueba Kaggle-Springleaf Kaggle competition: Springleaf Marketing Response Competencia de Kaggle: Marketing

1 Feb 09, 2022
Object Depth via Motion and Detection Dataset

ODMD Dataset ODMD is the first dataset for learning Object Depth via Motion and Detection. ODMD training data are configurable and extensible, with ea

Brent Griffin 172 Dec 21, 2022
Hands-On Machine Learning for Algorithmic Trading, published by Packt

Hands-On Machine Learning for Algorithmic Trading Hands-On Machine Learning for Algorithmic Trading, published by Packt This is the code repository fo

Packt 981 Dec 29, 2022
A tensorflow implementation of GCN-LPA

GCN-LPA This repository is the implementation of GCN-LPA (arXiv): Unifying Graph Convolutional Neural Networks and Label Propagation Hongwei Wang, Jur

Hongwei Wang 83 Nov 28, 2022
TransNet V2: Shot Boundary Detection Neural Network

TransNet V2: Shot Boundary Detection Neural Network This repository contains code for TransNet V2: An effective deep network architecture for fast sho

Tomáš Souček 212 Dec 27, 2022
Data Engineering ZoomCamp

Data Engineering ZoomCamp I'm partaking in a Data Engineering Bootcamp / Zoomcamp and will be tracking my progress here. I can't promise these notes w

Aaron 61 Jan 06, 2023
AquaTimer - Programmable Timer for Aquariums based on ATtiny414/814/1614

AquaTimer - Programmable Timer for Aquariums based on ATtiny414/814/1614 AquaTimer is a programmable timer for 12V devices such as lighting, solenoid

Stefan Wagner 4 Jun 13, 2022
Cereal box identification in store shelves using computer vision and a single train image per model.

Product Recognition on Store Shelves Description You can read the task description here. Report You can read and download our report here. Step A - Mu

Nicholas Baraghini 1 Jan 21, 2022
LexGLUE: A Benchmark Dataset for Legal Language Understanding in English

LexGLUE: A Benchmark Dataset for Legal Language Understanding in English ⚖️ 🏆 🧑‍🎓 👩‍⚖️ Dataset Summary Inspired by the recent widespread use of th

95 Dec 08, 2022
통일된 DataScience 폴더 구조 제공 및 가상환경 작업의 부담감 해소

Lucas coded by linux shell 목차 Mac버전 CookieCutter (autoenv) 1.How to Install autoenv 2.폴더 진입 시, activate 구현하기 3.폴더 탈출 시, deactivate 구현하기 4.Alias 설정하기 5

ello 3 Feb 21, 2022
Geometric Sensitivity Decomposition

Geometric Sensitivity Decomposition This repo is the official implementation of A Geometric Perspective towards Neural Calibration via Sensitivity Dec

16 Dec 26, 2022
Perform zero-order Hankel Transform for an 1D array (float or real valued).

perform zero-order Hankel Transform for an 1D array (float or real valued). An discrete form of Parseval theorem is guaranteed. Suit for iterative problems.

1 Jan 17, 2022
Official code of the paper "ReDet: A Rotation-equivariant Detector for Aerial Object Detection" (CVPR 2021)

ReDet: A Rotation-equivariant Detector for Aerial Object Detection ReDet: A Rotation-equivariant Detector for Aerial Object Detection (CVPR2021), Jiam

csuhan 334 Dec 23, 2022
Real-time analysis of intracranial neurophysiology recordings.

py_neuromodulation Click this button to run the "Tutorial ML with py_neuro" notebooks: The py_neuromodulation toolbox allows for real time capable pro

Interventional Cognitive Neuromodulation - Neumann Lab Berlin 15 Nov 03, 2022
Official Python implementation of the 'Sparse deconvolution'-v0.3.0

Sparse deconvolution Python v0.3.0 Official Python implementation of the 'Sparse deconvolution', and the CPU (NumPy) and GPU (CuPy) calculation backen

Weisong Zhao 23 Dec 28, 2022
A simple editor for captions in .SRT file extension

WaySRT A simple editor for captions in .SRT file extension The program doesn't use any external dependecies, just run: python way_srt.py {file_name.sr

Gustavo Lopes 3 Nov 16, 2022