Code for the ICCV 2021 Workshop paper: A Unified Efficient Pyramid Transformer for Semantic Segmentation.

Last update: Aug 23, 2022

Overview

Unified-EPT

Code for the ICCV 2021 Workshop paper: A Unified Efficient Pyramid Transformer for Semantic Segmentation.

Installation

Linux, CUDA>=10.0, GCC>=5.4
Python>=3.7
Create a conda environment:

    conda create -n unept python=3.7 pip

Then, activate the environment:

    conda activate unept

PyTorch>=1.5.1, torchvision>=0.6.1 (following instructions here)

For example:

conda install pytorch==1.5.1 torchvision==0.6.1 cudatoolkit=10.2 -c pytorch

Install MMCV, MMSegmentation, timm

pip install -r requirements.txt

Install Deformable DETR and compile the CUDA operators (the instructions can be found here).

Data Preparation

Please following the code from openseg to generate ground truth for boundary refinement.

The data format should be like this.

ADE20k

You can download the processed dt_offset file here.

path/to/ADEChallengeData2016/
  images/
    training/
    validation/
  annotations/ 
    training/
    validation/
  dt_offset/
    training/
    validation/

PASCAL-Context

You can download the processed dataset here.

path/to/PASCAL-Context/
  train/
    image/
    label/
    dt_offset/
  val/
    image/
    label/
    dt_offset/

Usage

Training

The default is for multi-gpu, DistributedDataParallel training.

python -m torch.distributed.launch --nproc_per_node=8 \ # specify gpu number
--master_port=29500  \
train.py  --launcher pytorch \
--config /path/to/config_file

specify the data_root in the config file;
log dir will be created in ./work_dirs;
download the DeiT pretrained model and specify the pretrained path in the config file.

Evaluation

# single-gpu testing
python test.py --checkpoint /path/to/checkpoint \
--config /path/to/config_file \
--eval mIoU \
[--out ${RESULT_FILE}] [--show] \
--aug-test \ # for multi-scale flip aug

# multi-gpu testing (4 gpus, 1 sample per gpu)
python -m torch.distributed.launch --nproc_per_node=4 --master_port=29500 \
test.py  --launcher pytorch --eval mIoU \
--config_file /path/to/config_file \
--checkpoint /path/to/checkpoint \
--aug-test \ # for multi-scale flip aug

Results

We report results on validation sets.

Backbone	Crop Size	Batch Size	Dataset	Lr schd	Mem(GB)	mIoU(ms+flip)	config
Res-50	480x480	16	ADE20K	160K	7.0G	46.1	config
DeiT	480x480	16	ADE20K	160K	8.5G	50.5	config
DeiT	480x480	16	PASCAL-Context	160K	8.5G	55.2	config

Security

See CONTRIBUTING for more information.

License

This project is licensed under the Apache-2.0 License.

Citation

If you use this code and models for your research, please consider citing:

@article{zhu2021unified,
  title={A Unified Efficient Pyramid Transformer for Semantic Segmentation},
  author={Zhu, Fangrui and Zhu, Yi and Zhang, Li and Wu, Chongruo and Fu, Yanwei and Li, Mu},
  journal={arXiv preprint arXiv:2107.14209},
  year={2021}
}

Acknowledgment

We thank the authors and contributors of MMCV, MMSegmentation, timm and Deformable DETR.

Code for the ICCV 2021 Workshop paper: A Unified Efficient Pyramid Transformer for Semantic Segmentation.

Related tags

Overview

Unified-EPT

Installation

Data Preparation

ADE20k

PASCAL-Context

Usage

Training

Evaluation

Results

Security

License

Citation

Acknowledgment

Owner

Official Implement of CVPR 2021 paper “Cross-Modal Collaborative Representation Learning and a Large-Scale RGBT Benchmark for Crowd Counting”

Unleashing Transformers: Parallel Token Prediction with Discrete Absorbing Diffusion for Fast High-Resolution Image Generation from Vector-Quantized Codes

A Deep Learning Based Knowledge Extraction Toolkit for Knowledge Base Population

Code for KDD'20 "An Efficient Neighborhood-based Interaction Model for Recommendation on Heterogeneous Graph"

Code for the paper: Learning Adversarially Robust Representations via Worst-Case Mutual Information Maximization (https://arxiv.org/abs/2002.11798)

Generating Videos with Scene Dynamics

Deep-learning-roadmap - All You Need to Know About Deep Learning - A kick-starter

Repository for the Bias Benchmark for QA dataset.

Tools for manipulating UVs in the Blender viewport.

New approach to benchmark VQA models

Python package for multiple object tracking research with focus on laboratory animals tracking.

Synthesize photos from PhotoDNA using machine learning 🌱

Jax/Flax implementation of Variational-DiffWave.

Pointer-generator - Code for the ACL 2017 paper Get To The Point: Summarization with Pointer-Generator Networks

Deep Learning Interviews book: Hundreds of fully solved job interview questions from a wide range of key topics in AI.

This is the code for the paper "Motion-Focused Contrastive Learning of Video Representations" (ICCV'21).

Plover-tapey-tape: an alternative to Plover’s built-in paper tape

Newt - a Gaussian process library in JAX.

DeepAL: Deep Active Learning in Python

Scalable, event-driven, deep-learning-friendly backtesting library