Code for the ICCV 2021 Workshop paper: A Unified Efficient Pyramid Transformer for Semantic Segmentation.

Overview

Unified-EPT

Code for the ICCV 2021 Workshop paper: A Unified Efficient Pyramid Transformer for Semantic Segmentation.

Installation

  • Linux, CUDA>=10.0, GCC>=5.4
  • Python>=3.7
  • Create a conda environment:
    conda create -n unept python=3.7 pip

Then, activate the environment:

    conda activate unept
  • PyTorch>=1.5.1, torchvision>=0.6.1 (following instructions here)

For example:

conda install pytorch==1.5.1 torchvision==0.6.1 cudatoolkit=10.2 -c pytorch
pip install -r requirements.txt

Data Preparation

Please following the code from openseg to generate ground truth for boundary refinement.

The data format should be like this.

ADE20k

You can download the processed dt_offset file here.

path/to/ADEChallengeData2016/
  images/
    training/
    validation/
  annotations/ 
    training/
    validation/
  dt_offset/
    training/
    validation/

PASCAL-Context

You can download the processed dataset here.

path/to/PASCAL-Context/
  train/
    image/
    label/
    dt_offset/
  val/
    image/
    label/
    dt_offset/

Usage

Training

The default is for multi-gpu, DistributedDataParallel training.

python -m torch.distributed.launch --nproc_per_node=8 \ # specify gpu number
--master_port=29500  \
train.py  --launcher pytorch \
--config /path/to/config_file 
  • specify the data_root in the config file;
  • log dir will be created in ./work_dirs;
  • download the DeiT pretrained model and specify the pretrained path in the config file.

Evaluation

# single-gpu testing
python test.py --checkpoint /path/to/checkpoint \
--config /path/to/config_file \
--eval mIoU \
[--out ${RESULT_FILE}] [--show] \
--aug-test \ # for multi-scale flip aug

# multi-gpu testing (4 gpus, 1 sample per gpu)
python -m torch.distributed.launch --nproc_per_node=4 --master_port=29500 \
test.py  --launcher pytorch --eval mIoU \
--config_file /path/to/config_file \
--checkpoint /path/to/checkpoint \
--aug-test \ # for multi-scale flip aug

Results

We report results on validation sets.

Backbone Crop Size Batch Size Dataset Lr schd Mem(GB) mIoU(ms+flip) config
Res-50 480x480 16 ADE20K 160K 7.0G 46.1 config
DeiT 480x480 16 ADE20K 160K 8.5G 50.5 config
DeiT 480x480 16 PASCAL-Context 160K 8.5G 55.2 config

Security

See CONTRIBUTING for more information.

License

This project is licensed under the Apache-2.0 License.

Citation

If you use this code and models for your research, please consider citing:

@article{zhu2021unified,
  title={A Unified Efficient Pyramid Transformer for Semantic Segmentation},
  author={Zhu, Fangrui and Zhu, Yi and Zhang, Li and Wu, Chongruo and Fu, Yanwei and Li, Mu},
  journal={arXiv preprint arXiv:2107.14209},
  year={2021}
}

Acknowledgment

We thank the authors and contributors of MMCV, MMSegmentation, timm and Deformable DETR.

Official code for 'Pixel-wise Energy-biased Abstention Learning for Anomaly Segmentationon Complex Urban Driving Scenes'

PEBAL This repo contains the Pytorch implementation of our paper: Pixel-wise Energy-biased Abstention Learning for Anomaly Segmentation on Complex Urb

Yu Tian 117 Jan 03, 2023
A project that uses optical flow and machine learning to detect aimhacking in video clips.

waldo-anticheat A project that aims to use optical flow and machine learning to visually detect cheating or hacking in video clips from fps games. Che

waldo.vision 542 Dec 03, 2022
Unified unsupervised and semi-supervised domain adaptation network for cross-scenario face anti-spoofing, Pattern Recognition

USDAN The implementation of Unified unsupervised and semi-supervised domain adaptation network for cross-scenario face anti-spoofing, which is accepte

11 Nov 03, 2022
Keras implementation of PersonLab for Multi-Person Pose Estimation and Instance Segmentation.

PersonLab This is a Keras implementation of PersonLab for Multi-Person Pose Estimation and Instance Segmentation. The model predicts heatmaps and vari

OCTI 160 Dec 21, 2022
Machine learning for NeuroImaging in Python

nilearn Nilearn enables approachable and versatile analyses of brain volumes. It provides statistical and machine-learning tools, with instructive doc

919 Dec 25, 2022
Multi-Output Gaussian Process Toolkit

Multi-Output Gaussian Process Toolkit Paper - API Documentation - Tutorials & Examples The Multi-Output Gaussian Process Toolkit is a Python toolkit f

GAMES 113 Nov 25, 2022
C3DPO - Canonical 3D Pose Networks for Non-rigid Structure From Motion.

C3DPO: Canonical 3D Pose Networks for Non-Rigid Structure From Motion By: David Novotny, Nikhila Ravi, Benjamin Graham, Natalia Neverova, Andrea Vedal

Meta Research 309 Dec 16, 2022
Cancer Drug Response Prediction via a Hybrid Graph Convolutional Network

DeepCDR Cancer Drug Response Prediction via a Hybrid Graph Convolutional Network This work has been accepted to ECCB2020 and was also published in the

Qiao Liu 50 Dec 18, 2022
Implementation for paper MLP-Mixer: An all-MLP Architecture for Vision

MLP Mixer Implementation for paper MLP-Mixer: An all-MLP Architecture for Vision. Give us a star if you like this repo. Author: Github: bangoc123 Emai

Ngoc Nguyen Ba 86 Dec 10, 2022
Source code for the paper "PLOME: Pre-training with Misspelled Knowledge for Chinese Spelling Correction" in ACL2021

PLOME:Pre-training with Misspelled Knowledge for Chinese Spelling Correction (ACL2021) This repository provides the code and data of the work in ACL20

197 Nov 26, 2022
Hand Gesture Volume Control is AIML based project which uses image processing to control the volume of your Computer.

Hand Gesture Volume Control Modules There are basically three modules Handtracking Program Handtracking Module Volume Control Program Handtracking Pro

VITTAL 1 Jan 12, 2022
Train an RL agent to execute natural language instructions in a 3D Environment (PyTorch)

Gated-Attention Architectures for Task-Oriented Language Grounding This is a PyTorch implementation of the AAAI-18 paper: Gated-Attention Architecture

Devendra Chaplot 234 Nov 05, 2022
Official page of Patchwork (RA-L'21 w/ IROS'21)

Patchwork Official page of "Patchwork: Concentric Zone-based Region-wise Ground Segmentation with Ground Likelihood Estimation Using a 3D LiDAR Sensor

Hyungtae Lim 254 Jan 05, 2023
ScriptProfilerPy - Module to visualize where your python script is slow

ScriptProfiler helps you track where your code is slow It provides: Code lines t

Lucas BLP 3 Jun 02, 2022
Controlling the MicriSpotAI robot from scratch

Abstract: The SpotMicroAI project is designed to be a low cost, easily built quadruped robot. The design is roughly based off of Boston Dynamics quadr

Florian Wilk 405 Jan 05, 2023
3D Pose Estimation for Vehicles

3D Pose Estimation for Vehicles Introduction This work generates 4 key-points and 2 key-edges from vertices and edges of vehicles as ground truth. The

Jingyi Wang 1 Nov 01, 2021
Implementation for paper LadderNet: Multi-path networks based on U-Net for medical image segmentation

Implementation for paper LadderNet: Multi-path networks based on U-Net for medical image segmentation This implementation is based on orobix implement

Juntang Zhuang 116 Sep 06, 2022
Understanding Convolution for Semantic Segmentation

TuSimple-DUC by Panqu Wang, Pengfei Chen, Ye Yuan, Ding Liu, Zehua Huang, Xiaodi Hou, and Garrison Cottrell. Introduction This repository is for Under

TuSimple 585 Dec 31, 2022
[BMVC2021] "TransFusion: Cross-view Fusion with Transformer for 3D Human Pose Estimation"

TransFusion-Pose TransFusion: Cross-view Fusion with Transformer for 3D Human Pose Estimation Haoyu Ma, Liangjian Chen, Deying Kong, Zhe Wang, Xingwei

Haoyu Ma 29 Dec 23, 2022
YOLO5Face: Why Reinventing a Face Detector (https://arxiv.org/abs/2105.12931)

Introduction Yolov5-face is a real-time,high accuracy face detection. Performance Single Scale Inference on VGA resolution(max side is equal to 640 an

DeepCam Shenzhen 1.4k Jan 07, 2023