[CVPR 2022] Semi-Supervised Semantic Segmentation Using Unreliable Pseudo-Labels

Overview

Using Unreliable Pseudo Labels

Official PyTorch implementation of Semi-Supervised Semantic Segmentation Using Unreliable Pseudo Labels, CVPR 2022.

Please refer to our project page for qualitative results.

Abstract. The crux of semi-supervised semantic segmentation is to assign adequate pseudo-labels to the pixels of unlabeled images. A common practice is to select the highly confident predictions as the pseudo ground-truth, but it leads to a problem that most pixels may be left unused due to their unreliability. We argue that every pixel matters to the model training, even its prediction is ambiguous. Intuitively, an unreliable prediction may get confused among the top classes (i.e., those with the highest probabilities), however, it should be confident about the pixel not belonging to the remaining classes. Hence, such a pixel can be convincingly treated as a negative sample to those most unlikely categories. Based on this insight, we develop an effective pipeline to make sufficient use of unlabeled data. Concretely, we separate reliable and unreliable pixels via the entropy of predictions, push each unreliable pixel to a category-wise queue that consists of negative samples, and manage to train the model with all candidate pixels. Considering the training evolution, where the prediction becomes more and more accurate, we adaptively adjust the threshold for the reliable-unreliable partition. Experimental results on various benchmarks and training settings demonstrate the superiority of our approach over the state-of-the-art alternatives.

Results

PASCAL VOC 2012

Labeled images are selected from the train set of original VOC, 1,464 images in total. And the remaining 9,118 images are all considered as unlabeled ones.

For instance, 1/2 (732) represents 732 labeled images and remaining 9,850 (9,118 + 732) are unlabeled.

Method 1/16 (92) 1/8 (183) 1/4 (366) 1/2 (732) Full (1464)
SupOnly 45.77 54.92 65.88 71.69 72.50
U2PL (w/ CutMix) 67.98 69.15 73.66 76.16 79.49

Labeled images are selected from the train set of augmented VOC, 10,582 images in total.

Method 1/16 (662) 1/8 (1323) 1/4 (2646) 1/2 (5291)
SupOnly 67.87 71.55 75.80 77.13
U2PL (w/ CutMix) 77.21 79.01 79.30 80.50

Cityscapes

Labeled images are selected from the train set, 2,975 images in total.

Method 1/16 (186) 1/8 (372) 1/4 (744) 1/2 (1488)
SupOnly 65.74 72.53 74.43 77.83
U2PL (w/ CutMix) 70.30 74.37 76.47 79.05
U2PL (w/ AEL) 74.90 76.48 78.51 79.12

Checkpoints

  • Models on Cityscapes with AEL (ResNet101-DeepLabv3+)
1/16 (186) 1/8 (372) 1/4 (744) 1/2 (1488)
Google Drive Google Drive Google Drive Google Drive
Baidu Drive
Fetch Code: rrpd
Baidu Drive
Fetch Code: welw
Baidu Drive
Fetch Code: qwcd
Baidu Drive
Fetch Code: 4p8r

Installation

git clone https://github.com/Haochen-Wang409/U2PL.git && cd U2PL
conda create -n u2pl python=3.6.9
conda activate u2pl
pip install -r requirements.txt
pip install pip install torch==1.8.1+cu102 torchvision==0.9.1+cu102 -f https://download.pytorch.org/whl/torch_stable.html

Usage

U2PL is evaluated on both Cityscapes and PASCAL VOC 2012 dataset.

Prepare Data

For Cityscapes

Download "leftImg8bit_trainvaltest.zip" and "gtFine_trainvaltest.zip" from: https://www.cityscapes-dataset.com/downloads/.

Next, unzip the files to folder data and make the dictionary structures as follows:

data/cityscapes
├── gtFine
│   ├── test
│   ├── train
│   └── val
└── leftImg8bit
    ├── test
    ├── train
    └── val
For PASCAL VOC 2012

Download "VOCtrainval_11-May-2012.tar" from: http://host.robots.ox.ac.uk/pascal/VOC/voc2012/VOCtrainval_11-May-2012.tar.

And unzip the files to folder data and make the dictionary structures as follows:

data/VOC2012
├── Annotations
├── ImageSets
├── JPEGImages
├── SegmentationClass
├── SegmentationClassAug
└── SegmentationObject

Finally, the structure of dictionary data should be as follows:

data
├── cityscapes
│   ├── gtFine
│   └── leftImg8bit
├── splits
│   ├── cityscapes
│   └── pascal
└── VOC2012
    ├── Annotations
    ├── ImageSets
    ├── JPEGImages
    ├── SegmentationClass
    ├── SegmentationClassAug
    └── SegmentationObject

Prepare Pretrained Backbone

Before training, please download ResNet101 pretrained on ImageNet-1K from one of the following:

After that, modify model_urls in semseg/models/resnet.py to </path/to/resnet101.pth>

Train a Fully-Supervised Model

For instance, we can train a model on PASCAL VOC 2012 with only 1464 labeled data for supervision by:

cd experiments/pascal/1464/suponly
# use torch.distributed.launch
sh train.sh <num_gpu> <port>

# or use slurm
# sh slurm_train.sh <num_gpu> <port> <partition>

Or for Cityscapes, a model supervised by only 744 labeled data can be trained by:

cd experiments/cityscapes/744/suponly
# use torch.distributed.launch
sh train.sh <num_gpu> <port>

# or use slurm
# sh slurm_train.sh <num_gpu> <port> <partition>

After training, the model should be evaluated by

sh eval.sh

Train a Semi-Supervised Model

We can train a model on PASCAL VOC 2012 with 1464 labeled data and 9118 unlabeled data for supervision by:

cd experiments/pascal/1464/ours
# use torch.distributed.launch
sh train.sh <num_gpu> <port>

# or use slurm
# sh slurm_train.sh <num_gpu> <port> <partition>

Or for Cityscapes, a model supervised by 744 labeled data and 2231 unlabeled data can be trained by:

cd experiments/cityscapes/744/ours
# use torch.distributed.launch
sh train.sh <num_gpu> <port>

# or use slurm
# sh slurm_train.sh <num_gpu> <port> <partition>

After training, the model should be evaluated by

sh eval.sh

Train a Semi-Supervised Model on Cityscapes with AEL

First, you should switch the branch:

git checkout with_AEL

Then, we can train a model supervised by 744 labeled data and 2231 unlabeled data by:

cd experiments/city_744
# use torch.distributed.launch
sh train.sh <num_gpu> <port>

# or use slurm
# sh slurm_train.sh <num_gpu> <port> <partition>

After training, the model should be evaluated by

sh eval.sh

Note

<num_gpu> means the number of GPUs for training.

To reproduce our results, we recommend you follow the settings:

  • Cityscapes: 4 for SupOnly and 8 for Semi-Supervised
  • PASCAL VOC 2012: 2 for SupOnly and 4 for Semi-Supervised

Or, change the lr in config.yaml in a linear manner (e.g., if you want to train a SupOnly model on Cityscapes with 8 GPUs, you are recommended to change the lr to 0.02).

If you want to train a model on other split, you need to modify data_list and n_sup in config.yaml.

Due to the randomness of function torch.nn.functional.interpolate when mode="bilinear", the results of semantic segmentation will not be the same EVEN IF a fixed random seed is set.

Therefore, we recommend you run 3 times and get the average performance.

License

This project is released under the Apache 2.0 license.

Acknowledgement

The contrastive learning loss and strong data augmentation (CutMix, CutOut, and ClassMix) are borrowed from ReCo. We reproduce our U2PL based on AEL on branch with_AEL.

Thanks a lot for their great work!

Citation

@inproceedings{wang2022semi,
    title={Semi-Supervised Semantic Segmentation Using Unreliable Pseudo Labels},
    author={Wang, Yuchao and Wang, Haochen and Shen, Yujun and Fei, Jingjing and Li, Wei and Jin, Guoqiang and Wu, Liwei and Zhao, Rui and Le, Xinyi},
    booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision and Patern Recognition (CVPR)},
    year={2022}
}

Contact

Owner
Haochen Wang
Haochen Wang
Code for the ICME 2021 paper "Exploring Driving-Aware Salient Object Detection via Knowledge Transfer"

TSOD Code for the ICME 2021 paper "Exploring Driving-Aware Salient Object Detection via Knowledge Transfer" Usage For training, open train_test, run p

Jinming Su 2 Dec 23, 2021
Keras implementation of the GNM model in paper ’Graph-Based Semi-Supervised Learning with Nonignorable Nonresponses‘

Graph-based joint model with Nonignorable Missingness (GNM) This is a Keras implementation of the GNM model in paper ’Graph-Based Semi-Supervised Lear

Fan Zhou 2 Apr 17, 2022
Reproduce results and replicate training fo T0 (Multitask Prompted Training Enables Zero-Shot Task Generalization)

T-Zero This repository serves primarily as codebase and instructions for training, evaluation and inference of T0. T0 is the model developed in Multit

BigScience Workshop 253 Dec 27, 2022
TransPrompt - Towards an Automatic Transferable Prompting Framework for Few-shot Text Classification

TransPrompt This code is implement for our EMNLP 2021's paper 《TransPrompt:Towards an Automatic Transferable Prompting Framework for Few-shot Text Cla

WangJianing 23 Dec 21, 2022
Python package for multiple object tracking research with focus on laboratory animals tracking.

motutils is a Python package for multiple object tracking research with focus on laboratory animals tracking. Features loads: MOTChallenge CSV, sleap

Matěj Šmíd 2 Sep 05, 2022
A modified version of DeepMind's Alphafold2 to divide CPU part (MSA and template searching) and GPU part (prediction model)

ParallelFold Author: Bozitao Zhong This is a modified version of DeepMind's Alphafold2 to divide CPU part (MSA and template searching) and GPU part (p

Bozitao Zhong 77 Dec 22, 2022
Pytorch implementation of YOLOX、PPYOLO、PPYOLOv2、FCOS an so on.

简体中文 | English miemiedetection 概述 miemiedetection是女装大佬咩酱基于YOLOX进行二次开发的个人检测库(使用的深度学习框架为pytorch),支持Windows、Linux系统,以女装大佬咩酱的名字命名。miemiedetection是一个不需要安装的

248 Jan 02, 2023
Unofficial PyTorch implementation of Masked Autoencoders Are Scalable Vision Learners

Unofficial PyTorch implementation of Masked Autoencoders Are Scalable Vision Learners This repository is built upon BEiT, thanks very much! Now, we on

Zhiliang Peng 2.3k Jan 04, 2023
Pyramid Scene Parsing Network, CVPR2017.

Pyramid Scene Parsing Network by Hengshuang Zhao, Jianping Shi, Xiaojuan Qi, Xiaogang Wang, Jiaya Jia, details are in project page. Introduction This

Hengshuang Zhao 1.5k Jan 05, 2023
PyTorch code for training MM-DistillNet for multimodal knowledge distillation

There is More than Meets the Eye: Self-Supervised Multi-Object Detection and Tracking with Sound by Distilling Multimodal Knowledge MM-DistillNet is a

51 Dec 20, 2022
Dynamical Wasserstein Barycenters for Time Series Modeling

Dynamical Wasserstein Barycenters for Time Series Modeling This is the code related for the Dynamical Wasserstein Barycenter model published in Neurip

8 Sep 09, 2022
Official code for paper "ISNet: Costless and Implicit Image Segmentation for Deep Classifiers, with Application in COVID-19 Detection"

Official code for paper "ISNet: Costless and Implicit Image Segmentation for Deep Classifiers, with Application in COVID-19 Detection". LRPDenseNet.py

Pedro Ricardo Ariel Salvador Bassi 2 Sep 21, 2022
Get the partition that a file belongs and the percentage of space that consumes

tinos_eisai_sy Get the partition that a file belongs and the percentage of space that consumes (works only with OSes that use the df command) tinos_ei

Konstantinos Patronas 6 Jan 24, 2022
Official code for MPG2: Multi-attribute Pizza Generator: Cross-domain Attribute Control with Conditional StyleGAN

This is the official code for Multi-attribute Pizza Generator (MPG2): Cross-domain Attribute Control with Conditional StyleGAN. Paper Demo Setup Envir

Fangda Han 5 Sep 01, 2022
Data pipelines for both TensorFlow and PyTorch!

rapidnlp-datasets Data pipelines for both TensorFlow and PyTorch ! If you want to load public datasets, try: tensorflow/datasets huggingface/datasets

1 Dec 08, 2021
Distributionally robust neural networks for group shifts

Distributionally Robust Neural Networks for Group Shifts: On the Importance of Regularization for Worst-Case Generalization This code implements the g

151 Dec 25, 2022
Fuzzer for Linux Kernel Drivers

difuze: Fuzzer for Linux Kernel Drivers This repo contains all the sources (including setup scripts), you need to get difuze up and running. Tested on

seclab 344 Dec 27, 2022
Learning to Estimate Hidden Motions with Global Motion Aggregation

Learning to Estimate Hidden Motions with Global Motion Aggregation (GMA) This repository contains the source code for our paper: Learning to Estimate

Shihao Jiang (Zac) 221 Dec 18, 2022
PyTorch implementation of our paper: Decoupling and Recoupling Spatiotemporal Representation for RGB-D-based Motion Recognition

Decoupling and Recoupling Spatiotemporal Representation for RGB-D-based Motion Recognition, arxiv This is a PyTorch implementation of our paper. 1. Re

DamoCV 11 Nov 19, 2022
This is the implementation of the paper LiST: Lite Self-training Makes Efficient Few-shot Learners.

LiST (Lite Self-Training) This is the implementation of the paper LiST: Lite Self-training Makes Efficient Few-shot Learners. LiST is short for Lite S

Microsoft 28 Dec 07, 2022