Official Implementation of DAFormer: Improving Network Architectures and Training Strategies for Domain-Adaptive Semantic Segmentation

Related tags

Deep LearningDAFormer
Overview

DAFormer: Improving Network Architectures and Training Strategies for Domain-Adaptive Semantic Segmentation

[Arxiv] [Paper]

As acquiring pixel-wise annotations of real-world images for semantic segmentation is a costly process, a model can instead be trained with more accessible synthetic data and adapted to real images without requiring their annotations. This process is studied in Unsupervised Domain Adaptation (UDA).

Even though a large number of methods propose new UDA strategies, they are mostly based on outdated network architectures. In this work, we particularly study the influence of the network architecture on UDA performance and propose DAFormer, a network architecture tailored for UDA. It consists of a Transformer encoder and a multi-level context-aware feature fusion decoder.

DAFormer is enabled by three simple but crucial training strategies to stabilize the training and to avoid overfitting the source domain: While the Rare Class Sampling on the source domain improves the quality of pseudo-labels by mitigating the confirmation bias of self-training towards common classes, the Thing-Class ImageNet Feature Distance and a Learning Rate Warmup promote feature transfer from ImageNet pretraining.

DAFormer significantly improves the state-of-the-art performance by 10.8 mIoU for GTA→Cityscapes and by 5.4 mIoU for Synthia→Cityscapes and enables learning even difficult classes such as train, bus, and truck well.

UDA over time

The strengths of DAFormer, compared to the previous state-of-the-art UDA method ProDA, can also be observed in qualitative examples from the Cityscapes validation set.

Demo Color Palette

For more information on DAFormer, please check our [Paper].

If you find this project useful in your research, please consider citing:

@article{hoyer2021daformer,
  title={DAFormer: Improving Network Architectures and Training Strategies for Domain-Adaptive Semantic Segmentation},
  author={Hoyer, Lukas and Dai, Dengxin and Van Gool, Luc},
  journal={arXiv preprint arXiv:2111.14887},
  year={2021}
}

Setup Environment

For this project, we used python 3.8.5. We recommend setting up a new virtual environment:

python -m venv ~/venv/daformer
source ~/venv/daformer/bin/activate

In that environment, the requirements can be installed with:

pip install -r requirements.txt -f https://download.pytorch.org/whl/torch_stable.html
pip install mmcv-full==1.3.7  # requires the other packages to be installed first

Further, please download the MiT weights and a pretrained DAFormer using the following script. If problems occur with the automatic download, please follow the instructions for a manual download within the script.

sh tools/download_checkpoints.sh

All experiments were executed on a NVIDIA RTX 2080 Ti.

Inference Demo

Already as this point, the provided DAFormer model (downloaded by tools/download_checkpoints.sh) can be applied to a demo image:

python -m demo.image_demo demo/demo.png work_dirs/211108_1622_gta2cs_daformer_s0_7f24c/211108_1622_gta2cs_daformer_s0_7f24c.json work_dirs/211108_1622_gta2cs_daformer_s0_7f24c/latest.pth

When judging the predictions, please keep in mind that DAFormer had no access to real-world labels during the training.

Setup Datasets

Cityscapes: Please, download leftImg8bit_trainvaltest.zip and gt_trainvaltest.zip from here and extract them to data/cityscapes.

GTA: Please, download all image and label packages from here and extract them to data/gta.

Synthia: Please, download SYNTHIA-RAND-CITYSCAPES from here and extract it to data/synthia.

The final folder structure should look like this:

DAFormer
├── ...
├── data
│   ├── cityscapes
│   │   ├── leftImg8bit
│   │   │   ├── train
│   │   │   ├── val
│   │   ├── gtFine
│   │   │   ├── train
│   │   │   ├── val
│   ├── gta
│   │   ├── images
│   │   ├── labels
│   ├── synthia
│   │   ├── RGB
│   │   ├── GT
│   │   │   ├── LABELS
├── ...

Data Preprocessing: Finally, please run the following scripts to convert the label IDs to the train IDs and to generate the class index for RCS:

python tools/convert_datasets/gta.py data/gta --nproc 8
python tools/convert_datasets/cityscapes.py data/cityscapes --nproc 8
python tools/convert_datasets/synthia.py data/synthia/ --nproc 8

Training

For convenience, we provide an annotated config file of the final DAFormer. A training job can be launched using:

python run_experiments.py --config configs/daformer/gta2cs_uda_warm_fdthings_rcs_croppl_a999_daformer_mitb5_s0.py

For the experiments in our paper (e.g. network architecture comparison, component ablations, ...), we use a system to automatically generate and train the configs:

python run_experimenty.py --exp <ID>

More information about the available experiments and their assigned IDs, can be found in experiments.py. The generated configs will be stored in configs/generated/.

Testing & Predictions

The provided DAFormer checkpoint trained on GTA->Cityscapes (already downloaded by tools/download_checkpoints.sh) can be tested on the Cityscapes validation set using:

sh test.sh work_dirs/211108_1622_gta2cs_daformer_s0_7f24c

The predictions are saved for inspection to work_dirs/211108_1622_gta2cs_daformer_s0_7f24c/preds and the mIoU of the model is printed to the console. The provided checkpoint should achieve 68.85 mIoU. Refer to the end of work_dirs/211108_1622_gta2cs_daformer_s0_7f24c/20211108_164105.log for more information such as the class-wise IoU.

Similarly, also other models can be tested after the training has finished:

sh test.sh path/to/checkpoint_directory

Framework Structure

This project is based on mmsegmentation version 0.16.0. For more information about the framework structure and the config system, please refer to the mmsegmentation documentation and the mmcv documentation.

The most relevant files for DAFormer are:

Acknowledgements

This project is based on the following open-source projects. We thank their authors for making the source code publically available.

Owner
Lukas Hoyer
Doctoral student at ETH Zurich
Lukas Hoyer
Learning Spatio-Temporal Transformer for Visual Tracking

STARK The official implementation of the paper Learning Spatio-Temporal Transformer for Visual Tracking Hiring research interns for visual transformer

Multimedia Research 484 Dec 29, 2022
Source code for the plant extraction workflow introduced in the paper “Agricultural Plant Cataloging and Establishment of a Data Framework from UAV-based Crop Images by Computer Vision”

Plant extraction workflow Source code for the plant extraction workflow introduced in the paper "Agricultural Plant Cataloging and Establishment of a

Maurice Günder 0 Apr 22, 2022
Codes for NAACL 2021 Paper "Unsupervised Multi-hop Question Answering by Question Generation"

Unsupervised-Multi-hop-QA This repository contains code and models for the paper: Unsupervised Multi-hop Question Answering by Question Generation (NA

Liangming Pan 70 Nov 27, 2022
Multivariate Boosted TRee

Multivariate Boosted TRee What is MBTR MBTR is a python package for multivariate boosted tree regressors trained in parameter space. The package can h

SUPSI-DACD-ISAAC 61 Dec 19, 2022
PyQt6 configuration in yaml format providing the most simple script.

PyamlQt(ぴゃむるきゅーと) PyQt6 configuration in yaml format providing the most simple script. Requirements yaml PyQt6, ( PyQt5 ) Installation pip install Pya

Ar-Ray 7 Aug 15, 2022
This is the implementation of GGHL (A General Gaussian Heatmap Labeling for Arbitrary-Oriented Object Detection)

GGHL: A General Gaussian Heatmap Labeling for Arbitrary-Oriented Object Detection This is the implementation of GGHL 👋 👋 👋 [Arxiv] [Google Drive][B

551 Dec 31, 2022
Official Pytorch Implementation of 'Learning Action Completeness from Points for Weakly-supervised Temporal Action Localization' (ICCV-21 Oral)

Learning-Action-Completeness-from-Points Official Pytorch Implementation of 'Learning Action Completeness from Points for Weakly-supervised Temporal A

Pilhyeon Lee 67 Jan 03, 2023
Modeling Temporal Concept Receptive Field Dynamically for Untrimmed Video Analysis

Modeling Temporal Concept Receptive Field Dynamically for Untrimmed Video Analysis This is a PyTorch implementation of the model described in our pape

qzhb 6 Jul 08, 2021
People movement type classifier with YOLOv4 detection and SORT tracking.

Movement classification The goal of this project would be movement classification of people, in other words, walking (normal and fast) and running. Yo

4 Sep 21, 2021
Misc YOLOL scripts for use in the Starbase space sandbox videogame

starbase-misc Misc YOLOL scripts for use in the Starbase space sandbox videogame. Each directory contains standalone YOLOL scripts. They don't really

4 Oct 17, 2021
Hierarchical User Intent Graph Network for Multimedia Recommendation

Hierarchical User Intent Graph Network for Multimedia Recommendation This is our Pytorch implementation for the paper: Hierarchical User Intent Graph

6 Jan 05, 2023
Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers

Segmentation Transformer Implementation of Segmentation Transformer in PyTorch, a new model to achieve SOTA in semantic segmentation while using trans

Abhay Gupta 161 Dec 08, 2022
Code To Tune or Not To Tune? Zero-shot Models for Legal Case Entailment.

COLIEE 2021 - task 2: Legal Case Entailment This repository contains the code to reproduce NeuralMind's submissions to COLIEE 2021 presented in the pa

NeuralMind 13 Dec 16, 2022
Camera-caps - Examine the camera capabilities for V4l2 cameras

camera-caps This is a graphical user interface over the v4l2-ctl command line to

Jetsonhacks 25 Dec 26, 2022
git《Tangent Space Backpropogation for 3D Transformation Groups》(CVPR 2021) GitHub:1]

LieTorch: Tangent Space Backpropagation Introduction The LieTorch library generalizes PyTorch to 3D transformation groups. Just as torch.Tensor is a m

Princeton Vision & Learning Lab 482 Jan 06, 2023
Nsdf: A mesh SDF with just some code we can directly paste into our raymarcher

nsdf Representing SDFs of arbitrary meshes has been a bit tricky so far. Express

Jan Ivanecky 5 Feb 18, 2022
Session-based Recommendation, CoHHN, price preferences, interest preferences, Heterogeneous Hypergraph, Co-guided Learning, SIGIR2022

This is our implementation for the paper: Price DOES Matter! Modeling Price and Interest Preferences in Session-based Recommendation Xiaokun Zhang, Bo

Xiaokun Zhang 27 Dec 02, 2022
System Combination for Grammatical Error Correction Based on Integer Programming

System Combination for Grammatical Error Correction Based on Integer Programming This repository contains the code and scripts that implement the syst

NUS NLP Group 0 Mar 29, 2022
Deep Learning for Time Series Classification

Deep Learning for Time Series Classification This is the companion repository for our paper titled "Deep learning for time series classification: a re

Hassan ISMAIL FAWAZ 1.2k Jan 02, 2023
Workshop Materials Delivered on 28/02/2022

intro-to-cnn-p1 Repo for hosting workshop materials delivered on 28/02/2022 Questions you will answer in this workshop Learning Objectives What are co

Beginners Machine Learning 5 Feb 28, 2022