Training PSPNet in Tensorflow. Reproduce the performance from the paper.

Overview

Training Reproduce of PSPNet.

(Updated 2021/04/09. Authors of PSPNet have provided a Pytorch implementation for PSPNet and their new work with supporting Sync Batch Norm, see https://github.com/hszhao/semseg.)

(Updated 2019/02/26. A major change of code structure. For the version before, checkout v0.9 https://github.com/holyseven/PSPNet-TF-Reproduce/tree/v0.9.)

This is an implementation of PSPNet (from training to test) in pure Tensorflow library (tested on TF1.12, Python 3).

  • Supported Backbones: ResNet-V1-50, ResNet-V1-101 and other ResNet-V1s can be easily added.
  • Supported Databases: ADE20K, SBD (Augmented Pascal VOC) and Cityscapes.
  • Supported Modes: training, validation and inference with multi-scale inputs.
  • More things: L2-SP regularization and sync batch normalization implementation.

L2-SP Regularization

L2-SP regularization is a variant of L2 regularization. Instead of the origin like L2 does, L2-SP sets the pre-trained model as reference, just like (w - w0)^2, where w0 is the pre-trained model. Simple but effective. More details about L2-SP can be found in the paper and the code.

If you find the L2-SP useful for your research (not limited in image segmentation), please consider citing our work:

@inproceedings{li2018explicit,
  author    = {Li, Xuhong and Grandvalet, Yves and Davoine, Franck},
  title     = {Explicit Inductive Bias for Transfer Learning with Convolutional Networks},
  booktitle={International Conference on Machine Learning (ICML)},
   pages     = {2830--2839},
  year      = {2018}
}

Sync Batch Norm

When concerning image segmentation, batch size is usually limited. Small batch size will make the gradients instable and harm the performance, especially for batch normalization layers. Multi-GPU settings by default does not help because the statistics in batch normalization layer are computed independently within each GPU. More discussion can be found here and here.

This repo resolves this problem in pure python and pure Tensorflow by simply using a list as input. The main idea is located in model/utils_mg.py

I do not know if this is the first implementation of sync batch norm in Tensorflow, but there is already an implementation in PyTorch and some applications.

Update: There is other implementation that uses NCCL to gather statistics across GPUs, see in tensorpack. However, TF1.1 does not support gradients passing by nccl_all_reduce. Plus, ppc64le with tf1.10, cuda9.0 and nccl1.3.5 was not able to run this code. No idea why, and do not want to spend a lot of time on this. Maybe nccl2 can solve this.

Results

Numerical Results

  • Random scaling for all
  • Random rotation for SBD
  • SS/MS on validation set
  • Welcome to correct and fill in the table
Backbones L2 L2-SP
Cityscapes (train set: 3K) ResNet-50 76.9/? 77.9/?
ResNet-101 77.9/? 78.6/?
Cityscapes (coarse + train set: 20K + 3K) ResNet-50
ResNet-101 80.0/80.9 80.1/81.2*
SBD ResNet-50 76.5/? 76.6/?
ResNet-101 77.5/79.2 78.5/79.9
ADE20K ResNet-50 41.92/43.09
ResNet-101 42.80/?

*This model gets 80.3 without post-processing methods on Cityscapes test set (1525).

Qualitative Results on Cityscapes

Devil Details

Training and Evaluation

Download the databases with the links: ADE20K, SBD (Augmented Pascal VOC) and Cityscapes.

Prepare the database for Cityscapes by generating *labelTrainIds.png images with createTrainIdLabelImgs, and then change the code in database/reader.py or move undersired images to other directory.

Download pretrained models.

cd z_pretrained_weights
sh download_resnet_v1_101.sh

A script of training resnet-50 on ADE20K, getting around 41.92 mIoU scores (with single-scale test):

python ./run.py --network 'resnet_v1_50' --visible_gpus '0,1' --reader_method 'queue' --lrn_rate 0.01 --weight_decay_mode 0 --weight_decay_rate 0.0001 --weight_decay_rate2 0.001 --database 'ADE' --subsets_for_training 'train' --batch_size 8 --train_image_size 480 --snapshot 30000 --train_max_iter 90000 --test_image_size 480 --random_rotate 0 --fine_tune_filename './z_pretrained_weights/resnet_v1_50.ckpt'

Test and Infer

Test with multi-scale (set batch_size as large as you can to speed up).

python predict.py --visible_gpus '0' --network 'resnet_v1_101' --database 'ADE' --weights_ckpt './log/ADE/PSP-resnet_v1_101-gpu_num2-batch_size8-lrn_rate0.01-random_scale1-random_rotate1-480-60000-train-1-0.0001-0.001-0-0-1-1/snapshot/model.ckpt-60000' --test_subset 'val' --test_image_size 480 --batch_size 8 --ms 1 --mirror 1

Infer one image (with multi-scale).

python demo_infer.py --database 'Cityscapes' --network 'resnet_v1_101' --weights_ckpt './log/Cityscapes/old/model.ckpt-50000' --test_image_size 864 --batch_size 4 --ms 1

Uncertainties for Training Details:

  1. (Cityscapes only) Whether finely labeled data in the first training stage should be involved?
  2. (Cityscapes only) Whether the (base) learning rate should be reduced in the second training stage?
  3. Whether logits should be resized to original size before computing the loss?
  4. Whether new layers should receive larger learning rate?
  5. About weired padding behavior of tf.image.resize_images(). Whether the align_corners=True should be set?
  6. What is optimal hyperparameter of decay for statistics of batch normalization layers? (0.9, 0.95, 0.9997)
  7. may be more but not sure how much these little changes can effect the results ...
  8. Welcome to discuss !

Change Log

26 Febuary, 2019

  • Code structure: on-the-fly evaluation during training.
  • Code structure: wrapping of the model.
  • Add tf.data support, but with queue-based reader is faster.
  • print results using python utils.py in experiment_manager dir.
  • The default environment is Python 3 and TF1.12. OpenCV is needed for predicting and demo_infer.
  • The previous version becomes a branch of this repo named as v0.9.

External links

Pyramid Scene Parsing Network paper and official github.

Owner
Li Xuhong
Researcher at Baidu Research, focus on interpretable deep learning and transfer learning.
Li Xuhong
For IBM Quantum Challenge Africa 2021, 9 September (07:00 UTC) - 20 September (23:00 UTC).

IBM Quantum Challenge Africa 2021 To ensure Africa is able to apply quantum computing to solve problems relevant to the continent, the IBM Research La

Qiskit Community 48 Dec 25, 2022
A minimalist tool to display a network graph.

A tool to get a minimalist view of any architecture This tool has only be tested with the models included in this repo. Therefore, I can't guarantee t

Thibault Castells 1 Feb 11, 2022
Deep Ensemble Learning with Jet-Like architecture

Ransomware analysis using DEL with jet-like architecture comprising two CNN wings, a sparse AE tail, a non-linear PCA to produce a diverse feature space, and an MLP nose

Ahsen Nazir 2 Feb 06, 2022
Benchmark spaces - Benchmarks of how well different two dimensional spaces work for clustering algorithms

benchmark_spaces Benchmarks of how well different two dimensional spaces work fo

Bram Cohen 6 May 07, 2022
BrainGNN - A deep learning model for data-driven discovery of functional connectivity

A deep learning model for data-driven discovery of functional connectivity https://doi.org/10.3390/a14030075 Usman Mahmood, Zengin Fu, Vince D. Calhou

Usman Mahmood 3 Aug 28, 2022
This is the repository for The Machine Learning Workshops, published by AI DOJO

This is the repository for The Machine Learning Workshops, published by AI DOJO. It contains all the workshop's code with supporting project files necessary to work through the code.

AI Dojo 12 May 06, 2022
Code for TIP 2017 paper --- Illumination Decomposition for Photograph with Multiple Light Sources.

Illumination_Decomposition Code for TIP 2017 paper --- Illumination Decomposition for Photograph with Multiple Light Sources. This code implements the

QAY 7 Nov 15, 2020
Official implementation of CATs: Cost Aggregation Transformers for Visual Correspondence NeurIPS'21

CATs: Cost Aggregation Transformers for Visual Correspondence NeurIPS'21 For more information, check out the paper on [arXiv]. Training with different

Sunghwan Hong 120 Jan 04, 2023
Autonomous Perception: 3D Object Detection with Complex-YOLO

Autonomous Perception: 3D Object Detection with Complex-YOLO LiDAR object detect

Thomas Dunlap 2 Feb 18, 2022
An optimization and data collection toolbox for convenient and fast prototyping of computationally expensive models.

An optimization and data collection toolbox for convenient and fast prototyping of computationally expensive models. Hyperactive: is very easy to lear

Simon Blanke 422 Jan 04, 2023
DWIPrep is a robust and easy-to-use pipeline for preprocessing of diverse dMRI data.

DWIPrep: A Robust Preprocessing Pipeline for dMRI Data DWIPrep is a robust and easy-to-use pipeline for preprocessing of diverse dMRI data. The transp

Gal Ben-Zvi 1 Jan 09, 2023
A Differentiable Recipe for Learning Visual Non-Prehensile Planar Manipulation

A Differentiable Recipe for Learning Visual Non-Prehensile Planar Manipulation This repository contains the source code of the paper A Differentiable

Bernardo Aceituno 2 May 05, 2022
Perform zero-order Hankel Transform for an 1D array (float or real valued).

perform zero-order Hankel Transform for an 1D array (float or real valued). An discrete form of Parseval theorem is guaranteed. Suit for iterative problems.

1 Jan 17, 2022
InferPy: Deep Probabilistic Modeling with Tensorflow Made Easy

InferPy: Deep Probabilistic Modeling Made Easy InferPy is a high-level API for probabilistic modeling written in Python and capable of running on top

PGM-Lab 141 Oct 13, 2022
An unofficial implementation of "Unpaired Image Super-Resolution using Pseudo-Supervision." CVPR2020

UnpairedSR An unofficial implementation of "Unpaired Image Super-Resolution using Pseudo-Supervision." CVPR2020 turn RCAN(modified) -- xmodel(xilinx

JiaKui Hu 10 Oct 28, 2022
PyTorch implementation of our ICCV 2021 paper Intrinsic-Extrinsic Preserved GANs for Unsupervised 3D Pose Transfer.

Unsupervised_IEPGAN This is the PyTorch implementation of our ICCV 2021 paper Intrinsic-Extrinsic Preserved GANs for Unsupervised 3D Pose Transfer. Ha

25 Oct 26, 2022
Cross-modal Retrieval using Transformer Encoder Reasoning Networks (TERN). With use of Metric Learning and FAISS for fast similarity search on GPU

Cross-modal Retrieval using Transformer Encoder Reasoning Networks This project reimplements the idea from "Transformer Reasoning Network for Image-Te

Minh-Khoi Pham 5 Nov 05, 2022
CTRMs: Learning to Construct Cooperative Timed Roadmaps for Multi-agent Path Planning in Continuous Spaces

CTRMs: Learning to Construct Cooperative Timed Roadmaps for Multi-agent Path Planning in Continuous Spaces This is a repository for the following pape

17 Oct 13, 2022
Learning to Disambiguate Strongly Interacting Hands via Probabilistic Per-Pixel Part Segmentation [3DV 2021 Oral]

Learning to Disambiguate Strongly Interacting Hands via Probabilistic Per-Pixel Part Segmentation [3DV 2021 Oral] Learning to Disambiguate Strongly In

Zicong Fan 40 Dec 22, 2022
BERT model training impelmentation using 1024 A100 GPUs for MLPerf Training v1.1

Pre-trained checkpoint and bert config json file Location of checkpoint and bert config json file This MLCommons members Google Drive location contain

SAIT (Samsung Advanced Institute of Technology) 12 Apr 27, 2022