Train DeepLab for Semantic Image Segmentation

Overview

Train DeepLab for Semantic Image Segmentation

Martin Kersner, [email protected]

This repository contains scripts for training DeepLab for Semantic Image Segmentation using strongly and weakly annotated data. Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs and Weakly- and Semi-Supervised Learning of a DCNN for Semantic Image Segmentation papers describe training procedure using strongly and weakly annotated data, respectively.

git clone --recursive https://github.com/martinkersner/train-DeepLab.git 

In following tutorial we use couple of shell variables in order to reproduce the same results without any obtacles.

  • $DEEPLAB denotes the main directory where repository is checked out
  • $DATASETS denotes path to directory where all necessary datasets are stored
  • $LOGNAME denotes name of log file stored in $DEEPLAB/exper/voc12/log directory
  • $DOWNLOADS denotes directory where downloaded files are stored

Prerequisites

Install DeepLab caffe

You should follow instructions for installation. However, if you have already fulfilled all necessary dependencies running following commands from code/ directory should do the job.

cd $DEEPLAB/code
cp Makefile.config.example Makefile.config
# Adjust Makefile.config (for example, if using Anaconda Python, or if cuDNN is desired)
make all
make pycaffe
make test # NOT mandatory
make runtest # NOT mandatory

Compile DenseCRF

Go to $DEEPLAB/code/densecrf directory, modify Makefile if necessary and run make command. Or you can run following commands in sequential order.

cd $DEEPLAB/code/densecrf
# Adjust Makefile if necessary
make

Strong annotations

In this part of tutorial we train DCNN for semantic image segmentation using PASCAL VOC dataset with all 21 classes and also with limited number of them. As a training data we use only strong annotations (pixel level labels).

Dataset

All necessary data for training are listed in $DEEPLAB/exper/voc12/list/original. Training scripts are prepared to employ either PASCAL VOC 2012 dataset or augmented PASCAL VOC dataset which contains more images.

# augmented PASCAL VOC
cd $DATASETS
wget http://www.eecs.berkeley.edu/Research/Projects/CS/vision/grouping/semantic_contours/benchmark.tgz # 1.3 GB
tar -zxvf benchmark.tgz
mv benchmark_RELEASE VOC_aug

# original PASCAL VOC 2012
wget http://host.robots.ox.ac.uk/pascal/VOC/voc2012/VOCtrainval_11-May-2012.tar # 2 GB
tar -xvf VOCtrainval_11-May-2012.tar
mv VOCdevkit/VOC2012 VOC2012_orig && rm -r VOCdevkit

Data conversions

Unfortunately, ground truth labels within augmented PASCAL VOC dataset are distributed as Matlab data files, therefore we will have to convert them before we can start training itself.

cd $DATASETS/VOC_aug/dataset
mkdir cls_png
cd $DEEPLAB
./mat2png.py $DATASETS/VOC_aug/dataset/cls $DATASETS/VOC_aug/dataset/cls_png

Caffe softmax loss function can accept only one-channel ground truth labels. However, those labels in original PASCAL VOC 2012 dataset are defined as RGB images. Thus, we have to reduce their dimensionality.

cd $DATASETS/VOC2012_orig
mkdir SegmentationClass_1D

cd $DEEPLAB
./convert_labels.py $DATASETS/VOC2012_orig/SegmentationClass/ \
  $DATASETS/VOC2012_orig/ImageSets/Segmentation/trainval.txt \
  $DATASETS/VOC2012_orig/SegmentationClass_1D/

At last, part of code which computes DenseCRF is able to work only with PPM image files, hence we have to perform another conversion. This step is necessary only if we want to use DenseCRF separately and as one of Caffe layers.

cd $DEEPLAB

# augmented PASCAL VOC
mkdir $DATASETS/VOC_aug/dataset/img_ppm
./jpg2ppm.sh $DATASETS/VOC_aug/dataset/img $DATASETS/VOC_aug/dataset/img_ppm

# original PASCAL VOC 2012
mkdir $DATASETS/VOC2012_orig/PPMImages
./jpg2ppm.sh $DATASETS/VOC2012_orig/JPEGImages $DATASETS/VOC2012_orig/PPMImages

Connect $DATASETS into $DEEPLAB

Then we create symbolic links to training images and ground truth labels.

mkdir -p $DEEPLAB/exper/voc12/data
cd $DEEPLAB/exper/voc12/data

# augmented PASCAL VOC
ln -s $DATASETS/VOC_aug/dataset/img images_aug
ln -s $DATASETS/VOC_aug/dataset/cls_png labels_aug
ln -s $DATASETS/VOC_aug/dataset/img_ppm images_aug_ppm

# original PASCAL VOC 2012
ln -s $DATASETS/VOC2012_orig/JPEGImages images_orig
ln -s $DATASETS/VOC2012_orig/SegmentationClass_1D labels_orig
ln -s $DATASETS/VOC2012_orig/PPMImages images_orig_ppm

Download necessary files for training

Before the first training we have to download several files. Using the command below we download initialization model, definition its network and solver. It will also setup symbolic links in directories where those files are later expected during training.

./get_DeepLab_LargeFOV_voc12_data.sh

In order to easily switch between datasets we will modify image lists appropriately.

./prepare_voc12_data_lists.sh

Training with all classes

run_pascal_strong.sh can go through 4 different phases (twice training, twice testing), but I wouldn't recommend to run testing phases using this script. Actually, they are currently disabled. At lines 27 through 30, any of phase can be enabled (value 1) or disabled (value 0).

Finally, we can start training.

./run_pascal_strong.sh

Plotting training information

Training script generates information which are printed to terminal and also stored in $DEEPLAB/exper/voc12/log/DeepLab-LargeFOV/ directory. For every printed iteration there are displayed loss and three different model evalutation metrics for currently employed batch. They denote pixel accuracy, average recall and average Jacard index, respectively. Even though those values are retrievd from training data, they possess important information about training and using the script below we can plot them as a graph. The script generates two graphs evaluation.png and loss.png.

cd $DEEPLAB
./loss_from_log.py exper/voc12/log/DeepLab-LargeFOV/`ls -t exper/voc12/log/DeepLab-LargeFOV/ | head -n 1` # for the newest log
#./loss_from_log.py exper/voc12/log/DeepLab-LargeFOV/$LOGNAME # specified log 

Training with only 3 classes

If we want to train with limited number of classes we have to modify ground truth labels and also list of images that can be exploited for training. In filter_images.py at line 17 are specified classes that we are interested in (defaultly bird, bottle and chair).

# augmented PASCAL VOC 
mkdir -p $DATASETS/VOC_aug/dataset/cls_sub_png
cd $DEEPLAB/exper/voc12/data/
ln -s $DATASETS/VOC_aug/dataset/cls_sub_png labels_sub_aug
find exper/voc12/data/labels_aug/ -printf '%f\n' | sed 's/\.png//'  | tail -n +2 > all_aug_data.txt
python filter_images.py $DATASETS/VOC_aug/dataset/cls_png/ $DATASETS/VOC_aug/dataset/cls_sub_png/ all_aug_data.txt sub_aug_data.txt

# original PASCAL VOC 2012 
mkdir -p $DATASETS/VOC2012_orig/SegmentationClass_sub_1D
cd $DEEPLAB/exper/voc12/data/
ln -s $DATASETS/VOC2012_orig/SegmentationClass_sub_1D labels_sub_orig
find exper/voc12/data/labels_orig/ -printf '%f\n' | sed 's/\.png//'  | tail -n +2 > all_orig_data.txt
python filter_images.py $DATASETS/VOC2012_orig/SegmentationClass_1D/ $DATASETS/VOC2012_orig/SegmentationClass_sub_1D/ all_orig_data.txt sub_orig_data.txt

./filter_lists.sh

The number of classes that we plan to use is set at lines 13 and 14 in run_pascal_strong.sh. This number should be always higher by 1 than number of specified classes in filter_images.py script, because we also consider background as one of classes.

After, we can proceed to training.

./run_pascal_strong.sh

We can also use the same script for plotting training information.

Evaluation

phase 1 (24,000 iter., no CRF) phase 2 (12,000 iter., no CRF)
pixel accuracy 0.8315 0.8523
mean accuracy 0.6807 0.6987
mean IU 0.6725 0.6937
frequency weighted IU 0.8182 0.8439

Visual results

Employed model was trained without CRF in phase 1 (24,000 iterations) and then in phase 2 (12,000 iterations), but results here exploited DENSE_CRF layer. Displayed images (bird: 2010_004994, bottle: 2007_000346, chair: 2008_000673) are part of validation dataset stored in $DEEPLAB/exper/voc12/list_subset/val.txt. Colors of segments differ from original ground truth labels because employed model was trained only for 3 classes + background.

Weak annotations

In a case we don't possess enough training data, weakly annotated ground truth labels can be exploited using DeepLab.

Dataset

At first you should download SegmentationClassBboxRect_Visualization.zip and SegmentationClassBboxSeg_Visualization.zip from link https://ucla.app.box.com/s/laif889j7pk6dj04b0ou1apm2sgub9ga and run commands below to prepare data for use.

cd $DOWNLOADS
mv SegmentationClassBboxRect_Visualization.zip $DATASETS/VOC_aug/dataset/
mv SegmentationClassBboxSeg_Visualization.zip $DATASETS/VOC_aug/dataset/

cd $DATASETS/VOC_aug/dataset
unzip SegmentationClassBboxRect_Visualization.zip
unzip SegmentationClassBboxSeg_Visualization.zip

mv SegmentationClassBboxAug_Visualization/ SegClassBboxAug_RGB
mv SegmentationClassBboxErode20CRFAug_Visualization/ SegClassBboxErode20CRFAug_RGB

Downloaded weak annotations were created using Matlab and because of that labels are sometimes stored with one channel and other times with three channels. Similarly to strong annotations we have to convert all labels to the same one channel format. In order to cope with it I recommend you to use Matlab script convert_weak_labels.m (if anybody knows how to perform the same conversion using python I would be really interested) which is stored in $DEEPLAB directory. Before running script you have to specify path to datasets on line 3.

After script successfully finished we have to create symbolic links to be able to reach data during training.

cd $DEEPLAB/exper/voc12/data
ln -s $DATASETS/VOC_aug/dataset/SegClassBboxAug_1D/ labels_bbox
ln -s $DATASETS/VOC_aug/dataset/SegClassBboxErode20CRFAug_1D/ labels_bboxcrf

Create subsets

Training DeepLab using weak labels enables to employ datasets of different sizes. Following snippet creates those subsets of strong dataset and also necessary training lists with weak labels.

cd $DEEPLAB
./create_weak_lists.sh

cd $DEEPLAB/exper/voc12/list
head -n 200  train.txt > train200.txt
head -n 500  train.txt > train500.txt
head -n 750  train.txt > train750.txt
head -n 1000 train.txt > train1000.txt

cp train_bboxcrf.txt trainval_bboxcrf.txt
cp train_bbox.txt trainval_bbox.txt

Training

Training using weak annotations is similar to exploiting strong annotations. The only difference is the name of script which should be run.

./run_pascal_weak.sh

Plotting is also same as for strong annotations.

Evaluation

5000 weak annotations and 200 strong annotations

phase 1 (6,000 iter., no CRF) phase 2 (8,000 iter., no CRF)
pixel accuracy 0.8688 0.8671
mean accuracy 0.7415 0.750
mean IU 0.6324 0.6343
frequency weighted IU 0.7962 0.7951

Note

Init models are modified VGG-16 networks with changed kernel size from 7x7 to 4x4 or 3x3. There are two models that can be employed for initialization: vgg16_128, vgg16_20M.

The first fully connected layer of vgg16_128 has kernel size 4x4 and 4096 filters. It can be used for DeepLab basic model. In vgg16_20M, the first fully connected layer has kernel size 3x3 and 1024 filters. It can be used for DeepLab-LargeFOV.

Currently training is focused on DeepLab-LargeFOV.

FAQ

At http://ccvl.stat.ucla.edu/deeplab_faq/ you can find frequently asked questions about DeepLab for semantic image segmentation.

Owner
Martin Kersner
Machine Learning Engineer
Martin Kersner
Pre-Training Graph Neural Networks for Cold-Start Users and Items Representation.

Pretrain-Recsys This is our Tensorflow implementation for our WSDM 2021 paper: Bowen Hao, Jing Zhang, Hongzhi Yin, Cuiping Li, Hong Chen. Pre-Training

30 Nov 14, 2022
AWS provides a Python SDK, "Boto3" ,which can be used to access the AWS-account from the local.

Boto3 - The AWS SDK for Python Boto3 is the Amazon Web Services (AWS) Software Development Kit (SDK) for Python, which allows Python developers to wri

Shreyas Srivastava 1 Oct 25, 2021
Deep Multimodal Neural Architecture Search

MMNas: Deep Multimodal Neural Architecture Search This repository corresponds to the PyTorch implementation of the MMnas for visual question answering

Vision and Language Group@ MIL 23 Dec 21, 2022
Official PyTorch implementation of "RMGN: A Regional Mask Guided Network for Parser-free Virtual Try-on" (IJCAI-ECAI 2022)

RMGN-VITON RMGN: A Regional Mask Guided Network for Parser-free Virtual Try-on In IJCAI-ECAI 2022(short oral). [Paper] [Supplementary Material] Abstra

27 Dec 01, 2022
Cweqgen - The CW Equation Generator

The CW Equation Generator The cweqgen (pronouced like "Queck-Jen") package provi

2 Jan 15, 2022
This repository contains part of the code used to make the images visible in the article "How does an AI Imagine the Universe?" published on Towards Data Science.

Generative Adversarial Network - Generating Universe This repository contains part of the code used to make the images visible in the article "How doe

Davide Coccomini 9 Dec 18, 2022
Ludwig is a toolbox that allows to train and evaluate deep learning models without the need to write code.

Translated in 🇰🇷 Korean/ Ludwig is a toolbox that allows users to train and test deep learning models without the need to write code. It is built on

Ludwig 8.7k Jan 05, 2023
[AAAI 2021] EMLight: Lighting Estimation via Spherical Distribution Approximation and [ICCV 2021] Sparse Needlets for Lighting Estimation with Spherical Transport Loss

EMLight: Lighting Estimation via Spherical Distribution Approximation (AAAI 2021) Update 12/2021: We release our Virtual Object Relighting (VOR) Datas

Fangneng Zhan 144 Jan 06, 2023
PPLNN is a Primitive Library for Neural Network is a high-performance deep-learning inference engine for efficient AI inferencing

PPLNN is a Primitive Library for Neural Network is a high-performance deep-learning inference engine for efficient AI inferencing

943 Jan 07, 2023
Implementation of "A Deep Learning Loss Function based on Auditory Power Compression for Speech Enhancement" by pytorch

This repository is used to suspend the results of our paper "A Deep Learning Loss Function based on Auditory Power Compression for Speech Enhancement"

ScorpioMiku 19 Sep 30, 2022
Putting NeRF on a Diet: Semantically Consistent Few-Shot View Synthesis Implementation

Putting NeRF on a Diet: Semantically Consistent Few-Shot View Synthesis Implementation This project attempted to implement the paper Putting NeRF on a

254 Dec 27, 2022
Source code of AAAI 2022 paper "Towards End-to-End Image Compression and Analysis with Transformers".

Towards End-to-End Image Compression and Analysis with Transformers Source code of our AAAI 2022 paper "Towards End-to-End Image Compression and Analy

37 Dec 21, 2022
This folder contains the implementation of the multi-relational attribute propagation algorithm.

MrAP This folder contains the implementation of the multi-relational attribute propagation algorithm. It requires the package pytorch-scatter. Please

6 Dec 06, 2022
Code for the paper "Improving Vision-and-Language Navigation with Image-Text Pairs from the Web" (ECCV 2020)

Improving Vision-and-Language Navigation with Image-Text Pairs from the Web Arjun Majumdar, Ayush Shrivastava, Stefan Lee, Peter Anderson, Devi Parikh

Arjun Majumdar 44 Dec 14, 2022
Towards Boosting the Accuracy of Non-Latin Scene Text Recognition

Convolutional Recurrent Neural Network + CTCLoss | STAR-Net Code for paper "Towards Boosting the Accuracy of Non-Latin Scene Text Recognition" Depende

Sanjana Gunna 7 Aug 07, 2022
Introducing neural networks to predict stock prices

IntroNeuralNetworks in Python: A Template Project IntroNeuralNetworks is a project that introduces neural networks and illustrates an example of how o

Vivek Palaniappan 637 Jan 04, 2023
Code for the SIGIR 2022 paper "Hybrid Transformer with Multi-level Fusion for Multimodal Knowledge Graph Completion"

MKGFormer Code for the SIGIR 2022 paper "Hybrid Transformer with Multi-level Fusion for Multimodal Knowledge Graph Completion" Model Architecture Illu

ZJUNLP 68 Dec 28, 2022
LLVIP: A Visible-infrared Paired Dataset for Low-light Vision

LLVIP: A Visible-infrared Paired Dataset for Low-light Vision Project | Arxiv | Abstract It is very challenging for various visual tasks such as image

CVSM Group - email: <a href=[email protected]"> 377 Jan 07, 2023
Yolov5 deepsort inference,使用YOLOv5+Deepsort实现车辆行人追踪和计数,代码封装成一个Detector类,更容易嵌入到自己的项目中

使用YOLOv5+Deepsort实现车辆行人追踪和计数,代码封装成一个Detector类,更容易嵌入到自己的项目中。

813 Dec 31, 2022
TACTO: A Fast, Flexible and Open-source Simulator for High-Resolution Vision-based Tactile Sensors

TACTO: A Fast, Flexible and Open-source Simulator for High-Resolution Vision-based Tactile Sensors This package provides a simulator for vision-based

Facebook Research 255 Dec 27, 2022