DeepLabv3+：Encoder-Decoder with Atrous Separable Convolution语义分割模型在tensorflow2当中的实现

Last update: Nov 25, 2022

Related tags

Overview

DeepLabv3+：Encoder-Decoder with Atrous Separable Convolution语义分割模型在tensorflow2当中的实现

性能情况

训练数据集	权值文件名称	测试数据集	输入图片大小	mIOU
VOC12+SBD	deeplabv3_mobilenetv2.h5	VOC-Val12	512x512	72.50
VOC12+SBD	deeplabv3_xception.h5	VOC-Val12	512x512	87.10

所需环境

tensorflow==2.2.0

注意事项

代码中的deeplabv3_mobilenetv2.h5和deeplabv3_xception.h5是基于VOC拓展数据集训练的。训练和预测时注意修改backbone。

文件下载

训练所需的deeplabv3_mobilenetv2.h5和deeplabv3_xception.h5可在百度网盘中下载。
链接: https://pan.baidu.com/s/1zVRshWRkb5C3kmDMwEf89A 提取码: ccq5

VOC拓展数据集的百度网盘如下：
链接: https://pan.baidu.com/s/1BrR7AUM1XJvPWjKMIy2uEw 提取码: vszf

训练步骤

a、训练voc数据集

1、将我提供的voc数据集放入VOCdevkit中（无需运行voc_annotation.py）。
2、在train.py中设置对应参数，默认参数已经对应voc数据集所需要的参数了，所以只要修改backbone和model_path即可。
3、运行train.py进行训练。

b、训练自己的数据集

1、本文使用VOC格式进行训练。
2、训练前将标签文件放在VOCdevkit文件夹下的VOC2007文件夹下的SegmentationClass中。
3、训练前将图片文件放在VOCdevkit文件夹下的VOC2007文件夹下的JPEGImages中。
4、在训练前利用voc_annotation.py文件生成对应的txt。
5、在train.py文件夹下面，选择自己要使用的主干模型和下采样因子。本文提供的主干模型有mobilenet和xception。下采样因子可以在8和16中选择。需要注意的是，预训练模型需要和主干模型相对应。
6、注意修改train.py的num_classes为分类个数+1。
7、运行train.py即可开始训练。

预测步骤

a、使用预训练权重

1、下载完库后解压，如果想用backbone为mobilenet的进行预测，直接运行predict.py就可以了；如果想要利用backbone为xception的进行预测，在百度网盘下载deeplab_xception.h5，放入model_data，修改deeplab.py的backbone和model_path之后再运行predict.py，输入。

img/street.jpg

可完成预测。
2、在predict.py里面进行设置可以进行fps测试、整个文件夹的测试和video视频检测。

b、使用自己训练的权重

1、按照训练步骤训练。
2、在deeplab.py文件里面，在如下部分修改model_path、num_classes、backbone使其对应训练好的文件；model_path对应logs文件夹下面的权值文件，num_classes代表要预测的类的数量加1，backbone是所使用的主干特征提取网络。

_defaults = {
    #----------------------------------------#
    #   model_path指向logs文件夹下的权值文件
    #----------------------------------------#
    "model_path"        : 'model_data/deeplabv3_mobilenetv2.h5',
    #----------------------------------------#
    #   所需要区分的类的个数+1
    #----------------------------------------#
    "num_classes"       : 21,
    #----------------------------------------#
    #   所使用的的主干网络：mobilenet、xception    
    #----------------------------------------#
    "backbone"          : "mobilenet",
    #----------------------------------------#
    #   输入图片的大小
    #----------------------------------------#
    "input_shape"       : [512, 512],
    #----------------------------------------#
    #   下采样的倍数，一般可选的为8和16
    #   与训练时设置的一样即可
    #----------------------------------------#
    "downsample_factor" : 16,
    #--------------------------------#
    #   blend参数用于控制是否
    #   让识别结果和原图混合
    #--------------------------------#
    "blend"             : True,
}

3、运行predict.py，输入

img/street.jpg

可完成预测。
4、在predict.py里面进行设置可以进行fps测试、整个文件夹的测试和video视频检测。

评估步骤

1、设置get_miou.py里面的num_classes为预测的类的数量加1。
2、设置get_miou.py里面的name_classes为需要去区分的类别。
3、运行get_miou.py即可获得miou大小。

Reference

https://github.com/ggyyzm/pytorch_segmentation
https://github.com/bonlime/keras-deeplab-v3-plus

Understanding and Improving Encoder Layer Fusion in Sequence-to-Sequence Learning (ICLR 2021)

Understanding and Improving Encoder Layer Fusion in Sequence-to-Sequence Learning (ICLR 2021) Citation Please cite as: @inproceedings{liu2020understan

22 Nov 25, 2022

《LXMERT: Learning Cross-Modality Encoder Representations from Transformers》(EMNLP 2020)

The Most Important Thing. Our code is developed based on: LXMERT: Learning Cross-Modality Encoder Representations from Transformers

53 Dec 16, 2022

Official implementation for Likelihood Regret: An Out-of-Distribution Detection Score For Variational Auto-encoder at NeurIPS 2020

Likelihood-Regret Official implementation of Likelihood Regret: An Out-of-Distribution Detection Score For Variational Auto-encoder at NeurIPS 2020. T

33 Oct 12, 2022

Official Implementation for "ReStyle: A Residual-Based StyleGAN Encoder via Iterative Refinement" https://arxiv.org/abs/2104.02699

ReStyle: A Residual-Based StyleGAN Encoder via Iterative Refinement Recently, the power of unconditional image synthesis has significantly advanced th

967 Jan 4, 2023

Implementation of Vision Transformer, a simple way to achieve SOTA in vision classification with only a single transformer encoder, in Pytorch

12.6k Jan 9, 2023

A Joint Video and Image Encoder for End-to-End Retrieval

Comments

How to reproduce the model deeplabv3_xception.h5?

Hi, I'm trying to train deeplabv3 with xception backbone on voc + SBD dataset. You provided the voc pretrained model deeplabv3_xception.h5. But If I want to reproduce your training result, I should not use it as pretrained model, right? So I comment out the line in train.py to not loading pretrained weights. But after 100 epochs, my model accuracy is poor, compared with your model. Did I miss something, do I need something like an ImageNet pretrained model or COCO pretrained model? Thanks!

opened by zhimengf 6
How to export a .pb file

Hi, I have problem exporting a .pb file with the current produced .h5 files. I do not think you have provided the export method in the project file. Could you please give me some advice on that? @bubbliiiing

opened by 77knight 1

Releases(v3.0)

v3.0(Apr 22, 2022)
重要更新

支持step、cos学习率下降法。

支持adam、sgd优化器选择。

支持学习率根据batch_size自适应调整。

支持不同预测模式的选择，单张图片预测、文件夹预测、视频预测、图片裁剪。

更新summary.py文件，用于观看网络结构。

增加了多GPU训练。

Source code(tar.gz)
Source code(zip)
v2.0(Mar 4, 2022)
重要更新

更新train.py文件，增加了大量的注释，增加多个可调整参数。

更新predict.py文件，增加了大量的注释，增加fps、视频预测、批量预测等功能。

更新deeplab.py文件，增加了大量的注释，增加先验框选择、置信度、非极大抑制等参数。

合并get_dr_txt.py、get_gt_txt.py和get_map.py文件，通过一个文件来实现数据集的评估。

更新voc_annotation.py文件，增加多个可调整参数。

更新summary.py文件，用于观看网络结构。

Source code(tar.gz)
Source code(zip)
v1.0(Sep 9, 2021)

Source code(tar.gz)
Source code(zip)
deeplabv3_mobilenetv2.h5(10.98 MB)
deeplabv3_xception.h5(158.40 MB)

DeepLabv3+：Encoder-Decoder with Atrous Separable Convolution语义分割模型在tensorflow2当中的实现

Related tags

Overview

DeepLabv3+：Encoder-Decoder with Atrous Separable Convolution语义分割模型在tensorflow2当中的实现

目录

性能情况

所需环境

注意事项

文件下载

训练步骤

a、训练voc数据集

b、训练自己的数据集

预测步骤

a、使用预训练权重

b、使用自己训练的权重

评估步骤

Reference

You might also like...

Understanding and Improving Encoder Layer Fusion in Sequence-to-Sequence Learning (ICLR 2021)

《LXMERT: Learning Cross-Modality Encoder Representations from Transformers》(EMNLP 2020)

Official implementation for Likelihood Regret: An Out-of-Distribution Detection Score For Variational Auto-encoder at NeurIPS 2020

Official Implementation for "ReStyle: A Residual-Based StyleGAN Encoder via Iterative Refinement" https://arxiv.org/abs/2104.02699

Implementation of Vision Transformer, a simple way to achieve SOTA in vision classification with only a single transformer encoder, in Pytorch

A Joint Video and Image Encoder for End-to-End Retrieval

PyTorch Implementation of VAENAR-TTS: Variational Auto-Encoder based Non-AutoRegressive Text-to-Speech Synthesis.

Code for the paper "Adversarial Generator-Encoder Networks"

PyTorch implementation of SQN based on CloserLook3D's encoder

Comments

How to reproduce the model deeplabv3_xception.h5?

How to export a .pb file

Releases(v3.0)

v3.0(Apr 22, 2022)

重要更新

v2.0(Mar 4, 2022)

重要更新

v1.0(Sep 9, 2021)

Owner

Bubbliiiing

PAMI stands for PAttern MIning. It constitutes several pattern mining algorithms to discover interesting patterns in transactional/temporal/spatiotemporal databases

Code for ACM MM2021 paper "Complementary Trilateral Decoder for Fast and Accurate Salient Object Detection"

Official page of Struct-MDC (RA-L'22 with IROS'22 option); Depth completion from Visual-SLAM using point & line features

Computationally efficient algorithm that identifies boundary points of a point cloud.

Official implementation of Influence-balanced Loss for Imbalanced Visual Classification in PyTorch.

ACAV100M: Automatic Curation of Large-Scale Datasets for Audio-Visual Video Representation Learning. In ICCV, 2021.

Pre-trained model, code, and materials from the paper "Impact of Adversarial Examples on Deep Learning Models for Biomedical Image Segmentation" (MICCAI 2019).

AFL binary instrumentation

An implementation of RetinaNet in PyTorch.

Attention mechanism with MNIST dataset

State-to-Distribution (STD) Model

This is Unofficial Repo. Lips Don't Lie: A Generalisable and Robust Approach to Face Forgery Detection (CVPR 2021)

Deep Learning to Create StepMania SM FIles

An example project demonstrating how the Autonomous Learning Library can be used to build new reinforcement learning agents.

PyTorch implementation of "A Full-Band and Sub-Band Fusion Model for Real-Time Single-Channel Speech Enhancement."

Library of deep learning models and datasets designed to make deep learning more accessible and accelerate ML research.

Code for the paper "TadGAN: Time Series Anomaly Detection Using Generative Adversarial Networks"

BLEURT is a metric for Natural Language Generation based on transfer learning.

Testing and Estimation of structural breaks in Stata

[NeurIPS'21] Projected GANs Converge Faster