基于Paddle框架的PSENet复现

Overview

PSENet-Paddle

基于Paddle框架的PSENet复现

本项目基于paddlepaddle框架复现PSENet,并参加百度第三届论文复现赛,将在2021年5月15日比赛完后提供AIStudio链接~敬请期待

AIStudio链接

参考项目:

whai362-PSENet

环境配置

本项目利用AIstudio平台,采用paddlepaddle: 2.0.2-gpu Version,除此之外你需要通过pip install mmcv editdistance Polygon3 pyclipper或者pip install -r requirement.txt来安装依赖包

数据集

本项目已搭载PSENet比赛指定数据集,你可以在此找到搭载的数据集,包含ICDAR2015 Task4以及Total-Text

工程目录

注意到你需要将submitPSENet重命名为PSENet

/home/aistudio/PSENet
|───data(解压的data.zip)
└───config
└───models
└───dataset
└───eval
└───utils
└───compile.sh
└───__init__.py
└───test.py
└───train.py
└───requirement.txt
└───logo.gif

项目配置**

注意:由于aistudio的docker环境并不适配本项目的编译,所以你需要在本地计算机编译完成后上传编译文件,在本地计算机我才用如下配置,你可以使用gcc --versiong++ --version查看配置

AIStudio Local PC
gcc (Ubuntu 7.5.0-3ubuntu1~16.04) 7.5.0
Copyright (C) 2017 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
Copyright (C) 2017 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
g++ (Ubuntu 5.4.0-6ubuntu1~16.04.12) 5.4.0 20160609
Copyright (C) 2015 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
g++ (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
Copyright (C) 2017 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

可以发现AIStudio的g++版本不适配,注意:你需要相同的架构,系统以及python版本,(Ubuntu)linux-x86_64&python3.7

`./compile.sh` or `bash compile.sh` if come out bash: ./compile.sh: Permission denied

或者直接进入指定目录,手动编译

cd /home/aistudio/PSENet/models/post_processing/pse
python setup.py build_ext --inplace

编译完成后你会在/home/aistudio/PSENet/models/post_processing/pse得到build/temp.linux-x86_64-3.7/pse.o文件和pse.cpython-37m-x86_64-linux-gnu.so文件

注意:本项目已经全部配置完成,这一步无需操作

训练

需要注意的是,在paddlepaddle-2.0.2中并不支持字典数据读取,因此我在/home/aistudio/PSENet/utils/data_loader.py利用迭代器重写了DataLoader这拉慢了数据读取的速度,会导致训练速度略慢,例如在使用psenet_r50_ic15_1024_finetune.py训练一个epoch需要512.4秒,另外paddlepaddle2.0.2暂不支持Identity方法,因此我在/home/aistudio/PSENet/models/utils/fuse_conv_bn.py通过继承Paddle.nn.Layer写了Identity

cd /home/aistudio/PSENet/
python train.py ${CONFIG_FILE}

例如:

cd /home/aistudio/PSENet/
python train.py config/psenet/psenet_r50_ic15_736.py

训练开启时,会生成一个类似/home/aistudio/PSENet/checkpoints/psenet_r50_ic15_1024_finetune的文件夹,里面将保存权重和优化器参数

测试

cd /home/aistudio/PSENet/
python test.py ${CONFIG_FILE} ${CHECKPOINT_FILE}

例如:

cd /home/aistudio/PSENet/
python test.py config/psenet/psenet_r50_ic15_736.py PSENet/PretrainedModel/checkpoint_ic15_736.pdparams

评估

你需要注意的是:测试和评估是递进的,通过测试生成文件后,进行评估

ICDAR 2015

cd /home/aistudio/PSENet/eval
`./eval_ic15.sh` or `bash ./eval_ic15.sh`

你会得到如下类似信息:

Calculated!{"precision": 0.8620689655172413, "recall": 0.7944150216658642, "hmean": 0.826860435980957, "AP": 0}

以下是paddlepaddle预训练模型测试指标

Method Backbone Fine-tuning Scale Config Precision (%) Recall (%) F-measure (%) Model
PSENet ResNet50 N Shorter Side: 736 psenet_r50_ic15_736.py 83.6 74.0 78.5 checkpoint_ic15_736
PSENet ResNet50 N Shorter Side: 1024 psenet_r50_ic15_1024.py 84.4 76.3 80.2 checkpoint_ic15_1024
PSENet ResNet50 Y Shorter Side: 736 psenet_r50_ic15_736_finetune.py 85.3 76.8 80.9 checkpoint_ic15_736_finetune
PSENet ResNet50 Y Shorter Side: 1024 psenet_r50_ic15_1024_finetune.py 86.2 79.4 82.7 checkpoint_ic15_1024_finetune

Total-Text

Text detection

cd /home/aistudio/PSENet/eval
./eval_tt.sh or `bash ./eval_tt.sh`

你会得到如下类似信息:

Precision:_0.8727937336814604_______/Recall:_0.7786751361161512/Hmean:_0.8230524859472805

pb

以下是paddlepaddle预训练模型测试指标

Method Backbone Fine-tuning Config Precision (%) Recall (%) F-measure (%) Model
PSENet ResNet50 N psenet_r50_tt.py 87.3 77.9 82.3 checkpoint_tt
PSENet ResNet50 Y psenet_r50_tt_finetune.py 89.3 79.6 84.2 checkpoint_tt_finetune

速度测试

python test.py ${CONFIG_FILE} ${CHECKPOINT_FILE} --report_speed

例如:

cd /home/aistudio/PSENet/
python test.py config/psenet/psenet_r50_ic15_736.py PSENet/PretrainedModel/checkpoint_ic15_736.pdparams --report_speed

你会得到如下类似信息

Testing 283/3000
backbone_time: 0.0152
neck_time: 0.0029
det_head_time: 0.0005
det_pse_time: 0.0660
FPS: 11.8
Testing 284/3000
backbone_time: 0.0152
neck_time: 0.0029
det_head_time: 0.0005
det_pse_time: 0.0660
FPS: 11.8
Testing 285/3000
backbone_time: 0.0152
neck_time: 0.0029
det_head_time: 0.0005
det_pse_time: 0.0660
FPS: 11.8
Testing 286/3000
backbone_time: 0.0152
neck_time: 0.0029
det_head_time: 0.0005
det_pse_time: 0.0660
FPS: 11.8

Citation

@inproceedings{wang2019shape,
  title={Shape robust text detection with progressive scale expansion network},
  author={Wang, Wenhai and Xie, Enze and Li, Xiang and Hou, Wenbo and Lu, Tong and Yu, Gang and Shao, Shuai},
  booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition},
  pages={9336--9345},
  year={2019}
}
Owner
QuanHao Guo
master at UESTC
QuanHao Guo
Repository for Scene Text Detection with Supervised Pyramid Context Network with tensorflow.

Scene-Text-Detection-with-SPCNET Unofficial repository for [Scene Text Detection with Supervised Pyramid Context Network][https://arxiv.org/abs/1811.0

121 Oct 15, 2021
Some codes from PyImageSearch course's and external projects.

👨‍💻 Some codes and projects 👨‍💻 💡 Technologies 📜 Projects 📍 Chrome Dinosaur Controller 📦 Script 📍 Coins Counter 📦 Script 🤓 Author Lucas Biv

Lucas Bivar 25 Oct 24, 2021
Deep learning based page layout analysis

Deep Learning Based Page Layout Analyze This is a Python implementaion of page layout analyze tool. The goal of page layout analyze is to segment page

186 Dec 29, 2022
This is the code for our paper DAAIN: Detection of Anomalous and AdversarialInput using Normalizing Flows

Merantix-Labs: DAAIN This is the code for our paper DAAIN: Detection of Anomalous and Adversarial Input using Normalizing Flows which can be found at

Merantix 14 Oct 12, 2022
Msos searcher - A half-hearted attempt at finding a magic square of squares

MSOS searcher A half-hearted attempt at finding (or rather searching) a MSOS (Magic Square of Squares) in the spirit of the Parker Square. Running I r

Niels Mündler 1 Jan 02, 2022
Fast image augmentation library and easy to use wrapper around other libraries. Documentation: https://albumentations.ai/docs/ Paper about library: https://www.mdpi.com/2078-2489/11/2/125

Albumentations Albumentations is a Python library for image augmentation. Image augmentation is used in deep learning and computer vision tasks to inc

11.4k Jan 02, 2023
Forked from argman/EAST for the ICPR MTWI 2018 CHALLENGE

EAST_ICPR: EAST for ICPR MTWI 2018 CHALLENGE Introduction This is a repository forked from argman/EAST for the ICPR MTWI 2018 CHALLENGE. Origin Reposi

Haozheng Li 157 Aug 23, 2022
Using python libraries to track hands

Python-HandTracking Using python libraries to track hands on a camera Uses cv2 and mediapipe libraries custom hand tracking module PyCharm IDE Final E

Martin Matsudaira 1 Dec 17, 2021
Demo processor to illustrate OCR-D Python API

ocrd_vandalize/ Demo processor to illustrate the OCR-D/core Python API Description :TODO: write docs :) Installation From PyPI pip3 install ocrd_vanda

Konstantin Baierer 5 May 05, 2022
FastOCR is a desktop application for OCR API.

FastOCR FastOCR is a desktop application for OCR API. Installation Arch Linux fastocr-git @ AUR Build from AUR or install with your favorite AUR helpe

Bruce Zhang 58 Jan 07, 2023
A general list of resources to image text localization and recognition 场景文本位置感知与识别的论文资源与实现合集 シーンテキストの位置認識と識別のための論文リソースの要約

Scene Text Localization & Recognition Resources Read this institute-wise: English, 简体中文. Read this year-wise: English, 简体中文. Tags: [STL] (Scene Text L

Karl Lok (Zhaokai Luo) 901 Dec 11, 2022
Slice a single image into multiple pieces and create a dataset from them

OpenCV Image to Dataset Converter Slice a single image of Persian digits into mu

Meysam Parvizi 14 Dec 29, 2022
Code release for our paper, "SimNet: Enabling Robust Unknown Object Manipulation from Pure Synthetic Data via Stereo"

SimNet: Enabling Robust Unknown Object Manipulation from Pure Synthetic Data via Stereo Thomas Kollar, Michael Laskey, Kevin Stone, Brijen Thananjeyan

68 Dec 14, 2022
This is used to convert a string to an Image with Handwritten Characters.

Text-to-Handwriting-using-python This is used to convert a string to an Image with Handwritten Characters. text_to_handwriting(string: str, save_to: s

Akashdeep Mahata 3 Aug 15, 2022
Automatic Number Plate Recognition (ANPR) is a highly accurate system capable of reading vehicle number plates without human intervention

ANPR ANPR is therefore the underlying technology used to find a vehicle license/number plate and it, in turn, supplies this information to a next stag

Melih Emin Kılıçoğlu 1 Jan 09, 2022
Creating a virtual tv using opencv in python3.

Virtual-TV Creating a virtual tv using opencv in python3. In order to run the code follow the below given steps: Make sure the desired videos which ar

Vamsi 1 Jan 01, 2022
A curated list of promising OCR resources

Call for contributor(paper summary,dataset generation,algorithm implementation and any other useful resources) awesome-ocr A curated list of promising

wanghaisheng 1.6k Jan 04, 2023
MORAN: A Multi-Object Rectified Attention Network for Scene Text Recognition

MORAN: A Multi-Object Rectified Attention Network for Scene Text Recognition Python 2.7 Python 3.6 MORAN is a network with rectification mechanism for

Canjie Luo 595 Dec 27, 2022
A dataset handling library for computer vision datasets in LOST-fromat

A dataset handling library for computer vision datasets in LOST-fromat

8 Dec 15, 2022
OCR, Object Detection, Number Plate, Real Time

README.md PrePareded anaconda env requirements.txt clova AI → deep text recognition → trained weights (ex, .pth) wpod-net weights (ex, .h5 , .json) ht

Kaven Lee 7 Dec 06, 2022