Efficient and Accurate Arbitrary-Shaped Text Detection with Pixel Aggregation Network

Overview

Efficient and Accurate Arbitrary-Shaped Text Detection with Pixel Aggregation Network

Paddle-PANet

目录

结果对比

CTW1500

Method Backbone Fine-tuning Config Precision (%) Recall (%) F-measure (%) Model Log
mmocr_PANet Resnet18 N ctw_config 77.6 83.8 80.6 -- --
PAN (paper) ResNet18 N config 84.6 77.7 81.0 - -
PaddlePaddle_PANet ResNet18 N panet_r18_ctw.py 84.51 78.62 81.46 Model Log

论文介绍

背景简介

这是发在2019ICCV上的一篇一阶段场景文本检测论文。主要是PSENet的升级版。PSENet虽然处理速度很快,准确度很高,但后处理过程繁琐,而且没办法和网络模型融合在一起,实现训练。PANet很好的解决了这一问题,把后处理过程也放入网络中,预测出三个loss,最后进行融合。

网络结构

上图为PAN的整个网络结构,网络主要由Backbone + Segmentation Head(FPEM + FFM) + Output(Text Region、Kernel、Similarity Vector)组成。

本文使用ResNet-18作为PAN的默认Backbone,并提出了低计算量的Segmentation Head(FPFE + FFM)以解决因为使用ResNet-18而导致的特征提取能力较弱,特征感受野较小且表征能力不足的缺点。

此外,为了精准地重建完整的文字实例(text instance),提出了一个可学习的后处理方法——像素聚合法(PA),它能够通过预测出的相似向量来引导文字像素聚合到正确的kernel上去。

下面将详细介绍一下上面的各个部分。

Backbone

Backbone选择的是resnet18, 提取stride为4,8,16,32的conv2,conv3,conv4,conv5的输出作为高低层特征。每层的特征图的通道数都使用1*1卷积降维至128得到轻量级的特征图Fr。

Segmentation Head

PAN使用resNet-18作为网络的默认backbone,虽减少了计算量,但是backbone层数的减少势必会带来模型学习能力的下降。为了提高效率,作者在 resNet-18基础上提出了一个低计算量但可高效增强特征的分割头Segmentation Head。它由两个关键模块组成:特征金字塔增强模块(Feature Pyramid Enhancement Module,FPEM)、特征融合模块(Feature Fusion Module,FFM)。

FPEM

Feature Pyramid Enhancement Module(FPEM),即特征金字塔增强模块。FPEM呈级联结构且计算量小,可以连接在backbone后面让不同尺寸的特征更深、更具表征能力,结构如下:

FPEM是一个U形模组,由两个阶段组成,up-scale增强、down-scale增强。up-scale增强作用于输入的特征金字塔,它以步长32,16,8,4像素在特征图上迭代增强。在down-scale阶段,输入的是由up-scale增强生成的特征金字塔,增强的步长从4到32,同时,down-scale增强输出的的特征金字塔就是最终FPEM的输出。 FPEM模块可以看成是一个轻量级的FPN,只不过这个FPEM计算量不大,可以不停级联以达到不停增强特征的作用。

FFM

Feature Fusion Module(FFM)模块用于融合不同尺度的特征,其结构如下:

最后通过上采样将它们Concatenate到一起。

模型最后预测三种信息: 1、文字区域 2、文字kernel 3、文字kernel的相似向量

Loss

其中文字区域和kernel预测loss为:

快速安装

Recommended environment

Python 3.6+
paddlepaddle-gpu 2.0.2
nccl 2.0+
mmcv 0.2.12
editdistance
Polygon3
pyclipper
opencv-python 3.4.2.17
Cython

Install env

Install paddle following the official tutorial.

pip install -r requirement.txt
./compile.sh

Dataset

Please refer to dataset/README.md for dataset preparation.

Pretrain Backbone

download resent18 pre-train model in pretrain/resnet18.pdparams

pretrain_resnet18 password: j5g3

Training

CUDA_VISIBLE_DEVICES=0,1,2,3 python dist_train.py ${CONFIG_FILE}

For example:

CUDA_VISIBLE_DEVICES=0,1,2,3 python dist_train.py config/pan/pan_r18_ctw.py
#checkpoint continue
python3.7 dist_train.py config/pan/pan_r18_ctw_train.py --nprocs 1 --resume checkpoints/pan_r18_ctw_train

Evaluation

The evaluation scripts of CTW 1500 dataset. CTW

Text detection

./start_test.sh

License

This project is developed and maintained by IMAGINE [email protected] Key Laboratory for Novel Software Technology, Nanjing University.

This project is released under the Apache 2.0 license.

@inproceedings{wang2019efficient,
  title={Efficient and Accurate Arbitrary-Shaped Text Detection with Pixel Aggregation Network},
  author={Wang, Wenhai and Xie, Enze and Song, Xiaoge and Zang, Yuhang and Wang, Wenjia and Lu, Tong and Yu, Gang and Shen, Chunhua},
  booktitle={Proceedings of the IEEE International Conference on Computer Vision},
  pages={8440--8449},
  year={2019}
}
Owner
Dreams Are Messages From The Deep🪐
Code for EMNLP2020 long paper: BERT-Attack: Adversarial Attack Against BERT Using BERT

BERT-ATTACK Code for our EMNLP2020 long paper: BERT-ATTACK: Adversarial Attack Against BERT Using BERT Dependencies Python 3.7 PyTorch 1.4.0 transform

Linyang Li 142 Jan 04, 2023
Alias-Free Generative Adversarial Networks (StyleGAN3) Official PyTorch implementation

Alias-Free Generative Adversarial Networks (StyleGAN3) Official PyTorch implementation

NVIDIA Research Projects 4.8k Jan 09, 2023
Implementation of Gans

GAN Generative Adverserial Networks are an approach to generative data modelling using Deep learning methods. I have currently implemented : DCGAN on

Sibam Parida 5 Sep 07, 2021
Algorithmic encoding of protected characteristics and its implications on disparities across subgroups

Algorithmic encoding of protected characteristics and its implications on disparities across subgroups This repository contains the code for the paper

Team MIRA - BioMedIA 15 Oct 24, 2022
Real-Time High-Resolution Background Matting

Real-Time High-Resolution Background Matting Official repository for the paper Real-Time High-Resolution Background Matting. Our model requires captur

Peter Lin 6.1k Jan 03, 2023
An algorithmic trading bot that learns and adapts to new data and evolving markets using Financial Python Programming and Machine Learning.

ALgorithmic_Trading_with_ML An algorithmic trading bot that learns and adapts to new data and evolving markets using Financial Python Programming and

1 Mar 14, 2022
Graph-total-spanning-trees - A Python script to get total number of Spanning Trees in a Graph

Total number of Spanning Trees in a Graph This is a python script just written f

Mehdi I. 0 Jul 18, 2022
PyTorch implementation of MoCo: Momentum Contrast for Unsupervised Visual Representation Learning

MoCo: Momentum Contrast for Unsupervised Visual Representation Learning This is a PyTorch implementation of the MoCo paper: @Article{he2019moco, aut

Meta Research 3.7k Jan 02, 2023
Video Corpus Moment Retrieval with Contrastive Learning (SIGIR 2021)

Video Corpus Moment Retrieval with Contrastive Learning PyTorch implementation for the paper "Video Corpus Moment Retrieval with Contrastive Learning"

ZHANG HAO 42 Dec 29, 2022
Tello Drone Trajectory Tracking

With this library you can track the trajectory of your tello drone or swarm of drones in real time.

Kamran Asgarov 2 Oct 12, 2022
Package for extracting emotions from social media text. Tailored for financial data.

EmTract: Extracting Emotions from Social Media Text Tailored for Financial Contexts EmTract is a tool that extracts emotions from social media text. I

13 Nov 17, 2022
InferPy: Deep Probabilistic Modeling with Tensorflow Made Easy

InferPy: Deep Probabilistic Modeling Made Easy InferPy is a high-level API for probabilistic modeling written in Python and capable of running on top

PGM-Lab 141 Oct 13, 2022
This project aims to be a handler for input creation and running of multiple RICEWQ simulations.

What is autoRICEWQ? This project aims to be a handler for input creation and running of multiple RICEWQ simulations. What is RICEWQ? From the descript

Yass Fuentes 1 Feb 01, 2022
Dirty Pixels: Towards End-to-End Image Processing and Perception

Dirty Pixels: Towards End-to-End Image Processing and Perception This repository contains the code for the paper Dirty Pixels: Towards End-to-End Imag

50 Nov 18, 2022
🧮 Matrix Factorization for Collaborative Filtering is just Solving an Adjoint Latent Dirichlet Allocation Model after All

Accompanying source code to the paper "Matrix Factorization for Collaborative Filtering is just Solving an Adjoint Latent Dirichlet Allocation Model A

Florian Wilhelm 39 Dec 03, 2022
This is the source code of the solver used to compete in the International Timetabling Competition 2019.

ITC2019 Solver This is the source code of the solver used to compete in the International Timetabling Competition 2019. Building .NET Core (2.1 or hig

Edon Gashi 8 Jan 22, 2022
A more easy-to-use implementation of KPConv

A more easy-to-use implementation of KPConv This repo contains a more easy-to-use implementation of KPConv based on PyTorch. Introduction KPConv is a

Zheng Qin 35 Dec 14, 2022
This repo is the official implementation of "L2ight: Enabling On-Chip Learning for Optical Neural Networks via Efficient in-situ Subspace Optimization".

L2ight is a closed-loop ONN on-chip learning framework to enable scalable ONN mapping and efficient in-situ learning. L2ight adopts a three-stage learning flow that first calibrates the complicated p

Jiaqi Gu 9 Jul 14, 2022
MoveNet Single Pose on DepthAI

MoveNet Single Pose tracking on DepthAI Running Google MoveNet Single Pose models on DepthAI hardware (OAK-1, OAK-D,...). A convolutional neural netwo

64 Dec 29, 2022
Suite of 500 procedurally-generated NLP tasks to study language model adaptability

TaskBench500 The TaskBench500 dataset and code for generating tasks. Data The TaskBench dataset is available under wget http://web.mit.edu/bzl/www/Tas

Belinda Li 20 May 17, 2022