Official implementations of PSENet, PAN and PAN++.

Overview

News

  • (2021/11/03) Paddle implementation of PAN, see Paddle-PANet. Thanks @simplify23.
  • (2021/04/08) PSENet and PAN are included in MMOCR.

Introduction

This repository contains the official implementations of PSENet, PAN, PAN++, and FAST [coming soon].

Text Detection
Text Spotting

Installation

First, clone the repository locally:

git clone https://github.com/whai362/pan_pp.pytorch.git

Then, install PyTorch 1.1.0+, torchvision 0.3.0+, and other requirements:

conda install pytorch torchvision -c pytorch
pip install -r requirement.txt

Finally, compile codes of post-processing:

# build pse and pa algorithms
sh ./compile.sh

Dataset

Please refer to dataset/README.md for dataset preparation.

Training

CUDA_VISIBLE_DEVICES=0,1,2,3 python train.py ${CONFIG_FILE}

For example:

CUDA_VISIBLE_DEVICES=0,1,2,3 python train.py config/pan/pan_r18_ic15.py

Testing

Evaluate the performance

python test.py ${CONFIG_FILE} ${CHECKPOINT_FILE}
cd eval/
./eval_{DATASET}.sh

For example:

python test.py config/pan/pan_r18_ic15.py checkpoints/pan_r18_ic15/checkpoint.pth.tar
cd eval/
./eval_ic15.sh

Evaluate the speed

python test.py ${CONFIG_FILE} ${CHECKPOINT_FILE} --report_speed

For example:

python test.py config/pan/pan_r18_ic15.py checkpoints/pan_r18_ic15/checkpoint.pth.tar --report_speed

Citation

Please cite the related works in your publications if it helps your research:

PSENet

@inproceedings{wang2019shape,
  title={Shape Robust Text Detection with Progressive Scale Expansion Network},
  author={Wang, Wenhai and Xie, Enze and Li, Xiang and Hou, Wenbo and Lu, Tong and Yu, Gang and Shao, Shuai},
  booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition},
  pages={9336--9345},
  year={2019}
}

PAN

@inproceedings{wang2019efficient,
  title={Efficient and Accurate Arbitrary-Shaped Text Detection with Pixel Aggregation Network},
  author={Wang, Wenhai and Xie, Enze and Song, Xiaoge and Zang, Yuhang and Wang, Wenjia and Lu, Tong and Yu, Gang and Shen, Chunhua},
  booktitle={Proceedings of the IEEE International Conference on Computer Vision},
  pages={8440--8449},
  year={2019}
}

PAN++

@article{wang2021pan++,
  title={PAN++: Towards Efficient and Accurate End-to-End Spotting of Arbitrarily-Shaped Text},
  author={Wang, Wenhai and Xie, Enze and Li, Xiang and Liu, Xuebo and Liang, Ding and Zhibo, Yang and Lu, Tong and Shen, Chunhua},
  journal={IEEE Transactions on Pattern Analysis and Machine Intelligence},
  year={2021},
  publisher={IEEE}
}

FAST

@misc{chen2021fast,
  title={FAST: Searching for a Faster Arbitrarily-Shaped Text Detector with Minimalist Kernel Representation}, 
  author={Zhe Chen and Wenhai Wang and Enze Xie and ZhiBo Yang and Tong Lu and Ping Luo},
  year={2021},
  eprint={2111.02394},
  archivePrefix={arXiv},
  primaryClass={cs.CV}
}

License

This project is developed and maintained by IMAGINE [email protected] Key Laboratory for Novel Software Technology, Nanjing University.

IMAGINE Lab

This project is released under the Apache 2.0 license.

Comments
  • Evaluation of the performance result

    Evaluation of the performance result

    Hello Author, First of all, I would like to appreciate your work and effort. I have tried your repo. The evaluation code gives me an error of the "The sample 199 not present in GT," but the label text is there. When I tried to see the result via visualizing it on the images, it seems good. Let me know if there is any solution from your side.

    opened by dikubab 9
  • _pickle.PicklingError: Can't pickle <class 'cPolygon.Error'>: import of module 'cPolygon' failed

    _pickle.PicklingError: Can't pickle : import of module 'cPolygon' failed

    more complete log as belows: Epoch: [1 | 600] /data/tools/anaconda3/envs/zyl_torch16/lib/python3.7/site-packages/torch/nn/functional.py:2941: UserWarning: nn.functional.upsample is deprecated. Use nn.functional.interpolate instead. warnings.warn("nn.functional.upsample is deprecated. Use nn.functional.interpolate instead.") /data/tools/anaconda3/envs/zyl_torch16/lib/python3.7/site-packages/torch/nn/functional.py:3121: UserWarning: Default upsampling behavior when mode=bilinear is changed to align_corners=False since 0.4.0. Please specify align_corners=True if the old behavior is desired. See the documentation of nn.Upsample for details. "See the documentation of nn.Upsample for details.".format(mode)) (1/374) LR: 0.001000 | Batch: 2.668s | Total: 0min | ETA: 17min | Loss: 1.619 | Loss(text/kernel/emb/rec): 0.680/0.193/0.746/0.000 | IoU(text/kernel): 0.324/0.335 | Acc rec: 0.000 Traceback (most recent call last): File "/data/tools/anaconda3/envs/zyl_torch16/lib/python3.7/multiprocessing/queues.py", line 236, in _feed obj = _ForkingPickler.dumps(obj) File "/data/tools/anaconda3/envs/zyl_torch16/lib/python3.7/multiprocessing/reduction.py", line 51, in dumps cls(buf, protocol).dump(obj) _pickle.PicklingError: Can't pickle <class 'cPolygon.Error'>: import of module 'cPolygon' failed

    the code runs normally when using the CTW1500 datasets. but encounter errors when using my own datasets.

    it seems fine in the first run (1/374), what is wrong ? I have no idea.

    opened by Zhang-O 5
  • 关于训练的问题

    关于训练的问题

    您好!我现在在自己的数据上进行训练,训练过程是这样的 image Epoch: [212 | 600] (1/198) LR: 0.000677 | Batch: 3.934s | Total: 0min | ETA: 13min | Loss: 0.752 | Loss(text/kernel/emb/rec): 0.493/0.199/0.059/0.000 | IoU(text/kernel): 0.055/0.553 | Acc rec: 0.000 (21/198) LR: 0.000677 | Batch: 1.089s | Total: 0min | ETA: 3min | Loss: 0.731 | Loss(text/kernel/emb/rec): 0.478/0.199/0.054/0.000 | IoU(text/kernel): 0.048/0.482 | Acc rec: 0.000 (41/198) LR: 0.000677 | Batch: 1.022s | Total: 1min | ETA: 3min | Loss: 0.732 | Loss(text/kernel/emb/rec): 0.478/0.198/0.056/0.000 | IoU(text/kernel): 0.049/0.476 | Acc rec: 0.000 这个Acc rec一直是0,我终止训练后,在测试数据上进行测试时,output输出的是空的,请问是怎么回事呢,感谢啦!

    opened by mayidu 3
  • 关于后处理的疑问

    关于后处理的疑问

    1. 后处理的代码中当kernel中两个连通域的面积比大于max_rate时,将这两个连通域的flag赋值为1,在扩充时,必须同时满足当前扩充的点所属的连通域的flag值为1且与kernal的similar vector距离大于3时才不扩充该点。请问设flag这步操作的作用是什么,直接判断与Kernel的similar vector的距离可以吗?
    2. 论文中扩充的点与kernel相似向量的欧式距离thresh值为6,代码中为3,请问实际应用中这个值跟什么有关系,是数据集的某些特点吗?
    opened by jewelc92 3
  • Regarding pa.pyx

    Regarding pa.pyx

    Hi,

    I try to run your code and figure out that in your last line in pa.pyx

    return _pa(kernels[:-1], emb, label, cc, kernel_num, label_num, min_area)

    Looks like this should be

    return _pa(kernels, emb, label, cc, kernel_num, label_num, min_area)

    So that we can scan over all kernels (you skip the last kernel) and there is no crash in this function. Am I correct?

    Thanks.

    opened by liuch37 3
  • AttributeError: 'Namespace' object has no attribute 'resume'

    AttributeError: 'Namespace' object has no attribute 'resume'

    PAN++ic15,An error appears when trying to test the model:

    reading type: pil. Traceback (most recent call last): File "test.py", line 155, in main(args) File "test.py", line 138, in main print("No checkpoint found at '{}'".format(args.resume)) AttributeError: 'Namespace' object has no attribute 'resume'

    opened by lrjj 2
  • 训练Total Text时遇到的问题

    训练Total Text时遇到的问题

    运行 python train.py config/pan/pan_r18_tt.py 后,出现如下情况: p1 Traceback (most recent call last): File "/home/dell2/anaconda3/envs/pannet/lib/python3.6/multiprocessing/queues.py", line 234, in _feed obj = _ForkingPickler.dumps(obj) File "/home/dell2/anaconda3/envs/pannet/lib/python3.6/multiprocessing/reduction.py", line 51, in dumps cls(buf, protocol).dump(obj) _pickle.PicklingError: Can't pickle <class 'cPolygon.Error'>: import of module 'cPolygon' failed 似乎是迭代过程中出现的问题且只出现在训练TT数据集的时候 请问出现这种情况该怎样解决呢?谢谢您

    opened by mashumli 2
  • 执行test.py提示TypeError: 'module' object is not callable

    执行test.py提示TypeError: 'module' object is not callable

    将模型路径和config文件路径配置好了之后,执行python test.py,提示如下: Traceback (most recent call last): File "test.py", line 117, in main(args) File "test.py", line 107, in main test(test_loader, model, cfg) File "test.py", line 56, in test outputs = model(**data) File "/home/ethony/anaconda3/envs/ocr/lib/python3.6/site-packages/torch/nn/modules/module.py", line 547, in call result = self.forward(*input, **kwargs) File "/media/ethony/C14D581BDA18EBFA/lyg_datas_and_code/OCR_work/pan_pp.pytorch-master/models/pan.py", line 104, in forward det_res = self.det_head.get_results(det_out, img_metas, cfg) File "/media/ethony/C14D581BDA18EBFA/lyg_datas_and_code/OCR_work/pan_pp.pytorch-master/models/head/pa_head.py", line 65, in get_results label = pa(kernels, emb) TypeError: 'module' object is not callable 看提示应该是model/post_processing下的pa没有正确导入,导入为模块了,这应该怎么解决呢

    opened by ethanlighter 2
  • problems in train.py

    problems in train.py

    Hi. When I run 'python train.py config/pan/pan_r18_ic15.py' , the errors are as followings: Do you know how to solve the problem? Thank you very much. Traceback (most recent call last): File "train.py", line 234, in main(args) File "train.py", line 216, in main train(train_loader, model, optimizer, epoch, start_iter, cfg) File "train.py", line 41, in train for iter, data in enumerate(train_loader): File "D:\Anaconda3\lib\site-packages\torch\utils\data\dataloader.py", line 435, in next data = self._next_data() File "D:\Anaconda3\lib\site-packages\torch\utils\data\dataloader.py", line 1085, in _next_data return self._process_data(data) File "D:\Anaconda3\lib\site-packages\torch\utils\data\dataloader.py", line 1111, in _process_data data.reraise() File "D:\Anaconda3\lib\site-packages\torch_utils.py", line 428, in reraise raise self.exc_type(msg) TypeError: function takes exactly 5 arguments (1 given)

    opened by YUDASHUAI916 2
  • not sure about run compile.sh

    not sure about run compile.sh

    (zyl_torch16) [email protected]:/data/zhangyl/pan_pp.pytorch-master$ sh ./compile.sh Compiling pa.pyx because it depends on /data/tools/anaconda3/envs/zyl_torch16/lib/python3.7/site-packages/numpy/init.pxd. [1/1] Cythonizing pa.pyx /data/tools/anaconda3/envs/zyl_torch16/lib/python3.7/site-packages/Cython/Compiler/Main.py:369: FutureWarning: Cython directive 'language_level' not set, using 2 for now (Py2). This will change in a later release! File: /data/zhangyl/pan_pp.pytorch-master/models/post_processing/pa/pa.pyx tree = Parsing.p_module(s, pxd, full_module_name) running build_ext building 'pa' extension creating build creating build/temp.linux-x86_64-3.7 gcc -pthread -B /data/tools/anaconda3/envs/zyl_torch16/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/data/tools/anaconda3/envs/zyl_torch16/lib/python3.7/site-packages/numpy/core/include -I/data/tools/anaconda3/envs/zyl_torch16/include/python3.7m -c pa.cpp -o build/temp.linux-x86_64-3.7/pa.o -O3 cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++ In file included from /data/tools/anaconda3/envs/zyl_torch16/lib/python3.7/site-packages/numpy/core/include/numpy/ndarraytypes.h:1822:0, from /data/tools/anaconda3/envs/zyl_torch16/lib/python3.7/site-packages/numpy/core/include/numpy/ndarrayobject.h:12, from /data/tools/anaconda3/envs/zyl_torch16/lib/python3.7/site-packages/numpy/core/include/numpy/arrayobject.h:4, from pa.cpp:647: /data/tools/anaconda3/envs/zyl_torch16/lib/python3.7/site-packages/numpy/core/include/numpy/npy_1_7_deprecated_api.h:17:2: warning: #warning "Using deprecated NumPy API, disable it with " "#define NPY_NO_DEPRECATED_API NPY_1_7_API_VERSION" [-Wcpp] #warning "Using deprecated NumPy API, disable it with "
    ^~~~~~~ g++ -pthread -shared -B /data/tools/anaconda3/envs/zyl_torch16/compiler_compat -L/data/tools/anaconda3/envs/zyl_torch16/lib -Wl,-rpath=/data/tools/anaconda3/envs/zyl_torch16/lib -Wl,--no-as-needed -Wl,--sysroot=/ build/temp.linux-x86_64-3.7/pa.o -o /data/zhangyl/pan_pp.pytorch-master/models/post_processing/pa/pa.cpython-37m-x86_64-linux-gnu.so (zyl_torch16) [email protected]:/data/zhangyl/pan_pp.pytorch-master$

    this is the compile history, I am not sure whether is successully build or not.

    opened by Zhang-O 2
  • morphology operations from kornia

    morphology operations from kornia

    Hi,

    Your FAST paper is really amazing! While you already have an implementation of erosion/dilation, let me offer using our set of morphology, implemented in pyre pytorch: https://kornia.readthedocs.io/en/latest/morphology.html

    https://kornia-tutorials.readthedocs.io/en/master/morphology_101.html

    Best, Dmytro.

    opened by ducha-aiki 1
  • The sample 199 not present in GT

    The sample 199 not present in GT

    Hello Author, First of all, I would like to appreciate your work and effort. I have tried your repo. The evaluation code gives me an error of the "The sample 199 not present in GT," but the label text is there. When I tried to see the result via visualizing it on the images, it seems good. Let me know if there is any solution from your side.

    opened by zeng-cy 1
  • How  to predict a new image using the training weight?it doesn't work below.

    How to predict a new image using the training weight?it doesn't work below.

    How to predict a new image using the training weight?it doesn't work below.

    python test.py config/pan/pan_r18_ic15.py checkpoints/pan_r18_ic15/checkpoint.pth.tar cd eval/ ./eval_ic15.sh

    please inform me with [email protected] or wechat SanQian-2012,thanks you so much.

    Originally posted by @Devin521314 in https://github.com/whai362/pan_pp.pytorch/issues/91#issuecomment-1233810612

    opened by Devin521314 0
  • Why rec encoder use EOS? not SOS

    Why rec encoder use EOS? not SOS

    hi: I find there is no 'SOS' in code, I understand SOS should be embedding at the beginning. Please tell me ,thanks! ---------------code----------------------------------------------- class Encoder(nn.Module): def init(self, hidden_dim, voc, char2id, id2char): super(Encoder, self).init() self.hidden_dim = hidden_dim self.vocab_size = len(voc) self.START_TOKEN = char2id['EOS'] self.emb = nn.Embedding(self.vocab_size, self.hidden_dim) self.att = MultiHeadAttentionLayer(self.hidden_dim, 8)

    def forward(self, x):
        batch_size, feature_dim, H, W = x.size()
        x_flatten = x.view(batch_size, feature_dim, H * W).permute(0, 2, 1)
        st = x.new_full((batch_size,), self.START_TOKEN, dtype=torch.long)
        emb_st = self.emb(st)
        holistic_feature, _ = self.att(emb_st, x_flatten, x_flatten)
        return 
    
    opened by Patickk 0
Releases(v1)
Implementation of Bidirectional Recurrent Independent Mechanisms (Learning to Combine Top-Down and Bottom-Up Signals in Recurrent Neural Networks with Attention over Modules)

BRIMs Bidirectional Recurrent Independent Mechanisms Implementation of the paper Learning to Combine Top-Down and Bottom-Up Signals in Recurrent Neura

Sarthak Mittal 26 May 26, 2022
Official source code of paper 'IterMVS: Iterative Probability Estimation for Efficient Multi-View Stereo'

IterMVS official source code of paper 'IterMVS: Iterative Probability Estimation for Efficient Multi-View Stereo' Introduction IterMVS is a novel lear

Fangjinhua Wang 127 Jan 04, 2023
Official implementation of the paper "Steganographer Detection via a Similarity Accumulation Graph Convolutional Network"

SAGCN - Official PyTorch Implementation | Paper | Project Page This is the official implementation of the paper "Steganographer detection via a simila

ZHANG Zhi 1 Nov 26, 2021
This is an official implementation for "SimMIM: A Simple Framework for Masked Image Modeling".

SimMIM By Zhenda Xie*, Zheng Zhang*, Yue Cao*, Yutong Lin, Jianmin Bao, Zhuliang Yao, Qi Dai and Han Hu*. This repo is the official implementation of

Microsoft 674 Dec 26, 2022
A Sign Language detection project using Mediapipe landmark detection and Tensorflow LSTM's

sign-language-detection A Sign Language detection project using Mediapipe landmark detection and Tensorflow LSTM. The project is built for a vocabular

Hashim 4 Feb 06, 2022
Code for Talking Face Generation by Adversarially Disentangled Audio-Visual Representation (AAAI 2019)

Talking Face Generation by Adversarially Disentangled Audio-Visual Representation (AAAI 2019) We propose Disentangled Audio-Visual System (DAVS) to ad

Hang_Zhou 750 Dec 23, 2022
Simultaneous NMT/MMT framework in PyTorch

This repository includes the codes, the experiment configurations and the scripts to prepare/download data for the Simultaneous Machine Translation wi

<a href=[email protected]"> 37 Sep 29, 2022
StyleGAN2-ADA - Official PyTorch implementation

Need Help? If you’re new to StyleGAN2-ADA and looking to get started, please check out this video series from a course Lia Coleman and I taught in Oct

Derrick Schultz 217 Jan 04, 2023
Unified tracking framework with a single appearance model

Paper: Do different tracking tasks require different appearance model? [ArXiv] (comming soon) [Project Page] (comming soon) UniTrack is a simple and U

ZhongdaoWang 300 Dec 24, 2022
Deep Learning for 3D Point Clouds: A Survey (IEEE TPAMI, 2020)

🔥Deep Learning for 3D Point Clouds (IEEE TPAMI, 2020)

Qingyong 1.4k Jan 08, 2023
joint detection and semantic segmentation, based on ultralytics/yolov5,

Multi YOLO V5——Detection and Semantic Segmentation Overeview This is my undergraduate graduation project which based on ultralytics YOLO V5 tag v5.0.

477 Jan 06, 2023
"SinNeRF: Training Neural Radiance Fields on Complex Scenes from a Single Image", Dejia Xu, Yifan Jiang, Peihao Wang, Zhiwen Fan, Humphrey Shi, Zhangyang Wang

SinNeRF: Training Neural Radiance Fields on Complex Scenes from a Single Image [Paper] [Website] Pipeline Code Environment pip install -r requirements

VITA 250 Jan 05, 2023
Ladder Variational Autoencoders (LVAE) in PyTorch

Ladder Variational Autoencoders (LVAE) PyTorch implementation of Ladder Variational Autoencoders (LVAE) [1]: where the variational distributions q at

Andrea Dittadi 63 Dec 22, 2022
Generate pixel-style avatars with python.

face2pixel Generate pixel-style avatars with python. Run: Clone the project: git clone https://github.com/theodorecooper/face2pixel install requiremen

Theodore Cooper 2 May 11, 2022
PyTorch implementation of paper "Neural Scene Flow Fields for Space-Time View Synthesis of Dynamic Scenes", CVPR 2021

Neural Scene Flow Fields PyTorch implementation of paper "Neural Scene Flow Fields for Space-Time View Synthesis of Dynamic Scenes", CVPR 20

Zhengqi Li 585 Jan 04, 2023
Symbolic Parallel Adaptive Importance Sampling for Probabilistic Program Analysis in JAX

SYMPAIS: Symbolic Parallel Adaptive Importance Sampling for Probabilistic Program Analysis Overview | Installation | Documentation | Examples | Notebo

Yicheng Luo 4 Sep 13, 2022
Natural Intelligence is still a pretty good idea.

Human Learn Machine Learning models should play by the rules, literally. Project Goal Back in the old days, it was common to write rule-based systems.

vincent d warmerdam 641 Dec 26, 2022
Code and Resources for the Transformer Encoder Reasoning Network (TERN)

Transformer Encoder Reasoning Network Code for the cross-modal visual-linguistic retrieval method from "Transformer Reasoning Network for Image-Text M

Nicola Messina 53 Dec 30, 2022
LaneDetectionAndLaneKeeping - Lane Detection And Lane Keeping

LaneDetectionAndLaneKeeping This project is part of my bachelor's thesis. The go

5 Jun 27, 2022
PFENet: Prior Guided Feature Enrichment Network for Few-shot Segmentation (TPAMI).

PFENet This is the implementation of our paper PFENet: Prior Guided Feature Enrichment Network for Few-shot Segmentation that has been accepted to IEE

DV Lab 230 Dec 31, 2022