This is a pytorch re-implementation of EAST: An Efficient and Accurate Scene Text Detector.

Overview

EAST: An Efficient and Accurate Scene Text Detector

Description:

This version will be updated soon, please pay attention to this work. The motivation of this version is to build a easy-training model. This version can automatically update best_model by comparing current hmean and the former. At the same time, we can see evaluation info about every sample easily.

  • 1.train
  • 2.predict
  • 3.compress
  • 4.compute Hmean(if Hmean is higher than before, update best_weight.pkl)
  • 5.visualization(blue, green, red)
  • 6.multi-scale test (update soon) multi-scale vis. (vis with score, scales)

Thanks

The version is ported from argman/EAST, from Tensorflow to Pytorch

Check On Website

If you have no confidence of the result of our program, you could use submit.zip to submit on website,then you can see result of every image.

Performance

  • right -- green || wrong -- red || miss -- blue visualization visualization

  • recall/precision/hmean for every test image hmean

Introduction

This is a pytorch re-implementation of EAST: An Efficient and Accurate Scene Text Detector. The features are summarized blow:

  • Only RBOX part is implemented.
  • A fast Locality-Aware NMS in C++ provided by the paper's author.(g++/gcc version 6.0 + will be ok)
  • Evalution see here for the detailed results.
  • Differences from original paper
    • Use ResNet-50 rather than PVANET
    • Use dice loss (optimize IoU of segmentation) rather than balanced cross entropy
    • Use linear learning rate decay rather than staged learning rate decay

Thanks for the author's (@zxytim) help! Please cite his paper if you find this useful.

Contents

  1. Installation
  2. Download
  3. Prepare dataset/pretrain
  4. Test
  5. Train
  6. Examples

Installation

  1. Any version of pytorch version > 0.4.0 should be ok.

Download

  1. Pretrained model is not provided temporarily. Web site is updating now, please continue to pay attention

Prepare dataset/pretrain weight

[1]. dataset(you need to prepare for dataset for train and test) suggestions: you could do a soft-link to root_to_this_program/dataset/train/img/*.jpg

  • -- train ./dataset/train/img/img_###.jpg ./dataset/train/gt/img_###.txt (you need to change name)
  • -- test ./data/test/img_###.jpg (img only)
  • -- gt.zip ./result/gt.zip(ICDAR15 gt.zip is avaliable on website

** Note: you can download dataset here

[2]. pretrained

  • In config.py set resume True and set checkpoint path/to/weight/file
  • I will provide pretrianed weight soon

[3]. check GPUs and CPUs you can use following to check aviliable gpu, this is for train

watch -n 0.1 nvidia-smi

then, you will see 2,3 is avaliable, modify config.py gpu_ids = [0,1], gpu = 2, and modify run.sh - CUDA_VISIBLE_DEVICES=2,3

Train

If you want to train the model, you should provide the dataset path in config.py and run

sh run.py

** Note: you should modify run.sh to specify your gpu id

If you have more than one gpu, you can pass gpu ids to gpu_list(like gpu_list=0,1,2,3) in config.py

** Note: you should change the gt text file of icdar2015's filename to img_*.txt instead of gt_img_*.txt(or you can change the code in icdar.py), and some extra characters should be removed from the file. See the examples in training_samples/**

Test

By default, we set train-eval process into integer. If you want to use eval independently, you can do it by yourself. Any question can contact me.

Examples

Here are some test examples on icdar2015, enjoy the beautiful text boxes! image_1 image_2 image_3 image_4 image_5

Owner
Dejia Song
Computer Vision & Machine Learning
Dejia Song
Code for the head detector (HeadHunter) proposed in our CVPR 2021 paper Tracking Pedestrian Heads in Dense Crowd.

Head Detector Code for the head detector (HeadHunter) proposed in our CVPR 2021 paper Tracking Pedestrian Heads in Dense Crowd. The head_detection mod

Ramana Subramanyam 76 Dec 06, 2022
Visual Attention based OCR

Attention-OCR Authours: Qi Guo and Yuntian Deng Visual Attention based OCR. The model first runs a sliding CNN on the image (images are resized to hei

Yuntian Deng 1.1k Jan 02, 2023
A tool to make dumpy among us GIFS

Among Us Dumpy Gif Maker Made by ThatOneCalculator & Pixer415 With help from Telk, karl-police, and auguwu! Please credit this repository when you use

Kainoa Kanter 535 Jan 07, 2023
A Screen Translator/OCR Translator made by using Python and Tesseract, the user interface are made using Tkinter. All code written in python.

About An OCR translator tool. Made by me by utilizing Tesseract, compiled to .exe using pyinstaller. I made this program to learn more about python. I

Fauzan F A 41 Dec 30, 2022
Unofficial implementation of "TableNet: Deep Learning model for end-to-end Table detection and Tabular data extraction from Scanned Document Images"

TableNet Unofficial implementation of ICDAR 2019 paper : TableNet: Deep Learning model for end-to-end Table detection and Tabular data extraction from

Jainam Shah 243 Dec 30, 2022
kaldi-asr/kaldi is the official location of the Kaldi project.

Kaldi Speech Recognition Toolkit To build the toolkit: see ./INSTALL. These instructions are valid for UNIX systems including various flavors of Linux

Kaldi 12.3k Jan 05, 2023
OpenCV-Erlang/Elixir bindings

evision [WIP] : OS : arch Build Status Ubuntu 20.04 arm64 Ubuntu 20.04 armv7 Ubuntu 20.04 s390x Ubuntu 20.04 ppc64le Ubuntu 20.04 x86_64 macOS 11 Big

Cocoa 194 Jan 05, 2023
Image augmentation for machine learning experiments.

imgaug This python library helps you with augmenting images for your machine learning projects. It converts a set of input images into a new, much lar

Alexander Jung 13.2k Jan 02, 2023
Table recognition inside douments using neural networks

TableTrainNet A simple project for training and testing table recognition in documents. This project was developed to make a neural network which reco

Giovanni Cavallin 93 Jul 24, 2022
(CVPR 2021) Back-tracing Representative Points for Voting-based 3D Object Detection in Point Clouds

BRNet Introduction This is a release of the code of our paper Back-tracing Representative Points for Voting-based 3D Object Detection in Point Clouds,

86 Oct 05, 2022
Fusion 360 Add-in that creates a pair of toothed curves that can be used to split a body and create two pieces that slide and lock together.

Fusion-360-Add-In-PuzzleSpline Fusion 360 Add-in that creates a pair of toothed curves that can be used to split a body and create two pieces that sli

Michiel van Wessem 1 Nov 15, 2021
Packaged, Pytorch-based, easy to use, cross-platform version of the CRAFT text detector

CRAFT: Character-Region Awareness For Text detection Packaged, Pytorch-based, easy to use, cross-platform version of the CRAFT text detector | Paper |

188 Dec 28, 2022
This is an API written in python that uses FastAPI. It is a simple API that can detect discord tokens in Images.

Welcome This is an API written in python that uses FastAPI. It is a simple API that can detect discord tokens in Images. Installation There are curren

8 Jul 29, 2022
Official implementation of Character Region Awareness for Text Detection (CRAFT)

CRAFT: Character-Region Awareness For Text detection Official Pytorch implementation of CRAFT text detector | Paper | Pretrained Model | Supplementary

Clova AI Research 2.5k Jan 03, 2023
A simple python program to record security cam footage by detecting a face and body of a person in the frame.

SecurityCam A simple python program to record security cam footage by detecting a face and body of a person in the frame. This code was created by me,

1 Nov 08, 2021
A bot that plays TFT using OCR. Keeps track of bench, board, items, and plays the user defined team comp.

NOTES: To ensure best results, make sure you are running this on a computer that has decent specs. 1920x1080 fullscreen is required in League, game mu

francis 125 Dec 30, 2022
Code for the ACL2021 paper "Combining Static Word Embedding and Contextual Representations for Bilingual Lexicon Induction"

CSCBLI Code for our ACL Findings 2021 paper, "Combining Static Word Embedding and Contextual Representations for Bilingual Lexicon Induction". Require

Jinpeng Zhang 12 Oct 08, 2022
([email protected]) Boosting Co-teaching with Compression Regularization for Label Noise

Nested-Co-teaching ([email protected]) Pytorch implementation of paper "Boosting Co-tea

YINGYI CHEN 41 Jan 03, 2023
This is a pytorch re-implementation of EAST: An Efficient and Accurate Scene Text Detector.

EAST: An Efficient and Accurate Scene Text Detector Description: This version will be updated soon, please pay attention to this work. The motivation

Dejia Song 544 Dec 20, 2022
Bu uygulamada Python ve Opencv kullanarak bilgisayar kamerasından yüz tespiti yapıyoruz.

opencv_yuz_bulma Bu uygulamada Python ve Opencv kullanarak bilgisayar kamerasından yüz tespiti yapıyoruz. Bilgisarın kendi kamerasını kullanmak için;

Ahmet Haydar Ornek 6 Apr 16, 2022