TensorFlow Implementation of FOTS, Fast Oriented Text Spotting with a Unified Network.

Last update: Nov 11, 2022

Overview

FOTS: Fast Oriented Text Spotting with a Unified Network

I am still working on this repo. updates and detailed instructions are coming soon!

Table of Contens

TensorFlow Versions
Other Requirements
Trained Models
Datasets
Train
- Pre-train with SynthText
- Finetune with ICDAR 2015, ICDAR 2017 MLT or ICDAR 2013
Test
References

TensorFlow Versions

As for now, the pre-training code is tested on TensorFlow 1.12, 1.14 and 1.15. I may try to implement 2.x version in the future.

Other Requirements

GCC >= 6

Trained Models

tmp pre-trained model
trained model comming soon

Datasets

pre-training
Synth800k(The dataset is only available for non-commercial research and educational purposes)
finetuning
ICDAR 2015, 2017MLT, 2013

Train

Pre-train with SynthText

Download pre-trained ResNet-50 from TensorFlow-Slim image classification model library page and place it at 'ckpt/resnet_v1_50' dir.

cd ckpt/resnet_v1_50
wget http://download.tensorflow.org/models/resnet_v1_50_2016_08_28.tar.gz
tar -zxvf resnet_v1_50_2016_08_28.tar.gz
rm resnet_v1_50_2016_08_28.tar.gz

Download Synth800k dataset and place it at data/SynthText/ dir to pre-train the whole net.
Transform(Pre-process) the SynthText data into the ICDAR data format.

python data_provider/SynthText2ICDAR.py

Train with SynthText for 10 epochs(with 1 GPU).

python train.py \
  --max_steps=715625 \
  --gpu_list='0' \
  --checkpoint_path=ckpt/synthText_10eps/ \
  --pretrained_model_path=ckpt/resnet_v1_50/resnet_v1_50.ckpt \
  --training_img_data_dir=data/SynthText/ \
  --training_gt_data_dir=data/SynthText/ \
  --icdar=False \

Visualize pre-pretraining progress with TensorBoard.

tensorboard --logdir=ckpt/synthText_10eps/

Finetune with ICDAR 2015, ICDAR 2017 MLT or ICDAR 2013

(if you are using the pre-trained model, place all of the files in ckpt/synthText_10eps/)

Combine ICDAR data before training.
1. Place ICDAR data under tmp/ foler.
2. Run the following script to combine the data.
```
python combine_ICDAR_data.py --year [year of ICDAR to train(13 or 15 or 17)]
```

ICDAR 2017 MLT/pre-finetune for ICDAR 2013 or ICDAR 2015 (text detection task only)

Train the pre-trained model with 9,000 images from ICDAR 2017 MLT training and validation datasets(with 1 GPU).

python train.py \
  --gpu_list='0' \
  --checkpoint_path=ckpt/ICDAR17MLT/ \
  --pretrained_model_path=ckpt/synthText_10eps/ \
  --train_stage=0 \
  --training_img_data_dir=data/ICDAR17MLT/imgs/ \
  --training_gt_data_dir=data/ICDAR17MLT/gts/

ICDAR 2015

Train the model with 1,000 images from ICDAR 2015 training dataset and 229 images from ICDAR 2013 training datasets(with 1 GPU).

python train.py \
  --gpu_list='0' \
  --checkpoint_path=ckpt/ICDAR15/ \
  --pretrained_model_path=ckpt/ICDAR17MLT/ \
  --training_img_data_dir=data/ICDAR15+13/imgs/ \
  --training_gt_data_dir=data/ICDAR15+13/gts/

ICDAR 2013(horizontal text only)

Train the model with 229 images from ICDAR 2013 training datasets(with 1 GPU).

python train.py \
  --gpu_list='0' \
  --checkpoint_path=ckpt/ICDAR13/ \
  --pretrained_model_path=ckpt/ICDAR17MLT/ \
  --training_img_data_dir=data/ICDAR13/imgs/ \
  --training_gt_data_dir=data/ICDAR13/gts/

Test

Place some images in test_imgs/ dir and specify a trained checkpoint path to see the test result.

python test.py --test_data_path test_imgs/ --checkpoint_path [checkpoint path]

TensorFlow Implementation of FOTS, Fast Oriented Text Spotting with a Unified Network.

Related tags

Overview

FOTS: Fast Oriented Text Spotting with a Unified Network

Table of Contens

TensorFlow Versions

Other Requirements

Trained Models

Datasets

Train

Pre-train with SynthText

Finetune with ICDAR 2015, ICDAR 2017 MLT or ICDAR 2013

Test

References

Owner

Masao Taketani

Some codes from PyImageSearch course's and external projects.

This repository summarized computer vision theories.

Awesome anomaly detection in medical images

Sign Language Recognition service utilizing a deep learning model with Long Short-Term Memory to perform sign language recognition.

This is a pytorch re-implementation of EAST: An Efficient and Accurate Scene Text Detector.

The code of "Mask TextSpotter: An End-to-End Trainable Neural Network for Spotting Text with Arbitrary Shapes"

The code for CVPR2022 paper "Likert Scoring with Grade Decoupling for Long-term Action Assessment".

Toolbox for OCR post-correction

A collection of resources (including the papers and datasets) of OCR (Optical Character Recognition).

Opencv face recognition desktop application

Roboflow makes managing, preprocessing, augmenting, and versioning datasets for computer vision seamless.

Autonomous Driving project for Euro Truck Simulator 2

A simple python program to record security cam footage by detecting a face and body of a person in the frame.

Demo processor to illustrate OCR-D Python API

Bu uygulamada Python ve Opencv kullanarak bilgisayar kamerasından yüz tespiti yapıyoruz.

This Repository contain Opencv Projects in python

An OCR evaluation tool

Simple app for visual editing of Page XML files

SCOUTER: Slot Attention-based Classifier for Explainable Image Recognition

Detect textlines in document images