text detection mainly based on ctpn model in tensorflow, id card detect, connectionist text proposal network

Overview

text-detection-ctpn

Scene text detection based on ctpn (connectionist text proposal network). It is implemented in tensorflow. The origin paper can be found here. Also, the origin repo in caffe can be found in here. For more detail about the paper and code, see this blog. If you got any questions, check the issue first, if the problem persists, open a new issue.


NOTICE: Thanks to banjin-xjy, banjin and I have reonstructed this repo. The old repo was written based on Faster-RCNN, and remains tons of useless code and dependencies, make it hard to understand and maintain. Hence we reonstruct this repo. The old code is saved in branch master


roadmap

  • reonstruct the repo
  • cython nms and bbox utils
  • loss function as referred in paper
  • oriented text connector
  • BLSTM

setup

nms and bbox utils are written in cython, hence you have to build the library first.

cd utils/bbox
chmod +x make.sh
./make.sh

It will generate a nms.so and a bbox.so in current folder.


demo

  • follow setup to build the library
  • download the ckpt file from googl drive or baidu yun
  • put checkpoints_mlt/ in text-detection-ctpn/
  • put your images in data/demo, the results will be saved in data/res, and run demo in the root
python ./main/demo.py

training

prepare data

  • First, download the pre-trained model of VGG net and put it in data/vgg_16.ckpt. you can download it from tensorflow/models
  • Second, download the dataset we prepared from google drive or baidu yun. put the downloaded data in data/dataset/mlt, then start the training.
  • Also, you can prepare your own dataset according to the following steps.
  • Modify the DATA_FOLDER and OUTPUT in utils/prepare/split_label.py according to your dataset. And run split_label.py in the root
python ./utils/prepare/split_label.py
  • it will generate the prepared data in data/dataset/
  • The input file format demo of split_label.py can be found in gt_img_859.txt. And the output file of split_label.py is img_859.txt. A demo image of the prepared data is shown below.


train

Simplely run

python ./main/train.py
  • The model provided in checkpoints_mlt is trained on GTX1070 for 50k iters. It takes about 0.25s per iter. So it will takes about 3.5 hours to finished 50k iterations.

some results

NOTICE: all the photos used below are collected from the internet. If it affects you, please contact me to delete them.


oriented text connector

  • oriented text connector has been implemented, i's working, but still need futher improvement.
  • left figure is the result for DETECT_MODE H, right figure for DETECT_MODE O

Comments
  • How to export model for Tensorflow Serving?

    How to export model for Tensorflow Serving?

    Follow some tutorials of exporting model for Tensorflow Serving, I've come up below trial:

    cfg_from_file('ctpn/text.yml')
    config = tf.ConfigProto(allow_soft_placement=True)
    with tf.Session(config=config) as sess:
            net = get_network("VGGnet_test")
    
            saver = tf.train.Saver()
            try:
                ckpt = tf.train.get_checkpoint_state(cfg.TEST.checkpoints_path)
                saver.restore(sess, ckpt.model_checkpoint_path)
            except:
                raise 'Missing pre-trained model: {}'.format(ckpt.model_checkpoint_path)
    
           # The main export trial is here
           #############################
            export_path = os.path.join(
                tf.compat.as_bytes('/tmp/ctpn'),
                tf.compat.as_bytes(str(1)))
            builder = tf.saved_model.builder.SavedModelBuilder(export_path)
    
            freezing_graph = sess.graph
            prediction_signature = tf.saved_model.signature_def_utils.predict_signature_def(
                inputs={'input': freezing_graph.get_tensor_by_name('Placeholder:0')},
                outputs={'output': freezing_graph.get_tensor_by_name('Placeholder_1:0')}
            )
    
            builder.add_meta_graph_and_variables(
                sess,
                [tf.saved_model.tag_constants.SERVING],
                signature_def_map={
                    tf.saved_model.signature_constants.DEFAULT_SERVING_SIGNATURE_DEF_KEY: prediction_signature
                },
                clear_devices=True)
    
            builder.save()
            print('[INFO] Export SavedModel into {}'.format(export_path))
            #############################
    

    With output of freezing_graph is:

    Tensor("Placeholder:0", shape=(?, ?, ?, 3), dtype=float32)
    Tensor("conv5_3/conv5_3:0", shape=(?, ?, ?, 512), dtype=float32)
    Tensor("rpn_conv/3x3/rpn_conv/3x3:0", shape=(?, ?, ?, 512), dtype=float32)
    Tensor("lstm_o/Reshape_2:0", shape=(?, ?, ?, 512), dtype=float32)
    Tensor("lstm_o/Reshape_2:0", shape=(?, ?, ?, 512), dtype=float32)
    Tensor("rpn_cls_score/Reshape_1:0", shape=(?, ?, ?, 20), dtype=float32)
    Tensor("rpn_cls_prob:0", shape=(?, ?, ?, ?), dtype=float32)
    Tensor("Reshape_2:0", shape=(?, ?, ?, 20), dtype=float32)
    Tensor("rpn_bbox_pred/Reshape_1:0", shape=(?, ?, ?, 40), dtype=float32)
    Tensor("Placeholder_1:0", shape=(?, 3), dtype=float32)
    

    After I got /tmp/ctpn/1 exported model, I try to load into Tensorflow Serving server:

    tensorflow_model_server --port=9000 --model_name=ctpn --model_base_path=/tmp/ctpn
    

    But it came up an error:

    Loading servable: {name: detector version: 1} failed: Not found: Op type not registered 'PyFunc' in binary running on [...]. Make sure the Op and Kernel are registered in the binary running in this process.
    

    So there are 2 questions:

    • Am I right about the inputs (Placeholder:0) and the outputs (Placeholder_1:0) of prediction_signature
    • Where do I miss PyFunc?
    opened by hiepph 20
  • exuse me !!!!do you have some tricks when training model?why do i train MLT dataset as default parameters and after 50000iters,it can detect nothing?

    exuse me !!!!do you have some tricks when training model?why do i train MLT dataset as default parameters and after 50000iters,it can detect nothing?

    exuse me !!!! do you have some tricks when training model?why do i train MLT dataset as default parameters and after 50000 iters,it can detect nothing?

    opened by cjt222 15
  • BiLSTM and Training Time

    BiLSTM and Training Time

    Thanks for sharing your implementation with us. I have implemented CTPN with Caffe which failed to converge when adding LSTM. First, I want to ask whether you have added the BiLSTM in your code or not. I am new to tensorflow. After looking at the code, I think you just implement the LSTM not the BiLSTM, is it right ? Second, I want to ask how long did you train your model? I have run the train script of your programs on a GPU device. It seems that it would take 5-6 days to finish the first 180000 iterations.

    Thanks very much.

    opened by TaoDream 14
  • I try to change code from python2 to python3

    I try to change code from python2 to python3

    I try to change code from python2 to python3,but when I finish all the mistake,and run the code,it caused error below,i do not know where it come from and how to carry out it,how can i do?Thank you so much

    2017-10-26 14:07:03.163176: W tensorflow/core/framework/op_kernel.cc:1158] Unknown: KeyError: b'TEST' Traceback (most recent call last): File "/usr/local/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1139, in _do_call return fn(*args) File "/usr/local/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1121, in _run_fn status, run_metadata) File "/usr/local/lib/python3.6/contextlib.py", line 88, in exit next(self.gen) File "/usr/local/lib/python3.6/site-packages/tensorflow/python/framework/errors_impl.py", line 466, in raise_exception_on_not_ok_status pywrap_tensorflow.TF_GetCode(status)) tensorflow.python.framework.errors_impl.UnknownError: KeyError: b'TEST' [[Node: rois/PyFunc = PyFunc[Tin=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_STRING, DT_INT32, DT_INT32], Tout=[DT_FLOAT, DT_FLOAT], token="pyfunc_0", _device="/job:localhost/replica:0/task:0/cpu:0"](Reshape_5/_75, rpn_bbox_pred/Reshape/_77, _arg_Placeholder_1_0_1, rois/PyFunc/input_3, rois/PyFunc/input_4, rois/PyFunc/input_5)]] [[Node: rois/PyFunc/_79 = _Recvclient_terminated=false, recv_device="/job:localhost/replica:0/task:0/gpu:0", send_device="/job:localhost/replica:0/task:0/cpu:0", send_device_incarnation=1, tensor_name="edge_290_rois/PyFunc", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/gpu:0"]]

    During handling of the above exception, another exception occurred:

    Traceback (most recent call last): File "ctpn/demo.py", line 95, in _, _ = test_ctpn(sess, net, im) File "/root/chengjuntao/text-detection-ctpn/lib/fast_rcnn/test.py", line 171, in test_ctpn rois = sess.run([net.get_output('rois')[0]],feed_dict=feed_dict) File "/usr/local/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 789, in run run_metadata_ptr) File "/usr/local/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 997, in _run feed_dict_string, options, run_metadata) File "/usr/local/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1132, in _do_run target_list, options, run_metadata) File "/usr/local/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1152, in _do_call raise type(e)(node_def, op, message) tensorflow.python.framework.errors_impl.UnknownError: KeyError: b'TEST' [[Node: rois/PyFunc = PyFunc[Tin=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_STRING, DT_INT32, DT_INT32], Tout=[DT_FLOAT, DT_FLOAT], token="pyfunc_0", _device="/job:localhost/replica:0/task:0/cpu:0"](Reshape_5/_75, rpn_bbox_pred/Reshape/_77, _arg_Placeholder_1_0_1, rois/PyFunc/input_3, rois/PyFunc/input_4, rois/PyFunc/input_5)]] [[Node: rois/PyFunc/_79 = _Recvclient_terminated=false, recv_device="/job:localhost/replica:0/task:0/gpu:0", send_device="/job:localhost/replica:0/task:0/cpu:0", send_device_incarnation=1, tensor_name="edge_290_rois/PyFunc", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/gpu:0"]]

    Caused by op 'rois/PyFunc', defined at: File "ctpn/demo.py", line 85, in net = get_network("VGGnet_test") File "/root/chengjuntao/text-detection-ctpn/lib/networks/factory.py", line 20, in get_network return VGGnet_test() File "/root/chengjuntao/text-detection-ctpn/lib/networks/VGGnet_test.py", line 14, in init self.setup() File "/root/chengjuntao/text-detection-ctpn/lib/networks/VGGnet_test.py", line 68, in setup .proposal_layer(_feat_stride, anchor_scales, 'TEST', name='rois')) File "/root/chengjuntao/text-detection-ctpn/lib/networks/network.py", line 28, in layer_decorated layer_output = op(self, layer_input, *args, **kwargs) File "/root/chengjuntao/text-detection-ctpn/lib/networks/network.py", line 241, in proposal_layer [tf.float32,tf.float32]) File "/usr/local/lib/python3.6/site-packages/tensorflow/python/ops/script_ops.py", line 198, in py_func input=inp, token=token, Tout=Tout, name=name) File "/usr/local/lib/python3.6/site-packages/tensorflow/python/ops/gen_script_ops.py", line 38, in _py_func name=name) File "/usr/local/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 767, in apply_op op_def=op_def) File "/usr/local/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 2506, in create_op original_op=self._default_original_op, op_def=op_def) File "/usr/local/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1269, in init self._traceback = _extract_stack()

    UnknownError (see above for traceback): KeyError: b'TEST' [[Node: rois/PyFunc = PyFunc[Tin=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_STRING, DT_INT32, DT_INT32], Tout=[DT_FLOAT, DT_FLOAT], token="pyfunc_0", _device="/job:localhost/replica:0/task:0/cpu:0"](Reshape_5/_75, rpn_bbox_pred/Reshape/_77, _arg_Placeholder_1_0_1, rois/PyFunc/input_3, rois/PyFunc/input_4, rois/PyFunc/input_5)]] [[Node: rois/PyFunc/_79 = _Recvclient_terminated=false, recv_device="/job:localhost/replica:0/task:0/gpu:0", send_device="/job:localhost/replica:0/task:0/cpu:0", send_device_incarnation=1, tensor_name="edge_290_rois/PyFunc", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/gpu:0"]]

    opened by cjt222 14
  • ModuleNotFoundError: No module named 'lib'

    ModuleNotFoundError: No module named 'lib'

    When I try to execute the demo.py im receiving this error

    File "demo.py", line 9, in from lib.networks.factory import get_network ModuleNotFoundError: No module named 'lib'

    Can someone please help me with this ?

    opened by SuryaprakashNSM 12
  • AttributeError: 'NoneType' object has no attribute 'model_checkpoint_path'

    AttributeError: 'NoneType' object has no attribute 'model_checkpoint_path'

    File "/home/hjq/PycharmProjects/Tensorflow-OCR/text-detection-ctpn-master/ctpn/demo.py", line 103, in raise 'Check your pretrained {:s}'.format(ckpt.model_checkpoint_path) AttributeError: 'NoneType' object has no attribute 'model_checkpoint_path'

    opened by seawater668 11
  • After CTPN. What is your idea?

    After CTPN. What is your idea?

    Hello. eragonruan! I was very impressed with your code. One more time. Thank you so much.

    I have a problem. When the image passes through the CTPN, a green box is created. If I put this image in the OCR engine, how should I separate it (green box)? My idea is to use OpenCV. But the green box is on the text. so it hides the text. Can I draw a thin line to solve the problem?

    opened by rudebono 10
  • win10 on cpu encounter KeyError: b'TEST' error

    win10 on cpu encounter KeyError: b'TEST' error

    @eragonruan My enviroment is win10+cpu+tensorflow1.3. I have spend lot of time for this; I'm confused about this. Please help me on your spare time. Thanks!

    Here is the error: `2018-07-26 11:11:58.723047: W C:\tf_jenkins\home\workspace\rel-win\M\windows\PY\36\tensorflow\core\platform\cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations. 2018-07-26 11:11:58.728604: W C:\tf_jenkins\home\workspace\rel-win\M\windows\PY\36\tensorflow\core\platform\cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations. Tensor("Placeholder:0", shape=(?, ?, ?, 3), dtype=float32) Tensor("conv5_3/conv5_3:0", shape=(?, ?, ?, 512), dtype=float32) Tensor("rpn_conv/3x3/rpn_conv/3x3:0", shape=(?, ?, ?, 512), dtype=float32) Tensor("lstm_o/Reshape_2:0", shape=(?, ?, ?, 512), dtype=float32) Tensor("lstm_o/Reshape_2:0", shape=(?, ?, ?, 512), dtype=float32) Tensor("rpn_cls_score/Reshape_1:0", shape=(?, ?, ?, 20), dtype=float32) Tensor("rpn_cls_prob:0", shape=(?, ?, ?, ?), dtype=float32) Tensor("Reshape_2:0", shape=(?, ?, ?, 20), dtype=float32) Tensor("rpn_bbox_pred/Reshape_1:0", shape=(?, ?, ?, 40), dtype=float32) Tensor("Placeholder_1:0", shape=(?, 3), dtype=float32) Loading network VGGnet_test... Restoring from checkpoints/VGGnet_fast_rcnn_iter_50000.ckpt... done 2018-07-26 11:12:09.006369: W C:\tf_jenkins\home\workspace\rel-win\M\windows\PY\36\tensorflow\core\framework\op_kernel.cc:1192] Unknown: KeyError: b'TEST' Traceback (most recent call last): File "C:\Users\d00455280.CHINA\AppData\Local\Continuum\Anaconda3\envs\ocr\lib\site-packages\tensorflow\python\client\session.py", line 1327, in _do_call return fn(*args) File "C:\Users\d00455280.CHINA\AppData\Local\Continuum\Anaconda3\envs\ocr\lib\site-packages\tensorflow\python\client\session.py", line 1306, in _run_fn status, run_metadata) File "C:\Users\d00455280.CHINA\AppData\Local\Continuum\Anaconda3\envs\ocr\lib\contextlib.py", line 88, in exit next(self.gen) File "C:\Users\d00455280.CHINA\AppData\Local\Continuum\Anaconda3\envs\ocr\lib\site-packages\tensorflow\python\framework\errors_impl.py", line 466, in raise_exception_on_not_ok_status pywrap_tensorflow.TF_GetCode(status)) tensorflow.python.framework.errors_impl.UnknownError: KeyError: b'TEST' [[Node: rois/PyFunc = PyFunc[Tin=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_STRING, DT_INT32, DT_INT32], Tout=[DT_FLOAT, DT_FLOAT], token="pyfunc_0", _device="/job:localhost/replica:0/task:0/cpu:0"](Reshape_2, rpn_bbox_pred/Reshape_1, _arg_Placeholder_1_0_1, rois/PyFunc/input_3, rois/PyFunc/input_4, rois/PyFunc/input_5)]]

    During handling of the above exception, another exception occurred:

    Traceback (most recent call last): File "./ctpn/demo.py", line 97, in _, _ = test_ctpn(sess, net, im) File "D:\workspace\text-detection-ctpn-master\lib\fast_rcnn\test.py", line 51, in test_ctpn rois = sess.run([net.get_output('rois')[0]],feed_dict=feed_dict) File "C:\Users\d00455280.CHINA\AppData\Local\Continuum\Anaconda3\envs\ocr\lib\site-packages\tensorflow\python\client\session.py", line 895, in run run_metadata_ptr) File "C:\Users\d00455280.CHINA\AppData\Local\Continuum\Anaconda3\envs\ocr\lib\site-packages\tensorflow\python\client\session.py", line 1124, in _run feed_dict_tensor, options, run_metadata) File "C:\Users\d00455280.CHINA\AppData\Local\Continuum\Anaconda3\envs\ocr\lib\site-packages\tensorflow\python\client\session.py", line 1321, in _do_run options, run_metadata) File "C:\Users\d00455280.CHINA\AppData\Local\Continuum\Anaconda3\envs\ocr\lib\site-packages\tensorflow\python\client\session.py", line 1340, in _do_call raise type(e)(node_def, op, message) tensorflow.python.framework.errors_impl.UnknownError: KeyError: b'TEST' [[Node: rois/PyFunc = PyFunc[Tin=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_STRING, DT_INT32, DT_INT32], Tout=[DT_FLOAT, DT_FLOAT], token="pyfunc_0", _device="/job:localhost/replica:0/task:0/cpu:0"](Reshape_2, rpn_bbox_pred/Reshape_1, _arg_Placeholder_1_0_1, rois/PyFunc/input_3, rois/PyFunc/input_4, rois/PyFunc/input_5)]]

    Caused by op 'rois/PyFunc', defined at: File "./ctpn/demo.py", line 82, in net = get_network("VGGnet_test") File "D:\workspace\text-detection-ctpn-master\lib\networks\factory.py", line 8, in get_network return VGGnet_test() File "D:\workspace\text-detection-ctpn-master\lib\networks\VGGnet_test.py", line 15, in init self.setup() File "D:\workspace\text-detection-ctpn-master\lib\networks\VGGnet_test.py", line 56, in setup .proposal_layer(_feat_stride, anchor_scales, 'TEST', name='rois')) File "D:\workspace\text-detection-ctpn-master\lib\networks\network.py", line 21, in layer_decorated layer_output = op(self, layer_input, *args, **kwargs) File "D:\workspace\text-detection-ctpn-master\lib\networks\network.py", line 215, in proposal_layer [tf.float32,tf.float32]) File "C:\Users\d00455280.CHINA\AppData\Local\Continuum\Anaconda3\envs\ocr\lib\site-packages\tensorflow\python\ops\script_ops.py", line 203, in py_func input=inp, token=token, Tout=Tout, name=name) File "C:\Users\d00455280.CHINA\AppData\Local\Continuum\Anaconda3\envs\ocr\lib\site-packages\tensorflow\python\ops\gen_script_ops.py", line 36, in _py_func name=name) File "C:\Users\d00455280.CHINA\AppData\Local\Continuum\Anaconda3\envs\ocr\lib\site-packages\tensorflow\python\framework\op_def_library.py", line 767, in apply_op op_def=op_def) File "C:\Users\d00455280.CHINA\AppData\Local\Continuum\Anaconda3\envs\ocr\lib\site-packages\tensorflow\python\framework\ops.py", line 2630, in create_op original_op=self._default_original_op, op_def=op_def) File "C:\Users\d00455280.CHINA\AppData\Local\Continuum\Anaconda3\envs\ocr\lib\site-packages\tensorflow\python\framework\ops.py", line 1204, in init self._traceback = self._graph._extract_stack() # pylint: disable=protected-access

    UnknownError (see above for traceback): KeyError: b'TEST' [[Node: rois/PyFunc = PyFunc[Tin=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_STRING, DT_INT32, DT_INT32], Tout=[DT_FLOAT, DT_FLOAT], token="pyfunc_0", _device="/job:localhost/replica:0/task:0/cpu:0"](Reshape_2, rpn_bbox_pred/Reshape_1, _arg_Placeholder_1_0_1, rois/PyFunc/input_3, rois/PyFunc/input_4, rois/PyFunc/input_5)]]`

    opened by Crocodiles 9
  •   UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory. This may consume a large amount of memory.

    UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory. This may consume a large amount of memory.

    when i try to train my own datasets, i faced this core dump,

    Computing bounding-box regression targets... bbox target means: [[ 0. 0. 0. 0.] [ 0. 0. 0. 0.]] [ 0. 0. 0. 0.] bbox target stdevs: [[ 0.1 0.1 0.2 0.2] [ 0.1 0.1 0.2 0.2]] [ 0.1 0.1 0.2 0.2] Normalizing targets done Solving... /data/resys/var/python2.7.3/lib/python2.7/site-packages/tensorflow/python/ops/gradients_impl.py:91: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory. "Converting sparse IndexedSlices to a dense Tensor of unknown shape. " Segmentation fault (core dumped)

    this fault caused by the function train_model() in lib/fast-rcnn/train.py , when it run train_op=opt.apply_gradients(list(zip(grads,tvars),global_step=global_step)) .

    have you ever faced this error?

    opened by louisly 8
  • WHERE ARE gt_img_1001.txt ... gt_img_6000.txt FILES?!

    WHERE ARE gt_img_1001.txt ... gt_img_6000.txt FILES?!

    I cloned the project, downloaded the pretrained model and the training data, unzipped them then got a folder named TEXTVOC that contains other folders (Annotations, ImagesSets and JPEGImages). I placed TEXTVOC in the data folder and edited the paths in the split_label.py as following: path = '/home/hani/Desktop/text-detection-ctpn/data/TEXTVOC/JPEGImages' gt_path = '/home/hani/Desktop/text-detection-ctpn/data/TEXTVOC/Annotations'. But, when I run it python lib/prepare_training_data/split_label.py, I got this error:

    /home/hani/Desktop/text-detection-ctpn/data/TEXTVOC/JPEGImages/img_1001.jpg
    Traceback (most recent call last):
      File "lib/prepare_training_data/split_label.py", line 34, in <module>
        with open(gt_file, 'r') as f:
    FileNotFoundError: [Errno 2] No such file or directory: '/home/hani/Desktop/text-detection-ctpn/data/TEXTVOC/Annotations/gt_img_1001.txt'
    

    As it is shown, the gt_img_1001.txt is not missing, which means whether the gt_path is wrong whether those files do not exist. I looked into all folders and I didn't find any file starting with gt_* PS: I also run this command: ln -s TEXTVOC VOCdevkit2007, so I didn't miss anything that should be done.

    opened by maky-hnou 7
  • how to run the train scripts

    how to run the train scripts

    在split_label.py中有两个路径path = '/media/D/code/OCR/text-detection-ctpn/data/mlt_english+chinese/image'和gt_path = '/media/D/code/OCR/text-detection-ctpn/data/mlt_english+chinese/label',代码中会分别读取这个两个路径下的文件,请问....../label中放什么样的文件?谢谢

    opened by jibadallz 7
  • Bump tensorflow-gpu from 1.4.0 to 2.9.3

    Bump tensorflow-gpu from 1.4.0 to 2.9.3

    Bumps tensorflow-gpu from 1.4.0 to 2.9.3.

    Release notes

    Sourced from tensorflow-gpu's releases.

    TensorFlow 2.9.3

    Release 2.9.3

    This release introduces several vulnerability fixes:

    TensorFlow 2.9.2

    Release 2.9.2

    This releases introduces several vulnerability fixes:

    ... (truncated)

    Changelog

    Sourced from tensorflow-gpu's changelog.

    Release 2.9.3

    This release introduces several vulnerability fixes:

    Release 2.8.4

    This release introduces several vulnerability fixes:

    ... (truncated)

    Commits
    • a5ed5f3 Merge pull request #58584 from tensorflow/vinila21-patch-2
    • 258f9a1 Update py_func.cc
    • cd27cfb Merge pull request #58580 from tensorflow-jenkins/version-numbers-2.9.3-24474
    • 3e75385 Update version numbers to 2.9.3
    • bc72c39 Merge pull request #58482 from tensorflow-jenkins/relnotes-2.9.3-25695
    • 3506c90 Update RELEASE.md
    • 8dcb48e Update RELEASE.md
    • 4f34ec8 Merge pull request #58576 from pak-laura/c2.99f03a9d3bafe902c1e6beb105b2f2417...
    • 6fc67e4 Replace CHECK with returning an InternalError on failing to create python tuple
    • 5dbe90a Merge pull request #58570 from tensorflow/r2.9-7b174a0f2e4
    • Additional commits viewable in compare view

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

    dependencies 
    opened by dependabot[bot] 0
  • Bump numpy from 1.14.2 to 1.22.0

    Bump numpy from 1.14.2 to 1.22.0

    Bumps numpy from 1.14.2 to 1.22.0.

    Release notes

    Sourced from numpy's releases.

    v1.22.0

    NumPy 1.22.0 Release Notes

    NumPy 1.22.0 is a big release featuring the work of 153 contributors spread over 609 pull requests. There have been many improvements, highlights are:

    • Annotations of the main namespace are essentially complete. Upstream is a moving target, so there will likely be further improvements, but the major work is done. This is probably the most user visible enhancement in this release.
    • A preliminary version of the proposed Array-API is provided. This is a step in creating a standard collection of functions that can be used across application such as CuPy and JAX.
    • NumPy now has a DLPack backend. DLPack provides a common interchange format for array (tensor) data.
    • New methods for quantile, percentile, and related functions. The new methods provide a complete set of the methods commonly found in the literature.
    • A new configurable allocator for use by downstream projects.

    These are in addition to the ongoing work to provide SIMD support for commonly used functions, improvements to F2PY, and better documentation.

    The Python versions supported in this release are 3.8-3.10, Python 3.7 has been dropped. Note that 32 bit wheels are only provided for Python 3.8 and 3.9 on Windows, all other wheels are 64 bits on account of Ubuntu, Fedora, and other Linux distributions dropping 32 bit support. All 64 bit wheels are also linked with 64 bit integer OpenBLAS, which should fix the occasional problems encountered by folks using truly huge arrays.

    Expired deprecations

    Deprecated numeric style dtype strings have been removed

    Using the strings "Bytes0", "Datetime64", "Str0", "Uint32", and "Uint64" as a dtype will now raise a TypeError.

    (gh-19539)

    Expired deprecations for loads, ndfromtxt, and mafromtxt in npyio

    numpy.loads was deprecated in v1.15, with the recommendation that users use pickle.loads instead. ndfromtxt and mafromtxt were both deprecated in v1.17 - users should use numpy.genfromtxt instead with the appropriate value for the usemask parameter.

    (gh-19615)

    ... (truncated)

    Commits

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

    dependencies 
    opened by dependabot[bot] 0
  • Model Quantization

    Model Quantization

    The pretrained model is favorable as it detects text with high accuracy. However, it takes long time to inference with CPU, this is out of expectation in production deployment. Is there a way to quantize the model?

    What I've tried is to convert the checkpoint model to saved_model format, then load from saved_model to perform quantization with TFLite converter, code snippet as followed:

    # Load checkpoint and convert to saved_model
    import tf
    trained_checkpoint_prefix = "checkpoints_mlt/ctpn_50000.ckpt"
    export_dir = "exported_model"
    
    graph = tf.Graph()
    with tf.compat.v1.Session(graph=graph) as sess:
        # Restore from checkpoint
        loader = tf.compat.v1.train.import_meta_graph(trained_checkpoint_prefix + ".meta")
        loader.restore(sess, trained_checkpoint_prefix)
    
    # Export checkpoint to SavedModel
    builder = tf.compat.v1.saved_model.builder.SavedModelBuilder(export_dir)
    builder.add_meta_graph_and_variables(sess,
                                         [tf.saved_model.TRAINING, tf.saved_model.SERVING],
                                         strip_default_attrs=True)
    builder.save()
    

    In result, I got a .pb file and and a variables folder with checkpoint and index files inside. Then errors popped out when I tried to perform quantization:

    converter = tf.lite.TFLiteConverter.from_saved_model(export_dir)
    converter.optimizations = [tf.lite.Optimize.DEFAULT]
    converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
    converter.inference_input_type = tf.int8  # or tf.uint8
    converter.inference_output_type = tf.int8  # or tf.uint8
    tflite_quant_model = converter.convert()
    

    This is the error:

    ---------------------------------------------------------------------------
    ValueError                                Traceback (most recent call last)
    <ipython-input-4-03205673177f> in <module>
         11 converter.inference_input_type = tf.int8  # or tf.uint8
         12 converter.inference_output_type = tf.int8  # or tf.uint8
    ---> 13 tflite_quant_model = converter.convert()
    
    ~/virtualenvironment/tf2/lib/python3.6/site-packages/tensorflow/lite/python/lite.py in convert(self)
        450     # TODO(b/130297984): Add support for converting multiple function.
        451     if len(self._funcs) != 1:
    --> 452       raise ValueError("This converter can only convert a single "
        453                        "ConcreteFunction. Converting multiple functions is "
        454                        "under development.")
    
    ValueError: This converter can only convert a single ConcreteFunction. Converting multiple functions is under development.
    

    Understand that this error was raised due to the multiple inputs input_image and input_im_info required by the model. Appreciate if anyone could help.

    opened by leeshien 0
  • Bump ipython from 5.1.0 to 7.16.3

    Bump ipython from 5.1.0 to 7.16.3

    Bumps ipython from 5.1.0 to 7.16.3.

    Commits

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

    dependencies 
    opened by dependabot[bot] 0
Releases(untagged-48d74c6337a71b6b5f87)
Owner
Shaohui Ruan
Interested in machine learning & computer vision
Shaohui Ruan
An unofficial package help developers to implement ZATCA (Fatoora) QR code easily which required for e-invoicing

ZATCA (Fatoora) QR-Code Implementation An unofficial package help developers to implement ZATCA (Fatoora) QR code easily which required for e-invoicin

TheAwiteb 28 Nov 03, 2022
ARU-Net - Deep Learning Chinese Word Segment

ARU-Net: A Neural Pixel Labeler for Layout Analysis of Historical Documents Contents Introduction Installation Demo Training Introduction This is the

128 Sep 12, 2022
Tracking the latest progress in Scene Text Detection and Recognition: Must-read papers well organized

SceneTextPapers Tracking the latest progress in Scene Text Detection and Recognition: must-read papers well organized Information about this repositor

Shangbang Long 763 Jan 01, 2023
Optical character recognition for Japanese text, with the main focus being Japanese manga

Manga OCR Optical character recognition for Japanese text, with the main focus being Japanese manga. It uses a custom end-to-end model built with Tran

Maciej Budyś 327 Jan 01, 2023
A simple document layout analysis using Python-OpenCV

Run the application: python main.py *Note: For first time running the application, create a folder named "output". The application is a simple documen

Roinand Aguila 109 Dec 12, 2022
pulse2percept: A Python-based simulation framework for bionic vision

pulse2percept: A Python-based simulation framework for bionic vision Retinal degenerative diseases such as retinitis pigmentosa and macular degenerati

67 Dec 29, 2022
This repository lets you train neural networks models for performing end-to-end full-page handwriting recognition using the Apache MXNet deep learning frameworks on the IAM Dataset.

Handwritten Text Recognition (OCR) with MXNet Gluon These notebooks have been created by Jonathan Chung, as part of his internship as Applied Scientis

Amazon Web Services - Labs 422 Jan 03, 2023
Python library to extract tabular data from images and scanned PDFs

Overview ExtractTable - API to extract tabular data from images and scanned PDFs The motivation is to make it easy for developers to extract tabular d

Org. Account 165 Dec 31, 2022
When Age-Invariant Face Recognition Meets Face Age Synthesis: A Multi-Task Learning Framework (CVPR 2021 oral)

MTLFace This repository contains the PyTorch implementation and the dataset of the paper: When Age-Invariant Face Recognition Meets Face Age Synthesis

Hzzone 120 Jan 05, 2023
Characterizing possible failure modes in physics-informed neural networks.

Characterizing possible failure modes in physics-informed neural networks This repository contains the PyTorch source code for the experiments in the

Aditi Krishnapriyan 55 Jan 02, 2023
make a better chinese character recognition OCR than tesseract

deep ocr See README_en.md for English installation documentation. 只在ubuntu下面测试通过,需要virtualenv安装,安装路径可自行调整: git clone https://github.com/JinpengLI/deep

Jinpeng 1.5k Dec 28, 2022
A simple OCR API server, seriously easy to be deployed by Docker, on Heroku as well

ocrserver Simple OCR server, as a small working sample for gosseract. Try now here https://ocr-example.herokuapp.com/, and deploy your own now. Deploy

Hiromu OCHIAI 541 Dec 28, 2022
Image processing is one of the most common term in computer vision

Image processing is one of the most common term in computer vision. Computer vision is the process by which computers can understand images and videos, and how they are stored, manipulated, and retri

Happy N. Monday 3 Feb 15, 2022
A Python wrapper for Google Tesseract

Python Tesseract Python-tesseract is an optical character recognition (OCR) tool for python. That is, it will recognize and "read" the text embedded i

Matthias A Lee 4.6k Jan 06, 2023
Morphological edge detection or object's boundary detection using erosion and dialation in OpenCV python

Morphologycal-edge-detection-using-erosion-and-dialation the task is to detect object boundary using erosion or dialation . Here, use the kernel or st

Tamzid hasan 3 Nov 25, 2022
OCR engine for all the languages

Description kraken is a turn-key OCR system optimized for historical and non-Latin script material. kraken's main features are: Fully trainable layout

431 Jan 04, 2023
第一届西安交通大学人工智能实践大赛(2018AI实践大赛--图片文字识别)第一名;仅采用densenet识别图中文字

OCR 第一届西安交通大学人工智能实践大赛(2018AI实践大赛--图片文字识别)冠军 模型结果 该比赛计算每一个条目的f1score,取所有条目的平均,具体计算方式在这里。这里的计算方式不对一句话里的相同文字重复计算,故f1score比提交的最终结果低: - train val f1score 0

尹畅 441 Dec 22, 2022
Code for the paper "Controllable Video Captioning with an Exemplar Sentence"

SMCG Code for the paper "Controllable Video Captioning with an Exemplar Sentence" Introduction We investigate a novel and challenging task, namely con

10 Dec 04, 2022
WACV 2022 Paper - Is An Image Worth Five Sentences? A New Look into Semantics for Image-Text Matching

Is An Image Worth Five Sentences? A New Look into Semantics for Image-Text Matching Code based on our WACV 2022 Accepted Paper: https://arxiv.org/pdf/

Andres 13 Dec 17, 2022
A little but useful tool to explore OCR data extracted with `pytesseract` and `opencv`

Screenshot OCR Tool Extracting data from screen time screenshots in iOS and Android. We are exploring 3 options: Simple OCR with no text position usin

Gabriele Marini 1 Dec 07, 2021