Line based ATR Engine based on OCRopy

Overview

logo

OCR Engine based on OCRopy and Kraken using python3. It is designed to both be easy to use from the command line but also be modular to be integrated and customized from other python scripts.

preview

Pretrained model repository

Pretrained models are available at (https://github.com/Calamari-OCR/calamari_models). The current release can be accessed here (336 MB).

Installing

Installation using Pip

The suggested method is to install calamari into a virtual environment using pip:

virtualenv -p python3 PATH_TO_VENV_DIR (e. g. virtualenv calamari_venv)
source PATH_TO_VENV_DIR/bin/activate
pip install calamari_ocr

which will install calamari and all of its dependencies.

To install the package without a virtual environment simply run

pip install calamari_ocr

To install the package from its source, download the source code and run

python setup.py install

Installation using Conda

Run

conda env create -f environment_master_gpu.yml

Alternatively you can install the cpu versions or the current dev version instead of the stable master.

Command line interface (Standard User)

If you simply want to use calamari for applying existent models to your text lines and optionally train new models you probably should use the command line interface of calamari, which is very similar to the one of OCRopy.

Note that you have to activate the virtual environment if used during the installation in order to make the command line scripts available.

Prediction of a page

Currently only OCR on lines is supported. To segment pages into lines (and the preceding preprocessing steps) we refer to the solutions provided by OCRopus, Kraken, Tesseract, etc. For users (especially less technical ones) in need of an all-in-one package OCR4all might be worth a look.

The prediction step using very deep neural networks implemented on Tensorflow as core feature of calamari should be used:

calamari-predict --checkpoint path_to_model.ckpt --files your_images.*.png

Calamari also supports several voting algorithms to improve different predictions of different models. To enable voting you simply have to pass several models to the --checkpoint argument:

calamari-predict --checkpoint path_to_model_1.ckpt path_to_model_2.ckpt ... --files your_images.*.png

The voting algorithm can be changed by the --voter flag. Possible values are: confidence_voter_default_ctc (default), sequence_voter. Note that both confidence voters depend on the loss function used for training a model, while the sequence voter can be used for all models but might yield slightly worse results.

Training of a model

In calamari you can both train a single model using a given data set or train a fold of several (default 5) models to generate different voters for a voted prediction.

Training a single model

A single model can be trained by the calamar-train-script. Given a data set with its ground truth you can train the default model by calling:

calamari-train --files your_images.*.png

Note, that calamari expects that each image file (.png) has a corresponding ground truth text file (.gt.txt) at the same location with the same base name.

There are several important parameters to adjust the training. For a full list type calamari-train --help.

  • --network=cnn=40:3x3,pool=2x2,cnn=60:3x3,pool=2x2,lstm=200,dropout=0.5: Specify the network structure in a simple language. The default network consists of a stack of two CNN- and Pooling-Layers, respectively and a following LSTM layer. The network uses the default CTC-Loss implemented in Tensorflow for training and a dropout-rate of 0.5. The creation string thereto is: cnn=40:3x3,pool=2x2,cnn=60:3x3,pool=2x2,lstm=200,dropout=0.5. To add additional layers or remove a single layer just add or remove it in the comma separated list. Note that the order is important!
  • --line_height=48: The height of each rescaled input file passed to the network.
  • --num_threads=1: The number of threads used during training and line preprocessing.
  • --batch_size=1: The number of lines processed in parallel.
  • --display=1: (epochs) How often an informative string about the current training process is printed in the shell
  • --output_dir: A path where to store checkpoints
  • --checkpoint_frequency: (epochs) How often a model shall be written as checkpoint to the drive
  • --epochs: The maximum number of training iterations (batches) for training. Note: this is the upper boundary if you use early stopping.
  • --samples_per_epoch: The number of samples to process per epoch (by default the size of the dataset)
  • --validation=None: Provide a second data set (images with corresponding .gt.txt) to enable early stopping.
  • --early_stopping_frequency=checkpoint_frequency: How often to check for early stopping on the validation dataset.
  • --early_stopping_nbest=10: How many successive models must be worse than the current best model to break the training loop
  • --early_stopping_best_model_output_dir=output_dir: Output dir for the current best model
  • --early_stopping_best_model_prefix=best: Prefix for the best model (output name will be {prefix}.ckpt
  • --n_augmentations=0: Data augmentation on the training set.
  • --weights: Load network weights from a given pretrained model. Note that the codec will probabily change its size to match the codec of the provided ground truth files. To enforce that some characters may not be deleted use a --whitelist.
  • --whitelist=[] --whitelist_files=[]: Specify either individual characters or a text file listing all white list characters stored as string.

Hint: If you want to use early stopping but don't have a separated validation set you can train a single fold of the calamari-cross-fold-train-script (see next section).

Training a n-fold of models

To train n more-or-less individual models given a training set you can use the calamari-cross-fold-train-script. The default call is

calamari-cross-fold-train --files your_images*.*.png --best_models_dir some_dir

By default this will train 5 default models using 80%=(n-1)/n of the provided data for training and 20%=1/n for validation. These independent models can then be used to predict lines using a voting mechanism. There are several important parameters to adjust the training. For a full list type calamari-cross-fold-train --help.

  • Almost parameters of calamari-train can be used to affect the training
  • --n_folds=5: The number of folds
  • --weights=None: Specify one or n_folds models to use for pretraining.
  • --best_models_dir=REQUIRED: Directory where to store the best model determined on the validation data set
  • --best_model_label={id}: The prefix for each of the best model of each fold. A string that will be formatted. {id} will be replaced by the number of the fold, i. e. 0, ..., n-1.
  • --temporary_dir=None: A directory where to store temporary files, e. g. checkpoints of the scripts to train an individual model. By default a temporary dir using pythons tempfile modules is used.
  • --max_parallel_models=n_folds: The number of models that shall be run in parallel. By default all models are trained in parallel.
  • --single_fold=[]: Use this parameter to train only a subset, e. g. a single fold out of all n_folds.

To use all models to predict and then vote for a set of lines you can use the calamari-predict script and provide all models as checkpoint:

calamari-predict --checkpoint best_models_dir/*.ckpt.json --files your_images.*.png

Evaluating a model

To compute the performance of a model you need first to predict your evaluation data set (see calamari-predict. Afterwards run

calamari-eval --gt *.gt.txt

on the ground truth files to compute an evaluation measure including the full confusion matrix. By default the predicted sentences as produced by the calamari-predict script end in .pred.txt. You can change the default behavior of the validation script by the following parameters

  • --gt=REQUIRED: The ground truth txt files.
  • --pred=None: The prediction files. If None it is expected that the prediction files have the same base name as the ground truth files but with --pred_ext as suffix.
  • --pred_ext=.pred.txt: The suffix of the prediction files if --pred is not specified
  • --n_confusions=-1: Print only the top n_confusions most common errors.

Experimenting with different network hyperparameters (experimental)

To find a good set of hyperparameters (e. g. network structure, learning rate, batch size, ...) you can use the experiment.pyscript that will both train models using the Cross-Fold-Algorithm and evaluate the model on a given evaluation data set. Thereto this script will directly output the performance of each individual fold, the average and its standard deviation, plus the results using the different voting algorithms. If you want to use this experimental script have a look at the parameters (experiment.py --help).

Comments
  • after traning how to create models?

    after traning how to create models?

    I run train command and it saved checkpoints in output directory. now , how can i use these as model? how to make models? which command? is there any more detailed documentation for this?

    opened by UlasSAYGINIM 27
  • Allow namespace prefixes other than 'None' in PageXML

    Allow namespace prefixes other than 'None' in PageXML

    Eynollah, e.g., produces PageXML files that use an explicit prefix (xmlns:pc="http://schema.primaresearch.org/PAGE/gts/pagecontent/2019-07-15").

    calamari_ocr/ocr/dataset/datareader/pagexml/reader.py, however, expects the prefix to be 'None' and throws an error when processing an eynollah pagexml.

    When I change line 120 of reader.py from

    ns = {"ns": root.nsmap[None]} 
    

    to

    ns = {'ns' : 'http://schema.primaresearch.org/PAGE/gts/pagecontent/2019-07-15'}
    

    it works. I'm not sure if you can generalize the namespace dictionary to cover both output styles. Maybe xpath's local-name function (instead of lxml find or findall) is an alternative.

    opened by alexander-winkler 17
  • performance degradation for versions > 0.2.5

    performance degradation for versions > 0.2.5

    It seems that a performance issue was introduced between 0.2.5 and 0.3.0 releases. I tested separately on environments with tensorflow cpu and gpu. Tensorflow version: 1.13.1

    Hardware: GPU: NVIDIA Tesla M60 CPU: intel i7 4710hq (8 threads)

    I've got images already in memory, so I use RawDataSet. Then I wrap it with InputDataset. And finally I use Predictor directly in code. The code is here: https://gist.github.com/wosiu/9fa50de9e47615b5fa08b23637e1f947

    | version | GPU time | CPU time | | --- | --- | --- | | 0.2.5 | 1440 ms | 2100 ms | | 0.3.0 | not tested | 5700ms | | 0.3.1 | 5859 ms | 6000ms |

    And some logs I get, not sure if related:

    1. tensorflow-gpu, calamari 0.3.1:
    2019-05-16 15:41:16,329 INFO 21006 140466380875520 calamari_wrapper.py:70 Found 10 files in the dataset
    WARNING: RawData set should always be used with a RawInputDataSet to avoid excessive thread creation
    2019-05-16 15:41:18,694 INFO 21006 140466380875520 calamari_wrapper.py:70 Found 13 files in the dataset
    WARNING: RawData set should always be used with a RawInputDataSet to avoid excessive thread creation
    2019-05-16 15:41:20,461 INFO 21006 140466380875520 calamari_wrapper.py:70 Found 2 files in the dataset
    WARNING: RawData set should always be used with a RawInputDataSet to avoid excessive thread creation
    2019-05-16 15:41:22,187 INFO 21006 140466380875520 metrics2.py:126 ocr_ms took 5859 ms
    
    1. tensorflow cpu, calamari 0.3.1:
    2019-05-16 15:50:48,571 INFO 23732 140378726676224 calamari_wrapper.py:70 Found 10 files in the dataset
    WARNING: RawData set should always be used with a RawInputDataSet to avoid excessive thread creation
    OMP: Info #212: KMP_AFFINITY: decoding x2APIC ids.
    OMP: Info #210: KMP_AFFINITY: Affinity capable, using global cpuid leaf 11 info
    OMP: Info #154: KMP_AFFINITY: Initial OS proc set respected: 0-7
    OMP: Info #156: KMP_AFFINITY: 8 available OS procs
    OMP: Info #157: KMP_AFFINITY: Uniform topology
    OMP: Info #179: KMP_AFFINITY: 1 packages x 4 cores/pkg x 2 threads/core (4 total cores)
    OMP: Info #214: KMP_AFFINITY: OS proc to physical thread map:
    OMP: Info #171: KMP_AFFINITY: OS proc 0 maps to package 0 core 0 thread 0 
    OMP: Info #171: KMP_AFFINITY: OS proc 1 maps to package 0 core 0 thread 1 
    OMP: Info #171: KMP_AFFINITY: OS proc 2 maps to package 0 core 1 thread 0 
    OMP: Info #171: KMP_AFFINITY: OS proc 3 maps to package 0 core 1 thread 1 
    OMP: Info #171: KMP_AFFINITY: OS proc 4 maps to package 0 core 2 thread 0 
    OMP: Info #171: KMP_AFFINITY: OS proc 5 maps to package 0 core 2 thread 1 
    OMP: Info #171: KMP_AFFINITY: OS proc 6 maps to package 0 core 3 thread 0 
    OMP: Info #171: KMP_AFFINITY: OS proc 7 maps to package 0 core 3 thread 1 
    OMP: Info #250: KMP_AFFINITY: pid 24599 tid 24599 thread 0 bound to OS proc set 0
    OMP: Info #250: KMP_AFFINITY: pid 24582 tid 24582 thread 0 bound to OS proc set 0-7
    OMP: Info #250: KMP_AFFINITY: pid 24583 tid 24583 thread 0 bound to OS proc set 0-7
    OMP: Info #250: KMP_AFFINITY: pid 24582 tid 24605 thread 1 bound to OS proc set 0-7
    OMP: Info #250: KMP_AFFINITY: pid 24582 tid 24608 thread 2 bound to OS proc set 0-7
    OMP: Info #250: KMP_AFFINITY: pid 24585 tid 24585 thread 0 bound to OS proc set 0-7
    OMP: Info #250: KMP_AFFINITY: pid 24582 tid 24610 thread 3 bound to OS proc set 0-7
    OMP: Info #250: KMP_AFFINITY: pid 24585 tid 24611 thread 1 bound to OS proc set 0-7
    OMP: Info #250: KMP_AFFINITY: pid 24583 tid 24609 thread 3 bound to OS proc set 0-7
    OMP: Info #250: KMP_AFFINITY: pid 24583 tid 24606 thread 1 bound to OS proc set 0-7
    OMP: Info #250: KMP_AFFINITY: pid 24585 tid 24613 thread 3 bound to OS proc set 0-7
    OMP: Info #250: KMP_AFFINITY: pid 24583 tid 24607 thread 2 bound to OS proc set 0-7
    OMP: Info #250: KMP_AFFINITY: pid 24585 tid 24612 thread 2 bound to OS proc set 0-7
    OMP: Info #250: KMP_AFFINITY: pid 24584 tid 24584 thread 0 bound to OS proc set 0-7
    OMP: Info #250: KMP_AFFINITY: pid 24584 tid 24615 thread 1 bound to OS proc set 0-7
    OMP: Info #250: KMP_AFFINITY: pid 24584 tid 24621 thread 3 bound to OS proc set 0-7
    OMP: Info #250: KMP_AFFINITY: pid 24584 tid 24620 thread 2 bound to OS proc set 0-7
    OMP: Info #212: KMP_AFFINITY: decoding x2APIC ids.
    OMP: Info #210: KMP_AFFINITY: Affinity capable, using global cpuid leaf 11 info
    OMP: Info #154: KMP_AFFINITY: Initial OS proc set respected: 0-7
    OMP: Info #156: KMP_AFFINITY: 8 available OS procs
    OMP: Info #157: KMP_AFFINITY: Uniform topology
    OMP: Info #179: KMP_AFFINITY: 1 packages x 4 cores/pkg x 2 threads/core (4 total cores)
    OMP: Info #214: KMP_AFFINITY: OS proc to physical thread map:
    OMP: Info #171: KMP_AFFINITY: OS proc 0 maps to package 0 core 0 thread 0 
    OMP: Info #171: KMP_AFFINITY: OS proc 1 maps to package 0 core 0 thread 1 
    OMP: Info #171: KMP_AFFINITY: OS proc 2 maps to package 0 core 1 thread 0 
    OMP: Info #171: KMP_AFFINITY: OS proc 3 maps to package 0 core 1 thread 1 
    OMP: Info #171: KMP_AFFINITY: OS proc 4 maps to package 0 core 2 thread 0 
    OMP: Info #171: KMP_AFFINITY: OS proc 5 maps to package 0 core 2 thread 1 
    OMP: Info #171: KMP_AFFINITY: OS proc 6 maps to package 0 core 3 thread 0 
    OMP: Info #171: KMP_AFFINITY: OS proc 7 maps to package 0 core 3 thread 1 
    OMP: Info #250: KMP_AFFINITY: pid 24614 tid 24614 thread 0 bound to OS proc set 0
    2019-05-16 15:50:50.630679: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 15:50:50.630721: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 15:50:50.677730: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 15:50:50.677766: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 15:50:50.717138: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 15:50:50.717175: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 15:50:50.753602: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 15:50:50.753637: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 15:50:50.775119: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 15:50:50.775151: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 15:50:50.793670: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 15:50:50.793716: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 15:50:50.827669: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 15:50:50.827717: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 15:50:50.865883: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 15:50:50.865927: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 15:50:50.912646: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 15:50:50.912676: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 15:50:50.955744: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 15:50:50.955777: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 15:50:50,960 INFO 23732 140378726676224 calamari_wrapper.py:70 Found 13 files in the dataset
    WARNING: RawData set should always be used with a RawInputDataSet to avoid excessive thread creation
    OMP: Info #212: KMP_AFFINITY: decoding x2APIC ids.
    OMP: Info #210: KMP_AFFINITY: Affinity capable, using global cpuid leaf 11 info
    OMP: Info #154: KMP_AFFINITY: Initial OS proc set respected: 0-7
    OMP: Info #156: KMP_AFFINITY: 8 available OS procs
    OMP: Info #157: KMP_AFFINITY: Uniform topology
    OMP: Info #179: KMP_AFFINITY: 1 packages x 4 cores/pkg x 2 threads/core (4 total cores)
    OMP: Info #214: KMP_AFFINITY: OS proc to physical thread map:
    OMP: Info #171: KMP_AFFINITY: OS proc 0 maps to package 0 core 0 thread 0 
    OMP: Info #171: KMP_AFFINITY: OS proc 1 maps to package 0 core 0 thread 1 
    OMP: Info #171: KMP_AFFINITY: OS proc 2 maps to package 0 core 1 thread 0 
    OMP: Info #171: KMP_AFFINITY: OS proc 3 maps to package 0 core 1 thread 1 
    OMP: Info #171: KMP_AFFINITY: OS proc 4 maps to package 0 core 2 thread 0 
    OMP: Info #171: KMP_AFFINITY: OS proc 5 maps to package 0 core 2 thread 1 
    OMP: Info #171: KMP_AFFINITY: OS proc 6 maps to package 0 core 3 thread 0 
    OMP: Info #171: KMP_AFFINITY: OS proc 7 maps to package 0 core 3 thread 1 
    OMP: Info #250: KMP_AFFINITY: pid 24643 tid 24643 thread 0 bound to OS proc set 0
    OMP: Info #212: KMP_AFFINITY: decoding x2APIC ids.
    OMP: Info #210: KMP_AFFINITY: Affinity capable, using global cpuid leaf 11 info
    OMP: Info #154: KMP_AFFINITY: Initial OS proc set respected: 0-7
    OMP: Info #156: KMP_AFFINITY: 8 available OS procs
    OMP: Info #157: KMP_AFFINITY: Uniform topology
    OMP: Info #179: KMP_AFFINITY: 1 packages x 4 cores/pkg x 2 threads/core (4 total cores)
    OMP: Info #214: KMP_AFFINITY: OS proc to physical thread map:
    OMP: Info #171: KMP_AFFINITY: OS proc 0 maps to package 0 core 0 thread 0 
    OMP: Info #171: KMP_AFFINITY: OS proc 1 maps to package 0 core 0 thread 1 
    OMP: Info #171: KMP_AFFINITY: OS proc 2 maps to package 0 core 1 thread 0 
    OMP: Info #171: KMP_AFFINITY: OS proc 3 maps to package 0 core 1 thread 1 
    OMP: Info #171: KMP_AFFINITY: OS proc 4 maps to package 0 core 2 thread 0 
    OMP: Info #171: KMP_AFFINITY: OS proc 5 maps to package 0 core 2 thread 1 
    OMP: Info #171: KMP_AFFINITY: OS proc 6 maps to package 0 core 3 thread 0 
    OMP: Info #171: KMP_AFFINITY: OS proc 7 maps to package 0 core 3 thread 1 
    OMP: Info #250: KMP_AFFINITY: pid 24655 tid 24655 thread 0 bound to OS proc set 0
    2019-05-16 15:50:52.560072: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 15:50:52.560110: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 15:50:52.599294: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 15:50:52.599324: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 15:50:52.621620: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 15:50:52.621653: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 15:50:52.641955: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 15:50:52.641989: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 15:50:52.663661: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 15:50:52.663695: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 15:50:52.684411: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 15:50:52.684477: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 15:50:52.704166: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 15:50:52.704195: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 15:50:52.720328: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 15:50:52.720356: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 15:50:52.735363: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 15:50:52.735405: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 15:50:52.754581: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 15:50:52.754612: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 15:50:52.764184: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 15:50:52.764352: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 15:50:52.780335: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 15:50:52.780366: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 15:50:52,783 INFO 23732 140378726676224 calamari_wrapper.py:70 Found 2 files in the dataset
    WARNING: RawData set should always be used with a RawInputDataSet to avoid excessive thread creation
    OMP: Info #212: KMP_AFFINITY: decoding x2APIC ids.
    OMP: Info #210: KMP_AFFINITY: Affinity capable, using global cpuid leaf 11 info
    OMP: Info #154: KMP_AFFINITY: Initial OS proc set respected: 0-7
    OMP: Info #156: KMP_AFFINITY: 8 available OS procs
    OMP: Info #157: KMP_AFFINITY: Uniform topology
    OMP: Info #179: KMP_AFFINITY: 1 packages x 4 cores/pkg x 2 threads/core (4 total cores)
    OMP: Info #214: KMP_AFFINITY: OS proc to physical thread map:
    OMP: Info #171: KMP_AFFINITY: OS proc 0 maps to package 0 core 0 thread 0 
    OMP: Info #171: KMP_AFFINITY: OS proc 1 maps to package 0 core 0 thread 1 
    OMP: Info #171: KMP_AFFINITY: OS proc 2 maps to package 0 core 1 thread 0 
    OMP: Info #171: KMP_AFFINITY: OS proc 3 maps to package 0 core 1 thread 1 
    OMP: Info #171: KMP_AFFINITY: OS proc 4 maps to package 0 core 2 thread 0 
    OMP: Info #171: KMP_AFFINITY: OS proc 5 maps to package 0 core 2 thread 1 
    OMP: Info #171: KMP_AFFINITY: OS proc 6 maps to package 0 core 3 thread 0 
    OMP: Info #171: KMP_AFFINITY: OS proc 7 maps to package 0 core 3 thread 1 
    OMP: Info #250: KMP_AFFINITY: pid 24679 tid 24679 thread 0 bound to OS proc set 0
    OMP: Info #212: KMP_AFFINITY: decoding x2APIC ids.
    OMP: Info #210: KMP_AFFINITY: Affinity capable, using global cpuid leaf 11 info
    OMP: Info #154: KMP_AFFINITY: Initial OS proc set respected: 0-7
    OMP: Info #156: KMP_AFFINITY: 8 available OS procs
    OMP: Info #157: KMP_AFFINITY: Uniform topology
    OMP: Info #179: KMP_AFFINITY: 1 packages x 4 cores/pkg x 2 threads/core (4 total cores)
    OMP: Info #214: KMP_AFFINITY: OS proc to physical thread map:
    OMP: Info #171: KMP_AFFINITY: OS proc 0 maps to package 0 core 0 thread 0 
    OMP: Info #171: KMP_AFFINITY: OS proc 1 maps to package 0 core 0 thread 1 
    OMP: Info #171: KMP_AFFINITY: OS proc 2 maps to package 0 core 1 thread 0 
    OMP: Info #171: KMP_AFFINITY: OS proc 3 maps to package 0 core 1 thread 1 
    OMP: Info #171: KMP_AFFINITY: OS proc 4 maps to package 0 core 2 thread 0 
    OMP: Info #171: KMP_AFFINITY: OS proc 5 maps to package 0 core 2 thread 1 
    OMP: Info #171: KMP_AFFINITY: OS proc 6 maps to package 0 core 3 thread 0 
    OMP: Info #171: KMP_AFFINITY: OS proc 7 maps to package 0 core 3 thread 1 
    OMP: Info #250: KMP_AFFINITY: pid 24684 tid 24684 thread 0 bound to OS proc set 0
    2019-05-16 15:50:54.533429: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 15:50:54.533478: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 15:50:54.565486: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 15:50:54.565553: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 15:50:54,569 INFO 23732 140378726676224 metrics2.py:126 ocr_ms took 5999 ms
    
    1. tensorflow cpu, calamari 0.3.0:
    2019-05-16 15:57:36,407 INFO 24999 140185064949504 calamari_wrapper.py:70 Found 10 files in the dataset
    WARNING: RawData set should always be used with a RawInputDataSet to avoid excessive thread creation
    OMP: Info #212: KMP_AFFINITY: decoding x2APIC ids.
    OMP: Info #210: KMP_AFFINITY: Affinity capable, using global cpuid leaf 11 info
    OMP: Info #154: KMP_AFFINITY: Initial OS proc set respected: 0-7
    OMP: Info #156: KMP_AFFINITY: 8 available OS procs
    OMP: Info #157: KMP_AFFINITY: Uniform topology
    OMP: Info #179: KMP_AFFINITY: 1 packages x 4 cores/pkg x 2 threads/core (4 total cores)
    OMP: Info #214: KMP_AFFINITY: OS proc to physical thread map:
    OMP: Info #171: KMP_AFFINITY: OS proc 0 maps to package 0 core 0 thread 0 
    OMP: Info #171: KMP_AFFINITY: OS proc 1 maps to package 0 core 0 thread 1 
    OMP: Info #171: KMP_AFFINITY: OS proc 2 maps to package 0 core 1 thread 0 
    OMP: Info #171: KMP_AFFINITY: OS proc 3 maps to package 0 core 1 thread 1 
    OMP: Info #171: KMP_AFFINITY: OS proc 4 maps to package 0 core 2 thread 0 
    OMP: Info #171: KMP_AFFINITY: OS proc 5 maps to package 0 core 2 thread 1 
    OMP: Info #171: KMP_AFFINITY: OS proc 6 maps to package 0 core 3 thread 0 
    OMP: Info #171: KMP_AFFINITY: OS proc 7 maps to package 0 core 3 thread 1 
    OMP: Info #250: KMP_AFFINITY: pid 25755 tid 25755 thread 0 bound to OS proc set 0
    OMP: Info #250: KMP_AFFINITY: pid 25738 tid 25738 thread 0 bound to OS proc set 0-7
    OMP: Info #250: KMP_AFFINITY: pid 25740 tid 25740 thread 0 bound to OS proc set 0-7
    OMP: Info #250: KMP_AFFINITY: pid 25738 tid 25760 thread 1 bound to OS proc set 0-7
    OMP: Info #250: KMP_AFFINITY: pid 25741 tid 25741 thread 0 bound to OS proc set 0-7
    OMP: Info #250: KMP_AFFINITY: pid 25738 tid 25761 thread 2 bound to OS proc set 0-7
    OMP: Info #250: KMP_AFFINITY: pid 25738 tid 25762 thread 3 bound to OS proc set 0-7
    OMP: Info #250: KMP_AFFINITY: pid 25740 tid 25766 thread 3 bound to OS proc set 0-7
    OMP: Info #250: KMP_AFFINITY: pid 25740 tid 25765 thread 2 bound to OS proc set 0-7
    OMP: Info #250: KMP_AFFINITY: pid 25741 tid 25767 thread 1 bound to OS proc set 0-7
    OMP: Info #250: KMP_AFFINITY: pid 25741 tid 25768 thread 2 bound to OS proc set 0-7
    OMP: Info #250: KMP_AFFINITY: pid 25741 tid 25770 thread 3 bound to OS proc set 0-7
    OMP: Info #250: KMP_AFFINITY: pid 25740 tid 25763 thread 1 bound to OS proc set 0-7
    OMP: Info #250: KMP_AFFINITY: pid 25739 tid 25739 thread 0 bound to OS proc set 0-7
    OMP: Info #250: KMP_AFFINITY: pid 25739 tid 25772 thread 1 bound to OS proc set 0-7
    OMP: Info #250: KMP_AFFINITY: pid 25739 tid 25778 thread 3 bound to OS proc set 0-7
    OMP: Info #250: KMP_AFFINITY: pid 25739 tid 25777 thread 2 bound to OS proc set 0-7
    OMP: Info #212: KMP_AFFINITY: decoding x2APIC ids.
    OMP: Info #210: KMP_AFFINITY: Affinity capable, using global cpuid leaf 11 info
    OMP: Info #154: KMP_AFFINITY: Initial OS proc set respected: 0-7
    OMP: Info #156: KMP_AFFINITY: 8 available OS procs
    OMP: Info #157: KMP_AFFINITY: Uniform topology
    OMP: Info #179: KMP_AFFINITY: 1 packages x 4 cores/pkg x 2 threads/core (4 total cores)
    OMP: Info #214: KMP_AFFINITY: OS proc to physical thread map:
    OMP: Info #171: KMP_AFFINITY: OS proc 0 maps to package 0 core 0 thread 0 
    OMP: Info #171: KMP_AFFINITY: OS proc 1 maps to package 0 core 0 thread 1 
    OMP: Info #171: KMP_AFFINITY: OS proc 2 maps to package 0 core 1 thread 0 
    OMP: Info #171: KMP_AFFINITY: OS proc 3 maps to package 0 core 1 thread 1 
    OMP: Info #171: KMP_AFFINITY: OS proc 4 maps to package 0 core 2 thread 0 
    OMP: Info #171: KMP_AFFINITY: OS proc 5 maps to package 0 core 2 thread 1 
    OMP: Info #171: KMP_AFFINITY: OS proc 6 maps to package 0 core 3 thread 0 
    OMP: Info #171: KMP_AFFINITY: OS proc 7 maps to package 0 core 3 thread 1 
    OMP: Info #250: KMP_AFFINITY: pid 25771 tid 25771 thread 0 bound to OS proc set 0
    2019-05-16 15:57:38.043087: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 15:57:38.043131: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 15:57:38.105437: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 15:57:38.105465: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 15:57:38.158275: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 15:57:38.158300: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 15:57:38.213394: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 15:57:38.213423: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 15:57:38.234949: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 15:57:38.235122: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 15:57:38.263308: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 15:57:38.263334: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 15:57:38.310647: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 15:57:38.310687: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 15:57:38.387153: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 15:57:38.387188: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 15:57:38.428633: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 15:57:38.428661: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 15:57:38.477382: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 15:57:38.477543: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 15:57:38,481 INFO 24999 140185064949504 calamari_wrapper.py:70 Found 13 files in the dataset
    WARNING: RawData set should always be used with a RawInputDataSet to avoid excessive thread creation
    OMP: Info #212: KMP_AFFINITY: decoding x2APIC ids.
    OMP: Info #210: KMP_AFFINITY: Affinity capable, using global cpuid leaf 11 info
    OMP: Info #154: KMP_AFFINITY: Initial OS proc set respected: 0-7
    OMP: Info #156: KMP_AFFINITY: 8 available OS procs
    OMP: Info #157: KMP_AFFINITY: Uniform topology
    OMP: Info #179: KMP_AFFINITY: 1 packages x 4 cores/pkg x 2 threads/core (4 total cores)
    OMP: Info #214: KMP_AFFINITY: OS proc to physical thread map:
    OMP: Info #171: KMP_AFFINITY: OS proc 0 maps to package 0 core 0 thread 0 
    OMP: Info #171: KMP_AFFINITY: OS proc 1 maps to package 0 core 0 thread 1 
    OMP: Info #171: KMP_AFFINITY: OS proc 2 maps to package 0 core 1 thread 0 
    OMP: Info #171: KMP_AFFINITY: OS proc 3 maps to package 0 core 1 thread 1 
    OMP: Info #171: KMP_AFFINITY: OS proc 4 maps to package 0 core 2 thread 0 
    OMP: Info #171: KMP_AFFINITY: OS proc 5 maps to package 0 core 2 thread 1 
    OMP: Info #171: KMP_AFFINITY: OS proc 6 maps to package 0 core 3 thread 0 
    OMP: Info #171: KMP_AFFINITY: OS proc 7 maps to package 0 core 3 thread 1 
    OMP: Info #250: KMP_AFFINITY: pid 25808 tid 25808 thread 0 bound to OS proc set 0
    OMP: Info #212: KMP_AFFINITY: decoding x2APIC ids.
    OMP: Info #210: KMP_AFFINITY: Affinity capable, using global cpuid leaf 11 info
    OMP: Info #154: KMP_AFFINITY: Initial OS proc set respected: 0-7
    OMP: Info #156: KMP_AFFINITY: 8 available OS procs
    OMP: Info #157: KMP_AFFINITY: Uniform topology
    OMP: Info #179: KMP_AFFINITY: 1 packages x 4 cores/pkg x 2 threads/core (4 total cores)
    OMP: Info #214: KMP_AFFINITY: OS proc to physical thread map:
    OMP: Info #171: KMP_AFFINITY: OS proc 0 maps to package 0 core 0 thread 0 
    OMP: Info #171: KMP_AFFINITY: OS proc 1 maps to package 0 core 0 thread 1 
    OMP: Info #171: KMP_AFFINITY: OS proc 2 maps to package 0 core 1 thread 0 
    OMP: Info #171: KMP_AFFINITY: OS proc 3 maps to package 0 core 1 thread 1 
    OMP: Info #171: KMP_AFFINITY: OS proc 4 maps to package 0 core 2 thread 0 
    OMP: Info #171: KMP_AFFINITY: OS proc 5 maps to package 0 core 2 thread 1 
    OMP: Info #171: KMP_AFFINITY: OS proc 6 maps to package 0 core 3 thread 0 
    OMP: Info #171: KMP_AFFINITY: OS proc 7 maps to package 0 core 3 thread 1 
    OMP: Info #250: KMP_AFFINITY: pid 25814 tid 25814 thread 0 bound to OS proc set 0
    2019-05-16 15:57:40.048510: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 15:57:40.048545: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 15:57:40.091460: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 15:57:40.091488: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 15:57:40.121078: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 15:57:40.121109: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 15:57:40.151877: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 15:57:40.152048: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 15:57:40.173696: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 15:57:40.173745: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 15:57:40.197971: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 15:57:40.198044: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 15:57:40.219307: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 15:57:40.219341: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 15:57:40.238584: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 15:57:40.238613: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 15:57:40.263360: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 15:57:40.263388: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 15:57:40.288734: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 15:57:40.288763: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 15:57:40.308051: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 15:57:40.308091: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 15:57:40.329646: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 15:57:40.329827: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 15:57:40.342140: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 15:57:40.342170: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 15:57:40,343 INFO 24999 140185064949504 calamari_wrapper.py:70 Found 2 files in the dataset
    WARNING: RawData set should always be used with a RawInputDataSet to avoid excessive thread creation
    OMP: Info #212: KMP_AFFINITY: decoding x2APIC ids.
    OMP: Info #210: KMP_AFFINITY: Affinity capable, using global cpuid leaf 11 info
    OMP: Info #154: KMP_AFFINITY: Initial OS proc set respected: 0-7
    OMP: Info #156: KMP_AFFINITY: 8 available OS procs
    OMP: Info #157: KMP_AFFINITY: Uniform topology
    OMP: Info #179: KMP_AFFINITY: 1 packages x 4 cores/pkg x 2 threads/core (4 total cores)
    OMP: Info #214: KMP_AFFINITY: OS proc to physical thread map:
    OMP: Info #171: KMP_AFFINITY: OS proc 0 maps to package 0 core 0 thread 0 
    OMP: Info #171: KMP_AFFINITY: OS proc 1 maps to package 0 core 0 thread 1 
    OMP: Info #171: KMP_AFFINITY: OS proc 2 maps to package 0 core 1 thread 0 
    OMP: Info #171: KMP_AFFINITY: OS proc 3 maps to package 0 core 1 thread 1 
    OMP: Info #171: KMP_AFFINITY: OS proc 4 maps to package 0 core 2 thread 0 
    OMP: Info #171: KMP_AFFINITY: OS proc 5 maps to package 0 core 2 thread 1 
    OMP: Info #171: KMP_AFFINITY: OS proc 6 maps to package 0 core 3 thread 0 
    OMP: Info #171: KMP_AFFINITY: OS proc 7 maps to package 0 core 3 thread 1 
    OMP: Info #250: KMP_AFFINITY: pid 25839 tid 25839 thread 0 bound to OS proc set 0
    OMP: Info #212: KMP_AFFINITY: decoding x2APIC ids.
    OMP: Info #210: KMP_AFFINITY: Affinity capable, using global cpuid leaf 11 info
    OMP: Info #154: KMP_AFFINITY: Initial OS proc set respected: 0-7
    OMP: Info #156: KMP_AFFINITY: 8 available OS procs
    OMP: Info #157: KMP_AFFINITY: Uniform topology
    OMP: Info #179: KMP_AFFINITY: 1 packages x 4 cores/pkg x 2 threads/core (4 total cores)
    OMP: Info #214: KMP_AFFINITY: OS proc to physical thread map:
    OMP: Info #171: KMP_AFFINITY: OS proc 0 maps to package 0 core 0 thread 0 
    OMP: Info #171: KMP_AFFINITY: OS proc 1 maps to package 0 core 0 thread 1 
    OMP: Info #171: KMP_AFFINITY: OS proc 2 maps to package 0 core 1 thread 0 
    OMP: Info #171: KMP_AFFINITY: OS proc 3 maps to package 0 core 1 thread 1 
    OMP: Info #171: KMP_AFFINITY: OS proc 4 maps to package 0 core 2 thread 0 
    OMP: Info #171: KMP_AFFINITY: OS proc 5 maps to package 0 core 2 thread 1 
    OMP: Info #171: KMP_AFFINITY: OS proc 6 maps to package 0 core 3 thread 0 
    OMP: Info #171: KMP_AFFINITY: OS proc 7 maps to package 0 core 3 thread 1 
    OMP: Info #250: KMP_AFFINITY: pid 25847 tid 25847 thread 0 bound to OS proc set 0
    2019-05-16 15:57:42.054612: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 15:57:42.054845: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 15:57:42.100764: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 15:57:42.100795: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 15:57:42,104 INFO 24999 140185064949504 metrics2.py:126 ocr_ms took 5698 ms
    
    1. tensorflow-gpu, calamari 0.2.5: no warnings or errors
    2019-05-16 16:05:10,802 INFO 30495 139657482069760 metrics2.py:126 ocr_ms took 1440 ms
    
    1. tensorflow cpu, calamari 0.2.5:
    2019-05-16 16:00:12,076 INFO 26062 140471660128000 calamari_wrapper.py:70 Found 10 files in the dataset
    OMP: Info #250: KMP_AFFINITY: pid 26569 tid 26569 thread 0 bound to OS proc set 0-7
    OMP: Info #250: KMP_AFFINITY: pid 26569 tid 26590 thread 1 bound to OS proc set 0-7
    OMP: Info #250: KMP_AFFINITY: pid 26569 tid 26591 thread 2 bound to OS proc set 0-7
    OMP: Info #250: KMP_AFFINITY: pid 26569 tid 26592 thread 3 bound to OS proc set 0-7
    2019-05-16 16:00:12.699564: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 16:00:12.699610: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 16:00:12.739422: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 16:00:12.739451: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 16:00:12.774491: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 16:00:12.774525: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 16:00:12.808151: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 16:00:12.808183: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 16:00:12.830500: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 16:00:12.830530: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 16:00:12.863712: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 16:00:12.863750: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 16:00:12.939775: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 16:00:12.939804: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 16:00:12.980605: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 16:00:12.980636: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 16:00:13.015485: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 16:00:13.015515: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 16:00:13.059779: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 16:00:13.059824: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 16:00:13,064 INFO 26062 140471660128000 calamari_wrapper.py:70 Found 13 files in the dataset
    2019-05-16 16:00:13.477228: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 16:00:13.477268: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 16:00:13.506554: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 16:00:13.506587: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 16:00:13.521381: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 16:00:13.521414: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 16:00:13.534570: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 16:00:13.534615: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 16:00:13.548266: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 16:00:13.548300: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 16:00:13.572625: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 16:00:13.572665: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 16:00:13.595008: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 16:00:13.595052: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 16:00:13.616656: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 16:00:13.616823: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 16:00:13.647358: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 16:00:13.647398: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 16:00:13.667300: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 16:00:13.667423: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 16:00:13.680838: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 16:00:13.680964: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 16:00:13.696618: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 16:00:13.696664: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 16:00:13.716621: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 16:00:13.716766: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 16:00:13,718 INFO 26062 140471660128000 calamari_wrapper.py:70 Found 2 files in the dataset
    2019-05-16 16:00:14.027123: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 16:00:14.027158: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 16:00:14.096210: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 16:00:14.096237: E tensorflow/core/common_runtime/bfc_allocator.cc:373] tried to deallocate nullptr
    2019-05-16 16:00:14,099 INFO 26062 140471660128000 metrics2.py:126 ocr_ms took 2023 ms
    
    opened by wosiu 14
  • Issue on CTC loss when training on new data

    Issue on CTC loss when training on new data

    HI,

    When training Calamari on my dataset, I got this error tensorflow/core/util/ctc/ctc_loss_calculator.cc:144] No valid path found.

    Can you help me? Thank you

    opened by realjoenguyen 14
  • Prediction API Error

    Prediction API Error

    I used cli to train on SROIE2019 dataset (original images are preprocessed into line images) with :

    calamari-train \
    --device.gpus 0 \
    --trainer.gen SplitTrain \
    --trainer.gen.validation_split_ratio=0.2  \
    --trainer.output_dir /data/model_output \
    --trainer.epochs 25 \
    --early_stopping.frequency=1 \
    --early_stopping.n_to_go=3 \
    --train.images /data/*.jpg
    

    Training went smooth and the logs are train.log

    After the training process, I am trying to load the model as mentioned here, however I get following error:

    >>> predictor = Predictor.from_checkpoint(params=PredictorParams(), checkpoint='/data/model_output/best.ckpt')
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "/usr/local/lib/python3.8/dist-packages/calamari_ocr/ocr/predict/predictor.py", line 31, in from_checkpoint
        keras.models.load_model(
      File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/keras/saving/save.py", line 206, in load_model
        return hdf5_format.load_model_from_hdf5(filepath, custom_objects,
      File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/keras/saving/hdf5_format.py", line 182, in load_model_from_hdf5
        model_config = json_utils.decode(model_config.decode('utf-8'))
    AttributeError: 'str' object has no attribute 'decode'
    

    I tried loading pretrainined model from antiqua_historical, and again I got the same error:

    >>> predictor = Predictor.from_checkpoint(params=PredictorParams(), checkpoint='/data/model_output/antiqua_historical/0.ckpt')
    /usr/local/lib/python3.8/dist-packages/paiargparse/dataclass_json_overrides.py:78: RuntimeWarning: `NoneType` object value of non-optional type tfaip_commit_hash detected when decoding CalamariScenarioParams.
      warnings.warn(f"`NoneType` object {warning}.", RuntimeWarning)
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "/usr/local/lib/python3.8/dist-packages/calamari_ocr/ocr/predict/predictor.py", line 26, in from_checkpoint
        ckpt = SavedCalamariModel(checkpoint, auto_update=auto_update_checkpoints)
      File "/usr/local/lib/python3.8/dist-packages/calamari_ocr/ocr/savedmodel/saved_model.py", line 31, in __init__
        self.update_checkpoint()
      File "/usr/local/lib/python3.8/dist-packages/calamari_ocr/ocr/savedmodel/saved_model.py", line 56, in update_checkpoint
        self._single_upgrade()
      File "/usr/local/lib/python3.8/dist-packages/calamari_ocr/ocr/savedmodel/saved_model.py", line 88, in _single_upgrade
        update_model(self.dict, self.ckpt_path)
      File "/usr/local/lib/python3.8/dist-packages/calamari_ocr/ocr/savedmodel/migrations/version3_4to5.py", line 22, in update_model
        pred_model.load_weights(path + ".h5")
      File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/keras/engine/training.py", line 2234, in load_weights
        hdf5_format.load_weights_from_hdf5_group(f, self.layers)
      File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/keras/saving/hdf5_format.py", line 662, in load_weights_from_hdf5_group
        original_keras_version = f.attrs['keras_version'].decode('utf8')
    AttributeError: 'str' object has no attribute 'decode'
    
    
    
    opened by Mageswaran1989 12
  •  Prediction step using very deep neural networks  feature of calamari

    Prediction step using very deep neural networks feature of calamari

    Hi, I installed calamari-0.2.4 . Tried to test on this simple example ""https://user-images.githubusercontent.com/33478216/46499779-a909b480-c829-11e8-87f2-d4a34d84ab69.png"" by: calamari-predict --checkpoint calamari_models/default/ModernEnglish.ckpt --files data.png

    It returns this Error :+1: Found 1 files in the dataset Traceback (most recent call last): File "/home/pc/my_calamari_env/bin/calamari-predict", line 11, in load_entry_point('calamari-ocr==0.2.4', 'console_scripts', 'calamari-predict')() File "/home/pc/my_calamari_env/lib/python3.5/site-packages/calamari_ocr-0.2.4-py3.5.egg/calamari_ocr/scripts/predict.py", line 151, in main run(args) File "/home/pc/my_calamari_env/lib/python3.5/site-packages/calamari_ocr-0.2.4-py3.5.egg/calamari_ocr/scripts/predict.py", line 61, in run predictor = MultiPredictor(checkpoints=args.checkpoint, batch_size=args.batch_size, processes=args.processes) File "/home/pc/my_calamari_env/lib/python3.5/site-packages/calamari_ocr-0.2.4-py3.5.egg/calamari_ocr/ocr/predictor.py", line 202, in init self.predictors = [Predictor(cp, batch_size=batch_size, processes=processes) for cp in checkpoints] File "/home/pc/my_calamari_env/lib/python3.5/site-packages/calamari_ocr-0.2.4-py3.5.egg/calamari_ocr/ocr/predictor.py", line 202, in self.predictors = [Predictor(cp, batch_size=batch_size, processes=processes) for cp in checkpoints] File "/home/pc/my_calamari_env/lib/python3.5/site-packages/calamari_ocr-0.2.4-py3.5.egg/calamari_ocr/ocr/predictor.py", line 100, in init ckpt = Checkpoint(checkpoint, auto_update=self.auto_update_checkpoints) File "/home/pc/my_calamari_env/lib/python3.5/site-packages/calamari_ocr-0.2.4-py3.5.egg/calamari_ocr/ocr/checkpoint.py", line 20, in init self.json = json.load(f) File "/usr/lib/python3.5/json/init.py", line 268, in load parse_constant=parse_constant, object_pairs_hook=object_pairs_hook, **kw) File "/usr/lib/python3.5/json/init.py", line 319, in loads return _default_decoder.decode(s) File "/usr/lib/python3.5/json/decoder.py", line 339, in decode obj, end = self.raw_decode(s, idx=_w(s, 0).end()) File "/usr/lib/python3.5/json/decoder.py", line 357, in raw_decode raise JSONDecodeError("Expecting value", s, err.value) from None json.decoder.JSONDecodeError: Expecting value: line 7 column 1 (char 6)

    Thanks for your help :)

    opened by Tailor2019 12
  • Applying data processor Text Normalizer

    Applying data processor Text Normalizer

    @ChWick I have updated calamari-ocr version to 2.0.0 and now training takes ages to start. Previously, calamari used to compute codec and start. Now, calamari takes 2+ days to apply text normalization. I cant afford to wait 3 days for training to start. Can someone help? Capture

    opened by abhikatoldtrafford 11
  • Error: Process finished with code 1 in cross-fold

    Error: Process finished with code 1 in cross-fold

    It worked with the new code BUT after the fold 0 is done and found no better model than the 99,056858 I again get an error

    FOLD 0 | Storing checkpoint to 'I:\BIQE\CALAMARI\projects\voetius\TRAINING\crosstrainen\fold_0\model_00019470.ckpt' FOLD 0 | Checking early stopping model Prediction: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1948/1948 [04:06<00:00, 6.38it/s] FOLD 0 | No better model found. Currently accuracy of 99.056858% at iter 11682 (remaining nbest = 0) FOLD 0 | Early stopping now. FOLD 0 | Total time 11274.687343358994s for 19469 iterations. multiprocessing.pool.RemoteTraceback: """ Traceback (most recent call last): File "c:\users\drsjh\anaconda3\envs\calamaridev\lib\multiprocessing\pool.py", line 119, in worker result = (True, func(*args, **kwds)) File "c:\users\drsjh\anaconda3\envs\calamaridev\lib\multiprocessing\pool.py", line 44, in mapstar return list(map(*args)) File "c:\users\drsjh\anaconda3\envs\calamaridev\lib\site-packages\calamari_ocr\ocr\cross_fold_trainer.py", line 27, in train_individual_model ], args.get("run", None), {"threads": args.get('num_threads', -1)}), verbose=args.get("verbose", False)): File "c:\users\drsjh\anaconda3\envs\calamaridev\lib\site-packages\calamari_ocr\utils\multiprocessing.py", line 87, in run raise Exception("Error: Process finished with code {}".format(process.returncode)) Exception: Error: Process finished with code 1 """

    The above exception was the direct cause of the following exception:

    Traceback (most recent call last): File "c:\users\drsjh\anaconda3\envs\calamaridev\lib\runpy.py", line 193, in _run_module_as_main "main", mod_spec) File "c:\users\drsjh\anaconda3\envs\calamaridev\lib\runpy.py", line 85, in run_code exec(code, run_globals) File "C:\Users\drsjh\Anaconda3\envs\calamaridev\Scripts\calamari-cross-fold-train.exe_main.py", line 9, in File "c:\users\drsjh\anaconda3\envs\calamaridev\lib\site-packages\calamari_ocr\scripts\cross_fold_train.py", line 80, in main temporary_dir=args.temporary_dir, keep_temporary_files=args.keep_temporary_files, File "c:\users\drsjh\anaconda3\envs\calamaridev\lib\site-packages\calamari_ocr\ocr\cross_fold_trainer.py", line 151, in run pool.map_async(train_individual_model, run_args).get() File "c:\users\drsjh\anaconda3\envs\calamaridev\lib\multiprocessing\pool.py", line 644, in get raise self._value Exception: Error: Process finished with code 1

    opened by cornerman57 11
  • Use pre-trained Calamari models

    Use pre-trained Calamari models

    Thanks for the great work!

    I installed Calamari on a new AWS P2 instance and calamari-models. Tried to test on a simple example by

    calamari-predict --checkpoint calamari_models/default/ModernEnglish.ckpt --files data.png
    

    The detected text is way off. I guess it is related to the loading of model.

    I got these warnings:

    Found 1 files in the dataset
    2018-08-05 17:12:16.976735: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
    Attempting a workaround: New graph and load weights
    Using CUDNN compatible LSTM backend on CPU
    WARNING:tensorflow:From /home/ubuntu/anaconda3/envs/calamari/lib/python3.6/site-packages/tensorflow/python/ops/rnn.py:417: calling reverse_sequence (from tensorflow.python.ops.array_ops) with seq_dim is deprecated and will be removed in a future version.
    Instructions for updating:
    seq_dim is deprecated, use seq_axis instead
    WARNING:tensorflow:From /home/ubuntu/anaconda3/envs/calamari/lib/python3.6/site-packages/tensorflow/python/util/deprecation.py:432: calling reverse_sequence (from tensorflow.python.ops.array_ops) with batch_dim is deprecated and will be removed in a future version.
    Instructions for updating:
    batch_dim is deprecated, use batch_axis instead
    2018-08-05 17:12:20.637472: W tensorflow/core/framework/op_kernel.cc:1318] OP_REQUIRES failed at save_restore_v2_ops.cc:184 : Not found: Key Minimum/ExponentialMovingAverage not found in checkpoint
    Attempting workaround: only loading trainable variables
    Loading Dataset: 100%|█████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 109.32it/s]
    Data Preprocessing: 100%|██████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 104.47it/s]
    Prediction: 100%|███████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00,  7.74it/s]
    Prediction of 1 models took 0.14934062957763672s
    

    Is it due to the tensorflow version that ExponentialMovingAverage are not loaded? Currently installing calamari will install tensowflow 1.9. What tf version do you use in your development?

    Thanks!

    opened by zhangxiangnick 10
  • TypeError: metaclass conflict (problem with tfaip?)

    TypeError: metaclass conflict (problem with tfaip?)

    Hello!

    I'm not sure if I'm missing something, but as there has already been a problem with tfaip (#205), I wanted to point out an issue I'm struggling with when installing the latest version of calamari.

    Here my output. Any hints welcome, the hack provided in the above-mentioned issue does not work.

     [email protected]:~/virtualenvs/calamari_2-1-1/calamari(master)$ calamari-train --version
    Traceback (most recent call last):
      File "/home/user/virtualenvs/calamari_2-1-1/bin/calamari-train", line 33, in <module>
        sys.exit(load_entry_point('calamari-ocr==2.1.1', 'console_scripts', 'calamari-train')())
      File "/home/user/virtualenvs/calamari_2-1-1/bin/calamari-train", line 25, in importlib_load_entry_point
        return next(matches).load()
      File "/home/user/virtualenvs/calamari_2-1-1/lib/python3.6/site-packages/importlib_metadata-4.0.1-py3.6.egg/importlib_metadata/__init__.py", line 166, in load
        module = import_module(match.group('module'))
      File "/home/user/virtualenvs/calamari_2-1-1/lib/python3.6/importlib/__init__.py", line 126, in import_module
        return _bootstrap._gcd_import(name[level:], package, level)
      File "<frozen importlib._bootstrap>", line 994, in _gcd_import
      File "<frozen importlib._bootstrap>", line 971, in _find_and_load
      File "<frozen importlib._bootstrap>", line 955, in _find_and_load_unlocked
      File "<frozen importlib._bootstrap>", line 665, in _load_unlocked
      File "<frozen importlib._bootstrap_external>", line 678, in exec_module
      File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
      File "/home/user/virtualenvs/calamari_2-1-1/lib/python3.6/site-packages/calamari_ocr-2.1.1-py3.6.egg/calamari_ocr/scripts/train.py", line 5, in <module>
        from tfaip.util.logging import logger, WriteToLogFile
      File "/home/user/virtualenvs/calamari_2-1-1/lib/python3.6/site-packages/tfaip-1.1.1-py3.6.egg/tfaip/__init__.py", line 37, in <module>
        from tfaip.scenario.scenariobaseparams import ScenarioBaseParams
      File "/home/user/virtualenvs/calamari_2-1-1/lib/python3.6/site-packages/tfaip-1.1.1-py3.6.egg/tfaip/scenario/scenariobaseparams.py", line 48, in <module>
        class ScenarioBaseParams(Generic[TDataParams, TModelParams], ABC, metaclass=ScenarioBaseParamsMeta):
    TypeError: metaclass conflict: the metaclass of a derived class must be a (non-strict) subclass of the metaclasses of all its bases
    
    opened by alexander-winkler 9
  • training: shuffle data between epochs

    training: shuffle data between epochs

    First of all - thank you for that fantastic framework! I've been using tesseract for more than 1 year, but this one is way better for a single line processing :)

    Proposal: From the logs during training, it seems that input images are not shuffled at all. It would be nice, if they are shuffled at least at the very beginning. And it would be perfect if data are also shuffled after each epoch, so that different batches are created.

    opened by wosiu 9
  • Add CodeQL workflow for GitHub code scanning

    Add CodeQL workflow for GitHub code scanning

    Hi Calamari-OCR/calamari!

    This is a one-off automatically generated pull request from LGTM.com :robot:. You might have heard that we’ve integrated LGTM’s underlying CodeQL analysis engine natively into GitHub. The result is GitHub code scanning!

    With LGTM fully integrated into code scanning, we are focused on improving CodeQL within the native GitHub code scanning experience. In order to take advantage of current and future improvements to our analysis capabilities, we suggest you enable code scanning on your repository. Please take a look at our blog post for more information.

    This pull request enables code scanning by adding an auto-generated codeql.yml workflow file for GitHub Actions to your repository — take a look! We tested it before opening this pull request, so all should be working :heavy_check_mark:. In fact, you might already have seen some alerts appear on this pull request!

    Where needed and if possible, we’ve adjusted the configuration to the needs of your particular repository. But of course, you should feel free to tweak it further! Check this page for detailed documentation.

    Questions? Check out the FAQ below!

    FAQ

    Click here to expand the FAQ section

    How often will the code scanning analysis run?

    By default, code scanning will trigger a scan with the CodeQL engine on the following events:

    • On every pull request — to flag up potential security problems for you to investigate before merging a PR.
    • On every push to your default branch and other protected branches — this keeps the analysis results on your repository’s Security tab up to date.
    • Once a week at a fixed time — to make sure you benefit from the latest updated security analysis even when no code was committed or PRs were opened.

    What will this cost?

    Nothing! The CodeQL engine will run inside GitHub Actions, making use of your unlimited free compute minutes for public repositories.

    What types of problems does CodeQL find?

    The CodeQL engine that powers GitHub code scanning is the exact same engine that powers LGTM.com. The exact set of rules has been tweaked slightly, but you should see almost exactly the same types of alerts as you were used to on LGTM.com: we’ve enabled the security-and-quality query suite for you.

    How do I upgrade my CodeQL engine?

    No need! New versions of the CodeQL analysis are constantly deployed on GitHub.com; your repository will automatically benefit from the most recently released version.

    The analysis doesn’t seem to be working

    If you get an error in GitHub Actions that indicates that CodeQL wasn’t able to analyze your code, please follow the instructions here to debug the analysis.

    How do I disable LGTM.com?

    If you have LGTM’s automatic pull request analysis enabled, then you can follow these steps to disable the LGTM pull request analysis. You don’t actually need to remove your repository from LGTM.com; it will automatically be removed in the next few months as part of the deprecation of LGTM.com (more info here).

    Which source code hosting platforms does code scanning support?

    GitHub code scanning is deeply integrated within GitHub itself. If you’d like to scan source code that is hosted elsewhere, we suggest that you create a mirror of that code on GitHub.

    How do I know this PR is legitimate?

    This PR is filed by the official LGTM.com GitHub App, in line with the deprecation timeline that was announced on the official GitHub Blog. The proposed GitHub Action workflow uses the official open source GitHub CodeQL Action. If you have any other questions or concerns, please join the discussion here in the official GitHub community!

    I have another question / how do I get in touch?

    Please join the discussion here to ask further questions and send us suggestions!

    opened by lgtm-com[bot] 0
  • calamari-ocr 2.2.2 on ubuntu  22.04 partial success, difficulty with GPU software

    calamari-ocr 2.2.2 on ubuntu 22.04 partial success, difficulty with GPU software

    Hi, I installed calamari-ocr-2.2.2 on ubuntu 22.04, and tensorflow-2.6, and python-3.9 in a venv. had to remove keras-2.11 which came with tensorflow2.6, and replace with keras 2.6.0 to get rid of error. Works great with cpu. So far so good.

    With tensorflow 2.6, it seems I am forced into a narrow range of cuda-11.2 and nvidia 360 drivers. I have not been able to get either successfully installed. Anyone have any success stories with Nvidia GPU and ubuntu 22.04 and calamari 2.2.2? Thanks!

    opened by ocrwork 0
  • calamari-eval: unknown arguments

    calamari-eval: unknown arguments

    I am on Calamari 2.2.2, and when freely combining the arguments I see on --help

    calamari-eval --checkpoint hsbfraktur.cala/best.ckpt.json --gt.preload false --n_worst_lines 10   --gt.texts /dev/shm/hsbfraktur.val/*.gt.txt --evaluator.progress_bar false
    

    …I end up with the following cryptic error message…

                 tfaip.util.logging: Uncaught exception
    Traceback (most recent call last):
      File "/home/h1/rosa992c/my-kernel/powerai-kernel2/bin/calamari-eval", line 8, in <module>
        sys.exit(run())
      File "/home/h1/rosa992c/my-kernel/powerai-kernel2/lib/python3.7/site-packages/calamari_ocr/scripts/eval.py", line 200, in run
        main(parse_args())
      File "/home/h1/rosa992c/my-kernel/powerai-kernel2/lib/python3.7/site-packages/calamari_ocr/scripts/eval.py", line 206, in parse_args
        return parser.parse_args(args=args).root
      File "/home/h1/rosa992c/my-kernel/powerai-kernel2/lib/python3.7/site-packages/paiargparse/main_parser.py", line 93, in parse_args
        raise UnknownArgumentError(f"Unknown Arguments {' '.join(argv)}. Possible alternatives:{''.join(help_str)}")
    paiargparse.dataclass_parser.UnknownArgumentError: Unknown Arguments  . Possible alternatives:
    
    opened by bertsky 6
  • featreq: when warmstart-training, init weights of new chars from existing ones

    featreq: when warmstart-training, init weights of new chars from existing ones

    I have the following feature request: Often one needs to finetune a model to add diacritics. Luckily, we can finetune with --warmstart ... --codec.keep_loaded False. In such cases the actual witnesses of the diacritics are usually still sparse in the GT. So it would likely be helpful if the weights of the additional characters / codepoints could be initialized from those of characters that are similar looking or similar in function. Perhaps as an option --codec.init_new_from_old '["à": "a", "ś": "s" ...]' ...

    enhancement 
    opened by bertsky 2
  • HDF5 dataset format: how to convert

    HDF5 dataset format: how to convert

    I presume training on HDF5 will be more efficient than any of the other formats. And at least against the line GT file pairs, filesystem performance might be much better, too.

    So my question is: how do I convert existing datasets into HDF5 format?

    opened by bertsky 4
Releases(v2.2.2)
Captcha Recognition

The objective of this project is to recognize the target numbers in the captcha images correctly which would tell us how good or bad a captcha system has been built.

Mohit Kaushik 5 Feb 20, 2022
Convert PDF/Image to TXT using EasyOcr - the best OCR engine available!

PDFImage2TXT - DOWNLOAD INSTALLER HERE What can you do with it? Convert scanned PDFs to TXT. Convert scanned Documents to TXT. No coding required!! In

Hans Alemão 2 Feb 22, 2022
An expandable and scalable OCR pipeline

Overview Nidaba is the central controller for the entire OGL OCR pipeline. It oversees and automates the process of converting raw images into citable

81 Jan 04, 2023
A synthetic data generator for text recognition

TextRecognitionDataGenerator A synthetic data generator for text recognition What is it for? Generating text image samples to train an OCR software. N

Edouard Belval 2.5k Jan 04, 2023
Image processing in Python

scikit-image: Image processing in Python Website (including documentation): https://scikit-image.org/ Mailing list: https://mail.python.org/mailman3/l

Image Processing Toolbox for SciPy 5.2k Dec 30, 2022
Python tool that takes the OCR.space JSON output as input and draws a text overlay on top of the image.

OCR.space OCR Result Checker = Draw OCR overlay on top of image Python tool that takes the OCR.space JSON output as input, and draws an overlay on to

a9t9 4 Oct 18, 2022
A Python script to capture images from multiple webcams at once and save them into your local machine

Capturing multiple images at once from Webcam Using OpenCV Capture multiple image by accessing the webcam of your system and save it to your machine.

Fazal ur Rehman 2 Apr 16, 2022
This is a Computer vision package that makes its easy to run Image processing and AI functions. At the core it uses OpenCV and Mediapipe libraries.

CVZone This is a Computer vision package that makes its easy to run Image processing and AI functions. At the core it uses OpenCV and Mediapipe librar

CVZone 648 Dec 30, 2022
Face Detection with DLIB

Face Detection with DLIB In this project, we have detected our face with dlib and opencv libraries. Setup This Project Install DLIB & OpenCV You can i

Can 2 Jan 16, 2022
Python-based tools for document analysis and OCR

ocropy OCRopus is a collection of document analysis programs, not a turn-key OCR system. In order to apply it to your documents, you may need to do so

OCRopus 3.2k Dec 31, 2022
Links to awesome OCR projects

Awesome OCR This list contains links to great software tools and libraries and literature related to Optical Character Recognition (OCR). Contribution

Konstantin Baierer 2.2k Jan 02, 2023
A simple OCR API server, seriously easy to be deployed by Docker, on Heroku as well

ocrserver Simple OCR server, as a small working sample for gosseract. Try now here https://ocr-example.herokuapp.com/, and deploy your own now. Deploy

Hiromu OCHIAI 541 Dec 28, 2022
TensorFlow Implementation of FOTS, Fast Oriented Text Spotting with a Unified Network.

FOTS: Fast Oriented Text Spotting with a Unified Network I am still working on this repo. updates and detailed instructions are coming soon! Table of

Masao Taketani 52 Nov 11, 2022
A tool to enhance your old/damaged pictures built using python & opencv.

Breathe Life into your Old Pictures Table of Contents About The Project Getting Started Prerequisites Usage Contact Acknowledgments About The Project

Shah Anwaar Khalid 5 Dec 16, 2021
A Vietnamese personal card OCR website built with Django.

Django VietCardOCR Installation Creation of virtual environments is done by executing the command venv: python -m venv venv That will create a new fol

Truong Hoang Thuan 4 Sep 04, 2021
STEFANN: Scene Text Editor using Font Adaptive Neural Network

STEFANN: Scene Text Editor using Font Adaptive Neural Network @ The IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2020.

Prasun Roy 208 Dec 11, 2022
Code for the AAAI 2018 publication "SEE: Towards Semi-Supervised End-to-End Scene Text Recognition"

SEE: Towards Semi-Supervised End-to-End Scene Text Recognition Code for the AAAI 2018 publication "SEE: Towards Semi-Supervised End-to-End Scene Text

Christian Bartz 572 Jan 05, 2023
pulse2percept: A Python-based simulation framework for bionic vision

pulse2percept: A Python-based simulation framework for bionic vision Retinal degenerative diseases such as retinitis pigmentosa and macular degenerati

67 Dec 29, 2022
Generate a list of papers with publicly available source code in the daily arxiv

2021-06-08 paper code optimal network slicing for service-oriented networks with flexible routing and guaranteed e2e latency networkslicing multi-moda

79 Jan 03, 2023
Open Source Computer Vision Library

OpenCV: Open Source Computer Vision Library Resources Homepage: https://opencv.org Courses: https://opencv.org/courses Docs: https://docs.opencv.org/m

OpenCV 65.7k Jan 03, 2023