Document Layout Analysis

Overview

Eynollah

Document Layout Analysis

Introduction

This tool performs document layout analysis (segmentation) from image data and returns the results as PAGE-XML.

It can currently detect the following layout classes/elements:

In addition, the tool can be used to detect the ReadingOrder of regions. The final goal is to feed the output to an OCR model.

The tool uses a combination of various models and heuristics (see flowchart below for the different stages and how they interact):

The first three stages are based on pixel-wise segmentation.

Border detection

For the purpose of text recognition (OCR) and in order to avoid noise being introduced from texts outside the printspace, one first needs to detect the border of the printed frame. This is done by a binary pixel-wise-segmentation model trained on a dataset of 2,000 documents where about 1,200 of them come from the dhSegment project (you can download the dataset from here) and the remainder having been annotated in SBB. For border detection, the model needs to be fed with the whole image at once rather than separated in patches.

Layout detection

As a next step, text regions need to be identified by means of layout detection. Again a pixel-wise segmentation model was trained on 131 labeled images from the SBB digital collections, including some data augmentation. Since the target of this tool are historical documents, we consider as main region types text regions, separators, images, tables and background - each with their own subclasses, e.g. in the case of text regions, subclasses like header/heading, drop capital, main body text etc. While it would be desirable to detect and classify each of these classes in a granular way, there are also limitations due to having a suitably large and balanced training set. Accordingly, the current version of this tool is focussed on the main region types background, text region, image and separator.

Textline detection

In a subsequent step, binary pixel-wise segmentation is used again to classify pixels in a document that constitute textlines. For textline segmentation, a model was initially trained on documents with only one column/block of text and some augmentation with regard to scaling. By fine-tuning the parameters also for multi-column documents, additional training data was produced that resulted in a much more robust textline detection model.

Image enhancement

This is an image to image model which input was low quality of an image and label was actually the original image. For this one we did not have any GT, so we decreased the quality of documents in SBB and then feed them into model.

Scale classification

This is simply an image classifier which classifies images based on their scales or better to say based on their number of columns.

Heuristic methods

Some heuristic methods are also employed to further improve the model predictions:

  • After border detection, the largest contour is determined by a bounding box, and the image cropped to these coordinates.
  • For text region detection, the image is scaled up to make it easier for the model to detect background space between text regions.
  • A minimum area is defined for text regions in relation to the overall image dimensions, so that very small regions that are noise can be filtered out.
  • Deskewing is applied on the text region level (due to regions having different degrees of skew) in order to improve the textline segmentation result.
  • After deskewing, a calculation of the pixel distribution on the X-axis allows the separation of textlines (foreground) and background pixels.
  • Finally, using the derived coordinates, bounding boxes are determined for each textline.

Installation

pip install . or

pip install . -e for editable installation

Alternatively, you can also use make with these targets:

make install or

make install-dev for editable installation

Models

In order to run this tool you also need trained models. You can download our pretrained models from qurator-data.de.

Alternatively, running make models will download and extract models to $(PWD)/models_eynollah.

Usage

The basic command-line interface can be called like this:

eynollah \
-i <image file name> \
-o <directory to write output xml or enhanced image> \
-m <directory of models> \
-fl <if true, the tool will perform full layout analysis> \
-ae <if true, the tool will resize and enhance the image and produce the resulting image as output> \
-as <if true, the tool will check whether the document needs rescaling or not> \
-cl <if true, the tool will extract the contours of curved textlines instead of rectangle bounding boxes> \
-si <if a directory is given here, the tool will output image regions inside documents there>

The tool does accept and works better on original images (RGB format) than binarized images.

--full-layout vs --no-full-layout

Here are the difference in elements detected depending on the --full-layout/--no-full-layout command line flags:

--full-layout --no-full-layout
reading order x x
header regions x -
text regions x x
text regions / text line x x
drop-capitals x -
marginals x x
marginals / text line x x
image region x x

How to use

First, this model makes use of up to 9 trained models which are responsible for different operations like size detection, column classification, image enhancement, page extraction, main layout detection, full layout detection and textline detection.That does not mean that all 9 models are always required for every document. Based on the document characteristics and parameters specified, different scenarios can be applied.

  • If none of the parameters is set to true, the tool will perform a layout detection of main regions (background, text, images, separators and marginals). An advantage of this tool is that it tries to extract main text regions separately as much as possible.

  • If you set -ae (allow image enhancement) parameter to true, the tool will first check the ppi (pixel-per-inch) of the image and when it is less than 300, the tool will resize it and only then image enhancement will occur. Image enhancement can also take place without this option, but by setting this option to true, the layout xml data (e.g. coordinates) will be based on the resized and enhanced image instead of the original image.

  • For some documents, while the quality is good, their scale is very large, and the performance of tool decreases. In such cases you can set -as (allow scaling) to true. With this option enabled, the tool will try to rescale the image and only then the layout detection process will begin.

  • If you care about drop capitals (initials) and headings, you can set -fl (full layout) to true. With this setting, the tool can currently distinguish 7 document layout classes/elements.

  • In cases where the document includes curved headers or curved lines, rectangular bounding boxes for textlines will not be a great option. In such cases it is strongly recommended setting the flag -cl (curved lines) to true to find contours of curved lines instead of rectangular bounding boxes. Be advised that enabling this option increases the processing time of the tool.

  • To crop and save image regions inside the document, set the parameter -si (save images) to true and provide a directory path to store the extracted images.

  • This tool is actively being developed. If problems occur, or the performance does not meet your expectations, we welcome your feedback via issues.

Comments
  • trying to get running...

    trying to get running...

    Hi. I am trying to get this running on Windows 10 using Visual Studio Code.

    If cd into the repo and run a command like: eynollah -i C:/Users/Scott/Desktop/Python2/Kpages/Pages/076v.jpg -o C:/Users/Scott/Desktop/Python2/Kpages -m C:/Users/Scott/Desktop/Python2/eynollah/models_eynollah -si C:/Users/Scott/Desktop/Python2/Kpages it doesn't appear to run. A new command prompt comes up after a couple of seconds -- but no output and no error message.

    Any guidance would be appreciated.

    opened by SB2020-eye 53
  • Eynollah on Python 3.8

    Eynollah on Python 3.8

    Hi, Eynollah's requirements include Tensorflow < 2. This option is not suppored on Python 3.8+. It will work on 3.7, but I'd prefer not install a dedicated environment for this. Will it break with a newer version? Do you have plans for upgrading it to TF 2.0+? Thank you.

    enhancement 
    opened by nacho-pancho 18
  • Receiving error

    Receiving error "TypeError: can't pickle _thread.RLock objects"

    Hi I am excited trying out your code and I installed it on my Windows 10 machine (Ryzen 3700x cpu, Nvidia RTX 2070 Super gpu) under anaconda (python 3.6.15, tensorflow 2.6.2, cudatoolkit 11.2.2) and it gets pretty far along before it crashes.
    Here is my command line...

    eynollah --image sn98062568_1933-11-18_ed-1_seq-3.png --out test1 --model models_eynollah --save_layout test1 --full-layout --enable-plotting --allow-enhancement --allow_scaling --log-level DEBUG
    

    And I get sn98062568_1933-11-18_ed-1_seq-3_enhanced.png and sn98062568_1933-11-18_ed-1_seq-3_layout_main.png images generated that look reasonable. But here is the output stream just before and including the error...

    14:32:25.982 INFO eynollah - detection of marginals took 4.2s
    14:32:25.982 DEBUG eynollah - enter run_boxes_full_layout
    14:32:26.780 DEBUG eynollah - enter extract_text_regions
    14:32:26.894 DEBUG eynollah - enter start_new_session_and_model (model_dir=models_eynollah/model_3up_new_good_no_augmentation.h5)
    14:32:28.952 DEBUG eynollah - enter do_prediction
    14:32:28.954 DEBUG eynollah - Patch size: 896x896
    14:32:32.797 DEBUG eynollah - enter do_prediction
    14:32:32.799 DEBUG eynollah - Patch size: 896x896
    14:32:41.277 DEBUG eynollah - exit extract_text_regions
    14:32:42.255 DEBUG eynollah - enter extract_text_regions
    14:32:42.256 DEBUG eynollah - enter start_new_session_and_model (model_dir=models_eynollah/model_no_patches_class0_30eopch.h5)
    14:32:44.120 DEBUG eynollah - enter do_prediction
    14:32:45.507 DEBUG eynollah - exit extract_text_regions
    14:32:46.658 DEBUG eynollah - exit run_boxes_full_layout
    14:33:52.914 DEBUG eynollah - enter get_slopes_and_deskew_new
    Traceback (most recent call last):
    Traceback (most recent call last):
      File "c:\users\steve\anaconda3\envs\qurator-spk\lib\runpy.py", line 193, in _run_module_as_main
      File "<string>", line 1, in <module>
        "__main__", mod_spec)
      File "c:\users\steve\anaconda3\envs\qurator-spk\lib\multiprocessing\spawn.py", line 105, in spawn_main
      File "c:\users\steve\anaconda3\envs\qurator-spk\lib\runpy.py", line 85, in _run_code
        exitcode = _main(fd)
        exec(code, run_globals)
      File "c:\users\steve\anaconda3\envs\qurator-spk\lib\multiprocessing\spawn.py", line 115, in _main
      File "C:\Users\Steve\anaconda3\envs\qurator-spk\Scripts\eynollah.exe\__main__.py", line 7, in <module>
        self = reduction.pickle.load(from_parent)
      File "c:\users\steve\anaconda3\envs\qurator-spk\lib\site-packages\click\core.py", line 1128, in __call__
    EOFError: Ran out of input
        return self.main(*args, **kwargs)
      File "c:\users\steve\anaconda3\envs\qurator-spk\lib\site-packages\click\core.py", line 1053, in main
        rv = self.invoke(ctx)
      File "c:\users\steve\anaconda3\envs\qurator-spk\lib\site-packages\click\core.py", line 1395, in invoke
        return ctx.invoke(self.callback, **ctx.params)
      File "c:\users\steve\anaconda3\envs\qurator-spk\lib\site-packages\click\core.py", line 754, in invoke
        return __callback(*args, **kwargs)
      File "c:\users\steve\anaconda3\envs\qurator-spk\lib\site-packages\qurator\eynollah\cli.py", line 151, in main
        pcgts = eynollah.run()
      File "c:\users\steve\anaconda3\envs\qurator-spk\lib\site-packages\qurator\eynollah\eynollah.py", line 2458, in run
        slopes, all_found_texline_polygons, boxes_text, txt_con_org, contours_only_text_parent, all_box_coord, index_by_text_par_con = self.get_slopes_and_deskew_new(txt_con_org, contours_only_text_parent, textline_mask_tot_ea, image_page_rotated, boxes_text, slope_deskew)
      File "c:\users\steve\anaconda3\envs\qurator-spk\lib\site-packages\qurator\eynollah\eynollah.py", line 828, in get_slopes_and_deskew_new
        processes[i].start()
      File "c:\users\steve\anaconda3\envs\qurator-spk\lib\multiprocessing\process.py", line 105, in start
        self._popen = self._Popen(self)
      File "c:\users\steve\anaconda3\envs\qurator-spk\lib\multiprocessing\context.py", line 223, in _Popen
        return _default_context.get_context().Process._Popen(process_obj)
      File "c:\users\steve\anaconda3\envs\qurator-spk\lib\multiprocessing\context.py", line 322, in _Popen
        return Popen(process_obj)
      File "c:\users\steve\anaconda3\envs\qurator-spk\lib\multiprocessing\popen_spawn_win32.py", line 65, in __init__
        reduction.dump(process_obj, to_child)
      File "c:\users\steve\anaconda3\envs\qurator-spk\lib\multiprocessing\reduction.py", line 60, in dump
        ForkingPickler(file, protocol).dump(obj)
    TypeError: can't pickle _thread.RLock objects
    

    Do you have any idea of what the problem may be, and what I can do to fix it? Thanks!

    opened by sjscotti 17
  • Documentation: Should the OCR-D processor run on RGB or binarized images?

    Documentation: Should the OCR-D processor run on RGB or binarized images?

    Should the OCR-D processor run on a RGB or a binarized image input group?

    I think it would be best if the README listed an example, e.g.:

    ocrd-eynollah-segment -I <WHICH ONE?> -O SEG-LINE -P xyz abc

    documentation 
    opened by mikegerber 14
  • Running results in OCR-D

    Running results in OCR-D

    Hello again. :)

    In this closed issue, @kba kindly recommended the following workflow to use eynollah results in an OCR-D workflow:

    ocrd workspace init
    ocrd workspace add -G IMG -i IMG_1 -g page1 image1.png
    ocrd workspace add -G SEG -i SEG_1 -g page1 image1.xml
    ocrd-tesserocr-recognize -P segmentation_level none -P textequiv_level line
    

    I'm having some challenges implementing this. It may just have to do with folders and paths, or maybe some "blanks" I failed to fill in...

    Everything goes smoothly until the last line. (I believe it wants an input parameter?) The output is:

            Input fileGrp[@USE='INPUT'] not in METS!
    

    If I try adding -I SEG, output includes the following:

    Traceback (most recent call last):
      File "/home/scott/src/github/OCR-D/ocrd_all/venv/lib/python3.6/site-packages/ocrd/workspace.py", line 111, in download_file
        raise Exception("Not already downloaded, moving on")
    Exception: Not already downloaded, moving on
    

    and FileNotFoundError: File path passed as 'url' to download_to_directory does not exist: C:/Users/Scott/Desktop/Python2/K/eyn_test2/F073r.jpg and FileNotFoundError: File path passed as 'url' to download_to_directory does not exist: /mnt/c/users/scott/desktop/python2/k/eyn_test2/C:/Users/Scott/Desktop/Python2/K/eyn_test2/F073r.jpg and Exception: Already tried prepending baseurl '/mnt/c/users/scott/desktop/python2/k/eyn_test2'. Cannot retrieve '/mnt/c/users/scott/desktop/python2/k/eyn_test2/C:/Users/Scott/Desktop/Python2/K/eyn_test2/F073r.jpg'

    If I try adding -I SEG_1, the output is:

            Input fileGrp[@USE='SEG_1'] not in METS!
    

    Any suggestions welcome and appreciated!

    opened by SB2020-eye 14
  • Reverse text line order from OCR-D

    Reverse text line order from OCR-D

    Hi, using eynollah in a OCR-D workflow produced a reverse text line order within each region, so that the last actual line is line_001 in the PAGE XML.

    I'm new to eynollah and OCR-D, so I might have made a mistake somewhere. Any ideas anyone? Thanks!

    I used this workflow:

    ocrd process \
      "sbb-binarize -I OCR-D-IMG -O OCR-D-BIN -P model default" \
      "eynollah-segment -I OCR-D-BIN -O OCR-D-SEG -P models default -P curved_line true" \
      "calamari-recognize -I OCR-D-SEG -O OCR-D-OCR -P checkpoint_dir qurator-gt4histocr-1.0"
    

    Used image

    PageView screenshot

    PageView screenshot

    And here's the xml section corresponding to the first news paragraph:

    XML
     <pc:TextRegion id="region_0010" type="paragraph">
    <pc:Coords points="372,501 371,501 363,501 363,502 357,502 356,501 350,501 349,502 347,502 347,503 345,505 345,506 345,506 345,520 345,521 345,524 348,526 348,529 361,529 361,528 364,528 365,529 373,529 373,530 373,530 374,531 373,531 372,531 371,532 360,532 360,533 359,533 358,532 348,532 348,533 346,533 345,533 343,533 342,534 341,534 341,535 327,535 327,539 326,540 321,540 321,539 318,539 317,538 316,538 315,538 313,538 313,537 311,537 310,536 309,536 308,536 306,536 305,535 303,535 302,535 294,535 293,535 291,535 290,535 284,535 283,535 281,535 281,535 276,535 275,535 271,535 271,536 263,536 262,536 251,536 250,536 235,536 234,535 224,535 223,536 208,536 207,535 204,535 203,535 181,535 180,535 172,535 171,536 165,536 165,536 165,537 164,536 159,536 158,537 158,537 157,536 156,536 155,536 153,536 153,535 127,535 126,536 118,536 117,535 106,535 105,536 102,536 101,536 101,537 101,538 101,540 100,541 100,542 99,543 99,544 98,545 98,551 99,551 99,555 97,556 96,556 96,557 95,557 94,558 92,558 91,558 90,558 89,559 81,559 80,560 78,560 78,560 76,560 75,561 73,561 72,561 66,561 65,562 60,562 59,563 51,563 50,564 50,574 50,575 50,621 50,621 50,665 49,665 49,681 50,681 50,701 50,701 51,702 51,712 51,713 51,718 52,719 51,720 51,726 52,727 52,731 51,731 51,735 51,735 51,753 51,753 51,757 51,758 51,766 51,767 51,779 52,780 52,795 53,796 53,808 52,809 52,819 53,820 53,830 52,830 52,832 51,833 51,840 52,840 52,850 53,850 53,881 53,881 53,882 53,890 53,890 53,903 53,904 53,909 53,910 53,910 54,910 55,911 127,911 128,911 132,911 133,912 133,912 134,913 138,913 138,913 143,913 144,913 152,913 153,913 155,913 156,914 158,914 159,915 190,915 190,914 205,914 205,915 211,915 212,915 218,915 219,915 224,915 225,913 226,913 227,913 232,913 233,912 235,912 236,911 240,911 240,911 250,911 250,910 270,910 271,910 271,905 271,904 271,898 272,898 272,895 273,894 273,893 273,893 273,892 275,890 276,890 276,890 283,890 283,889 286,889 286,888 288,888 288,888 290,888 290,887 290,886 296,886 296,886 297,886 298,885 302,885 303,886 305,886 306,886 313,886 314,886 323,886 323,885 328,885 328,885 329,885 332,885 333,886 354,886 355,885 357,885 358,885 377,885 378,885 379,885 380,886 394,886 395,885 396,885 396,885 404,885 405,885 410,885 410,886 418,886 419,886 419,888 420,888 428,888 428,888 429,888 429,886 430,886 430,886 431,885 444,885 445,884 452,884 453,885 471,885 471,884 473,884 474,883 478,883 479,884 496,884 496,883 563,883 563,883 565,883 566,883 578,883 578,884 580,884 580,885 591,885 591,884 595,884 596,883 608,883 608,883 633,883 633,883 635,883 635,884 641,884 642,883 650,883 651,883 658,883 658,883 661,883 662,885 663,885 664,886 685,886 686,885 686,885 686,883 687,883 687,880 688,880 688,858 687,858 687,848 686,847 686,841 687,841 687,787 686,786 686,771 686,771 686,769 686,768 686,765 687,765 687,731 686,731 686,725 686,725 686,720 686,719 686,706 686,706 686,698 686,698 686,690 687,689 687,676 686,675 686,668 686,668 686,666 686,666 686,650 686,649 686,633 685,633 685,619 686,618 686,609 685,608 685,598 685,598 685,591 685,590 685,578 685,578 685,541 684,540 684,540 683,539 683,538 683,538 682,538 675,538 675,537 675,536 674,535 674,533 673,532 673,531 666,531 665,532 660,532 660,533 658,533 657,532 654,532 653,531 636,531 635,532 606,532 606,533 603,533 603,532 598,532 598,531 586,531 586,531 581,531 581,530 576,530 576,530 575,530 575,529 575,528 576,528 576,528 577,528 578,527 581,527 581,526 587,526 588,526 594,526 595,525 596,525 597,525 600,525 601,524 601,524 601,523 602,523 630,523 630,522 631,521 642,521 643,522 648,522 648,521 658,521 658,516 657,516 657,515 656,514 655,514 653,512 653,510 653,510 653,508 652,508 634,508 633,507 633,506 632,505 632,503 631,503 626,503 626,503 601,503 600,504 588,504 588,503 583,503 583,503 570,503 570,503 553,503 552,503 532,503 531,503 493,503 493,503 486,503 486,503 469,503 468,504 463,504 463,503 453,503 453,504 440,504 439,503 435,503 434,503 400,503 400,502 392,502 391,501 390,501 390,501"/>
                <pc:TextLine id="region_0010_line_0001">
                    <pc:Coords points="55,888 55,888 53,888 52,889 52,890 51,890 51,893 51,893 48,893 48,898 48,899 48,910 49,910 50,910 50,913 50,913 51,914 64,914 65,913 159,913 160,915 160,916 161,916 161,917 176,917 176,916 180,916 181,916 181,915 182,914 183,914 184,913 191,913 191,914 211,914 211,915 223,915 225,913 248,913 249,913 266,913 266,912 268,911 268,911 269,910 276,910 276,910 276,909 276,894 268,894 266,893 251,893 250,892 250,892 249,891 249,891 248,890 248,889 227,889 226,888 226,888 207,888 206,888 203,888 202,889 195,889 195,890 194,889 184,889 183,888 171,888 170,890 160,890 160,891 159,891 158,891 158,892 158,893 135,893 135,893 134,893 133,892 133,891 132,891 132,889 131,888 118,888 118,889 118,890 117,891 98,891 97,890 97,890 96,889 96,888 73,888 72,888 55,888"/>
                    <pc:TextEquiv conf="0.996653318405151">
                        <pc:Unicode>Kulturkampfes halten.</pc:Unicode>
                    </pc:TextEquiv>
                </pc:TextLine>
                <pc:TextLine id="region_0010_line_0002">
                    <pc:Coords points="655,859 654,860 622,860 622,863 621,865 611,865 610,865 583,865 581,863 581,863 580,863 573,863 572,861 572,861 571,860 558,860 558,863 556,865 556,865 555,865 516,865 515,865 515,865 513,863 512,863 511,863 504,863 503,861 474,861 473,863 473,863 472,864 472,865 471,866 412,866 411,865 411,862 390,862 389,863 389,865 386,867 300,867 298,866 298,863 276,863 276,864 276,865 276,866 275,867 270,867 269,868 256,868 256,867 252,867 251,866 251,863 233,863 233,863 231,865 223,865 223,866 216,866 215,866 213,866 213,867 210,867 208,865 208,862 185,862 184,863 157,863 157,865 155,866 153,866 152,866 146,866 145,865 130,865 129,866 129,867 128,868 88,868 88,868 87,868 86,867 86,866 86,866 86,863 50,863 50,863 49,863 49,868 48,868 48,868 48,885 49,885 50,885 50,886 51,888 51,888 53,889 65,889 66,888 92,888 93,888 116,888 116,888 190,888 190,888 202,888 203,888 240,888 240,888 253,888 253,888 260,888 261,888 287,888 288,888 292,888 293,887 415,887 415,888 415,888 416,889 416,890 417,891 431,891 431,891 431,890 432,889 432,888 433,887 435,887 435,886 457,886 458,886 475,886 475,886 501,886 501,886 583,886 583,885 595,885 596,885 597,885 598,885 625,885 626,885 646,885 647,884 656,884 656,885 687,885 688,884 689,884 689,881 690,881 690,861 689,861 683,861 683,860 670,860 670,860 667,860 666,859 655,859"/>
                    <pc:TextEquiv conf="0.99672269821167">
                        <pc:Unicode>haben, weil ſie ihn für einen Gegner Bismarcks und des</pc:Unicode>
                    </pc:TextEquiv>
                </pc:TextLine>
                <pc:TextLine id="region_0010_line_0003">
                    <pc:Coords points="541,835 540,836 540,837 538,839 528,839 527,840 482,840 481,839 468,839 468,840 466,841 443,841 443,841 410,841 410,841 401,841 400,842 384,842 383,841 360,841 358,840 358,838 330,838 330,838 329,839 329,840 328,841 328,841 327,841 313,841 312,842 308,842 308,843 298,843 298,842 284,842 283,841 274,841 273,840 273,840 268,840 267,838 267,838 266,837 211,837 210,838 198,838 197,837 181,837 180,838 160,838 159,839 159,841 158,842 158,843 158,843 156,843 155,843 123,843 122,841 122,840 121,839 121,838 91,838 91,839 90,840 89,840 89,841 88,842 71,842 71,843 65,843 64,842 63,842 62,841 62,838 49,838 49,843 48,843 47,843 47,855 48,855 48,860 49,860 50,860 50,861 53,861 54,862 92,862 93,863 107,863 108,862 123,862 123,863 156,863 157,863 159,863 160,864 180,864 181,863 195,863 195,862 211,862 212,863 234,863 235,863 246,863 248,862 310,862 310,863 310,863 311,864 311,865 311,866 351,866 351,865 352,865 352,864 354,862 388,862 389,861 433,861 435,863 435,864 451,864 451,863 451,863 453,861 456,861 456,861 469,861 470,860 475,860 475,861 500,861 501,860 539,860 540,861 551,861 552,860 560,860 560,860 612,860 613,859 620,859 621,860 636,860 637,859 674,859 675,858 686,858 688,856 690,856 690,840 688,840 687,840 686,840 685,838 685,838 684,838 671,838 671,838 666,838 666,839 641,839 640,837 640,836 618,836 615,840 580,840 579,840 577,840 576,840 568,840 567,838 567,836 566,835 541,835"/>
                    <pc:TextEquiv conf="0.995206952095032">
                        <pc:Unicode>bereiteten Feierlichkeiten zeigten, mag darin ſeinen Grund</pc:Unicode>
                    </pc:TextEquiv>
                </pc:TextLine>
                <pc:TextLine id="region_0010_line_0004">
                    <pc:Coords points="405,810 404,811 404,811 403,812 386,812 386,811 381,811 380,811 346,811 346,811 318,811 318,812 318,815 316,816 307,816 306,816 255,816 255,817 254,816 217,816 216,815 215,815 215,815 201,815 200,815 199,815 198,815 198,815 197,814 197,813 196,813 196,812 196,811 176,811 176,812 163,812 163,815 162,815 162,816 161,817 98,817 97,816 95,816 93,815 89,815 87,813 87,812 53,812 52,813 51,813 51,813 50,814 50,815 50,816 50,817 49,818 48,818 48,834 50,834 50,835 50,836 51,836 52,836 53,837 69,837 70,838 75,838 75,838 88,838 89,838 95,838 95,837 121,837 122,838 123,838 123,838 123,839 124,840 137,840 138,839 138,838 138,838 142,838 143,837 143,838 160,838 160,838 161,838 162,839 183,839 184,838 197,838 198,837 248,837 249,838 249,838 263,838 265,837 282,837 283,838 283,839 297,839 297,838 298,836 346,836 347,837 359,837 360,836 394,836 395,836 416,836 416,835 516,835 516,835 537,835 538,835 577,835 578,835 624,835 625,834 651,834 652,835 653,835 653,835 653,836 654,836 667,836 667,836 668,835 668,835 668,834 670,834 671,833 688,833 689,833 689,832 690,831 690,815 689,815 688,815 688,814 683,814 683,813 646,813 646,813 630,813 630,812 612,812 611,813 607,813 606,814 583,814 581,811 581,811 580,810 563,810 561,812 561,813 561,813 561,814 560,815 528,815 528,814 516,814 516,815 513,815 513,815 483,815 482,815 481,815 478,812 478,810 431,810 430,811 418,811 418,810 405,810"/>
                    <pc:TextEquiv conf="0.986474871635437">
                        <pc:Unicode>Schwarzen ſich weniger zurückhaltend bei den dem Kronprinzen</pc:Unicode>
                    </pc:TextEquiv>
                </pc:TextLine>
                <pc:TextLine id="region_0010_line_0005">
                    <pc:Coords points="657,783 656,784 626,784 626,785 626,786 626,788 625,788 608,788 607,789 595,789 594,790 579,790 578,789 578,789 577,788 577,785 533,785 531,786 531,789 531,790 530,790 529,790 524,790 523,791 512,791 511,790 496,790 495,790 465,790 465,789 465,786 451,786 451,787 451,788 443,788 442,788 437,788 436,788 426,788 426,788 425,788 424,788 424,786 406,786 406,788 405,788 405,789 405,790 403,790 402,790 400,790 399,791 396,791 395,790 383,790 383,791 370,791 369,790 367,790 366,790 365,790 364,789 364,785 346,785 346,786 318,786 318,786 313,786 313,789 312,790 306,790 305,790 289,790 288,790 282,790 281,791 268,791 267,790 266,790 266,790 263,790 263,789 253,789 252,790 250,790 250,790 249,791 248,791 230,791 230,791 211,791 210,791 195,791 195,791 188,791 187,792 173,792 172,791 151,791 149,790 148,790 147,788 147,788 142,788 141,787 141,786 108,786 107,787 105,787 105,788 98,788 98,788 92,788 91,788 80,788 79,789 78,788 78,786 65,786 65,787 62,787 61,788 61,789 61,790 60,790 60,790 58,791 48,791 48,808 49,808 50,808 50,811 50,811 98,811 99,812 120,812 120,811 175,811 176,811 192,811 193,811 220,811 221,811 225,811 226,811 250,811 251,812 251,813 251,813 251,815 267,815 268,814 271,814 272,813 282,813 283,812 298,812 299,811 312,811 313,811 321,811 321,811 322,811 323,813 353,813 354,812 366,812 366,811 367,811 368,811 368,811 369,810 407,810 408,811 428,811 429,810 436,810 437,811 440,811 441,811 464,811 465,811 483,811 485,812 485,813 500,813 500,813 501,812 501,811 503,810 506,810 506,809 508,809 509,810 563,810 565,811 580,811 582,810 591,810 591,809 597,809 598,808 621,808 622,809 628,809 629,810 642,810 643,808 655,808 656,808 688,808 688,807 690,807 690,788 684,788 683,788 681,788 680,787 680,783 657,783"/>
                    <pc:TextEquiv conf="0.997538805007935">
                        <pc:Unicode>allenthalben einen ſympathiſchen Empfang. Daß auch die</pc:Unicode>
                    </pc:TextEquiv>
                </pc:TextLine>
                <pc:TextLine id="region_0010_line_0006">
                    <pc:Coords points="570,758 570,758 565,758 564,759 564,762 563,763 557,763 556,764 555,764 554,765 532,765 531,764 516,764 514,762 514,761 500,761 500,761 500,762 488,762 486,764 472,764 471,765 463,765 462,764 446,764 445,763 423,763 421,765 406,765 404,763 404,761 403,760 387,760 386,761 382,761 381,761 381,763 381,764 381,765 380,765 335,765 335,765 334,765 333,764 333,761 320,761 319,762 319,764 318,765 318,765 317,765 293,765 293,766 251,766 249,763 245,763 244,763 239,763 238,762 236,762 236,761 217,761 216,762 213,762 211,763 211,764 210,765 179,765 178,766 178,766 177,766 151,766 151,766 149,766 148,765 148,765 146,764 146,763 135,763 134,764 129,764 128,763 103,763 101,762 98,762 98,761 98,761 53,761 52,761 49,761 49,765 48,766 47,766 47,782 49,782 50,783 50,786 90,786 91,787 103,787 104,786 113,786 114,787 126,787 127,786 130,786 130,786 218,786 219,786 255,786 255,786 296,786 297,785 321,785 322,786 335,786 336,785 408,785 409,785 420,785 421,785 423,785 425,787 430,787 431,788 445,788 446,787 446,786 448,785 480,785 480,784 495,784 495,785 507,785 508,784 523,784 523,783 577,783 578,784 580,784 580,785 580,785 592,785 593,784 600,784 600,783 605,783 605,784 619,784 620,785 646,785 647,785 650,785 651,783 654,783 655,783 674,783 675,782 686,782 686,781 687,781 687,781 688,780 690,780 690,763 687,763 686,763 686,763 684,761 684,760 672,760 672,761 671,761 670,761 669,761 654,761 653,761 653,761 653,759 652,758 652,758 640,758 639,759 624,759 623,758 605,758 605,758 570,758"/>
                    <pc:TextEquiv conf="0.988161742687225">
                        <pc:Unicode>Reiches. der in Bahern mehrere Truppenrevüen abhielt, fand</pc:Unicode>
                    </pc:TextEquiv>
                </pc:TextLine>
                <pc:TextLine id="region_0010_line_0007">
                    <pc:Coords points="640,732 640,733 639,733 632,733 631,735 631,736 630,736 629,736 628,736 613,736 611,735 611,735 610,734 609,734 608,733 591,733 590,734 589,734 588,735 574,735 573,735 551,735 551,738 550,738 528,738 527,738 511,738 510,737 493,737 492,738 491,738 491,738 481,738 480,739 467,739 466,738 463,738 461,736 461,735 443,735 442,736 442,736 440,738 439,738 438,739 430,739 429,738 421,738 420,738 420,738 418,736 418,736 417,736 416,736 416,735 415,735 414,735 400,735 400,735 398,735 398,736 397,736 396,736 396,737 395,738 395,739 394,740 353,740 353,740 349,740 348,739 336,739 335,740 291,740 290,739 290,735 250,735 250,735 221,735 221,735 221,738 220,739 220,740 220,740 198,740 197,740 176,740 176,740 176,741 156,741 155,740 155,737 133,737 133,738 133,738 132,739 132,740 131,741 110,741 108,739 108,737 98,737 98,736 81,736 81,740 80,741 80,741 79,741 55,741 55,742 47,742 47,758 48,758 48,759 48,761 49,761 68,761 68,761 91,761 91,761 109,761 110,761 175,761 176,761 176,763 176,763 196,763 196,763 198,761 209,761 210,760 215,760 216,761 238,761 240,763 253,763 254,762 254,761 255,761 255,761 256,760 295,760 296,760 368,760 369,760 397,760 398,760 423,760 424,759 438,759 439,760 453,760 453,759 503,759 504,758 516,758 517,759 534,759 535,760 535,760 535,761 535,761 548,761 548,760 549,760 549,759 550,758 551,758 552,758 639,758 640,759 666,759 668,757 688,757 689,756 689,755 690,754 690,738 681,738 681,737 666,737 665,736 665,732 640,732"/>
                    <pc:TextEquiv conf="0.992116212844849">
                        <pc:Unicode>gendſten Truppeninſpektionen vor. Der Kronprinz des Deutſchen</pc:Unicode>
                    </pc:TextEquiv>
                </pc:TextLine>
                <pc:TextLine id="region_0010_line_0008">
                    <pc:Coords points="530,707 530,708 525,708 524,708 519,708 518,709 516,709 515,708 472,708 472,710 471,710 468,710 468,711 468,711 468,712 467,713 466,713 466,713 448,713 447,714 433,714 433,713 406,713 406,714 385,714 384,713 372,713 371,714 366,714 366,715 365,715 364,714 350,714 350,713 347,713 346,713 334,713 334,713 333,714 328,714 327,715 303,715 301,712 300,712 299,711 278,711 278,711 278,710 277,709 246,709 245,710 235,710 234,710 233,710 232,710 217,710 216,710 201,710 201,711 201,711 201,713 200,714 200,715 200,715 171,715 170,714 170,713 170,713 170,711 152,711 151,711 131,711 131,712 110,712 110,711 98,711 97,710 50,710 49,711 49,714 48,715 47,715 47,731 48,731 49,731 49,736 50,737 50,738 81,738 81,737 83,735 152,735 153,736 171,736 172,735 238,735 238,736 261,736 262,735 263,735 263,735 278,735 278,735 305,735 306,735 395,735 396,734 408,734 408,733 409,734 430,734 430,733 464,733 465,734 470,734 471,735 471,735 472,735 472,736 473,737 496,737 497,736 508,736 508,736 510,733 526,733 527,734 531,734 532,735 532,736 561,736 562,736 562,735 563,735 563,734 563,733 565,733 566,733 581,733 581,732 626,732 626,733 643,733 644,732 661,732 661,731 680,731 680,730 680,730 690,730 690,713 680,713 678,712 646,712 646,711 646,711 645,710 645,708 629,708 629,710 628,710 628,711 628,711 622,711 621,712 610,712 610,713 595,713 594,712 593,712 592,711 592,708 570,708 570,710 569,710 568,710 568,710 550,710 550,711 550,712 549,712 548,711 548,708 548,707 530,707"/>
                    <pc:TextEquiv conf="0.981864213943481">
                        <pc:Unicode>ſich des veſten Wohlſeins und nimmt noch häufig die anſtren—</pc:Unicode>
                    </pc:TextEquiv>
                </pc:TextLine>
                <pc:TextLine id="region_0010_line_0009">
                    <pc:Coords points="516,683 516,683 501,683 500,684 495,684 494,685 494,686 492,688 485,688 484,687 476,687 475,686 475,686 474,686 473,686 473,685 472,685 471,685 471,685 470,684 454,684 453,685 453,685 451,686 451,687 450,688 425,688 425,688 423,688 423,687 422,687 421,686 421,685 421,685 421,684 409,684 408,685 403,685 402,684 366,684 366,685 365,685 365,685 365,686 365,687 364,688 355,688 354,688 345,688 343,686 343,685 342,685 322,685 320,686 320,687 320,688 320,689 319,690 302,690 302,691 301,691 269,691 268,691 268,691 266,690 242,690 241,689 241,689 240,688 240,686 227,686 226,685 226,685 211,685 210,685 189,685 188,686 184,686 183,686 173,686 171,685 155,685 155,686 154,685 134,685 134,688 132,690 110,690 110,691 98,691 98,690 77,690 75,688 75,686 49,686 49,688 48,689 48,690 48,691 47,691 47,707 48,707 50,708 50,711 78,711 79,710 98,710 98,710 120,710 121,710 156,710 156,710 170,710 171,710 176,710 177,711 178,711 178,711 191,711 192,711 193,711 194,710 214,710 215,711 215,711 216,711 226,711 226,710 227,709 268,709 268,708 268,708 311,708 311,706 312,706 320,706 321,706 321,707 321,708 322,708 323,708 323,708 324,709 325,709 326,710 338,710 339,709 386,709 386,708 415,708 415,708 418,708 419,708 431,708 431,708 439,708 440,708 456,708 456,709 470,709 470,708 478,708 479,708 501,708 503,709 523,709 523,708 530,708 531,708 560,708 560,707 575,707 576,708 591,708 591,707 608,707 608,706 630,706 630,707 646,707 646,706 687,706 688,706 688,706 688,705 689,704 690,704 690,688 688,688 687,687 687,686 650,686 649,686 648,686 648,685 648,684 646,684 646,683 636,683 636,685 634,686 622,686 621,687 613,687 612,688 609,688 608,687 592,687 591,686 591,686 590,686 590,683 567,683 566,683 549,683 548,683 543,683 542,684 540,684 540,683 536,683 536,683 516,683"/>
                    <pc:TextEquiv conf="0.989803791046143">
                        <pc:Unicode>ſehen und begünſtigen. Se. M. der Deutſche Kaiſer erfreut</pc:Unicode>
                    </pc:TextEquiv>
                </pc:TextLine>
                <pc:TextLine id="region_0010_line_0010">
                    <pc:Coords points="561,657 561,659 560,660 560,661 560,661 556,661 556,662 546,662 546,663 523,663 521,661 521,661 521,660 521,658 505,658 505,661 504,661 500,661 499,661 481,661 481,661 480,661 479,662 474,662 473,663 464,663 463,661 463,661 462,660 462,658 441,658 439,660 439,661 438,662 431,662 431,663 415,663 413,661 413,659 413,658 413,658 398,658 397,658 368,658 367,659 367,660 366,661 366,662 365,663 359,663 358,664 330,664 329,663 326,663 325,663 324,663 323,661 323,660 322,660 322,659 255,659 255,660 254,660 253,660 253,660 253,661 252,661 252,662 251,663 251,663 251,664 221,664 220,663 220,662 220,661 220,660 219,659 176,659 176,660 165,660 165,663 164,664 163,664 162,665 151,665 150,665 145,665 144,666 126,666 126,665 119,665 118,665 85,665 84,663 84,662 83,661 83,660 48,660 48,661 48,665 47,666 47,682 48,682 48,683 48,684 50,686 51,686 52,687 84,687 86,685 142,685 143,686 143,686 181,686 182,685 183,685 183,685 188,685 188,685 201,685 201,685 207,685 208,685 221,685 222,685 223,685 223,684 281,684 281,685 282,685 283,685 295,685 296,684 318,684 318,683 364,683 365,684 400,684 401,685 414,685 415,683 416,683 416,683 445,683 446,683 460,683 461,683 481,683 481,683 482,683 483,684 486,684 486,685 515,685 515,684 516,683 535,683 535,682 539,682 540,683 540,683 541,684 556,684 557,685 570,685 571,683 583,683 584,683 585,683 596,683 597,683 598,683 598,682 605,682 606,681 614,681 615,682 631,682 632,683 632,684 633,685 646,685 646,684 646,683 648,683 648,683 649,683 649,682 650,681 650,681 651,681 686,681 687,680 688,680 688,680 689,679 690,679 690,663 688,663 688,662 688,661 679,661 678,661 663,661 662,661 651,661 650,662 621,662 620,663 619,663 618,662 606,662 606,661 605,661 604,660 604,658 603,657 561,657"/>
                    <pc:TextEquiv conf="0.992499232292175">
                        <pc:Unicode>höheren geiſtlichen Behörden ſolche Vorpoftengefechte gerne</pc:Unicode>
                    </pc:TextEquiv>
                </pc:TextLine>
                <pc:TextLine id="region_0010_line_0011">
                    <pc:Coords points="656,631 655,632 646,632 645,633 635,633 635,632 633,632 632,631 618,631 618,634 617,635 617,635 616,636 616,636 615,636 608,636 608,637 589,637 588,636 561,636 560,637 515,637 513,636 513,635 500,635 500,635 498,637 465,637 464,638 409,638 408,638 407,638 406,636 406,634 376,634 376,635 375,635 375,636 374,638 372,638 371,638 345,638 344,638 343,638 342,637 330,637 328,638 311,638 310,638 305,638 303,636 283,636 282,636 264,636 264,636 263,637 263,638 263,638 261,638 261,639 183,639 183,638 183,638 170,638 169,638 168,638 167,638 163,638 162,637 153,637 153,636 140,636 138,639 124,639 123,640 121,640 120,639 112,639 111,638 111,636 91,636 91,636 90,636 90,637 88,637 88,638 87,640 61,640 60,639 48,639 48,640 47,640 47,656 48,657 48,659 81,659 81,660 118,660 119,660 119,662 135,662 135,661 135,661 135,660 136,660 152,660 153,659 161,659 162,658 163,658 163,659 175,659 176,658 181,658 182,659 220,659 220,660 236,660 237,659 240,659 241,658 312,658 313,658 363,658 363,658 387,658 388,658 427,658 428,659 431,659 431,660 443,660 443,659 445,658 448,658 449,657 531,657 532,656 538,656 538,657 562,657 563,656 578,656 579,657 591,657 591,656 611,656 612,656 613,656 614,656 646,656 646,656 685,656 685,655 688,655 688,653 688,653 690,653 690,636 688,636 687,636 680,636 678,634 678,631 656,631"/>
                    <pc:TextEquiv conf="0.987055063247681">
                        <pc:Unicode>der Tagesordnung und werden ſolange vorkommen, als die</pc:Unicode>
                    </pc:TextEquiv>
                </pc:TextLine>
                <pc:TextLine id="region_0010_line_0012">
                    <pc:Coords points="518,606 518,606 503,606 503,609 503,610 503,610 501,611 501,611 500,612 480,612 480,613 443,613 441,611 441,608 439,608 438,608 406,608 406,609 405,610 405,611 404,612 397,612 396,613 368,613 366,611 366,611 366,610 366,609 363,609 363,608 346,608 346,609 345,610 340,610 339,610 337,610 336,610 326,610 325,609 325,609 324,608 306,608 305,609 296,609 295,608 276,608 276,609 275,609 274,610 274,611 272,613 271,613 270,613 250,613 250,613 249,613 248,612 248,611 248,611 248,608 217,608 217,608 216,609 189,609 189,611 188,611 188,612 187,613 181,613 181,614 176,614 175,615 147,615 146,614 78,614 77,615 73,615 72,614 64,614 63,613 63,613 62,613 61,613 60,611 48,611 48,612 47,613 47,631 48,633 48,636 64,636 65,636 65,635 65,635 66,635 66,634 120,634 121,635 121,637 136,637 136,636 138,634 170,634 171,635 171,636 171,637 183,637 183,636 183,636 184,636 184,635 185,634 200,634 201,633 223,633 223,634 224,634 225,635 225,636 248,636 248,636 248,635 250,634 250,634 251,633 275,633 277,636 277,636 278,637 311,637 311,636 312,636 315,636 315,635 315,635 316,635 316,634 343,634 343,633 344,633 345,633 385,633 386,632 404,632 406,634 414,634 415,635 415,635 416,635 438,635 438,635 441,635 443,633 444,633 445,632 506,632 506,633 507,633 508,634 512,634 513,635 516,635 516,635 564,635 565,635 565,634 567,631 568,631 568,631 615,631 616,631 621,631 622,632 623,632 623,633 635,633 636,631 637,631 638,631 643,631 643,630 657,630 658,631 686,631 686,628 686,628 690,628 690,611 689,611 688,611 688,606 676,606 676,608 676,608 676,610 675,610 663,610 663,611 639,611 638,610 637,610 635,608 635,608 635,607 619,607 619,609 617,611 575,611 573,610 573,609 573,608 573,607 562,607 561,606 553,606 552,606 518,606"/>
                    <pc:TextEquiv conf="0.996099174022675">
                        <pc:Unicode>ſperrungen zelotiſcher Hetzkapläne ſtehen auch jetzt noch auf</pc:Unicode>
                    </pc:TextEquiv>
                </pc:TextLine>
                <pc:TextLine id="region_0010_line_0013">
                    <pc:Coords points="640,581 639,581 639,583 638,584 638,585 638,585 601,585 600,584 600,581 564,581 564,583 563,583 563,585 562,586 514,586 513,586 513,586 512,585 500,585 500,586 499,587 453,587 452,588 439,588 438,587 436,587 435,586 435,583 396,583 395,583 395,584 395,585 395,586 393,588 346,588 345,588 332,588 331,587 330,587 330,586 330,584 306,584 306,585 305,585 305,586 305,587 304,587 303,588 293,588 292,588 283,588 283,589 269,589 268,588 246,588 246,588 234,588 233,588 200,588 198,586 198,584 192,584 191,583 181,583 181,583 145,583 145,583 135,583 135,584 129,584 128,583 107,583 106,584 103,584 103,585 96,585 95,584 75,584 75,585 61,585 61,584 48,584 47,585 47,608 48,609 60,609 61,610 77,610 78,611 91,611 92,610 93,610 93,609 95,609 96,610 116,610 116,609 143,609 143,610 144,610 145,610 145,611 156,611 156,610 158,609 168,609 168,610 178,610 179,610 191,610 193,608 199,608 200,608 231,608 231,607 236,607 236,608 265,608 266,608 281,608 282,610 332,610 333,609 333,608 333,608 336,608 337,607 392,607 393,606 394,606 395,607 401,607 401,608 414,608 415,607 455,607 456,608 456,610 471,610 472,609 481,609 481,608 482,608 486,608 486,607 496,607 497,606 503,606 503,606 606,606 607,605 642,605 643,606 656,606 657,605 678,605 679,604 679,603 680,602 690,602 690,586 678,586 678,585 667,585 666,585 666,585 665,584 665,583 665,583 665,581 640,581"/>
                    <pc:TextEquiv conf="0.993716180324554">
                        <pc:Unicode>d. h. Nachrichten von größerem Belange, denn kleine Ein—</pc:Unicode>
                    </pc:TextEquiv>
                </pc:TextLine>
                <pc:TextLine id="region_0010_line_0014">
                    <pc:Coords points="346,557 345,558 333,558 333,558 333,559 332,560 332,561 331,562 330,562 329,563 296,563 295,562 295,562 293,561 293,560 293,559 269,559 268,558 254,558 254,561 252,563 226,563 226,562 225,562 224,561 223,561 223,561 223,560 211,560 211,561 210,561 203,561 202,562 197,562 196,563 171,563 170,563 140,563 140,563 139,563 138,562 138,561 138,561 138,560 137,559 135,559 135,558 130,558 129,558 82,558 81,558 70,558 69,559 47,559 47,583 48,584 118,584 119,585 131,585 132,583 169,583 170,583 206,583 206,582 232,582 233,583 255,583 256,583 268,583 269,583 271,583 273,585 273,586 285,586 285,585 286,584 286,583 288,582 315,582 315,583 328,583 330,584 331,584 331,585 333,585 333,585 351,585 352,584 352,583 353,582 378,582 378,581 400,581 401,582 427,582 428,581 433,581 434,582 464,582 465,581 465,581 466,582 467,582 468,583 468,583 468,584 468,585 485,585 485,584 485,583 486,581 521,581 522,581 558,581 559,580 570,580 571,581 583,581 585,583 601,583 601,582 603,581 606,581 607,580 636,580 636,580 668,580 668,580 683,580 683,578 683,578 690,578 690,561 681,561 680,560 669,560 668,560 657,560 656,559 644,559 643,560 642,560 641,560 589,560 588,561 523,561 521,560 510,560 508,561 500,561 500,561 491,561 491,561 485,561 485,560 483,560 483,560 481,560 480,558 480,558 465,558 464,558 455,558 455,559 438,559 438,560 438,560 436,561 418,561 416,559 400,559 400,560 400,561 399,561 398,560 378,560 378,561 376,561 375,560 375,558 375,558 368,558 368,557 346,557"/>
                    <pc:TextEquiv conf="0.993075370788574">
                        <pc:Unicode>Nachrichten in der letzten Zeit etwas ſparſamer geworden,</pc:Unicode>
                    </pc:TextEquiv>
                </pc:TextLine>
                <pc:TextLine id="region_0010_line_0015">
                    <pc:Coords points="658,528 658,529 655,529 655,530 630,530 629,529 597,529 596,530 596,532 595,533 595,535 595,535 555,535 555,535 555,531 531,531 531,530 503,530 502,531 502,533 501,533 493,533 491,535 485,535 485,536 462,536 461,535 461,532 461,531 460,531 460,531 446,531 445,531 445,531 444,532 444,535 443,536 441,536 441,536 406,536 405,536 404,536 403,535 403,532 403,531 359,531 358,533 358,535 357,536 356,536 355,537 309,537 308,536 306,536 305,536 305,531 273,531 271,533 270,533 270,533 263,533 263,534 263,535 262,535 261,536 258,536 258,536 258,537 226,537 225,536 211,536 210,536 210,537 210,538 177,538 176,537 175,537 175,536 162,536 160,538 142,538 141,538 124,538 123,536 122,536 121,536 121,533 121,532 104,532 103,533 101,533 101,534 100,535 97,535 96,536 95,536 95,537 94,538 94,539 93,540 93,554 103,554 103,555 103,555 104,556 105,556 105,556 131,556 132,557 132,558 160,558 161,557 162,557 163,558 205,558 205,557 261,557 262,558 271,558 273,560 273,560 305,560 305,560 308,557 318,557 319,556 323,556 323,557 340,557 341,556 358,556 359,557 397,557 398,556 423,556 424,556 440,556 440,556 490,556 491,558 491,559 492,560 520,560 520,559 520,558 521,556 522,556 523,556 539,556 540,555 594,555 595,556 607,556 608,555 635,555 635,555 645,555 646,554 655,554 656,555 681,555 681,554 686,554 686,553 686,551 687,550 690,550 690,536 689,535 689,535 688,534 678,534 677,533 677,529 671,529 671,528 658,528"/>
                    <pc:TextEquiv conf="0.975517272949219">
                        <pc:Unicode>Von den deutfchen Cultur-Kampfſtätten ſind die</pc:Unicode>
                    </pc:TextEquiv>
                </pc:TextLine>
                <pc:TextLine id="region_0010_line_0016">
                    <pc:Coords points="549,500 548,500 543,500 542,501 516,501 515,501 515,502 515,503 514,503 514,503 513,504 513,505 513,505 510,505 510,506 482,506 481,505 480,505 480,505 479,505 478,504 466,504 465,505 465,505 464,506 451,506 450,505 436,505 435,504 435,501 404,501 403,501 390,501 389,501 375,501 375,501 349,501 348,502 348,505 348,506 345,506 345,506 340,506 340,523 346,523 346,523 346,528 360,528 361,527 386,527 387,526 406,526 406,527 418,527 419,526 420,526 420,526 460,526 461,525 467,525 468,526 487,526 488,525 571,525 571,526 583,526 584,525 598,525 598,525 656,525 656,522 656,521 663,521 663,511 662,511 662,510 661,509 660,509 658,507 658,505 656,505 655,505 646,505 645,504 636,504 636,503 636,502 635,501 635,500 622,500 621,500 621,500 620,501 620,503 619,504 596,504 596,503 591,503 591,503 588,503 588,502 578,502 577,501 570,501 568,500 549,500"/>
                    <pc:TextEquiv conf="0.9727663397789">
                        <pc:Unicode>Roſenheim, den 5. September.</pc:Unicode>
                    </pc:TextEquiv>
                </pc:TextLine>
                <pc:TextEquiv>
                    <pc:Unicode>Kulturkampfes halten.
    haben, weil ſie ihn für einen Gegner Bismarcks und des
    bereiteten Feierlichkeiten zeigten, mag darin ſeinen Grund
    Schwarzen ſich weniger zurückhaltend bei den dem Kronprinzen
    allenthalben einen ſympathiſchen Empfang. Daß auch die
    Reiches. der in Bahern mehrere Truppenrevüen abhielt, fand
    gendſten Truppeninſpektionen vor. Der Kronprinz des Deutſchen
    ſich des veſten Wohlſeins und nimmt noch häufig die anſtren—
    ſehen und begünſtigen. Se. M. der Deutſche Kaiſer erfreut
    höheren geiſtlichen Behörden ſolche Vorpoftengefechte gerne
    der Tagesordnung und werden ſolange vorkommen, als die
    ſperrungen zelotiſcher Hetzkapläne ſtehen auch jetzt noch auf
    d. h. Nachrichten von größerem Belange, denn kleine Ein—
    Nachrichten in der letzten Zeit etwas ſparſamer geworden,
    Von den deutfchen Cultur-Kampfſtätten ſind die
    Roſenheim, den 5. September.</pc:Unicode>
                </pc:TextEquiv>
    
    bug 
    opened by aurichje 13
  • IndexError: list index out of range (slopes[region_idx])

    IndexError: list index out of range (slopes[region_idx])

    Hi, when running Eynollah on this image, using

    en = Eynollah("models_eynollah", imgfile, dir_out=path.split(imgfile)[0], curved_line=True, full_layout=True)
    pcgts = en.run()
    

    it fails with

    Traceback (most recent call last):
      File "<stdin>", line 6, in <module>
      File "venv/lib/python3.6/site-packages/qurator/eynollah/eynollah.py", line 2074, in run
        pcgts = self.writer.build_pagexml_full_layout(contours_only_text_parent, contours_only_text_parent_h, page_coord, order_text_new, id_of_texts_tot, all_found_texline_polygons, all_found_texline_polygons_h, all_box_coord, all_box_coord_h, polygons_of_images, polygons_of_tabels, polygons_of_drop_capitals, polygons_of_marginals, all_found_texline_polygons_marginals, all_box_coord_marginals, slopes, slopes_marginals, cont_page, polygons_lines_xml)
      File "venv/lib/python3.6/site-packages/qurator/eynollah/writer.py", line 221, in build_pagexml_full_layout
        self.serialize_lines_in_region(textregion, all_found_texline_polygons_h, mm, page_coord, all_box_coord_h, slopes, counter)
      File "venv/lib/python3.6/site-packages/qurator/eynollah/writer.py", line 117, in serialize_lines_in_region
        if self.curved_line and np.abs(slopes[region_idx]) <= 45:
    IndexError: list index out of range
    

    Some other pages from the book seem to work, the results are looking really good (except for drop caps, but they are not that easy to identify and put in the correct order for humans as well). I'm segmenting the rest of the book now and will see if there are more errors like that one.

    opened by andbue 9
  • Irritating

    Irritating "Image dimension" log message

    I am processing a 4000x6000 image using ocrd-eynollah-segment and get - among other messages - this message:

    14:32:10.541 INFO eynollah - Image dimensions: 448x672
    

    Should this read "Patch dimensions" and maybe get a log level of DEBUG?

    documentation 
    opened by mikegerber 8
  • drop_capitals.py: ValueError: attempt to get argmin of an empty sequence

    drop_capitals.py: ValueError: attempt to get argmin of an empty sequence

    Hi, I think I found another one:

    wget https://api.digitale-sammlungen.de/iiif/image/v2/bsb00052981_00339/full/full/0/default.png
    eynollah -i default.png -o . -m eynollah/models_eynollah -fl -cl
    
    13:16:03.204 INFO eynollah - resize and enhance image
    13:16:03.204 INFO eynollah - Detected 230 DPI
    13:16:19.326 INFO eynollah - Found 3 columns ([[1.6621375e-26 1.6978607e-38 1.0000000e+00 2.5424867e-32 9.4024474e-31
      0.0000000e+00]])
    13:16:33.584 INFO eynollah - Image is enhanced
    13:16:33.726 INFO eynollah - Enhancing took 30.522119998931885s
    13:16:39.280 INFO eynollah - Image dimensions: 448x672
    13:16:58.684 INFO eynollah - Image dimensions: 448x672
    13:17:19.415 INFO eynollah - Image dimensions: 448x672
    13:17:39.792 INFO eynollah - ratio_of_two_models: 99.93604678448163
    13:17:40.588 INFO eynollah - Textregion detection took 66.86148571968079s
    13:17:47.636 INFO eynollah - Graphics detection took 7.048167943954468s
    13:17:47.636 INFO eynollah - cont_page [array([[  88,   87],
           [2933,   87],
           [2933, 4525],
           [  88, 4525]])]
    13:17:52.956 INFO eynollah - Image dimensions: 448x672
    13:18:04.696 INFO eynollah - textline detection took 17.060104370117188s
    13:18:21.939 INFO eynollah - slope_deskew: -0.3636363636363633
    13:18:21.939 INFO eynollah - deskewing took 17.242716073989868s
    13:18:21.962 INFO eynollah - detection of marginals took 0.022979736328125s
    13:18:27.893 INFO eynollah - Image dimensions: 896x896
    13:18:33.513 INFO eynollah - Image dimensions: 896x896
    13:18:53.899 INFO eynollah - areas_cnt_text [6.06679334e-05 3.96004787e-08 1.24939510e-03 1.53873996e-02
     3.28577052e-03 5.43809614e-03 5.36713208e-03 6.94196391e-05
     1.72341283e-04 1.30660354e-01 1.54637414e-01 7.77194243e-02
     3.97628407e-04 1.18769756e-03 4.26853560e-04]
    Traceback (most recent call last):
      File "/.../bin/eynollah", line 33, in <module>
      File "/.../lib/python3.7/site-packages/click/core.py", line 1137, in __call__
        return self.main(*args, **kwargs)
      File "/.../lib/python3.7/site-packages/click/core.py", line 1062, in main
        rv = self.invoke(ctx)
      File "/.../lib/python3.7/site-packages/click/core.py", line 1404, in invoke
        return ctx.invoke(self.callback, **ctx.params)
      File "/.../lib/python3.7/site-packages/click/core.py", line 763, in invoke
        return __callback(*args, **kwargs)
      File "/.../lib/python3.7/site-packages/qurator/eynollah/cli.py", line 142, in main
        pcgts = eynollah.run()
      File "/.../lib/python3.7/site-packages/qurator/eynollah/eynollah.py", line 2024, in run
        all_found_texline_polygons = adhere_drop_capital_region_into_corresponding_textline(text_regions_p, polygons_of_drop_capitals, contours_only_text_parent, contours_only_text_parent_h, all_box_coord, all_box_coord_h, all_found_texline_polygons, all_found_texline_polygons_h, kernel=KERNEL, curved_line=self.curved_line)
      File "/.../lib/python3.7/site-packages/qurator/eynollah/utils/drop_capitals.py", line 157, in adhere_drop_capital_region_into_corresponding_textline
        arg_min = np.argmin(np.abs(y_lines - y_min_d[i_drop]))
      File "<__array_function__ internals>", line 6, in argmin
      File "/.../lib/python3.7/site-packages/numpy/core/fromnumeric.py", line 1267, in argmin
        return _wrapfunc(a, 'argmin', axis=axis, out=out)
      File "/.../lib/python3.7/site-packages/numpy/core/fromnumeric.py", line 61, in _wrapfunc
        return bound(*args, **kwds)
    ValueError: attempt to get argmin of an empty sequence
    
    
    opened by andbue 6
  • Unable to process document due to ValueError: attempt to get argmax of an empty sequence

    Unable to process document due to ValueError: attempt to get argmax of an empty sequence

    For this workspace:

    PPN729186350.zip

    I get the following error:

    
    Traceback (most recent call last):
      File "/usr/local/bin/ocrd-eynollah-segment", line 8, in <module>
        sys.exit(main())
      File "/usr/local/lib/python3.6/dist-packages/click/core.py", line 829, in __call__
        return self.main(*args, **kwargs)
      File "/usr/local/lib/python3.6/dist-packages/click/core.py", line 782, in main
        rv = self.invoke(ctx)
      File "/usr/local/lib/python3.6/dist-packages/click/core.py", line 1066, in invoke
        return ctx.invoke(self.callback, **ctx.params)
      File "/usr/local/lib/python3.6/dist-packages/click/core.py", line 610, in invoke
        return callback(*args, **kwargs)
      File "/usr/local/lib/python3.6/dist-packages/qurator/eynollah/ocrd_cli.py", line 8, in main
        return ocrd_cli_wrap_processor(EynollahProcessor, *args, **kwargs)
      File "/usr/local/lib/python3.6/dist-packages/ocrd/decorators/__init__.py", line 91, in ocrd_cli_wrap_processor
        run_processor(processorClass, ocrd_tool, mets, workspace=workspace, **kwargs)
      File "/usr/local/lib/python3.6/dist-packages/ocrd/processor/helpers.py", line 72, in run_processor
        processor.process()
      File "/usr/local/lib/python3.6/dist-packages/qurator/eynollah/processor.py", line 57, in process
        Eynollah(**eynollah_kwargs).run()
      File "/usr/local/lib/python3.6/dist-packages/qurator/eynollah/eynollah.py", line 1744, in run
        contours_biggest = contours_only_text_parent[np.argmax(areas_cnt_text)]
      File "<__array_function__ internals>", line 6, in argmax
      File "/usr/local/lib/python3.6/dist-packages/numpy/core/fromnumeric.py", line 1186, in argmax
        return _wrapfunc(a, 'argmax', axis=axis, out=out)
      File "/usr/local/lib/python3.6/dist-packages/numpy/core/fromnumeric.py", line 61, in _wrapfunc
        return bound(*args, **kwds)
    ValueError: attempt to get argmax of an empty sequence
    

    Command line used:

    ocrd-eynollah-segment --overwrite -I OCR-D-IMG-BIN -O OCR-D-SEG-LINE -P models /var/lib/eynollah
    
    opened by mikegerber 6
  • How to get Image segment as img as Shown in repo for newspaper cutting

    How to get Image segment as img as Shown in repo for newspaper cutting

    Hi Sir, Can you pls tell the way I can get the result as segmented image that has mapped layouts with different color present in image. As the example you have displayed on the repo-https://user-images.githubusercontent.com/952378/102350683-8a74db80-3fa5-11eb-8c7e-f743f7d6eae2.jpg. Thankyou

    documentation 
    opened by dhirendraAL 6
  • ValueError: bad marshal data

    ValueError: bad marshal data

    I am using Ubuntu 20.04 Linux

    conda create -n eynollah -y python=3.8 conda activate eynollah // in eynollah's directory pip install -e . make models

    // execute following command generates "ValueError: bad marshal data" eynollah -i data/input/1.jpg -o data/output -m models_eynollah

    Full error messages is below.

    Traceback (most recent call last): File "/anaconda/envs/eynollah/bin/eynollah", line 33, in sys.exit(load_entry_point('eynollah', 'console_scripts', 'eynollah')()) File "/anaconda/envs/eynollah/lib/python3.8/site-packages/click/core.py", line 1130, in call return self.main(*args, **kwargs) File "/anaconda/envs/eynollah/lib/python3.8/site-packages/click/core.py", line 1055, in main rv = self.invoke(ctx) File "/anaconda/envs/eynollah/lib/python3.8/site-packages/click/core.py", line 1404, in invoke return ctx.invoke(self.callback, **ctx.params) File "/anaconda/envs/eynollah/lib/python3.8/site-packages/click/core.py", line 760, in invoke return __callback(*args, **kwargs) File "/home/mylogin/notebooks/eynollah/qurator/eynollah/cli.py", line 151, in main pcgts = eynollah.run() File "/home/mylogin/notebooks/eynollah/qurator/eynollah/eynollah.py", line 2307, in run img_res, is_image_enhanced, num_col_classifier, num_column_is_classified = self.run_enhancement() File "/home/mylogin/notebooks/eynollah/qurator/eynollah/eynollah.py", line 1990, in run_enhancement is_image_enhanced, img_org, img_res, num_col_classifier, num_column_is_classified, img_bin = self.resize_and_enhance_image_with_column_classifier() File "/home/mylogin/notebooks/eynollah/qurator/eynollah/eynollah.py", line 408, in resize_and_enhance_image_with_column_classifier _, page_coord = self.early_page_for_num_of_column_classification(img_bin) File "/home/mylogin/notebooks/eynollah/qurator/eynollah/eynollah.py", line 648, in early_page_for_num_of_column_classification model_page, session_page = self.start_new_session_and_model(self.model_page_dir) File "/home/mylogin/notebooks/eynollah/qurator/eynollah/eynollah.py", line 518, in start_new_session_and_model model = load_model(model_dir, compile=False) File "/anaconda/envs/eynollah/lib/python3.8/site-packages/keras/utils/traceback_utils.py", line 70, in error_handler raise e.with_traceback(filtered_tb) from None File "/anaconda/envs/eynollah/lib/python3.8/site-packages/keras/utils/generic_utils.py", line 103, in func_load code = marshal.loads(raw_code) ValueError: bad marshal data (unknown type code)

    opened by YanZhangADS 0
  • performance with high-res images

    performance with high-res images

    Sometimes the input comes with DPI 600 or beyond. It seems to me this makes eynollah become much slower. Larger resolution might be needed for newspapers, but there is always a point at which result quality does increase. I would assume that a single downscaling interpolation after import should not be too costly.

    The documentation of allow_scaling says that it would also scale down images. But the implementation does not look like that's the case:

    https://github.com/qurator-spk/eynollah/blob/8d5079c909b662eda0b4acf5ae2908455f0ff939/qurator/eynollah/eynollah.py#L437-L444

    IIUC only too small images get upsampled. I'd expect a secondary DPI_THRESHOLD2 at which downsampling would begin.

    opened by bertsky 2
  • What is the known working GPU config?

    What is the known working GPU config?

    I am using an Amazon pressed Ubuntu 16 Deep Learning AMI which contains CUDA 10, 10.1, 10.2, and 11.

    I am using Mambaforge with Python 3.6 or 3.7

    Tensorflow 2 is automatically used. I plan to try Tensorflow 1.x next.

    The process is loaded into GPU memory, but the GPU is never used.

    Is there a known working full stack config for eynollah on the GPU (OS+version, CUDA+version, Python+version, Tensorflow+version, etc) that you don't mind sharing?

    Thanks,

    opened by centerofexcellence 1
  • No segmentation results for specific image - (due to detecting 6 columns when there is only 1?)

    No segmentation results for specific image - (due to detecting 6 columns when there is only 1?)

    Hi I have run over 2000 images through eynollah as a OCR-D processor, but only 1 gave me this problem. There was no error detected, but the mets.xml file has no segmentation results. The only thing I know is that the image was detected as having 6 columns when there is only 1 actual column. The data for this case is below. Thanks in advance!

    Image processed attached...

    bqwndyazflxxtnmszruzzmlyofrxvtzc_s089_1561992285623

    OCR-D eynollah command and output to console...

    (qurator) D:\qurator>ocrd-eynollah-segment -I OCR-D-IMG -O OCR-D-IMG-SEG -P models eynollah/models_eynollah -P dpi 360 -P allow_scaling true
    10:32:03.361 INFO eynollah - INPUT FILE P_00738 (1/1)
    
    10:32:03.809 INFO eynollah - Resizing and enhancing image...
    10:32:03.809 INFO eynollah - Detected 360 DPI
    1/1 [==============================] - 3s 3s/step
    1/1 [==============================] - 1s 646ms/step
    10:32:11.003 INFO eynollah - Found 6 columns ([[0.09252959 0.01236904 0.0052445  0.05544147 0.01741231 0.8170032 ]])
    10:32:11.003 INFO eynollah - Image was not enhanced.
    1/1 [==============================] - 1s 732ms/step
    1/1 [==============================] - 1s 628ms/step
    10:32:15.064 INFO eynollah - Found 6 columns ([[0.09252959 0.01236904 0.0052445  0.05544147 0.01741231 0.8170032 ]])
    1/1 [==============================] - 1s 833ms/step
    1/1 [==============================] - 0s 16ms/step
    1/1 [==============================] - 0s 22ms/step
    
    ...
    NOTE: many similar lines
    ...
    
    1/1 [==============================] - 0s 24ms/step
    1/1 [==============================] - 0s 24ms/step
    1/1 [==============================] - 0s 24ms/step
    1/1 [==============================] - 0s 28ms/step
    10:34:32.242 INFO eynollah - Textregion detection took 114.8s
    1/1 [==============================] - 1s 733ms/step
    10:34:35.931 INFO eynollah - Graphics detection took 3.7s
    1/1 [==============================] - 1s 769ms/step
    1/1 [==============================] - 0s 16ms/step
    1/1 [==============================] - 0s 31ms/step
    
    ...
    NOTE: many similar lines
    ...
    
    1/1 [==============================] - 0s 16ms/step
    1/1 [==============================] - 0s 16ms/step
    1/1 [==============================] - 0s 31ms/step
    1/1 [==============================] - 0s 22ms/step
    10:34:59.506 INFO eynollah - textline detection took 23.6s
    10:36:15.179 INFO eynollah - slope_deskew: -90.0
    10:36:15.179 INFO eynollah - deskewing took 75.7s
    10:36:15.332 INFO eynollah - detection of marginals took 0.2s
    1/1 [==============================] - 2s 2s/step
    1/1 [==============================] - 0s 16ms/step
    1/1 [==============================] - 0s 16ms/step
    
    ...
    NOTE: many similar lines
    ...
    
    1/1 [==============================] - 0s 22ms/step
    1/1 [==============================] - 0s 31ms/step
    1/1 [==============================] - 1s 764ms/step
    10:39:20.846 INFO eynollah - Job done in 437.0s
    10:39:21.100 INFO ocrd.process.profile - Executing processor 'ocrd-eynollah-segment' took 437.728415s (wall) 813.187500s (CPU)( [--input-file-grp='OCR-D-IMG' --output-file-grp='OCR-D-IMG-SEG' --parameter='{"models": "eynollah/models_eynollah", "dpi": 360, "allow_scaling": true, "full_layout": true, "curved_line": false, "headers_off": false}' --page-id='']
    10:39:21.100 INFO ocrd.workspace.save_mets - Saving mets 'D:\qurator\mets.xml'
    

    and below is the contents of the mets.xml file...

    <?xml version="1.0" encoding="UTF-8"?>
    <mets:mets xmlns:mets="http://www.loc.gov/METS/" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="info:lc/xmlns/premis-v2 http://www.loc.gov/standards/premis/v2/premis-v2-0.xsd http://www.loc.gov/mods/v3 http://www.loc.gov/standards/mods/v3/mods-3-6.xsd http://www.loc.gov/METS/ http://www.loc.gov/standards/mets/mets.xsd http://www.loc.gov/mix/v10 http://www.loc.gov/standards/mix/mix10/mix10.xsd">
      <mets:metsHdr CREATEDATE="2022-07-22T10:29:16.958375">
        <mets:agent TYPE="OTHER" OTHERTYPE="SOFTWARE" ROLE="CREATOR">
          <mets:name>ocrd/core v2.34.0</mets:name>
        </mets:agent>
        <mets:agent TYPE="OTHER" OTHERTYPE="SOFTWARE" ROLE="OTHER" OTHERROLE="layout/segmentation/region">
          <mets:name>ocrd-eynollah-segment v0.0.11</mets:name>
          <mets:note xmlns:ocrd="https://ocr-d.de" ocrd:option="input-file-grp">OCR-D-IMG</mets:note>
          <mets:note xmlns:ocrd="https://ocr-d.de" ocrd:option="output-file-grp">OCR-D-IMG-SEG</mets:note>
          <mets:note xmlns:ocrd="https://ocr-d.de" ocrd:option="parameter">{"models": "eynollah/models_eynollah", "dpi": 360, "allow_scaling": true, "full_layout": true, "curved_line": false, "headers_off": false}</mets:note>
          <mets:note xmlns:ocrd="https://ocr-d.de" ocrd:option="page-id"/>
        </mets:agent>
      </mets:metsHdr>
      <mets:dmdSec ID="DMDLOG_0001">
        <mets:mdWrap MDTYPE="MODS">
          <mets:xmlData>
            <mods:mods xmlns:mods="http://www.loc.gov/mods/v3">
              <mods:identifier type="purl">'test'</mods:identifier>
            </mods:mods>
          </mets:xmlData>
        </mets:mdWrap>
      </mets:dmdSec>
      <mets:amdSec ID="AMD">
        </mets:amdSec>
      <mets:fileSec>
        <mets:fileGrp USE="OCR-D-IMG">
          <mets:file ID="OCR-D-IMG_00738" MIMETYPE="image/png">
            <mets:FLocat LOCTYPE="OTHER" OTHERLOCTYPE="FILE" xlink:href="OCR-D-IMG\bqwndyazflxxtnmszruzzmlyofrxvtzc_s089_1561992285623.png"/>
          </mets:file>
        </mets:fileGrp>
        <mets:fileGrp USE="OCR-D-IMG-SEG">
          <mets:file ID="OCR-D-IMG-SEG_00738" MIMETYPE="application/vnd.prima.page+xml">
            <mets:FLocat LOCTYPE="OTHER" OTHERLOCTYPE="FILE" xlink:href="OCR-D-IMG-SEG\OCR-D-IMG-SEG_00738.xml"/>
          </mets:file>
        </mets:fileGrp>
      </mets:fileSec>
      <mets:structMap TYPE="PHYSICAL">
        <mets:div TYPE="physSequence">
          <mets:div TYPE="page" ID="P_00738">
            <mets:fptr FILEID="OCR-D-IMG_00738"/>
            <mets:fptr FILEID="OCR-D-IMG-SEG_00738"/>
          </mets:div>
        </mets:div>
      </mets:structMap>
    </mets:mets>
    
    opened by sjscotti 3
  • Flag for OCR-D processor to periodically save mets.xml file (a suggestion)

    Flag for OCR-D processor to periodically save mets.xml file (a suggestion)

    Hi I seem to be sporadically crashing eynollah on one of a large number of images when running it as an OCR-D processor. This may happen after a large number of images were processed - which takes many hours to run. Because eynollah currently updates the mets.xml file with the segmentation files created only when the processor completes, all the results from that run are missing from the mets.xml file so an OCR cannot be performed on the successful segmentations. The two alternatives seem to be: 1) debug why eynollah is crashing (or eliminate the image causing the crash) and rerun all the images again, or 2) edit the mets.xml by hand to include the info for the successful segmentations that were done before the crash. Is there another approach that can be used if this case occurs? If not, how about including a flag in the OCR-D processor so that it periodically updates the mets.xml file with the info from the successful segmentations. Thanks!

    opened by sjscotti 1
Releases(v0.0.10)
Owner
QURATOR-SPK
Curation Technologies
QURATOR-SPK
Text page dewarping using a "cubic sheet" model

page_dewarp Page dewarping and thresholding using a "cubic sheet" model - see full writeup at https://mzucker.github.io/2016/08/15/page-dewarping.html

Matt Zucker 1.2k Dec 29, 2022
The open source extract transaction infomation by using OCR.

Transaction OCR Mã nguồn trích xuất thông tin transaction từ file scaned pdf, ở đây tôi lựa chọn tài liệu sao kê công khai của Thuy Tien. Mã nguồn có

Nguyen Xuan Hung 18 Jun 02, 2022
Convolutional Recurrent Neural Networks(CRNN) for Scene Text Recognition

CRNN_Tensorflow This is a TensorFlow implementation of a Deep Neural Network for scene text recognition. It is mainly based on the paper "An End-to-En

MaybeShewill-CV 1000 Dec 27, 2022
CellProfiler is a open-source application for biological image analysis

CellProfiler is a free open-source software designed to enable biologists without training in computer vision or programming to quantitatively measure phenotypes from thousands of images automaticall

CellProfiler 732 Dec 23, 2022
A little but useful tool to explore OCR data extracted with `pytesseract` and `opencv`

Screenshot OCR Tool Extracting data from screen time screenshots in iOS and Android. We are exploring 3 options: Simple OCR with no text position usin

Gabriele Marini 1 Dec 07, 2021
Deep learning based page layout analysis

Deep Learning Based Page Layout Analyze This is a Python implementaion of page layout analyze tool. The goal of page layout analyze is to segment page

186 Dec 29, 2022
A python scripts that uses 3 different feature extraction methods such as SIFT, SURF and ORB to find a book in a video clip and project trailer of a movie based on that book, on to it.

A python scripts that uses 3 different feature extraction methods such as SIFT, SURF and ORB to find a book in a video clip and project trailer of a movie based on that book, on to it.

tooraj taraz 3 Feb 10, 2022
Machine Leaning applied to denoise images to improve OCR Accuracy

Machine Learning to Denoise Images for Better OCR Accuracy This project is an adaptation of this tutorial and used only for learning purposes: https:/

Antonio Bri Pérez 2 Nov 16, 2022
Thresholding-and-masking-using-OpenCV - Image Thresholding is used for image segmentation

Image Thresholding is used for image segmentation. From a grayscale image, thresholding can be used to create binary images. In thresholding we pick a threshold T.

Grace Ugochi Nneji 3 Feb 15, 2022
Genalog is an open source, cross-platform python package allowing generation of synthetic document images with custom degradations and text alignment capabilities.

Genalog is an open source, cross-platform python package allowing generation of synthetic document images with custom degradations and text alignment capabilities.

Microsoft 235 Dec 22, 2022
Course material for the Multi-agents and computer graphics course

TC2008B Course material for the Multi-agents and computer graphics course. Setup instructions Strongly recommend using a custom conda environment. Ins

16 Dec 13, 2022
Some bits of javascript to transcribe scanned pages using PageXML

nashi (nasḫī) Some bits of javascript to transcribe scanned pages using PageXML. Both ltr and rtl languages are supported. Try it! But wait, there's m

Andreas Büttner 15 Nov 09, 2022
Tesseract Open Source OCR Engine (main repository)

Tesseract OCR About This package contains an OCR engine - libtesseract and a command line program - tesseract. Tesseract 4 adds a new neural net (LSTM

48.4k Jan 09, 2023
DouZero is a reinforcement learning framework for DouDizhu - 斗地主AI

[ICML 2021] DouZero: Mastering DouDizhu with Self-Play Deep Reinforcement Learning | 斗地主AI

Kwai 3.1k Jan 05, 2023
This is a passport scanning web service to help you scan, identify and validate your passport created with a simple and flexible design and ready to be integrated right into your system!

Passport-Recogniton-System This is a passport scanning web service to help you scan, identify and validate your passport created with a simple and fle

Mo'men Ashraf Muhamed 7 Jan 04, 2023
A bot that extract text from images using the Tesseract OCR.

Text from image (OCR) @ocr_text_bot A simple bot to extract text from images. Usage What do I need? A AWS key configured locally, see here. NodeJS. I

Weverton Marques 4 Aug 06, 2021
Image Smoothing and Blurring Using OpenCV

Image-Smoothing-and-Blurring-Using-OpenCV This repository contains codes for performing image smoothing and blurring using OpenCV. There are different

Happy N. Monday 3 Feb 15, 2022
OCR system for Arabic language that converts images of typed text to machine-encoded text.

Arabic OCR OCR system for Arabic language that converts images of typed text to machine-encoded text. The system currently supports only letters (29 l

Hussein Youssef 144 Jan 05, 2023
Python Computer Vision application that allows users to draw/erase on the screen using their webcam.

CV-Virtual-WhiteBoard The Virtual WhiteBoard is a project I made using the OpenCV and Mediapipe Python libraries. Using your index and middle finger y

Stephen Wang 1 Jan 07, 2022
Official implementation of "An Image is Worth 16x16 Words, What is a Video Worth?" (2021 paper)

An Image is Worth 16x16 Words, What is a Video Worth? paper Official PyTorch Implementation Gilad Sharir, Asaf Noy, Lihi Zelnik-Manor DAMO Academy, Al

213 Nov 12, 2022