docTR by Mindee (Document Text Recognition) - a seamless, high-performing & accessible library for OCR-related tasks powered by Deep Learning.

Overview

License Build Status codecov CodeFactor Codacy Badge Doc Status Pypi

Optical Character Recognition made seamless & accessible to anyone, powered by TensorFlow 2 (PyTorch in beta)

What you can expect from this repository:

  • efficient ways to parse textual information (localize and identify each word) from your documents
  • guidance on how to integrate this in your current architecture

OCR_example

Quick Tour

Getting your pretrained model

End-to-End OCR is achieved in DocTR using a two-stage approach: text detection (localizing words), then text recognition (identify all characters in the word). As such, you can select the architecture used for text detection, and the one for text recognition from the list of available implementations.

from doctr.models import ocr_predictor

model = ocr_predictor(det_arch='db_resnet50', reco_arch='crnn_vgg16_bn', pretrained=True)

Reading files

Documents can be interpreted from PDF or images:

from doctr.io import DocumentFile
# PDF
pdf_doc = DocumentFile.from_pdf("path/to/your/doc.pdf").as_images()
# Image
single_img_doc = DocumentFile.from_images("path/to/your/img.jpg")
# Webpage
webpage_doc = DocumentFile.from_url("https://www.yoursite.com").as_images()
# Multiple page images
multi_img_doc = DocumentFile.from_images(["path/to/page1.jpg", "path/to/page2.jpg"])

Putting it together

Let's use the default pretrained model for an example:

from doctr.io import DocumentFile
from doctr.models import ocr_predictor

model = ocr_predictor(pretrained=True)
# PDF
doc = DocumentFile.from_pdf("path/to/your/doc.pdf").as_images()
# Analyze
result = model(doc)

To make sense of your model's predictions, you can visualize them interactively as follows:

result.show(doc)

Visualization sample

Or even rebuild the original document from its predictions:

import matplotlib.pyplot as plt

plt.imshow(result.synthesize()); plt.axis('off'); plt.show()

Synthesis sample

The ocr_predictor returns a Document object with a nested structure (with Page, Block, Line, Word, Artefact). To get a better understanding of our document model, check our documentation:

You can also export them as a nested dict, more appropriate for JSON format:

json_output = result.export()

For examples & further details about the export format, please refer to this section of the documentation

Installation

Prerequisites

Python 3.6 (or higher) and pip are required to install DocTR. Additionally, you will need to install at least one of TensorFlow or PyTorch.

Since we use weasyprint, you will need extra dependencies if you are not running Linux.

For MacOS users, you can install them as follows:

brew install cairo pango gdk-pixbuf libffi

For Windows users, those dependencies are included in GTK. You can find the latest installer over here.

Latest release

You can then install the latest release of the package using pypi as follows:

pip install python-doctr

We try to keep framework-specific dependencies to a minimum. But if you encounter missing ones, you can install framework-specific builds as follows:

# for TensorFlow
pip install python-doctr[tf]
# for PyTorch
pip install python-doctr[torch]

Developer mode

Alternatively, you can install it from source, which will require you to install Git. First clone the project repository:

git clone https://github.com/mindee/doctr.git
pip install -e doctr/.

Again, if you prefer to avoid the risk of missing dependencies, you can install the TensorFlow or the PyTorch build:

# for TensorFlow
pip install -e doctr/.[tf]
# for PyTorch
pip install -e doctr/.[torch]

Models architectures

Credits where it's due: this repository is implementing, among others, architectures from published research papers.

Text Detection

Text Recognition

More goodies

Documentation

The full package documentation is available here for detailed specifications.

Demo app

A minimal demo app is provided for you to play with the text detection model!

You will need an extra dependency (Streamlit) for the app to run:

pip install -r demo/requirements.txt

You can then easily run your app in your default browser by running:

streamlit run demo/app.py

Demo app

Docker container

If you are to deploy containerized environments, you can use the provided Dockerfile to build a docker image:

docker build . -t <YOUR_IMAGE_TAG>

Example script

An example script is provided for a simple documentation analysis of a PDF or image file:

python scripts/analyze.py path/to/your/doc.pdf

All script arguments can be checked using python scripts/analyze.py --help

Minimal API integration

Looking to integrate DocTR into your API? Here is a template to get you started with a fully working API using the wonderful FastAPI framework.

Deploy your API locally

Specific dependencies are required to run the API template, which you can install as follows:

pip install -r api/requirements.txt

You can now run your API locally:

uvicorn --reload --workers 1 --host 0.0.0.0 --port=8002 --app-dir api/ app.main:app

Alternatively, you can run the same server on a docker container if you prefer using:

PORT=8002 docker-compose up -d --build

What you have deployed

Your API should now be running locally on your port 8002. Access your automatically-built documentation at http://localhost:8002/redoc and enjoy your three functional routes ("/detection", "/recognition", "/ocr"). Here is an example with Python to send a request to the OCR route:

import requests
import io
with open('/path/to/your/doc.jpg', 'rb') as f:
    data = f.read()
response = requests.post("http://localhost:8002/ocr", files={'file': io.BytesIO(data)}).json()

Citation

If you wish to cite this project, feel free to use this BibTeX reference:

@misc{doctr2021,
    title={DocTR: Document Text Recognition},
    author={Mindee},
    year={2021},
    publisher = {GitHub},
    howpublished = {\url{https://github.com/mindee/doctr}}
}

Contributing

If you scrolled down to this section, you most likely appreciate open source. Do you feel like extending the range of our supported characters? Or perhaps submitting a paper implementation? Or contributing in any other way?

You're in luck, we compiled a short guide (cf. CONTRIBUTING) for you to easily do so!

License

Distributed under the Apache 2.0 License. See LICENSE for more information.

Comments
  • Adding hocr output to generate Pdf/A

    Adding hocr output to generate Pdf/A

    Hi, adding the option to get the output in hocr format or directly as pdf/A (searchable text layer) would be great :) Any chance to receive this currently ? Best regards

    type: enhancement 
    opened by felixdittrich92 36
  • cannot load library 'pango-1.0': error 0x7e.  Additionally, ctypes.util.find_library() did not manage to locate a library called 'pango-1.0'

    cannot load library 'pango-1.0': error 0x7e. Additionally, ctypes.util.find_library() did not manage to locate a library called 'pango-1.0'

    Bug description

    I installed the library and the GTK project, but it's still throwing an error like

    cannot load library 'pango-1.0': error 0x7e. Additionally, ctypes.util.find_library() did not manage to locate a library called 'pango-1.0'

    Can anyone let me know the solution?

    I'm using python 3.7.x version

    Code snippet to reproduce the bug

    from doctr.models import ocr_predictor

    Error traceback

    ~\AppData\Roaming\Python\Python37\site-packages\cffi\api.py in _make_ffi_library(ffi, libname, flags)
        830 def _make_ffi_library(ffi, libname, flags):
        831     backend = ffi._backend
    --> 832     backendlib = _load_backend_lib(backend, libname, flags)
        833     #
        834     def accessor_function(name):
    
    ~\AppData\Roaming\Python\Python37\site-packages\cffi\api.py in _load_backend_lib(backend, name, flags)
        825         if first_error is not None:
        826             msg = "%s.  Additionally, %s" % (first_error, msg)
    --> 827         raise OSError(msg)
        828     return backend.load_library(path, flags)
        829 
    
    OSError: cannot load library 'pango-1.0': error 0x7e.  Additionally, ctypes.util.find_library() did not manage to locate a library called 'pango-1.0'
    

    Environment

    I installed the library and when trying to run the code, it's throwing an error

    type: bug topic: build awaiting response os: windows 
    opened by nithinreddyy 32
  • [datasets] Extend the range of public datasets supported in docTR

    [datasets] Extend the range of public datasets supported in docTR

    Currently, we support FUNSD, CORD and SROIE but we should look at extending the range of supported datasets. Among others, we could include handwritten, and in-the-wild situations.

    Here is a list of datasets you can usually find in OCR-related benchmarks:

    • [x] IIIT-5k (https://cvit.iiit.ac.in/research/projects/cvit-projects/the-iiit-5k-word-dataset) #589
    • [x] SVT (http://vision.ucsd.edu/~kai/svt/) #597 #620
    • [x] IC03 (http://www.iapr-tc11.org/mediawiki/index.php?title=ICDAR_2003_Robust_Reading_Competitions) #653
    • [x] IC13 (http://dagdata.cvc.uab.es/icdar2013competition/?ch=2&com=downloads) #662
    • [x] SVHN (http://ufldl.stanford.edu/housenumbers/) #634
    • [x] SynthText (https://github.com/ankush-me/SynthText) #624
    • [x] IMGUR5K (https://github.com/facebookresearch/IMGUR5K-Handwriting-Dataset) #785

    Of course, the list goes on

    type: enhancement help wanted module: datasets 
    opened by fg-mindee 23
  • [documents] Benchmark PDF document reading + numpy conversion options

    [documents] Benchmark PDF document reading + numpy conversion options

    Currently, the core reading of PDF document is made with PyMuPDF. This needs to be benchmarked against alternatives to ensure we use the optimal backend here.

    type: enhancement module: io benchmark 
    opened by fg-mindee 23
  • [Fix] PyTorch SAR_Resnet31 implementation

    [Fix] PyTorch SAR_Resnet31 implementation

    This PR:

    • fix for AttentionModule

    • fix SARDecoder forward step

    • [x] test with a full run (will do this on thursday) or maybe yon can trigger a run with your internal dataset @charlesmindee ?

    Any feedback is welcome :hugs: @frgfm let me know what you think :+1:

    BTW the url in conftest mock_text_box_stream was failing again can we upload an image for it to the repo please :sweat_smile:

    Issue: #802

    module: models framework: pytorch topic: text recognition 
    opened by felixdittrich92 20
  • Require detailed explanation on few points

    Require detailed explanation on few points

    Hello fg-midee & charlesmindee, The End to end ocr named doctr developed by you is fantastic.It is very easy to use and have very good results.Currently i am working on ocr related project.I had implemented doctr on sample images and have received good results.However I had few question which i list below and would be grateful for receiving explanatioins on them. Questions: 1)Which dbresnet50 model are you using?pretrained on synthtext dataset or tested on real world dataset as mentioned in the paper? 2)How can we fine tune the model? 3)Is their anyway we can get output after detection without postprocessing? 4)how can we improve accuracy of detection? 5)when would your private dataset be available? 6)How much training data we need to get good results on our dataset?(dataset type would be forms,invoices,receipts etc) 7)Also you have mentioned that to train the model Each JSON file must contains 3 lists of boxes.Why 3 boxes are needed for single image?

    question 
    opened by vishveshtrivedi 18
  • Recognition Training with Tensorflow

    Recognition Training with Tensorflow

    Bug description

    I want to use docTR for training of Recognition of Turkish language data. I have created a dataset as specified in documentation. I have also added the vocab for Turkish in vocabs.py. Turkish language consists of VOCAB['English'] + some turkish chars. For example: i, ö, ü, İ. Therefore, there are some Turkish special characters in the train images and json file in the dataset. When I start the training with these dataset, I get an error like the one below. As can be seen in the Target section, the entered Turkish characters are not read as in the json file. For example, capital i in "TARİH: 24.08.2021" and "NAKİT".

    And this is not just a situation specific to Turkish. When I start a train for Portuguese, it similarly does not accept the dataset containing values ​​that match the characters in the VOCAB file.

    image

    Can someone explain why I am getting this error and what has to be changed? How can I train the dataset containing some special characters?

    Code snippet to reproduce the bug

    Here is my training script

    python references/recognition/train_tensorflow.py crnn_vgg16_bn --train_path references/doctr-train --val_path references/doctr-valid --epochs 5 --vocab turkish
    

    Error traceback

    Namespace(arch='crnn_vgg16_bn', train_path='references/doctr-train', val_path='references/doctr-valid', train_samples=1000, val_samples=20, font='FreeMono.ttf,FreeSans.ttf,FreeSerif.ttf', min_chars=1, max_chars=12, name=None, epochs=5, batch_size=64, input_size=32, lr=0.001, workers=None, resume=None, vocab='turkish', test_only=False, show_samples=False, wb=False, push_to_hub=False, pretrained=False, amp=False, find_lr=False)
    b'{\r\n    "1.png": "KDV",\r\n    "2.png": "*400,00",\r\n    "3.png": "TAR\xc4\xb0H:24.08.2021",\r\n    "4.png": "NAK\xc4\xb0T",\r\n    "5.png": "TOPLAM",\r\n    "6.png": "Bakiye:"\r\n}'
    Validation set loaded in 0.001001s (6 samples in 1 batches)
    Train set loaded in 0.0009604s (6 samples in 0 batches)
    TARÄ°H:24.08.2021 //for debug, it prints file content
    0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!"#$%&'()*+,-./:;<=>?@[\]^_`{|}~iİöÖÇçÜü // for debug it prints my turkish vocab
    Traceback (most recent call last):
      File "C:\Users\x\Desktop\training-DOCTR\doctr\doctr\references\recognition\train_tensorflow.py", line 404, in <module>
        main(args)
      File "C:\Users\x\Desktop\training-DOCTR\doctr\doctr\references\recognition\train_tensorflow.py", line 330, in main
        val_loss, exact_match, partial_match = evaluate(model, val_loader, batch_transforms, val_metric)
      File "C:\Users\x\Desktop\training-DOCTR\doctr\doctr\references\recognition\train_tensorflow.py", line 115, in evaluate
        out = model(images, targets, return_preds=True, training=False)
      File "C:\Users\x\Desktop\training-DOCTR\doctr\doctr\trainvenv\lib\site-packages\keras\utils\traceback_utils.py", line 67, in error_handler
        raise e.with_traceback(filtered_tb) from None
      File "c:\users\x\desktop\training-doctr\doctr\doctr\doctr\models\recognition\crnn\tensorflow.py", line 229, in call
        out['loss'] = self.compute_loss(logits, target)
      File "c:\users\x\desktop\training-doctr\doctr\doctr\doctr\models\recognition\crnn\tensorflow.py", line 184, in compute_loss
        gt, seq_len = self.build_target(target)
      File "c:\users\x\desktop\training-doctr\doctr\doctr\doctr\models\recognition\core.py", line 35, in build_target
        encoded = encode_sequences(
      File "c:\users\x\desktop\training-doctr\doctr\doctr\doctr\datasets\utils.py", line 153, in encode_sequences
        for idx, seq in enumerate(map(partial(encode_string, vocab=vocab), sequences)):
      File "c:\users\x\desktop\training-doctr\doctr\doctr\doctr\datasets\utils.py", line 80, in encode_string
        raise ValueError("some characters cannot be found in 'vocab'")
    ValueError: Exception encountered when calling layer "crnn" (type CRNN).
    
    some characters cannot be found in 'vocab'
    
    Call arguments received by layer "crnn" (type CRNN):
      • x=tf.Tensor(shape=(6, 32, 128, 3), dtype=float32)
      • target=["'KDV'", "'*400,00'", "'TARÄ°H:24.08.2021'", "'NAKÄ°T'", "'TOPLAM'", "'Bakiye:'"]
      • return_model_output=False
      • return_preds=True
      • beam_width=1
      • top_paths=1
      • kwargs={'training': 'False'}
    

    Environment

    --

    Deep Learning backend

    is_tf_available: True is_torch_available: False

    type: bug module: datasets ext: references framework: tensorflow awaiting response 
    opened by zahidetastan 17
  • [models] Ensure all PyTorch models are ONNX exportable

    [models] Ensure all PyTorch models are ONNX exportable

    Most users of the library are more interested in existing pretrained models to use for inference rather than training. For this reason, it's important to ensure we can easily export those trained models.

    • General

      • [x] Add a unittest/CI job to check that all models can be exported with ONNX (#976, #979, #980 )
    • Classification

      • [x] VGG (#830)
      • [x] ResNet (#830)
      • [x] MobileNet V3 (#830)
      • [x] MAGC ResNet (#830)
    • Text Detection

      • [x] LinkNet #979
      • [x] DBNet #979
    • Text Recognition

      • [x] CRNN #980
      • [x] SAR #980
      • [x] MASTER #980 (#986)

      NOTE: master is exportable but in fact of an onnxruntime internal issue (https://github.com/microsoft/onnxruntime/issues/10994) the loading fails

    critical module: models framework: pytorch topic: onnx 
    opened by fg-mindee 16
  • Feat: Make detection training and inference Multiclass

    Feat: Make detection training and inference Multiclass

    Hello, this PR is to make the detection part of DocTR multiclass.

    :warning: This PR has a breaking change on the use of DocTR. The attribute blocks of class Page is no longer a list of blocks but a dictionnary with keys being the class names.

    • [x] Training detection model tf/torch working with new data format to include class and also old data format
    • [x] Inference with ocr_predictor working tf/torch
    • [x] visulization and export
    • [x] middleware and hf api not changed should work without changes
    • [x] minimal API: some changes in the docker file and doc to make it work

    A first review of the code was done on my fork. you can find it here: https://github.com/aminemindee/doctr/pull/4

    Any feedback is welcome!

    topic: documentation type: breaking change module: datasets ext: demo ext: references framework: pytorch framework: tensorflow topic: text detection type: new feature 
    opened by aminemindee 15
  • refactor: Switched from PyMuPDF to pypdfium2

    refactor: Switched from PyMuPDF to pypdfium2

    This PR introduces the following modifications:

    • switched PDF backend from PyMuPDF to pypdfium2
    • deprecated annotation reading (not supported by pypdfium2 for now)
    • updates unittests & documentation
    • only uses PyMuPDF for PDF mocking in the unittests (does impact license for users)

    Closes #486

    Any feedback is welcome!

    topic: documentation critical module: io ext: tests type: breaking change type: misc ext: docs 
    opened by fg-mindee 15
  • Rotate page

    Rotate page

    This is part of the solutions discussed in #225 and address the point "Integrate the feature in the predictor while being disabled by default" of the "Page-level orientation" part.

    It implements the function rotate_page and tracks the 'page_angles'

    module: models module: utils ext: tests type: new feature 
    opened by Rob192 15
  • [Bug] json encoding error on Windows where utf-8 is not the default

    [Bug] json encoding error on Windows where utf-8 is not the default

    Bug description

    I tried to train a Korean language recognition model with doctr/references/recognition/train_pytorch.py script, and I got encoding errors.

    first error is here. https://github.com/mindee/doctr/blob/e66ce01544ccd778b4145f6d9fab906f6133b969/doctr/datasets/recognition.py#L39-L40

    and second is here. https://github.com/mindee/doctr/blob/e66ce01544ccd778b4145f6d9fab906f6133b969/doctr/models/factory/hub.py#L86-L87

    I'm using Korean Windows 11, and the default encoding is 'cp949', so it is an error that could not read 'utf-8'.

    Code snippet to reproduce the bug

    # doctr/datasets/vocabs.py
    VOCABS["korean"] = VOCABS["english"] + "ᄀᄁᄂᄃᄄᄅᄆᄇᄈᄉᄊᄋᄌᄍᄎᄏᄐᄑᄒ" + "ᅡᅢᅣᅤᅥᅦᅧᅨᅩᅪᅫᅬᅭᅮᅯᅰᅱᅲᅳᅴᅵ" + "ᆨᆩᆪᆫᆬᆭᆮᆯᆰᆱᆲᆳᆴᆵᆶᆷᆸᆹᆺᆻᆼᆽᆾᆿᇀᇁᇂ"
    
    python train_pytorch.py  \
        vitstr_small  \
        --train_path train  \
        --val_path validation  \
        --name vitstr_small-korean  \
        --workers 8  \
        --vocab korean  \
        --wb  \
        --push-to-hub  \
        --amp
    

    It probably won't give an error on windows using utf-8.

    dataset: https://drive.google.com/file/d/1RN6pQAELWGYmwt1y6xnF6Xj0dO5RKU-Q/view?usp=share_link (344k images, 1GB)

    Error traceback

    Cloning https://huggingface.co/Bingsu/vitstr_small-korean into local empty directory.
    WARNING:huggingface_hub.repository:Cloning https://huggingface.co/Bingsu/vitstr_small-korean into local empty directory.
    Pulling changes ...
    WARNING:huggingface_hub.repository:Pulling changes ...
    Upload file pytorch_model.bin:  87%|███████████████████████████████████████▉      | 71.1M/81.9M [00:06<00:00, 14.8MB/s]remote: Scanning LFS files for validity, may be slow...
    remote: LFS file scan complete.
    To https://huggingface.co/Bingsu/vitstr_small-korean
       5aefb96..101a2bf  main -> main
    
    WARNING:huggingface_hub.repository:remote: Scanning LFS files for validity, may be slow...
    remote: LFS file scan complete.
    To https://huggingface.co/Bingsu/vitstr_small-korean
       5aefb96..101a2bf  main -> main
    
    Upload file pytorch_model.bin: 100%|██████████████████████████████████████████████| 81.9M/81.9M [00:09<00:00, 9.45MB/s]
    Traceback (most recent call last):
      File "C:\Users\smartmind\Desktop\workspace\test\train_ocr\train_pytorch.py", line 468, in <module>
        main(args)
      File "C:\Users\smartmind\Desktop\workspace\test\train_ocr\train_pytorch.py", line 405, in main
        push_to_hf_hub(model, exp_name, task="recognition", run_config=args)
      File "C:\Users\smartmind\miniconda3\envs\ocr\lib\site-packages\doctr\models\factory\hub.py", line 179, in push_to_hf_hub
        _save_model_and_config_for_hf_hub(model, repo.local_dir, arch=arch, task=task)
      File "C:\Users\smartmind\miniconda3\envs\ocr\lib\site-packages\doctr\models\factory\hub.py", line 87, in _save_model_and_config_for_hf_hub
        json.dump(model_config, f, indent=2, ensure_ascii=False)
      File "C:\Users\smartmind\miniconda3\envs\ocr\lib\json\__init__.py", line 180, in dump
        fp.write(chunk)
    UnicodeEncodeError: 'cp949' codec can't encode character '\xa3' in position 98: illegal multibyte sequence
    

    Environment

    DocTR version: N/A
    TensorFlow version: N/A
    PyTorch version: 1.13.1 (torchvision 0.14.1)
    OpenCV version: 4.7.0
    OS: Microsoft Windows 11 Pro
    Python version: 3.10.8
    Is CUDA available (TensorFlow): N/A
    Is CUDA available (PyTorch): Yes
    CUDA runtime version: 11.7.99
    GPU models and configuration: Could not collect
    Nvidia driver version: Could not collect
    cuDNN version: Could not collect
    

    Deep Learning backend

    is_tf_available: False
    is_torch_available: True
    
    type: bug 
    opened by Bing-su 0
  • Dockerfile fails on

    Dockerfile fails on "pip install" due to tensorflow-addons Lib issue

    Bug description

    tensorflow-addons releases have very limited versions, and the Dockerfile in the repo does not install as-is.

    To fix this....

    Replace "FROM python:3.8" to something like "FROM brunneis/python:3.8.3-ubuntu-20.04"

    Code snippet to reproduce the bug

    just build docker

    Error traceback

    Could not find a version that satisfies the requirement tensorFlow-addons

    Environment

    The TensorFlow library was compiled to use AVX instructions, but these aren't available on your machine. qemu: uncaught target signal 6 (Aborted) - core dumped Aborted

    Deep Learning backend

    Type "help", "copyright", "credits" or "license" for more information.

    from doctr.file_utils import is_tf_available, is_torch_available The TensorFlow library was compiled to use AVX instructions, but these aren't available on your machine. qemu: uncaught target signal 6 (Aborted) - core dumped Aborted

    type: bug 
    opened by clarkdong 0
  • chore(deps-dev): update shapely requirement from <2.0.0,>=1.6.0 to >=1.6.0,<3.0.0

    chore(deps-dev): update shapely requirement from <2.0.0,>=1.6.0 to >=1.6.0,<3.0.0

    Updates the requirements on shapely to permit the latest version.

    Release notes

    Sourced from shapely's releases.

    2.0.0

    Shapely version 2.0.0 is a major release featuring a complete refactor of the internals and new vectorized (element-wise) array operations providing considerable performance improvements.

    For a full changelog, see https://shapely.readthedocs.io/en/latest/release/2.x.html#version-2-0-0

    Relevant changes in behaviour compared to 2.0rc3:

    • Added temporary support for unpickling shapely<2.0 geometries.
    Changelog

    Sourced from shapely's changelog.

    2.0.0 (2022-12-12)

    Shapely version 2.0.0 is a major release featuring a complete refactor of the internals and new vectorized (element-wise) array operations providing considerable performance improvements.

    For a full changelog, see https://shapely.readthedocs.io/en/latest/release/2.x.html#version-2-0-0

    Relevant changes in behaviour compared to 2.0rc3:

    • Added temporary support for unpickling shapely<2.0 geometries.

    2.0rc1 (2022-11-26)

    Relevant changes in behaviour compared to 2.0b2:

    • The Point(..) constructor no longer accepts a sequence of coordinates consisting of more than one coordinate pair (previously, subsequent coordinates were ignored) (#1600).
    • Fix performance regression in the LineString() constructor when passing a numpy array of coordinates (#1602).

    Wheels for 2.0rc1 published on PyPI include GEOS 3.11.1.

    2.0b2 (2022-10-29)

    Relevant changes in behaviour compared to 2.0b1:

    • Fix for compatibility with PyPy (#1577).
    • Fix to the Point() constructor to accept arrays of length 1 for the x and y coordinates (fix compatibility with Shapely 1.8).
    • Raise ValueError for non-finite distance in the buffer() and offset_curve() methods on the Geometry classes (consistent with Shapely 1.8).

    2.0b1 (2022-10-17)

    Relevant changes in behaviour compared to 2.0a1:

    • Renamed the tolerance keyword to max_segment_length in the segmentize function.
    • Renamed the quadsegs keyword in the top-level buffer and offset_curve functions and the resolution keyword in the Geometry class buffer and offset_curve methods all to quad_segs.
    • Added use of GEOSGeom_getExtent to speed up bounds calculations for

    ... (truncated)

    Commits
    • 7c67808 RLS: 2.0.0
    • 31c81be DOC/RLS: update release notes for final 2.0.0 (#1659)
    • ced8260 COMPAT: Add compat for unpickling shapely<2 geometries (#1657)
    • 3a6f7b9 TST: skip intermittent remove_repeated_points failure for GEOS 3.11 (#1656)
    • 0a4435e DOC: Fix typo in from_ragged_array docstring (#1658)
    • 9ec5c16 RLS: 2.0rc3
    • 882bf37 DOC: Improve STRtree docstrings (#1626)
    • eba985c DOC: add note about prepare + touches to contains_xy/intersects_xy docstrings...
    • cc9250e DEV: update valgrind Dockerfile (#1649)
    • a4a65be Fix versioneer configuration (#1654)
    • Additional commits viewable in compare view

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    topic: build 
    opened by dependabot[bot] 1
  • Documentation for running on docker

    Documentation for running on docker

    🚀 The feature

    I turn everything into docker, my applications, but I found the documentation in docker very shallow. I created an image, but is it enough to run? On the guide page there is a command to upload a docker-compose, but there is no docker-compose.yml file as an example.

    Motivation, pitch

    If you need help I can provide something more complete, images in docker compiled in several architectures, but I noticed that the Dockerfile is very simple, it can be improved, already configuring a port, since you need one to be used, volume for persistence. I can create an example docker-compose.yml to use.

    But I need more information, how do I run a test using docker?

    Alternatives

    No response

    Additional context

    No response

    type: enhancement 
    opened by talesam 0
  • 16 bit precision support in predictors

    16 bit precision support in predictors

    🚀 The feature

    Currently, the various predictor classes such as OCRPredictor don't support 16 bit precision and it would be great if they did. The PyTorch flavor of predictors are just nn.Modules so you should be able to write

    model = ocr_predictor(det_arch='db_resnet50', reco_arch='crnn_vgg16_bn', pretrained=True).half()
    output = model(document_tensor.half())
    

    and have it work but it doesn't.

    Motivation, pitch

    Doing text detection and OCR on many documents is time consuming even with multiple high-end GPUs. 16 bit precision offers a nearly 2x speedup for free on supported GPUs. You can currently use 16 bit precision with the bare models but this is very cumbersome given the amount of pre- and post-processing involved in OCR.

    Alternatives

    No response

    Additional context

    No response

    type: enhancement 
    opened by Ryul0rd 0
Releases(v0.6.0)
  • v0.6.0(Sep 29, 2022)

    Highlights of the release:

    Note: doctr 0.6.0 requires either TensorFlow >= 2.9.0 or PyTorch >= 1.8.0.

    Full integration with Huggingface Hub (docTR meets Huggingface)

    hf

    • Loading from hub:
    from doctr.io import DocumentFile
    from doctr.models import ocr_predictor, from_hub
    image = DocumentFile.from_images(['data/example.jpg'])
    # Load a custom detection model from huggingface hub
    det_model = from_hub('Felix92/doctr-torch-db-mobilenet-v3-large')
    # Load a custom recognition model from huggingface hub
    reco_model = from_hub('Felix92/doctr-torch-crnn-mobilenet-v3-large-french')
    # You can easily plug in this models to the OCR predictor
    predictor = ocr_predictor(det_arch=det_model, reco_arch=reco_model)
    result = predictor(image)
    
    • Pushing to the hub:
    from doctr.models import recognition, login_to_hub, push_to_hf_hub
    login_to_hub()
    my_awesome_model = recognition.crnn_mobilenet_v3_large(pretrained=True)
    push_to_hf_hub(my_awesome_model, model_name='doctr-crnn-mobilenet-v3-large-french-v1', task='recognition', arch='crnn_mobilenet_v3_large')
    

    Documentation: https://mindee.github.io/doctr/using_doctr/sharing_models.html

    Predefined datasets can be used also for recognition task

    from doctr.datasets import CORD
    # Crop boxes as is (can contain irregular)
    train_set = CORD(train=True, download=True, recognition_task=True)
    # Crop rotated boxes (always regular)
    train_set = CORD(train=True, download=True, use_polygons=True, recognition_task=True)
    img, target = train_set[0]
    

    Documentation: https://mindee.github.io/doctr/using_doctr/using_datasets.html

    New models (both frameworks)

    • classification: VisionTransformer (ViT)
    • recognition: Vision Transformer for Scene Text Recognition (ViTSTR)

    Bug fixes recognition models

    • MASTER and SAR architectures are now operational in both frameworks (TensorFlow and PyTorch)

    ONNX support (experimential)

    • All models can now be exported into ONNX format (only TF mobilenet left for 0.7.0)

    NOTE: full production pipeline with ONNX / build is planned for 0.7.0 (the models can be only exported up to the logits without any post processing included)

    Further features

    • our demo is now also PyTorch compatible, thanks to @odulcy-mindee
    • it is now possible to detect the language of the extracted text, thanks to @aminemindee

    What's Changed

    Breaking Changes 🛠

    • feat: :sparkles: allow beam width > 1 in the CRNN postprocessor by @khalidMindee in https://github.com/mindee/doctr/pull/630
    • [Fix] TensorFlow SAR_Resnet31 implementation by @felixdittrich92 in https://github.com/mindee/doctr/pull/925

    New Features

    • [onnx] classification models export by @felixdittrich92 in https://github.com/mindee/doctr/pull/830
    • feat: Added Vietnamese entry in VOCAB by @calibretaliation in https://github.com/mindee/doctr/pull/878
    • feat: Added Czech to the set of vocabularies in datasets/vocabs.py by @Xargonus in https://github.com/mindee/doctr/pull/885
    • feat: Add ability to upload PT/TF models to Huggingface Hub by @felixdittrich92 in https://github.com/mindee/doctr/pull/881
    • [feature][tf/pt] integrate from_hub for all tasks by @felixdittrich92 in https://github.com/mindee/doctr/pull/892
    • [feature] Part 2 from use datasets for recognition by @felixdittrich92 in https://github.com/mindee/doctr/pull/891
    • [datasets] Add MJSynth (Synth90K) by @felixdittrich92 in https://github.com/mindee/doctr/pull/827
    • [docu]: add documentation for datasets by @felixdittrich92 in https://github.com/mindee/doctr/pull/905
    • add a Slack Community badge by @fharper in https://github.com/mindee/doctr/pull/936
    • Feat/add language detection by @aminemindee in https://github.com/mindee/doctr/pull/1023
    • add ViT as classification model TF and PT by @felixdittrich92 in https://github.com/mindee/doctr/pull/1050
    • [models] add ViTSTR TF and PT and update ViT to work as backbone by @felixdittrich92 in https://github.com/mindee/doctr/pull/1055

    Bug Fixes

    • [PyTorch][references] fix pretrained with different vocabs by @felixdittrich92 in https://github.com/mindee/doctr/pull/874
    • [classification] Fix cfgs by @felixdittrich92 in https://github.com/mindee/doctr/pull/883
    • docs: Fixed typo in installation instructions by @frgfm in https://github.com/mindee/doctr/pull/901
    • [Fix] imgur5k test by @felixdittrich92 in https://github.com/mindee/doctr/pull/903
    • fix: Fixed load_pretrained_params in PyTorch when ignoring keys by @frgfm in https://github.com/mindee/doctr/pull/902
    • [Fix]: Documentation add missing in vocabs and correct tab in sharing models by @felixdittrich92 in https://github.com/mindee/doctr/pull/904
    • Fix links in readme by @jsn5 in https://github.com/mindee/doctr/pull/937
    • [Fix] PyTorch MASTER implementation by @felixdittrich92 in https://github.com/mindee/doctr/pull/941
    • [Fix] MJSynth dataset: filter corrupted or missing images by @felixdittrich92 in https://github.com/mindee/doctr/pull/956
    • [Fix] SVT dataset: clip box values and add shape and label check by @felixdittrich92 in https://github.com/mindee/doctr/pull/955
    • [Fix] Tensorflow MASTER implementation by @felixdittrich92 in https://github.com/mindee/doctr/pull/949
    • [FIX] MASTER AMP and onnxruntime issue with master PT by @felixdittrich92 in https://github.com/mindee/doctr/pull/986
    • pytest-api test: fix ping server step by @odulcy-mindee in https://github.com/mindee/doctr/pull/997
    • docs/index: fix two minor typos by @mara004 in https://github.com/mindee/doctr/pull/1002
    • Fix orientation details export by @aminemindee in https://github.com/mindee/doctr/pull/1022
    • Changed return type of multithread_exec to iterator by @mtvch in https://github.com/mindee/doctr/pull/1019
    • [datasets] Fix recognition parts of SynthText and IMGUR5K by @felixdittrich92 in https://github.com/mindee/doctr/pull/1038
    • [Fix] rotation classifier input move to model device by @felixdittrich92 in https://github.com/mindee/doctr/pull/1039
    • [models] Vit: fix intermediate size scale and unify TF to PT by @felixdittrich92 in https://github.com/mindee/doctr/pull/1063

    Improvements

    • chore: Applied post release modifications v0.5.1 by @felixdittrich92 in https://github.com/mindee/doctr/pull/870
    • [refactor][fix]: Part1 from use datasets for recognition task by @felixdittrich92 in https://github.com/mindee/doctr/pull/889
    • ci: Add swagger ping in API CI job by @frgfm in https://github.com/mindee/doctr/pull/906
    • [docs] Add naming conventions for upload models to hf hub by @felixdittrich92 in https://github.com/mindee/doctr/pull/921
    • docs: Improved error message of encode_string by @frgfm in https://github.com/mindee/doctr/pull/929
    • [Refactor] PyTorch SAR_Resnet31 make it ONNX exportable (again) by @felixdittrich92 in https://github.com/mindee/doctr/pull/930
    • Add support page in README by @jonathanMindee in https://github.com/mindee/doctr/pull/946
    • [references] Add eval recognition and update eval detection scripts by @felixdittrich92 in https://github.com/mindee/doctr/pull/933
    • update pypdfium2 dep and improve code quality by @felixdittrich92 in https://github.com/mindee/doctr/pull/953
    • docs: Moved need help section after code snippet by @frgfm in https://github.com/mindee/doctr/pull/959
    • chore: Updated TF requirements to fix grouped convolutions on CPU by @frgfm in https://github.com/mindee/doctr/pull/963
    • style: Fixed mypy and moved tool configs to pyproject.toml by @frgfm in https://github.com/mindee/doctr/pull/966
    • Updating the readme by @Atomme1 in https://github.com/mindee/doctr/pull/938
    • Update docs in using_doctr by @odulcy-mindee in https://github.com/mindee/doctr/pull/993
    • feat: add a basic example of text detection by @ianardee in https://github.com/mindee/doctr/pull/999
    • Add pytorch demo by @odulcy-mindee in https://github.com/mindee/doctr/pull/1008
    • [build] move requirements to pyproject.toml by @felixdittrich92 in https://github.com/mindee/doctr/pull/1031
    • Migrate static data from github to monitoring middleware. by @marvinmindee in https://github.com/mindee/doctr/pull/1033
    • Changes needed to be able to use doctr on AWS Lambda by @mtvch in https://github.com/mindee/doctr/pull/1017
    • [Fix] unify recognition dataset parts return signature by @felixdittrich92 in https://github.com/mindee/doctr/pull/1041
    • Updated README.md for custom fonts by @carl-krikorian in https://github.com/mindee/doctr/pull/1051
    • [refactor] detection script by @felixdittrich92 in https://github.com/mindee/doctr/pull/1060
    • [models] ViT add checkpoints and some rework to use pretrained ViT backbone in ViTSTR by @felixdittrich92 in https://github.com/mindee/doctr/pull/1072
    • upgrade pypdfium2 by @felixdittrich92 in https://github.com/mindee/doctr/pull/1075
    • ViTSTR disable pretrained backbone by default by @felixdittrich92 in https://github.com/mindee/doctr/pull/1080

    Miscellaneous

    • [Refactor] commit tags by @felixdittrich92 in https://github.com/mindee/doctr/pull/871
    • Update io/pdf.py to new pypdfium2 API by @mara004 in https://github.com/mindee/doctr/pull/944
    • docs: Documentation the reason for keras version specifier by @frgfm in https://github.com/mindee/doctr/pull/958
    • [datasets] update IC / SROIE / FUNSD / CORD by @felixdittrich92 in https://github.com/mindee/doctr/pull/983
    • [datasets] revert whitespace filtering and fix svhn reco by @felixdittrich92 in https://github.com/mindee/doctr/pull/987
    • fix: update tensorflow-addons to match tensorflow version by @ianardee in https://github.com/mindee/doctr/pull/998
    • move transformers implementation to modules by @felixdittrich92 in https://github.com/mindee/doctr/pull/1013
    • [FIX] revert dev deps mistake by @felixdittrich92 in https://github.com/mindee/doctr/pull/1047
    • [models] update vit and transformer layer norm by @felixdittrich92 in https://github.com/mindee/doctr/pull/1059
    • make pretrained backbone flexible in predictor by @felixdittrich92 in https://github.com/mindee/doctr/pull/1061
    • handle LocalizationConfusion memory consuption and upgrade min weasyprint version by @felixdittrich92 in https://github.com/mindee/doctr/pull/1062
    • Fixed small typo in references recognition by @carl-krikorian in https://github.com/mindee/doctr/pull/1070
    • [docs] install extras for MacBooks with M1 chip by @felixdittrich92 in https://github.com/mindee/doctr/pull/1076
    • update version for minor release by @felixdittrich92 in https://github.com/mindee/doctr/pull/1073

    New Contributors

    • @calibretaliation made their first contribution in https://github.com/mindee/doctr/pull/878
    • @Xargonus made their first contribution in https://github.com/mindee/doctr/pull/885
    • @khalidMindee made their first contribution in https://github.com/mindee/doctr/pull/630
    • @frgfm made their first contribution in https://github.com/mindee/doctr/pull/901
    • @jsn5 made their first contribution in https://github.com/mindee/doctr/pull/937
    • @fharper made their first contribution in https://github.com/mindee/doctr/pull/936
    • @jonathanMindee made their first contribution in https://github.com/mindee/doctr/pull/946
    • @Atomme1 made their first contribution in https://github.com/mindee/doctr/pull/938
    • @odulcy-mindee made their first contribution in https://github.com/mindee/doctr/pull/993
    • @ianardee made their first contribution in https://github.com/mindee/doctr/pull/998
    • @aminemindee made their first contribution in https://github.com/mindee/doctr/pull/1022
    • @mtvch made their first contribution in https://github.com/mindee/doctr/pull/1019
    • @marvinmindee made their first contribution in https://github.com/mindee/doctr/pull/1033
    • @carl-krikorian made their first contribution in https://github.com/mindee/doctr/pull/1051

    Full Changelog: https://github.com/mindee/doctr/compare/v0.5.1...v0.6.0

    Source code(tar.gz)
    Source code(zip)
  • v0.5.1(Mar 22, 2022)

    This minor release includes: improvement of the documentation thanks to @felixdittrich92, bugs fixed, support of rotation extended to Tensorflow backend, a switch from PyMuPDF to pypdfmium2 and a nice integration to the Hugginface Hub thanks to @fg-mindee !

    Note: doctr 0.5.0 requires either TensorFlow 2.4.0 or PyTorch 1.8.0.

    Highlights

    Improvement of the documentation

    The documentation has been improved adding a new theme, illustrations, and docstring has been completed and developed. This how it renders:

    doc Capture d’écran de 2022-03-22 11-08-31

    Rotated text detection extended to Tensorflow backend

    We provide weights for the linknet_resnet18_rotation model which has been deeply modified: We implemented a new loss (based on Dice Loss and Focal Loss), we changed the computation of the targets so that polygons are shrunken the same way they are in the DBNet which improves highly the precision of the segmenter and we trained the model preserving the aspect ratio of the images. All these improvements led to much better results, and the pretrained model is now very robust.

    Preserving the aspect ratio in the detection task

    You can now choose to preserve the aspect ratio in the detection_predictor:

    >>> from doctr.models import detection_predictor
    >>> predictor = detection_predictor('db_resnet50_rotation', pretrained=True, assume_straight_pages=False, preserve_aspect_ratio=True)
    

    This option can also be activated in the high level end-to-end predictor:

    >>> from doctr.model import ocr_predictor
    >>> model = ocr_predictor('linknet_resnet18_rotation', pretrained=True, assume_straight_pages=False, preserve_aspect_ratio=True)
    

    Integration within the HugginFace Hub

    The artefact detection model is now available on the HugginFace Hub, this is amazing:

    Capture d’écran de 2022-03-22 11-33-14

    On DocTR, you can now use the .from_hub() method so that those 2 snippets are equivalent:

    # Pretrained
    from doctr.models.obj_detection import fasterrcnn_mobilenet_v3_large_fpn
    model = fasterrcnn_mobilenet_v3_large_fpn(pretrained=True)
    

    and:

    # HF Hub
    from doctr.models.obj_detection.factory import from_hub
    model = from_hub("mindee/fasterrcnn_mobilenet_v3_large_fpn")
    
    

    Breaking changes

    Replacing the PyMuPDF dependency with pypdfmium2 which is license compatible

    We replaced for the PyMuPDF dependency with pypdfmium2 for a license-compatibility issue, so we loose the word and objects extraction from source pdf which was done with PyMuPDF. It wasn't used in any models so it is not a big issue, but anyway we will work in the future to re-integrate such a feature.

    Full changelog

    What's Changed

    Breaking Changes 🛠

    • fix: polygon orientation + line aggregation by @charlesmindee in https://github.com/mindee/doctr/pull/801
    • refactor: Switched from PyMuPDF to pypdfium2 by @fg-mindee in https://github.com/mindee/doctr/pull/829

    New Features

    • feat: Added RandomHorizontalFLip in TF by @SiddhantBahuguna in https://github.com/mindee/doctr/pull/779
    • Imgur5k dataset integration by @felixdittrich92 in https://github.com/mindee/doctr/pull/785
    • feat: Added support of GPU for predictors in PyTorch by @fg-mindee in https://github.com/mindee/doctr/pull/808
    • Add SynthWordGenerator to text reco training scripts by @felixdittrich92 in https://github.com/mindee/doctr/pull/825
    • fix: Fixed some ResNet architecture imprecisions by @fg-mindee in https://github.com/mindee/doctr/pull/828
    • feat: Added shadow augmentation for all backends by @fg-mindee in https://github.com/mindee/doctr/pull/811
    • feat: Added loading method for PyTorch artefact detection models from HF Hub by @fg-mindee in https://github.com/mindee/doctr/pull/836
    • feat: add rotated linknet_resnet18 tensorflow ckpts by @charlesmindee in https://github.com/mindee/doctr/pull/817

    Bug Fixes

    • fix: Fixed rotation of img + target by @fg-mindee in https://github.com/mindee/doctr/pull/784
    • fix: show sample when batch size is 1 by @charlesmindee in https://github.com/mindee/doctr/pull/787
    • ci: Fixed PR label check job by @fg-mindee in https://github.com/mindee/doctr/pull/792
    • ci: Fixed typo in the script ref by @fg-mindee in https://github.com/mindee/doctr/pull/794
    • [datasets] fix description by @felixdittrich92 in https://github.com/mindee/doctr/pull/795
    • fix: linknet target computation by @charlesmindee in https://github.com/mindee/doctr/pull/803
    • ci: Fixed issue templates by @fg-mindee in https://github.com/mindee/doctr/pull/806
    • fix: Reverted mistake in demo by @fg-mindee in https://github.com/mindee/doctr/pull/810
    • Restore remap boxes by @Rob192 in https://github.com/mindee/doctr/pull/812
    • fix: Fixed SAR model for training and inference in PyTorch by @fg-mindee in https://github.com/mindee/doctr/pull/831
    • fix: Fixed expand_line for horizontal & vertical cases by @fg-mindee in https://github.com/mindee/doctr/pull/842
    • fix: Fixes inplace target modifications for AbstractDatasets by @fg-mindee in https://github.com/mindee/doctr/pull/848
    • fix: Fixed landing page and title underlines by @fg-mindee in https://github.com/mindee/doctr/pull/860
    • docs: Fixed HTML title by @fg-mindee in https://github.com/mindee/doctr/pull/864

    Improvements

    • docs: Updated headers of python files by @fg-mindee in https://github.com/mindee/doctr/pull/781
    • [datasets] unify np_dtype and fix comments by @felixdittrich92 in https://github.com/mindee/doctr/pull/782
    • fix: Clip in rotation transform + eval_straight mode for training by @charlesmindee in https://github.com/mindee/doctr/pull/786
    • refactor: Avoids instantiating orientation predictor when unnecessary by @fg-mindee in https://github.com/mindee/doctr/pull/809
    • feat: add straight-eval arg in evaluate script by @charlesmindee in https://github.com/mindee/doctr/pull/793
    • feat: add dice loss in linknet by @charlesmindee in https://github.com/mindee/doctr/pull/816
    • feat: add shrinked target in linknet + dilation in postprocessing by @charlesmindee in https://github.com/mindee/doctr/pull/822
    • feat: replace bce by focal loss in linknet loss by @charlesmindee in https://github.com/mindee/doctr/pull/824
    • docs: add rotation in docs by @charlesmindee in https://github.com/mindee/doctr/pull/846
    • feat: add aspect ratio for ocr predictor by @charlesmindee in https://github.com/mindee/doctr/pull/835
    • feat: add target to resize transform for aspect ratio training (detection task) by @charlesmindee in https://github.com/mindee/doctr/pull/823
    • update bug report ticket with Active backend field by @felixdittrich92 in https://github.com/mindee/doctr/pull/853
    • Theme + css #1 by @felixdittrich92 in https://github.com/mindee/doctr/pull/856
    • docs: Adds illustration in the docstrings of doctr.datasets by @felixdittrich92 in https://github.com/mindee/doctr/pull/857
    • docs: Updated docstrings of io, transforms & utils by @felixdittrich92 in https://github.com/mindee/doctr/pull/859
    • docs: Updated folder hierarchy of doc source and nootbooks to rst file by @felixdittrich92 in https://github.com/mindee/doctr/pull/862
    • Doc models #5 by @felixdittrich92 in https://github.com/mindee/doctr/pull/861
    • fix: linknet hyperparameters postprocessing + demo for rotation model by @charlesmindee in https://github.com/mindee/doctr/pull/865

    Miscellaneous

    • chore: Applied post release modifications by @fg-mindee in https://github.com/mindee/doctr/pull/780
    • Switch to new pypdfium2 API by @mara004 in https://github.com/mindee/doctr/pull/845

    New Contributors

    • @mara004 made their first contribution in https://github.com/mindee/doctr/pull/845

    Full Changelog: https://github.com/mindee/doctr/compare/v0.5.0...v0.5.1

    Source code(tar.gz)
    Source code(zip)
    doctr-need-help.png(861.59 KB)
    vit_b-103002d1.pt(325.51 MB)
    vit_b-983c86b5.zip(300.61 MB)
    vit_s-7a23bea4.zip(75.53 MB)
    vit_s-cd3472bd.pt(81.78 MB)
    word-crop.png(1.32 KB)
  • v0.5.0(Dec 31, 2021)

    This release adds support of rotated documents, and extends both the model & dataset zoos.

    Note: doctr 0.5.0 requires either TensorFlow 2.4.0 or PyTorch 1.8.0.

    Highlights

    :upside_down_face: :smiley: Rotation-aware text detection :upside_down_face: :smiley:

    It's no secret: this release focus was to bring the same level of performance to rotated documents!

    predictions

    docTR is meant to be your best tool for seamless document processing, and it couldn't do without supporting a very natural & common augmentation of input documents. This large project was subdivided into three parts:

    Straightening pages before text detection

    Developing a heuristic-based method to estimate the page skew, and rotate it before forwarding it to any deep learning model. Our thanks to @Rob192 for his contribution on this part :pray:

    This behaviour can be enabled to avoid retraining the text detection models. However, the heuristics approach has its limits in terms of robustness.

    Text detection training with rotated images

    doctr_sample

    The core of this project was to enable our text detection models to produce non-degraded heatmaps & localization candidates when processing a rotated page.

    Crop orientation resolution

    rot2

    Finally, once the localization candidates have been extracted, there is no saying that this localization candidate will read from left to right. In order to remove this doubt, a lightweight image orientation classifier was added to refine the crops that will be sent to text recognition!

    :zebra: A wider pretrained classification model zoo :zebra:

    The stability of trainings in deep learning for complex tasks has mostly been helped by leveraging transfer learning. As such, OCR tasks usually require a backbone as a feature extractor. For this reason, all checkpoints of classification models in both PyTorch & TensorFlow have been updated :rocket: Those were trained using our synthetic character classification dataset, for more details cf. Character classification training

    :framed_picture: New public datasets join the fray

    Thanks to @felixdittrich92, the list of supported datasets has considerably grown :partying_face: This includes widely popular datasets used for benchmarks on OCR-related tasks, you can find the full list over here :point_right: #587

    Synthetic text recognition dataset

    Additionally, we followed up on the existing CharGenerator by introducing WordGenerator:

    • generates an image of word of length randomly sampled within a specified range, with characters randomly sampled from the specified vocab.
    • you can even pass a list of fonts so that each word font family is randomly picked among them

    Below are some samples using a font_size=32: wordgenerator_sample

    :bookmark_tabs: New notebooks

    Two new notebooks have made their way into the documentation:

    • producing searchable PDFs from docTR analysis results
    • introduction to document artefact detection (QR code, bar codes, ID pictures, etc.) with docTR

    image

    Breaking changes

    Revamp of classification models

    With the retraining of all classification backbones, several changes have been introduced:

    • Model naming: linknet16 --> linknet_resnet18
    • Architecture changes: all classification backbones are available with a classification head now.

    Enforcing relative coordinates in datasets

    In order to unify our data pipelines, we forced the conversion to relative coordinates on all datasets!

    0.4.1 | 0.5.0 -- | -- >>> from doctr.datasets import FUNSD
    >>> ds = FUNSD(train=True, download=True)
    >>> img, target = ds[0]
    >>> print(target['boxes'].dtype, target['boxes'].max())
    (dtype('int64'), 862) | >>> from doctr.datasets import FUNSD
    >>> ds = FUNSD(train=True, download=True)
    >>> img, target = ds[0]
    >>> print(target['boxes'].dtype, target['boxes'].max())
    (dtype('float32'), 0.98341835) |

    Full changelog

    Breaking Changes 🛠

    • refacto: :wrench: postprocessing with rotated boxes by @charlesmindee in https://github.com/mindee/doctr/pull/641
    • refactor: Refactored LinkNet by @fg-mindee in https://github.com/mindee/doctr/pull/733
    • refactor: Renamed DataLoader arg "workers" into "num_workers" by @fg-mindee in https://github.com/mindee/doctr/pull/737
    • refactor: Unified return_preds flags across all tasks by @fg-mindee in https://github.com/mindee/doctr/pull/741
    • refactor: Introduces img + target transforms in Datasets by @fg-mindee in https://github.com/mindee/doctr/pull/750
    • refactor: refactoring rotated boxes by @charlesmindee in https://github.com/mindee/doctr/pull/731
    • refactor: Enforced relative coordinates for all dataset geometries by @fg-mindee in https://github.com/mindee/doctr/pull/775

    New Features

    • SynthText dataset integration by @felixdittrich92 in https://github.com/mindee/doctr/pull/624
    • [notebooks] add export_as_pdfa notebook by @felixdittrich92 in https://github.com/mindee/doctr/pull/650
    • ICDAR2003 dataset integration by @felixdittrich92 in https://github.com/mindee/doctr/pull/653
    • feat: Implements erosion & dilation in PyTorch & TF by @fg-mindee in https://github.com/mindee/doctr/pull/669
    • Rotate page by @Rob192 in https://github.com/mindee/doctr/pull/488
    • feat: Added option to use AMP with TF scripts by @fg-mindee in https://github.com/mindee/doctr/pull/682
    • feat: Added support of FasterRCNN for PyTorch by @fg-mindee in https://github.com/mindee/doctr/pull/691
    • ICDAR2013 dataset integration by @felixdittrich92 in https://github.com/mindee/doctr/pull/662
    • feat: Added LR finder option in PyTorch training scripts by @fg-mindee in https://github.com/mindee/doctr/pull/703
    • feat: Added line reading for source PDFs by @fg-mindee in https://github.com/mindee/doctr/pull/707
    • feat: Added plot_samples support to visualize the images along with the targets by @SiddhantBahuguna in https://github.com/mindee/doctr/pull/704
    • SVHN dataset integration by @felixdittrich92 in https://github.com/mindee/doctr/pull/634
    • feat: Added checkpoint for obj_detection by @SiddhantBahuguna in https://github.com/mindee/doctr/pull/713
    • feat: add classification module for crop orientation by @charlesmindee in https://github.com/mindee/doctr/pull/721
    • feat: Added inference+post processing script for artefact detection by @SiddhantBahuguna in https://github.com/mindee/doctr/pull/728
    • feat: Added latency evaluation scripts for all tasks by @fg-mindee in https://github.com/mindee/doctr/pull/746
    • docs: Added colab link in the Read me for artefact detection by @SiddhantBahuguna in https://github.com/mindee/doctr/pull/755
    • feat: Added LR Finder for TensorFlow scripts by @fg-mindee in https://github.com/mindee/doctr/pull/747
    • feat: Added latency evaluation & benchmark for image classification by @fg-mindee in https://github.com/mindee/doctr/pull/757
    • feat: Adds GaussianBlur, random font for CharGenerator and improves training scripts by @fg-mindee in https://github.com/mindee/doctr/pull/758
    • feat: Added WordGenerator dataset by @fg-mindee in https://github.com/mindee/doctr/pull/760
    • feat: Added dedicated evaluation scripts for text detection by @fg-mindee in https://github.com/mindee/doctr/pull/761
    • feat: Refactored & retrained all classification models by @fg-mindee in https://github.com/mindee/doctr/pull/763
    • feat: add rotated ckpts for pytorch DBNet + fix line resolution for rotated pages by @charlesmindee in https://github.com/mindee/doctr/pull/743
    • feat: Added torchvision photometric augmentations in artefact detection training by @SiddhantBahuguna in https://github.com/mindee/doctr/pull/764
    • feat: Added random noise augmentation to object detection by @SiddhantBahuguna in https://github.com/mindee/doctr/pull/654
    • feat: add rotation option to both detection training scripts by @charlesmindee in https://github.com/mindee/doctr/pull/765
    • feat: Added ChannelShuffle transformation and fixes RandomCrop by @fg-mindee in https://github.com/mindee/doctr/pull/768
    • feat: Added Gaussian Noise implementation in Tensorflow by @SiddhantBahuguna in https://github.com/mindee/doctr/pull/771
    • feat: Added Random Horizontal Flip augmentation by @SiddhantBahuguna in https://github.com/mindee/doctr/pull/773
    • ci: Added release helper actions by @fg-mindee in https://github.com/mindee/doctr/pull/776

    Bug Fixes

    • docs: Fixed documentation build by @fg-mindee in https://github.com/mindee/doctr/pull/644
    • fix: :bug: bug canvas dtype for threshold target by @charlesmindee in https://github.com/mindee/doctr/pull/645
    • fix: :bug: assume_straight_pages in predictor by @charlesmindee in https://github.com/mindee/doctr/pull/647
    • ci: Fixed silent isort failure by @fg-mindee in https://github.com/mindee/doctr/pull/655
    • fix: Fixed W&B config log by @fg-mindee in https://github.com/mindee/doctr/pull/656
    • fix: Updates Makefile to match CI by @fg-mindee in https://github.com/mindee/doctr/pull/661
    • docs: Fixed typo in the docstrings of metrics by @fg-mindee in https://github.com/mindee/doctr/pull/664
    • fix: rotation arg in training scripts by @charlesmindee in https://github.com/mindee/doctr/pull/657
    • feat: Added missing output classes param in DBNet by @fg-mindee in https://github.com/mindee/doctr/pull/666
    • fix: Fixed LinkNet target & loss computation by @fg-mindee in https://github.com/mindee/doctr/pull/670
    • fix: box angle rectification according to the quadrant by @charlesmindee in https://github.com/mindee/doctr/pull/667
    • fix: rotate_boxes angle by @charlesmindee in https://github.com/mindee/doctr/pull/678
    • fix: Fixed param override of backbone by @fg-mindee in https://github.com/mindee/doctr/pull/689
    • fix: Added missing AMP flags in training scripts by @fg-mindee in https://github.com/mindee/doctr/pull/690
    • fix: Added a 0-sized crop safeguard in split_crops by @fg-mindee in https://github.com/mindee/doctr/pull/693
    • fix: Fixed MASTER recognition architecture by @fg-mindee in https://github.com/mindee/doctr/pull/687
    • fix: Added safeguard for extreme aspect ratio in Resize by @fg-mindee in https://github.com/mindee/doctr/pull/695
    • fix: Fixed W&B logger in object detection training script by @fg-mindee in https://github.com/mindee/doctr/pull/697
    • fix: Fixed geometry utils for polygon <--> rbox conversions by @fg-mindee in https://github.com/mindee/doctr/pull/700
    • fix: Fixed build_target for detection models with rotated targets by @fg-mindee in https://github.com/mindee/doctr/pull/698
    • fix: box computing when assume straight pages is false by @charlesmindee in https://github.com/mindee/doctr/pull/720
    • test: Fixed TF loss unittest by @fg-mindee in https://github.com/mindee/doctr/pull/725
    • fix: Fixed edge cases of DB loss in PyTorch by @fg-mindee in https://github.com/mindee/doctr/pull/726
    • fix: Fixed computation of Mean IoU by @fg-mindee in https://github.com/mindee/doctr/pull/734
    • fix: Fixed detection training script by @fg-mindee in https://github.com/mindee/doctr/pull/742
    • fix: Fixed the bin_thresh of LinkNet by @fg-mindee in https://github.com/mindee/doctr/pull/745
    • test: Increased flexibility of loss test by @fg-mindee in https://github.com/mindee/doctr/pull/744
    • fix: Fixed mask computation of DBNet by @fg-mindee in https://github.com/mindee/doctr/pull/753
    • test: Fixed TensorFlow predictor unittest by @fg-mindee in https://github.com/mindee/doctr/pull/767
    • fix: Fixed the box cropping from RandomCrop by @fg-mindee in https://github.com/mindee/doctr/pull/772
    • ci: Fixed CI training job for TF by @fg-mindee in https://github.com/mindee/doctr/pull/770
    • docs: Fixed README link & update documentation by @fg-mindee in https://github.com/mindee/doctr/pull/774
    • fix: target DB by @charlesmindee in https://github.com/mindee/doctr/pull/777

    Improvements

    • style: Fixed isort and typing checks by @fg-mindee in https://github.com/mindee/doctr/pull/643
    • docs: Added TFJS demo ref in README by @fg-mindee in https://github.com/mindee/doctr/pull/651
    • fix: Added automatic worker resolution to remaining training scripts by @fg-mindee in https://github.com/mindee/doctr/pull/649
    • feat: Added rbox_iou function with a memory-savy option by @fg-mindee in https://github.com/mindee/doctr/pull/659
    • style: Cleaned codebase with Codacy hints by @fg-mindee in https://github.com/mindee/doctr/pull/665
    • feat: Added file existence check in DetectionDataset by @fg-mindee in https://github.com/mindee/doctr/pull/672
    • fix: pymupdf version by @charlesmindee in https://github.com/mindee/doctr/pull/673
    • [refactor] SROIE dataset by @felixdittrich92 in https://github.com/mindee/doctr/pull/660
    • fix: target_ar split crops by @charlesmindee in https://github.com/mindee/doctr/pull/681
    • feat: add line resolution for rotated boxes by @charlesmindee in https://github.com/mindee/doctr/pull/677
    • feat: add rboxes rectification in Linknet postprocessing by @charlesmindee in https://github.com/mindee/doctr/pull/679
    • docs: Added minimal docstring sanity check by @fg-mindee in https://github.com/mindee/doctr/pull/686
    • fix: Fixed deprecation warnings from numpy & PyMuPDF by @fg-mindee in https://github.com/mindee/doctr/pull/692
    • refactor: Removed postprocessor from high-level init by @fg-mindee in https://github.com/mindee/doctr/pull/688
    • feat: Added possibility to change the cache dir of datasets by @fg-mindee in https://github.com/mindee/doctr/pull/694
    • Mock Sroie / Funsd / Cord / Synthtext / DocArtefacts / IIIT5K / SVT / IC03 (all ^^) by @felixdittrich92 in https://github.com/mindee/doctr/pull/722
    • refactor: Refactored detection post-processing by @fg-mindee in https://github.com/mindee/doctr/pull/724
    • ci: Fixed CI job name and ignored .idea files by @fg-mindee in https://github.com/mindee/doctr/pull/727
    • feat: integration of the classifier in the ocr predictor by @charlesmindee in https://github.com/mindee/doctr/pull/723
    • test: Switch to a fully mocked PDF for unittests by @fg-mindee in https://github.com/mindee/doctr/pull/735
    • test: Silenced PyMuPDF warnings by @fg-mindee in https://github.com/mindee/doctr/pull/740
    • refactor: Removed contiguous param since it's included in torch>=1.7 by @fg-mindee in https://github.com/mindee/doctr/pull/756
    • feat: add preserve aspect ratio to predictor and vizualisation utils by @charlesmindee in https://github.com/mindee/doctr/pull/766
    • ci: Optimized CI jobs to speed up development process by @fg-mindee in https://github.com/mindee/doctr/pull/759
    • feat: Updated timing to more accurate one by @fg-mindee in https://github.com/mindee/doctr/pull/769

    Miscellaneous

    • chore: Applied post release modifications by @fg-mindee in https://github.com/mindee/doctr/pull/642

    Full Changelog: https://github.com/mindee/doctr/compare/v0.4.1...v0.5.0

    Source code(tar.gz)
    Source code(zip)
    artefacts-grid.png(89.63 KB)
    cord-grid.png(158.43 KB)
    funsd-grid.png(72.58 KB)
    ic03-grid.png(198.70 KB)
    ic13-grid.png(193.74 KB)
    iiit5k-grid.png(153.46 KB)
    imgur5k-grid.png(1.08 MB)
    linknet_resnet18-a48e6ed3.zip(40.92 MB)
    resnet101-330002d3.pt(163.70 MB)
    resnet101-be71742b.zip(150.85 MB)
    resnet31-5a47a60b.zip(162.88 MB)
    resnet34-5dcc97ca.zip(75.63 MB)
    resnet34-bd8725db.pt(81.57 MB)
    resnet34_wide-c1271816.zip(300.86 MB)
    resnet50-1a6c155e.pt(90.96 MB)
    resnet50-e75e4cdf.zip(83.79 MB)
    sar_resnet31-a74cc7c9.pt(219.35 MB)
    sroie-grid.png(130.98 KB)
    svhn-grid.png(110.36 KB)
    svt-grid.png(197.54 KB)
    synthtext-grid.png(220.65 KB)
  • v0.4.1(Nov 22, 2021)

    This patch release brings the support of AMP for PyTorch training to docTR along with artefact object detection.

    Note: doctr 0.4.1 requires either TensorFlow 2.4.0 or PyTorch 1.8.0.

    Highlights

    Automatic Mixed Precision (AMP) :zap:

    Training scripts with PyTorch back-end now benefit from AMP to reduce the RAM footprint and potentially increase the maximum batch size! This comes especially handy on text detection which require high spatial resolution inputs!

    Artefact detection :flying_saucer:

    Document understanding goes beyond textual elements, as information can be encoded in other visual forms. For this reason, we have extended the range of supported tasks by adding object detection. This will be focused on non-textual elements in documents, including QR codes, barcodes, ID pictures, and logos.

    Here are some early results:

    2x3_art(1)

    This release comes with a training & validation set DocArtefacts, and a reference training script. Keep an eye for models we will be releasing in the next release!

    Get more of docTR with Colab tutorials :book:

    You've been waiting for it, from now on, we will be adding regularly new tutorials for docTR in the form of jupyter notebooks that you can open and run locally or on Google Colab for instance!

    Check the new page in the documentation to have an updated list of all our community notebooks: https://mindee.github.io/doctr/latest/notebooks.html

    Breaking changes

    Deprecated support of FP16 for datasets

    Float-precision can be leveraged in deep learning to decrease the RAM footprint of trainings. The common data type float32 has a lower resolution counterpart float16 which is usually only supported on GPU for common deep learning operations. Initially, we were planning to make all our operations available in both to reduce memory footprint in the end.

    However, with the latest development of Deep Learning frameworks, and their Automatic Mixed Precision mechanism, this isn't required anymore and only adds more constraints on the development side. We thus deprecated this feature from our datasets and predictors:

    0.4.0 | 0.4.1 -- | -- >>> from doctr.datasets import FUNSD
    >>> ds = FUNSD(train=True, download=True, fp16=True)
    >>> print(getattr(ds, "fp16"))
    True | >>> from doctr.datasets import FUNSD
    >>> ds = FUNSD(train=True, download=True)
    >>> print(getattr(ds, "fp16"))
    None |

    Detailed changes

    New features

    • Adds Arabic to supported vocabs in #514 (@mzeidhassan)
    • Adds XML export method to DocumentBuilder in #544 (@felixdittrich92)
    • Adds flags to control the behaviour with rotated elements in #551 (@charlesmindee)
    • Adds unittest to ensure headers are correct in #556 (@fg-mindee)
    • Adds isort ordering & dedicated CI check in #557 (@fg-mindee)
    • Adds IIIT-5K to supported datasets in #589 (@felixdittrich92)
    • Adds support of AMP to all PyTorch training scripts in #604 (@fg-mindee)
    • Adds DocArtefacts dataset for object detection on non-textual elements in #583 (@SiddhantBahuguna)
    • Speeds up CTC decoding in PyTorch by x10 in #633 (@fg-mindee)
    • Added train script for artefact detection in #593 (@SiddhantBahuguna)
    • Added GPU support for classification and improve memory pinning in #629 (@fg-mindee)
    • Added an object detection metric in #628 (@fg-mindee)
    • Split DocArtefacts into subsets and updated its class mapping in #601 (@fg-mindee)
    • Added README specific for the API with route examples in #612 (@fg-mindee)
    • Added SVT dataset integration in #620 (@felixdittrich92)
    • Added links to tutorial notebooks in the documentation in #619 (@fg-mindee)
    • Added new architectures to model selection in demo in #600 (@fg-mindee)
    • Add det/reco_predictor arch in OCRPredictor.__repr__ in #595 (@RBMindee)
    • Improves coverage by adding missing unittests in #545 (@fg-mindee)
    • Resolve both lines and blocks by default when building a doc in #548 (@charlesmindee)
    • Relocated test/ to tests/ and made contribution process easier in #598 (@fg-mindee)
    • Fixed Makefile by converting spaces to tabs in #615 (@fg-mindee)
    • Updated flake8 config to spot unused imports & undefined variables in #623 (@fg-mindee)
    • Adds 2 new rotation flags in the ocr_predictor in #632 (@charlesmindee)

    Bug fixes

    • Fixed evaluation script clipping issue in #522 (@charlesmindee)
    • Fixed API template issues with new httpx version in #535 (@fg-mindee)
    • Fixed TransformerDecoder for PyTorch 1.10 in #539 (@fg-mindee)
    • Fixed a bug in resolve_lines in #537 (@charlesmindee)
    • Fixed target computation in MASTER model (PyTorch backend) in #546 (@charlesmindee)
    • Fixed portuguese entry in VOCAB in #571 (@fmobrj)
    • Fixed header check typo in #557 (@fg-mindee)
    • Fixed keras version constraint in #579 (@fg-mindee)
    • Updated streamlit version in demo app in #611 (@charlesmindee)
    • Updated environment collection script in #575 (@fg-mindee)
    • Removed console print in builder in #566 (@fg-mindee)
    • Fixed docstring and export as xml dim bug in #586 (@felixdittrich92)
    • Fixed README instruction for page synthesis in #590 (@fg-mindee)
    • Adds missing console log and removed Tensorboard in #626 (@fg-mindee)
    • Fixed docstrings of datasets in #603 (@felixdittrich92)
    • Fixed documentation build requirements in #549 (@fg-mindee)

    Improvements

    • Applied post release modifications in #520 (@fg-mindee)
    • Updated benchmark entry of crnn_mobilenet_v3 small in #523 (@charlesmindee)
    • Updated perf crnn_mobilenet_v3_large performances in doc (TF) in #526 (@charlesmindee)
    • Added automatic detection of rotated bbox in training utils in #534 (@fg-mindee)
    • Cleaned rotation transforms in #536 (@fg-mindee)
    • Updated library name spelling in #541 (@fg-mindee)
    • Updates README of detection training in #542 (@k-for-code)
    • Updated package index in #543 (@fg-mindee)
    • Updated README in #555 (@fg-mindee)
    • Updated CONTRIBUTING and issue templates in #560 (@fg-mindee)
    • Removed unused imports and prevents XML attacks in #582 (@fg-mindee)
    • Updated references to demo in README in #599 (@fg-mindee)
    • Updated readme and help in analyze.py in #596 (@RBMindee)
    • Specified that the API template only supports images for now in #609 (@fg-mindee)
    • Updated command to install tf/pytorch build in #614 (@charlesmindee)
    • Added checkpoint format to gitignore in #613 (@fg-mindee)
    • Specified comment in SAR about symbol encoding in #617 (@fg-mindee)
    • Drops support of np.float16 in #627 (@fg-mindee)

    New Contributors

    Our thanks & warm welcome to the following persons for their first contributions: @mzeidhassan @k-for-code @felixdittrich92 @SiddhantBahuguna @RBMindee @thentgesMindee :pray:

    Full Changelog: https://github.com/mindee/doctr/compare/v0.4.0...v0.4.1

    Source code(tar.gz)
    Source code(zip)
    classif_mobilenet_v3_small-1ea8db03.zip(5.43 MB)
    classif_mobilenet_v3_small-24f8ff57.pt(5.90 MB)
    db_resnet50-1138863a.pt(97.24 MB)
    db_resnet_50_rotation-e58c5701.zip(89.68 MB)
    fasterrcnn_mobilenet_v3_large_fpn-8bc7ff9e.pt(72.55 MB)
    fasterrcnn_mobilenet_v3_large_fpn-d5b2490d.pt(72.55 MB)
    magc_resnet31-857391d8.pt(176.10 MB)
    mobilenet_v3_large-11fc8cb9.pt(16.84 MB)
    mobilenet_v3_large-47d25d7e.zip(15.49 MB)
    mobilenet_v3_large_r-74a22066.pt(16.84 MB)
    mobilenet_v3_large_r-a108e192.zip(15.48 MB)
    mobilenet_v3_small-6a4bfa6b.pt(6.40 MB)
    mobilenet_v3_small-8a32c32c.zip(5.83 MB)
    mobilenet_v3_small_r-1a8a3530.pt(6.40 MB)
    mobilenet_v3_small_r-3d61452e.zip(5.83 MB)
    pdf_sample.pdf(9.49 KB)
    resnet18-244bf390.pt(42.95 MB)
    resnet18-d4634669.zip(39.81 MB)
    resnet31-1056cc5c.pt(175.77 MB)
    synthtext_samples-89fd1445.zip(1.04 MB)
    vgg16_bn_r-c5836cea.zip(52.38 MB)
    vgg16_bn_r-d108c19c.pt(56.47 MB)
  • v0.4.0(Oct 1, 2021)

    This release brings the support of PyTorch out of beta, makes text recognition more robust, and provides light architectures for complex tasks.

    Note: doctr 0.4.0 requires either TensorFlow 2.4.0 or PyTorch 1.8.0.

    Highlights

    No more width limitation for text recognition

    Some documents such as French ID card include very long strings that can be challenging to transcribe:

    fr_id_card_sample (copy)

    This release enables a smart split/merge strategy for wide crops to avoid performance drops. Previously the whole crop was analyzed altogether, while right now, it is split into reasonably sized crops, the inference is performed in batch then predictions are merged together.

    The following snippet:

    from doctr.io import DocumentFile
    from doctr.models import ocr_predictor
    
    doc = DocumentFile.from_images('path/to/img.png')
    predictor = ocr_predictor(pretrained=True)
    print(predictor(doc).pages[0])
    

    used to yield:

    Page(
      dimensions=(447, 640)
      (blocks): [Block(
        (lines): [Line(
          (words): [
            Word(value='1XXXXXX', confidence=0.0023),
            Word(value='1XXXX', confidence=0.0018),
          ]
        )]
        (artefacts): []
      )]
    )
    

    and now yields:

    Page(
      dimensions=(447, 640)
      (blocks): [Block(
        (lines): [Line(
          (words): [
            Word(value='IDFRABERTHIER<<<<<<<<<<<<<<<<<<<<<<', confidence=0.49),
            Word(value='8806923102858CORINNE<<<<<<<6512068F6', confidence=0.22),
          ]
        )]
        (artefacts): []
      )]
    )
    

    Framework specific predictors

    PyTorch support is now no longer in beta, so we made some efforts so that switching from one deep learning backend to another is unified :raised_hands: Predictors are designed to be the recommended interface for inference with your models!

    0.3.1 (TensorFlow) | 0.3.1 (PyTorch) | 0.4.0 -- | -- | -- >>> from doctr.models import detection_predictor
    >>> predictor = detection_predictor(pretrained=True)
    >>> out = predictor(doc, training=False) | >>> from doctr.models import detection_predictor
    >>> import torch
    >>> predictor = detection_predictor(pretrained=True)
    >>> predictor.model.eval()
    >>> with torch.no_grad(): out = predictor(doc) | >>> from doctr.models import detection_predictor
    >>> predictor = detection_predictor(pretrained=True)
    >>> out = predictor(doc) |

    An evergrowing model zoo :zebra:

    As PyTorch goes out of beta, we have bridged the gap between PyTorch & TensorFlow pretrained models' availability. Additionally, by leveraging our integration of light backbones, this release comes with lighter architectures for text detection and text recognition:

    • db_mobilenet_v3_large
    • crnn_mobilenet_v3_small
    • crnn_mobilenet_v3_large

    The full list of supported architectures is available :point_right: here

    Demo live on HuggingFace Spaces

    If you have enjoyed the Streamlit demo, but prefer not to run in on your own hardware, feel free to check out the online version on HuggingFace Spaces: Hugging Face Spaces

    Courtesy of @osanseviero for deploying it, and HuggingFace for hosting & serving :pray:

    Breaking changes

    Deprecated crnn_resnet31 & sar_vgg16_bn

    After going over some backbone compatibility and re-assessing whether all combinations should be trained, DocTR is focusing on reproducing the paper's authors' will or improve upon it. As such, we have deprecated the following recognition models (that had no pretrained params): crnn_resnet31, sar_vgg16_bn.

    Deprecated models.export

    Since doctr.models.export was specific to TensorFlow and it didn't bring much more value than TensorFlow tutorials, we added instructions in the documentation and deprecated the submodule.

    New features

    Datasets

    Resources to access data in efficient ways

    • Added entry in vocabs for Portuguese #464 (@fmobrj), English, Spanish & German #467 (@fg-mindee), ancient Greek #500 (@fg-mindee)

    IO

    Features to manipulate input & outputs

    • Added .synthesize method to Page and Document #472 (@fg-mindee)

    Models

    Deep learning model building and inference

    • Add dynamic crop splitting for wide inputs to recognition models #465 (@charlesmindee)
    • Added MobileNets with rectangular pooling #483 (@fg-mindee)
    • Added pretrained params for db_mobilenet_v3_large #485 #487 , crnn_vgg16_bn #487, db_resnet50 #489, crnn_mobilenet_v3_small & crnn_mobilenet_v3_small #517 #516 (@charlesmindee)

    Utils

    Utility features relevant to the library use cases.

    • Added automatic font resolution function #472 (@fg-mindee)

    Transforms

    Data transformations operations

    • Added RandomCrop transformation #448 (@charlesmindee)

    Test

    Verifications of the package well-being before release

    • Added a unittest for RandomCrop #448 (@charlesmindee)
    • Added a unittest for crop split/merge in recognition models #465 (@charlesmindee)
    • Added unittests for PyTorch OCR model zoo #499 (@fg-mindee)

    Documentation

    Online resources for potential users

    • Added entry for RandomCrop #448 (@charlesmindee)
    • Added explanations about model export / compression #463 (@fg-mindee)
    • Added benchmark entry for db_mobilenet_v3_large #485 in the documentation (@charlesmindee)
    • Added badge with hyperlink to HuggingFace Spaces demo #501 (@osanseviero)

    References

    Reference training scripts

    • Added option to select vocab in the training of character classification and text recognition #502 (@fg-mindee)

    Others

    Other tools and implementations

    • Added CI job to validate the demo, the evaluation script and the environment collection scripts #456 (@fg-mindee), the character classification training script #457 (@fg-mindee), the analysis & evaluation scripts in PyTorch #458 (@fg-mindee), the text recognition scripts #469 (@fg-mindee), the text detection scripts #491 (@fg-mindee)
    • Added support of PyTorch for the analysis & evaluation scripts #458 (@fg-mindee)

    Bug fixes

    Datasets

    • Fixed submodule import #451 (@fg-mindee )
    • Added missing characters in French vocab #467 (@fg-mindee)

    Models

    • Fixed PyTorch preprocessor shape resolution #453 (@charlesmindee)
    • Fixed Tensor cropping for channels_first format #458 #461 (@fg-mindee)
    • Replaced recognition models' MobileNet backbones by their rectangular pooling counterparts #483 (@fg-mindee)
    • Fixed crop extraction for PyTorch tensors #484 (@charlesmindee)
    • Fixed crop filtering on multi-page inference #497 (@fg-mindee)

    Transforms

    • Fixed rounding errors in RandomCrop #473 (@fg-mindee)

    Utils

    • Fixed page synthesis for characters outside of latin-1 #496 (@fg-mindee)

    Documentation

    • Fixed READMEs of training scripts #504 #491 (@fg-mindee)

    References

    • Fixed the requirements of the training scripts #494 #491 (@fg-mindee)

    Others

    • Fixed the requirements of the streamlit demo #492 (@osanseviero), the API template #494 (@fg-mindee)

    Improvements

    Datasets

    • Merged DocDataset & OCRDataset #474 (@charlesmindee)
    • Updated DetectionDataset label format #491 (@fg-mindee)

    Models

    • Deprecated doctr.models.export #463 (@fg-mindee)
    • Deprecated crnn_resnet31 & sar_vgg16_bn recognition models #468 (@fg-mindee)
    • Relocated DocumentBuilder to doctr.models.builder, split predictor into framework-specific objects #481 (@fg-mindee)
    • Added more robust argument checks in DocumentBuilder & refactored crop preparation and result processing in ocr predictors #497 (@fg-mindee)
    • Reflected changes of detection target formats on detection models #491 (@fg-mindee)

    Utils

    • Improved page synthesis with dynamic font size #472 (@fg-mindee)

    Documentation

    • Updated README badge & added release-specific documentation index #451 (@fg-mindee)
    • Added logo in README & documentation #459 (@charlesmindee)
    • Updated hyperlink to documentation in the README #462 (@fg-mindee)
    • Updated vocab description in the documentation #467 (@fg-mindee)
    • Added favicon in the documentation #466 (@fg-mindee)
    • Removed benchmark entry of deprecated models #468 (@fg-mindee)
    • Updated README of the text recognition training script #469 (@fg-mindee)
    • Updated performance benchmark with crop splitting #471 (@charlesmindee)
    • Added page synthesis example in README #472 (@fg-mindee)
    • Made copyright mention dynamic, improved the landing & installation pages in the documentation #475 (@fg-mindee)
    • Restructured the documentation #519 (@fg-mindee)

    Tests

    • Removed legacy unittests of doctr.models.export #463 (@fg-mindee)
    • Removed unittests for deprecated models #468 (@fg-mindee)
    • Updated unittests with the new doctr.utils.font submodule #472 (@fg-mindee)
    • Reflected changes from predictor refactor #481 (@fg-mindee)
    • Extended unittest of crop extraction #484 (@charlesmindee)
    • Reflected changes from predictor crop preparation improvement #497 (@fg-mindee)
    • Reflect changes from detection target format #491 (@fg-mindee)

    References

    • Reflected changes of detection dataset target format #491 (@fg-mindee)

    Others

    • Specified import of file_utils #447 (@zalakbhalani)
    • Updated package version #451 (@fg-mindee)
    • Updated PIL version constraint to fix vulnerability #460 (@fg-mindee)
    • Updated model selection in the demo #468 (@fg-mindee)
    • Removed some MacOS CI jobs that were slowing down PR checks #470 (@fg-mindee)
    • Reflected page synthesis changes in demo #477 (@fg-mindee)
    • Reflected changes from predictor refactor in API & demo #481 (@fg-mindee)
    • Updated author_email in setup #493 (@fg-mindee)
    • Split CI jobs for pytest in common, pytorch & tensorflow #498 #503 #506 (@fg-mindee)
    • Removed unused imports #507 (@fg-mindee)

    Many thanks to our contributors, we are delighted to see that there are more every week!

    Source code(tar.gz)
    Source code(zip)
    artefact_detection-13fab8ce.zip(169.18 MB)
    artefact_detection-6c401d4d.zip(169.15 MB)
    crnn_mobilenet_v3_large-21f3591c.zip(16.08 MB)
    rotation_illustration.png(215.88 KB)
  • v0.3.1(Aug 27, 2021)

    This release stabilizes the support for PyTorch backend while extending the range features (new task, superior pretrained models, speed ups).

    Brought to you by @fg-mindee & @charlesmindee

    Note: doctr 0.3.1 requires either TensorFlow 2.4.0 or PyTorch 1.8.0.

    Highlights

    Improved pretrained parameters for your favorite models :rocket:

    Which each release, we hope to bring you improved models and more comprehensive evaluation results. As part of the 0.3.1 release, we provide you with:

    • improved params for crnn_vgg16_bn & sar_resnet31
    • evaluation results on a new private dataset (US tax forms)

    Lighter backbones for faster architectures :zap:

    Without any surprise, just like many other libraries, DocTR's future will involve some balance between speed and pure performance. To make this choice available to you, we added support of MobileNet V3 and pretrained it for character classification for both PyTorch & TensorFlow.

    Speeding up preprocessors & datasets :train2:

    Whether you are a user looking for inference speed, or a dedicated model trainer looking for optimal data loading, you will be thrilled to know that we have greatly improved our data loading/processing by leveraging multi-threading!

    Better demo app :art:

    We value the accessibility of this project and thus commit to improving tools for entry-level users. Deploying a demo from a Python library is not the expertise of every developer, so this release improves the existing demo:

    new_demo

    Page selection was added for multi-page documents, the predictions are used to produce a synthesized version of the initial document, and you get the JSON export! We're looking forward to your feedback :hugs:

    [beta] Character classification

    As DocTR continues to move forward with more complex tasks, paving the way for a consistent training procedure will become necessary. Pretraining has shown potential in many deep learning tasks, and we want to explore opportunities to make training for OCR even more accessible. char_classif

    So this release makes a big step forward by adding on-the-fly character generator and training scripts, which allows you to train a character classifier without any pre-existing data :hushed:

    Breaking changes

    Default dtype of TF datasets

    In order to harmonize data processing between frameworks, the default data type of dataloaders has been switched to float32 for TensorFlow backend:

    0.3.0 | 0.3.1 -- | -- >>> from doctr.datasets import FUNSD
    >>> ds = FUNSD()
    >>> img, target = ds[0]
    >>> print(img.dtype)
    <dtype: 'uint8'>
    >>> print(img.numpy().min(), img.numpy().max())
    0 255 | >>> from doctr.datasets import FUNSD
    >>> ds = FUNSD()
    >>> img, target = ds[0]
    >>> print(img.dtype)
    <dtype: 'float32'>
    >>> print(img.numpy().min(), img.numpy().max())
    0.0 1.0 |

    I/O module

    Whether it is for exporting predictions or loading input data, the library lets you play around with inputs and outputs using minimal code. Since its usage is constantly expanding, the doctr.documents module was repurposed into doctr.io.

    0.3.0 | 0.3.1 -- | -- >>> from doctr.documents import DocumentFile
    >>> pdf_doc = DocumentFile.from_pdf("path/to/your/doc.pdf").as_images() | >>> from doctr.io import DocumentFile
    >>> pdf_doc = DocumentFile.from_pdf("path/to/your/doc.pdf").as_images() |

    It now also includes an image submodule for easy tensor <--> numpy conversion for all supported data types.

    Multithreading relocated

    As multithreading is getting increasingly used to boost performances in the entire library, it has been moved from utilities of TF-only datasets to doctr.utils.multithreading:

    0.3.0 | 0.3.1 -- | -- >>> from doctr.datasets.multithreading import multithread_exec
    >>> results = multithread_exec(lambda x: x ** 2, [1, 4, 8]) | >>> from doctr.utils.multithreading import multithread_exec
    >>> results = multithread_exec(lambda x: x ** 2, [1, 4, 8]) |

    New features

    Datasets

    Resources to access data in efficient ways

    • Added support of FP16 (#367)
    • Added option to merge subsets for recognition datasets (#376)
    • Added dynamic sequence encoding (#393)
    • Added support of new label format datasets (#407)
    • Added character generator dataset for image classification (#412, #418)

    IO

    Features to manipulate input & outputs

    • Added Element creation from dictionary (#386)
    • Added byte decoding function for PyTorch and TF (#390)
    • Added extra tensor conversion functions (#412)

    Models

    Deep learning model building and inference

    • Added crnn_resnet31 as a recognition model (#361)
    • Added a uniform comprehensive preprocessing mechanism for both frameworks (#370)
    • Added support of FP16 (#382)
    • Added MobileNet V3 for TensorFlow as a backbone (#372, #410, #420)
    • Added superior pretrained params for crnn_vgg16_bn in TF (#395)
    • Added pretrained params for master in TF (#396)
    • Added mobilenet backbone availability to detection & recognition models (#398, #399)
    • Added pretrained params for mobilenets on character classification (#415, #421, #424)
    • Added superior pretrained params for sar_resnet31 in TF (#395)

    Utils

    Utility features relevant to the library use cases.

    • Added box rotation function (#358)
    • Added box visualization feature (#384)

    Transforms

    Data transformations operations

    • Added rotate function (#358) and its corresponding augmentation module (#363)
    • Added cropping function (#366)
    • Added support of FP16 (#388)

    Test

    Verifications of the package well-being before release

    • Added unittests for rotation functions (#358)
    • Added test cases for recognition zoo (#361)
    • Added unittests for RandomRotate (#363)
    • Added test cases for recognition dataset merging (#376)
    • Added unittests for Element creation from dicts (#386)
    • Added test case for mobilenet backbone (#372)
    • Added unittests for datasets with new format (#407)
    • Added test cases for the character generator (#412)

    Documentation

    Online resources for potential users

    • Added entry for RandomRotate (#363)
    • Added entry for CharacterGenerator (#412)
    • Added evaluation on US tax forms in the documentation (#419)

    References

    Reference training scripts

    • Added PyTorch training reference scripts (#359, #394)
    • Added LR scheduler to TensorFlow script (#360, #374, #397) & Pytorch scripts (#381)
    • Added possibility to use multi-folder datasets (#377)
    • Added character classification training script (#414, #420)

    Others

    Other tools and implementations

    • Added page selection and result JSON display in demo (#369)
    • Added an entry for MASTER in model selection of the demo (#400)

    Bug fixes

    Datasets

    • Fixed image shape resolution in custom datasets (#354)
    • Fixed box clipping to avoid rounding errors in datasets (#355)

    Models

    • Fixed GPU compatibility of detection models (#359) & recognition models (#361)
    • Fixed recognition model loss computation in PyTorch (#379)
    • Fixed loss computation of CRNN in PyTorch (#434)
    • Fixed loss computation of MASTER in PyTorch (#440)

    Transforms

    • Fixed Resize transformation when aspect ratio matches the target (#357)
    • Fixed box rotation (#378)
    • Fixed image expansion while rotating (#438)

    Documentation

    • Fixed installation instructions (#437)

    References

    • Fixed missing import in utils (#389)
    • Fixed GPU support for PyTorch in the recognition script (#427)
    • Fixed gradient clipping in Pytorch scripts (#432)

    Others

    • Fixed trigger of script testing CI job (#351)
    • Constrained PIL version due to issues with version 8.3 (#362)
    • Added missing mypy config ignore (#365)
    • Fixed PDF page rendering for demo & analysis script (#368)
    • Constrained weasyprint version due to issues with version 53.0 (#404)
    • Constrained matplotlib version due to issues with version 3.4.3 (#413)

    Improvements

    Datasets

    • Improved typing of doctr.datasets (#354)
    • Improved PyTorch data loading (#362)
    • Switched default dtype of TF to tf.float32 instead of tf.uint8 (#367, #375)
    • Optimized sequence encoding throught multithreading (#393)

    IO

    • Relocated doctr.documents to doctr.io (#390)

    Models

    • Updated bottleneck channel multiplier in MASTER (#350)
    • Added dropout in MASTER (#349)
    • Renamed MASTER attribute for consistency across models (#356)
    • Added target validation for detection models (#355)
    • Optimized preprocessing by leveraging multithreading (#370)
    • Added dtype dynamic resolution and specified error messages (#382)
    • Harmonized parameter loading in PyTorch (#425)
    • Enabled backbone pretraining in complex architectures (#435)
    • Made head & FPN sizing dynamic using the feature extractor in detection models (#435)

    Utils

    • Moved multithreading to doctr.utils (#371)
    • Improved format validation for visualization features (#392)
    • Moved doctr.models._utils.rotate_page to doctr.utils.geometry.rotate_image (#371)

    Documentation

    • Updated pypi badge & documentation changelog (#346)
    • Added export example in documentation (#348)
    • Reflected relocation of doctr.documents to doctr.io in documentation and README (#390)
    • Updated recognition benchmark (#395, #441)
    • Updated model entries (#435)
    • Updated authors & maintainer references in setup.py and in README (#444)

    Tests

    • Added test case for same aspect ratio resizing (#357)
    • Extended testing of datasets (#354)
    • Added test cases for FP16 support of datasets (#367), models (#382), transforms (#388)
    • Moved multithreading unittests (#371)
    • Extended test cases of preprocessors (#370)
    • Reflected relocation of doctr.documents to doctr.io (#390)
    • Removed unused imports (#391)
    • Extended test cases for visualization (#392)
    • Updated unittests of sequence encoding (#393)
    • Added extra test cases for rotation validation (#438)

    References

    • Added optimal selection of workers for all scripts (#362)
    • Reflected changes of switching to tf.float32 by default for datasets (#367)
    • Removed legacy script arg (#380)
    • Removed unused imports (#391)
    • Improved device selection for Pytorch training script (#427)
    • Improved metric logging when the value is undefined (#432)

    Others

    • Updated package version (#346)
    Source code(tar.gz)
    Source code(zip)
    crnn_mobilenet_v3_large_pt-f5259ec2.pt(17.39 MB)
    crnn_mobilenet_v3_small-7f36edec.zip(7.38 MB)
    crnn_mobilenet_v3_small_pt-3b919a02.pt(8.03 MB)
    crnn_vgg16_bn-9762b0b0.pt(60.35 MB)
    db_mobilenet_v3_large-8c16d5bf.zip(15.42 MB)
    db_mobilenet_v3_large-fd62154b.pt(16.15 MB)
    db_resnet50-ac60cadc.pt(97.24 MB)
    Logo-docTR-white.png(7.39 KB)
    Logo_doctr.gif(74.66 KB)
    synthesized_sample.png(39.89 KB)
    toy_detection_set-bbbb4243.zip(8.68 KB)
    toy_recogition_set-036a4d80.zip(6.87 KB)
  • v0.3.0(Jul 2, 2021)

    This release adds support for PyTorch backend & rotated text elements.

    Release brought to you by @fg-mindee & @charlesmindee

    Note: doctr 0.3.0 requires either TensorFlow 2.4.0 or PyTorch 1.8.0.

    Highlights

    [beta] Welcome PyTorch :tada:

    This release comes with exciting news: we added support of PyTorch for the whole library!

    If you have both TensorFlow & Pytorch, simply switch DocTR backend by using the USE_TORCH and USE_TF environment variables.

    export USE_TORCH='1'
    

    Then DocTR will do the rest for you to play along with PyTorch:

    import torch
    from doctr.models import db_resnet50
    model = db_resnet50(pretrained=True).eval()
    with torch.no_grad():
        out = model(torch.rand(1, 3, 1024, 1024))
    

    More pretrained models to come in the next releases!

    Support of rotated boxes

    Users might be tempted to filtered text recognition predictions, which was not easy previously without a prediction's confidence. We harmonized our recognition models to provide the sequence prediction probability.

    Rotated bounding boxes

    Page reconstruction

    Following up on some feedback about the lack of clarity for visualization of dense predictions, we added a page reconstruction feature.

    import matplotlib.pyplot as plt
    from doctr.utils.visualization import synthesize_page
    from doctr.documents import DocumentFile
    from doctr.models import ocr_predictor
    
    model = ocr_predictor(pretrained=True)
    # PDF
    doc = DocumentFile.from_pdf("path/to/your/doc.pdf").as_images()
    # Analyze
    result = model(doc)
    
    # Reconstruct the first page
    reconstructed_page = synthesize_page(result.export()[0])
    plt.imshow(reconstructed_page); plt.show()
    

    Original image Page reconstruction

    Using the predictions from our models, we try to synthesize the document with only its textual information!

    Breaking changes

    Renamed LinkNet

    While the paper doesn't introduce different versions of the LinkNet architectures, we want to keep the possibility to add more. In order to stabilize the interface early on, we renamed linknet into linknet16

    0.2.1 | 0.3.0 -- | -- >>> from doctr.models import linknet
    >>> model = linknet(pretrained=True) | >>> from doctr.models import linknet16
    >>> model = linknet16(pretrained=True) |

    New features

    Datasets

    Resources to access data in efficient ways

    • Added option to yield rotated bounding boxes as target (#281)
    • Added support of PyTorch for all datasets (#319)

    Documents

    Features to manipulate document information

    • Added support of rotated bboxes (#281)
    • Added entry for MASTER (#300)
    • Updated LinkNet entry (#313)
    • Added code of conduct (#325)

    Models

    Deep learning model building and inference

    • Added rotated cropping feature & inference mode (#281)
    • Added spatial masked loss support for LinkNet (#296)
    • Added page orientation estimation feature (#293)
    • Added box target rotation feature (#297)
    • Added support of MASTER recognition model & transformer (#300, #342)
    • Added Focal loss support to linknet (#304, #311)
    • Added PyTorch support for DBNet (#310, #313, #316), LinkNet (#317), conv_sequence & parameter loading (#323), resnet31 (#327), vgg16_bn (#328), CRNN (#318), SAR (#333), MASTER (#329, #335, #340, #342)
    • Added cleaner verified file downloading function (#319)
    • Added upfront page orientation estimation (#324) by @Rob192

    Utils

    Utility features relevant to the library use cases.

    • Added Mask IoU computation (#290)
    • Added straight <--> rotated bbox conversion and metric computation support (#281)
    • Added page synthesis feature (#320)
    • Added IoA, and NMS (#332)

    Transforms

    Data transformations operations

    • Added support of custom Resize in PyTorch (#313), ColorInversion (#322)

    Test

    Verifications of the package well-being before release

    • Added unittest for maks IoU computation (#290)
    • Added unittests for rotated bbox support (#281, #297)
    • Added unittests for page orientation estimation (#293, #324)
    • Added unittests for MASTER (#300, #309)
    • Added test case for the focal loss of LinkNet (#304)
    • Added unittests for Pytorch integration (#310, #313, #317, #319, #322, #323, #327, #318, #329, #335, #340, #342)
    • Added unittests for IoA & NMS (#332)

    Documentation

    Online resources for potential users

    • Added instructions to install DocTR with PyTorch or TF (#306)
    • Added specific instructions to run checks in CONTRIBUTING (#321)

    References

    Reference training scripts

    • Added support of rotated bounding box targets (#281)

    Others

    Other tools and implementations

    • Added support of rotated bounding box target & inference mode (#281)
    • Added framework availability check (#306, #314, #315)
    • Added CI job for pytorch unittests (#310)
    • Added CI jobs to build DocTR with multiple python version, environment and framework (#314, #315)
    • Updated demo to add page reconstruction (#320)
    • Added PyTorch & torchvision to environment collection script (#345) & updated the bug template

    Bug fixes

    Documentation

    • Fixed entry of datasets (#344)

    Tests

    • Fixed ColorInversion unittest (#298, #339)

    References

    • Fixed missing import of wandb in the detection script (#288)
    • Fixed edge case of recognition model output unpacking in the recognition training script (#291)
    • Fixed model output unpacking in the detection script (#301)
    • Fixed wandb config for training scripts (#302)

    Others

    • Fixed edge case of recognition model output unpacking in the evaluation script (#291)
    • Fixed mypy config and related typing annotations (#308, #312, #314, #336)

    Improvements

    Datasets

    • Improved constructors of OCRDataset and CORD (#289, #299)
    • Silenced numpy dtype warnings (#336)

    Documents

    • Updated README badge & documentation versioning (#287)
    • Harmonized benchmark table formatting of figures (#281)
    • Updated demo illustration in README (#326)

    Documentation

    • Updated documentation font and mentioned PyTorch support in README & docs (#344)

    Tests

    • Updated unittest image (#337)
    • Cleaned up unittest folder separation (#338)

    References

    • Reordered script option to save time for test-only (#294)

    Others

    • Updated package version (#287)
    • Removed unused imports (#295, #307, #336)
    • Updated API requirements for security and cleaned Dockerfile (#303)
    • Improved setuptools classifiers and installation process (#306)

    :pray: Thanks to our contributors :pray: @Rob192

    Source code(tar.gz)
    Source code(zip)
    crnn_vgg16_bn-76b7f2c6.zip(56.03 MB)
    demo_update.png(172.94 KB)
    master-bade6eae.zip(241.18 MB)
    mobilenet_v3_large-a0aea820.pt(16.79 MB)
    mobilenet_v3_large-d27d66f2.zip(15.47 MB)
    mobilenet_v3_small-69c7267d.pt(6.36 MB)
    mobilenet_v3_small-d624c4de.zip(5.82 MB)
    mock_receipt.jpeg(397.19 KB)
    sar_resnet31-9ee49970.zip(204.02 MB)
  • v0.2.1(May 28, 2021)

    This patch release fixes issues with preprocessor and greatly improves text detection models.

    Brought to you by @fg-mindee & @charlesmindee

    Note: doctr 0.2.1 requires TensorFlow 2.4.0 or higher.

    Highlights

    Improved text detection

    With this iteration, DocTR brings you a set of newly pretrained parameters for db_resnet50 which was trained using a much wider range of data augmentations!

    architecture | FUNSD recall | FUNSD precision | CORD recall | CORD precision -- | -- | -- | -- | -- db_resnet50 + crnn_vgg16_bn (v0.2.0) | 64.8 | 70.3 | 67.7 | 78.4 db_resnet50 + crnn_vgg16_bn (v0.2.1) | 70.08 | 74.77 | 82.19 | 79.67

    OCR sample

    Sequence prediction confidence

    Users might be tempted to filtered text recognition predictions, which was not easy previously without a prediction's confidence. We harmonized our recognition models to provide the sequence prediction probability.

    Using the following image: reco_sample

    with this snippet

    from doctr.documents import DocumentFile
    from doctr.models import recognition_predictor
    predictor = recognition_predictor(pretrained=True)
    doc = DocumentFile.from_images("path/to/reco_sample.jpg")
    print(predictor(doc))
    

    will get you a list of tuples (word value, sequence confidence):

    [('invite', 0.9302278757095337)]
    

    More comprehensive representation of predictors

    For those who play around with the predictor's component, you might value your understanding of their composition. In order to get a cleaner interface, we improved the representation of all predictors component.

    The following snippet:

    from doctr.models import ocr_predictor
    print(ocr_predictor())
    

    now yields a much cleaner representation of the predictor composition

    OCRPredictor(
      (det_predictor): DetectionPredictor(
        (pre_processor): PreProcessor(
          (resize): Resize(output_size=(1024, 1024), method='bilinear')
          (normalize): Compose(
            (transforms): [
              LambdaTransformation(),
              Normalize(mean=[0.7979999780654907, 0.7850000262260437, 0.7720000147819519], std=[0.2639999985694885, 0.27489998936653137, 0.28700000047683716]),
            ]
          )
        )
        (model): DBNet(
          (feat_extractor): IntermediateLayerGetter()
          (fpn): FeaturePyramidNetwork(channels=128)
          (probability_head): <tensorflow.python.keras.engine.sequential.Sequential object at 0x7f6f645f58e0>
          (threshold_head): <tensorflow.python.keras.engine.sequential.Sequential object at 0x7f6f7ce15310>
          (postprocessor): DBPostProcessor(box_thresh=0.1, max_candidates=1000)
        )
      )
      (reco_predictor): RecognitionPredictor(
        (pre_processor): PreProcessor(
          (resize): Resize(output_size=(32, 128), method='bilinear', preserve_aspect_ratio=True, symmetric_pad=False)
          (normalize): Compose(
            (transforms): [
              LambdaTransformation(),
              Normalize(mean=[0.5, 0.5, 0.5], std=[1.0, 1.0, 1.0]),
            ]
          )
        )
        (model): CRNN(
          (feat_extractor): <doctr.models.backbones.vgg.VGG object at 0x7f6f7d866040>
          (decoder): <tensorflow.python.keras.engine.sequential.Sequential object at 0x7f6f7cce2430>
          (postprocessor): CTCPostProcessor(vocab_size=118)
        )
      )
      (doc_builder): DocumentBuilder(resolve_lines=False, resolve_blocks=False, paragraph_break=0.035)
    )
    

    Breaking changes

    Metrics' granularity

    Renamed ExactMatch to TextMatch since the metric now produces different levels of flexibility for the evaluation. Additionally, the constructor flags have been deprecated since the summary will provide all different types of evaluation.

    0.2.0 | 0.2.1 -- | -- >>> from doctr.utils.metrics import ExactMatch
    >>> metric = ExactMatch(ignore_case=True)
    >>> metric.update(["i", "am", "a", "jedi"], ["I", "am", "a", "sith"])
    >>> print(metric.summary())
    0.75 | >>> from doctr.utils.metrics import TextMatch
    >>> metric = TextMatch()
    >>> metric.update(["i", "am", "a", "jedi"], ["I", "am", "a", "sith"])
    >>> print(metric.summary())
    {'raw': 0.5, 'caseless': 0.75, 'unidecode': 0.5, 'unicase': 0.75} |

    Raw being the exact match, caseless being the exact match of lower case counterparts, unidecode being the exact match of unidecoded counterparts, and unicase being the exact match of unidecoded lower-case counterparts.

    New features

    Models

    Deep learning model building and inference

    • Added detection features of faces (#258), bar codes (#260)
    • Added new pretrained weights for db_resnet50 (#277)
    • Added sequence probability in text recognition (#284)

    Utils

    Utility features relevant to the library use cases.

    • Added granularity on recognition metrics (#274)
    • Added visualization option to display artefacts (#273)

    Transforms

    Data transformations operations

    • Added option to switch padding between symmetric and left for resizing while preserving aspect ratio (#277)

    Test

    Verifications of the package well-being before release

    • added unittests for artefact detection (#258, #260)
    • added detailed unittests for granular metrics (#274)
    • Extended unittests for resizing (#277)

    Documentation

    Online resources for potential users

    • Added installation instructions for Mac & Windows users (#268)
    • Added benchmark of models on private datasets (#269)
    • Added changelog to the documentation (#279)
    • Added BibTeX citation in README (#279)
    • Added parameter count in performance benchmarks (#280)
    • Added OCR illustration in README (#283) and documentation (#285)

    References

    Reference training scripts

    • Added support of Weights & biases logging for training scripts (#286)
    • Added option to start using pretrained models (#286)

    Others

    Other tools and implementations

    • Added CI job to build for MacOS & Windows (#268)

    Bug fixes

    Datasets

    • Fixed blank image handling in OCRDataset (#270)

    Documents

    • Fixed channel order for PDF render into images (#276)

    Models

    • Fixed normalization step in preprocessors (#277)

    Utils

    • Fixed OCRMetric update edge case (#267)

    Transforms

    • Fixed Resize when preserving aspect ratio (#266)
    • Fixed RandomSaturation (#277)

    Documentation

    • Fixed documentation of OCRDataset (#274)
    • Improved documentation of doctr.documents.elements (#274)

    References

    • Fixed resizing in recognition script (#266)

    Others

    • Fixed demo for multi-page examples (#276)
    • Fixed image decoding in API routes (#282)
    • Fixed preprocessing in API routes (#282)

    Improvements

    Datasets

    • Added file existence check in dataset constructors (#277)
    • Refactored dataset methods (#278)

    Models

    • Improved DBNet box computation (#272)
    • Refactored preprocessors using transforms (#277)
    • Improved repr of preprocessors and models (#277)
    • Removed ignore_case and ignore_accents from recognition postprocessors (#284)

    Documents

    • Updated performance benchmarks (#272, #277)

    Documentation

    • Updated badges in README & documentation versions (#254)
    • Updated landing page of documentation (#279, #285)
    • Updated repo folder description in CONTRIBUTING (#282)
    • Improved the README's instructions to run the API (#282)

    Tests

    • Improved unittest of resizing transforms (#266)
    • Improved unittests of OCRMetric (#267)
    • Improved unittest of PDF rendering (#276)
    • Extended unittest of OCRDataset (#278)
    • Updated unittest of DocumentBuilder and recognition models (#284)

    References

    • Updated training scripts (#284)

    Others

    • Updated requirements (#274)
    • Updated evaluation script (#277, #284)
    Source code(tar.gz)
    Source code(zip)
    bitmap30.png(62.00 KB)
    demo.png(943.96 KB)
    Grace_Hopper.jpg(72.01 KB)
    master_new.zip(246.76 MB)
    tmp_checkpoint-6f0ce0e6.pt(2.08 KB)
  • v0.2.0(May 11, 2021)

    This release improves model performances and extends library features considerably (including a minimal API template, new datasets, newly trained models).

    Release handled by @fg-mindee & @charlesmindee

    Note: doctr 0.2.0 requires TensorFlow 2.4.0 or higher.

    Highlights

    New pretrained weights

    Enjoy our newly trained detection and recognition models with improved robustness and performances! Check our fully benchmark in the documentation for further details.

    Improved Line & block detection

    This release comes with a large improvement of line detection. While it is only done in post-processing for now, we considered many cases to make sure you get a consistent and helpful result:

    Before | After -- | -- Before | After

    File reading from any source

    You can now expect reading images or PDF from files, binary streams, or even URLs. We completely revamped our document reading pipeline with the new DocumentFile class methods

    from doctr.documents import DocumentFile
    # PDF
    pdf_doc = DocumentFile.from_pdf("path/to/your/doc.pdf").as_images()
    # Image
    single_img_doc = DocumentFile.from_images("path/to/your/img.jpg")
    # Multiple page images
    multi_img_doc = DocumentFile.from_images(["path/to/page1.jpg", "path/to/page2.jpg"])
    # Web page
    webpage_doc = DocumentFile.from_url("https://www.yoursite.com").as_images()
    

    If by any chance your PDF is a source file (web page are converted into such PDF) and not a scanned version, you will also be able to read the information inside

    from doctr.documents import DocumentFile
    pdf_doc = DocumentFile.from_pdf("path/to/your/doc.pdf")
    # Retrieve bounding box and text information
    words = pdf_doc.get_words()
    

    Reference scripts for training

    By adding multithreading dataloaders and transformations in DocTR, we can now provide you with reference training scripts to train models on your own!

    Text detection script (additional details available in README)

    python references/detection.train.py /path/to/dataset db_resnet50 -b 8 --input-size 512 --epochs 20
    

    Text recognition script (additional details available in README)

    python references/detection.train.py /path/to/dataset db_resnet50 -b 8 --input-size 512 --epochs 20
    

    Minimal API

    If you enjoy DocTR, you might want to integrate it in your API. For your convenience, we added a minimal API template with routes for text detection, text recognition or plain OCR!

    Run it as follows in a docker container:

    PORT=8050 docker-compose up -d --build
    

    Your API is now running locally on port 8050! Navigate to http://localhost:8050/redoc to check your documentation API doc

    Or start making your first request!

    import requests
    import io
    with open('/path/to/your/image.jpeg', 'rb') as f:
        data = f.read()
    response = requests.post("http://localhost:8050/recognition", files={'file': io.BytesIO(data)})
    

    Breaking changes

    Support dropped for TF < 2.4.0

    In order to ensure that all compression features are fully functional in DocTR, support for TensorFlow < 2.4.0 has been dropped.

    Less confusing predictor's inputs

    OCRPredictor used to be taking a list of documents as input, and now only takes list of pages.

    0.1.1 | 0.2.0 -- | -- >>> predictor = ...
    >>> page = np.zeros((h, w, 3), dtype=np.uint8)
    >>> out = predictor([[page]]) | >>> predictor = ...
    >>> page = np.zeros((h, w, 3), dtype=np.uint8)
    >>> out = predictor([page]) |

    Model calls

    To gain more flexibility on the training side, the model call method was changed to yield a dictionary with multiple entries

    0.1.1 | 0.2.0 -- | -- >>> from doctr.models import db_resnet50, DBPostProcessor
    >>> model = db_resnet50(pretrained=True)
    >>> postprocessor = DBPostProcessor()
    >>> prob_map = model(input_t, training=False)
    >>> boxes = postprocessor(prob_map) | >>> from doctr.models import db_resnet50
    >>> model = db_resnet50(pretrained=True)
    >>> out = model(input_t, training=False)
    >>> boxes = out['boxes'] |

    New features

    Datasets

    Easy-to-use datasets for OCR

    • Added support of SROIE (#165) and CORD (#197)
    • Added recognition dataloader (#163)
    • Added sequence encoding function (#184)
    • Added DataLoader as a dataset wrapper for parallel high performance data reading (#198, #201)
    • Added support of OCRDataset (#244)

    Documents

    • Added class methods for flexible file reading (#172)
    • Added visualization method to Document and Page (#174)
    • Added support for webpage conversion to document (#221, #222)
    • Added block detection in documents (#224)

    Models

    Deep learning model building and inference

    • Added pretrained weights for crnn_resnet31 recognition model (#160)
    • Added target building (#162) & loss computation (#171) methods in DBNet
    • Added loss computation for SAR (#185)
    • Added LinkNet detection architecture (#191, #200, #202)

    Utils

    Utility features relevant to the library use cases.

    • Added reset method to metric objects (#175)

    Transforms

    Data transformations operations

    • Added Compose, Resize, Normalize & LambdaTransformation (#205)
    • Added color transformations (#206, #207, #211)

    Test

    Verifications of the package well-being before release

    • Added unittest for preprocessors and DB target building (#162) & loss computation (#171)
    • Added unittests for dataloaders (#163, #198, #201)
    • Added unittests for file reading in all input forms and document format (#172, #221, #222, #240)
    • Added unittests for element display (#174)
    • Added unittests for metric reset (#175) & IoU computation (#176)
    • Added unittests for sequence encoding (#184)
    • Added unittests for loss computation of recognition models (#185, #186)
    • Added unittests for CORD datasets (#197)
    • Added unittests for transformations (#205, #206, #207)
    • Added unittests for OCRDataset (#244)

    Documentation

    Online resources for potential users

    • Added performances of crnn_resnet31 (#160)
    • Added instructions to use Docker in README (#174)
    • Added references to implemented papers in the README (#191)
    • Added DataLoader section (#198) as well as transforms (#205, #211) in the documentation
    • Added FPS in model benchmark section of the documentation (#209)
    • Added vocab descriptions in documentation (#238)
    • Added explanations to export models as SavedModel (#246)

    References

    • Added reference training script for recognition (#164) & detection (#178)
    • Added checkpoint resuming (#210), test-only (#228), backbone freezing (#231), sample display (#249) options
    • Added data augmentations (#211)

    Others

    Other tools and implementations

    • Added localization and end-to-end visualization in demo app (#188)
    • Added minimal API implementation using FastAPI (#242, #245, #247)
    • Added CI workflow to run API unittests (#242)

    Bug fixes

    Datasets

    • Fixed inplace modifications of boxes in detection dataset (#212)
    • Fixed box dtype in detection datasets (#226)

    Models

    • Fixed edge case of DBNet target computation (#163, #217)
    • Fixed edge case of resizing with zero-sized inputs (#177)
    • Fixed CTC loss computation in CRNN (#195)
    • Fixed NaN from a rare zero division by adding eps #228)
    • Fixed LinkNet loss computation (#250)

    Utils

    • Fixed IoU computation with zero-sized inputs (#176, #177)
    • Fixed localization metric update (#227)

    Documentation

    • Fixed usage instructions in README (#174)

    References

    • Fixed resizing (#194) and validation transforms (#214) in recognition script
    • Fixed dataset args in training scripts (#195)
    • Fixed tensor scaling in recognition training (#241)
    • Fixed validation loop (#250)

    Others

    • Fixed pypi publishing CI job (#159)
    • Fixed typo in evaluation script (#177)
    • Fixed demo app inference of detection model (#219)

    Improvements

    Datasets

    • Refactored dataloaders (#193)

    Models

    • Refactored task predictors (#169)
    • Harmonized preprocessor call method (#162)
    • Switched input type of OCRPredictor from list of docs to list of pages (#170)
    • Added a backbone submodule with all corresponding models (#187) and refactored recognition models
    • Added improved pretrained weights of DBNet (#196)
    • Moved box score computation to core detection postprocessor (#203)
    • Refactored loss computation for detection models (#208)
    • Improved line detection in DocumentBuilder (#220)
    • Made detection postprocessing more robust to image size (#230)
    • Moved post processors inside the model to have a more flexible call (#248, #250)

    Documents

    • Improved error for image reading when the file cannot be found (#229)
    • Increased default DPI for PDF to image rendering (#240)
    • Improved speed of PDF to image conversion (#251)

    Utils

    • Made page display size dynamic while preserving aspect ratio (#173)
    • Improved visualization size resolution (#174)

    Documentation

    • Added hyperlinks for license and CONTRIBUTING in the README (#169)
    • Enlarged column width in documentation (#169)
    • Added visualization script GIF in README (#173)
    • Revamped README and documentation (#182)
    • Rearranged model benchmark tables in documentation (#196, #199)
    • Improved documentation landing page (#239)

    Tests

    • Added more thorough test cases for vision datasets (#165)
    • Refactored loader unittests (#193)
    • Added unittest for edge case in metric computation (#227)

    References

    • Added preprocessing in training scripts (#180)
    • Added recognition loss computation (#189)
    • Added resize transformations (#205) in scripts
    • Added proper console metric logging (#208, #210)
    • Added dataset information console print (#228)

    Others

    • Added version index override possibility for setup (#159)
    • Enabled TF gpu growth in demo & scripts (#179)
    • Added support of images and model selection in demo app (#183)
    • Improved PDF resizing in demo app & eval script (#237)
    • Dropped support for TF < 2.4 (#243)
    Source code(tar.gz)
    Source code(zip)
    db_resnet50-adcafc63.zip(89.81 MB)
    ocr.png(1.97 MB)
  • v0.1.1(Mar 18, 2021)

    This release patch fixes several bugs, introduces OCR datasets and improves model performances.

    Release handled by @fg-mindee & @charlesmindee

    Note: doctr 0.1.1 requires TensorFlow 2.3.0 or higher.

    Highlights

    Introduction of vision datasets

    Whether this is for training or evaluation purposes, DocTR provides you with objects to easily download and manipulate datasets. Access OCR datasets within a few lines of code:

    from doctr.datasets import FUNSD
    train_set = FUNSD(train=True, download=True)
    img, target = train_set[0]
    

    Model evaluation

    While DocTR 0.1.0 gave you access to pretrained models, you had no way to find the performances of these models apart from computing them yourselves. As of now, we have added a performance benchmark in our documentation for all our models and made the evaluation script available for seamless reproducibility:

    python scripts/evaluate.py ocr_db_crnn_vgg
    

    Demo app

    Since we want to make DocTR a convenience for you to build OCR-related applications and services, we made a minimal Streamlit demo app to showcase its text detection capabilities. You can run the demo with the following commands:

    streamlit run demo/app.py
    

    Here is how it renders performing text detection on a sample document: doctr_demo

    Breaking changes

    Metric update & summary

    For improved clarity, the evaluation metrics' methods were renamed.

    0.1.0 | 0.1.1 -- | -- >>> from doctr.utils import ExactMatch
    >>> metric = ExactMatch()
    >>> metric.update_state(['Hello', 'world'], ['hello', 'world'])
    >>> metric.result() | >>> from doctr.utils import ExactMatch
    >>> metric = ExactMatch()
    >>> metric.update(['Hello', 'world'], ['hello', 'world'])
    >>> metric.summary() |

    Renaming of high-level predictors

    As the range of backbones and combinations evolves, we have updated the name of high-level predictors: 0.1.0 | 0.1.1 -- | -- >>> from doctr.models import ocr_db_crnn | >>> from doctr.models import ocr_db_crnn_vgg |

    New features

    Datasets

    Easy-to-use datasets for OCR

    • Added predefined vocabs (#116)
    • Added string encoding/decoding utilities (#116)
    • Added FUNSD dataset (#136, #141)

    Models

    Deep learning model building and inference

    • Added ResNet-31 backbone to SAR (#132) and CRNN (#148)

    Utils

    Utility features relevant to the library use cases.

    • Added localization (#117) & end-to-end OCR (#122, #141) metrics

    Test

    Verifications of the package well-being before release

    • Added unittests for evaluation metrics (#117, #122)
    • Added unittests for string encoding/decoding (#116)
    • Added unittests for datasets (#136, #141)
    • Added unittests for pretrained crnn_resnet31 (#148), and OCR predictors (#150)

    Documentation

    Online resources for potential users

    • Added pypi badge to README (#114)
    • Added pypi installation instructions to documentation (#114)
    • Added evaluation metric section (#117, #122, #158)
    • Added multi-version documentation deployment (#123)
    • Added datasets page in documentation (#136, #154)
    • Added performance benchmark on FUNSD in documentation (#143, #149, #150, #155)
    • Added instructions in README to run the demo app (#146)
    • Added sar_resnet31 to recognition models documentation (#150)

    Others

    Other tools and implementations

    • Added default label to bug report issues (#121)
    • Updated CI job for documentation build (#123)
    • Added CI job to ensure analyze.py script runs (#142)
    • Added evaluation script (#141, #145, #151)
    • Added text detection demo app (#146)

    Bug fixes

    Models

    • Fixed no-detection predictor export (#119)
    • Fixed edge case of polygon to box computation (#139)
    • Fixed DB bitmap_to_boxes method (#155)

    Utils

    • Fixed typo in ExactMatch (#120)
    • Fixed IoU computation when boxes are distant (#140)

    Test

    Documentation

    • Fixed docstring examples of predictors (#126)
    • Fixed multi-version documentation build (#138)
    • Fixed docstrings of VisionDataset and FUNSD (#147)
    • Fixed usage instructions in README (#150)
    • Fixed installation instructions in documentation (#154)

    Others

    • Fixed pypi release CI job (#153)

    Improvements

    Models

    • Added dimension check on predictor's inputs (#126)
    • Updated pretrained DBNet URLs (#129, #150)
    • Improved DBNet post-processing (#130, #150, #155, #157)
    • Moved normalization parameters to config (#133, #150)
    • Refactored file downloading (#136)
    • Increased default batch size for recognition (#143)
    • Updated max_length and input_shape of SAR (#143)
    • Added support of absolute coordinates for crop extraction (#145)
    • Added proper kernel sizing to silence TF unresolved checkpoints warnings (#152, #156)

    Utils

    • Renamed state updating and summarizing methods of metrics (#117)
    • Updated text distance computation backend (#128)
    • Simplified repr of NestedObject when they have no children (#137)

    Documentation

    • Cleaned README prerequisites & URLs (#125)
    • Added usage example for images in README (#125)
    • Updated installation instructions in README (#154)
    • Added docstring examples to FUNSD (#154)
    • Added docstring examples to evaluation metrics (#154)

    Others

    • Updated environment collection script & bug report template (#135)
    • Enabled GPU on analyze.py script (#141)
    Source code(tar.gz)
    Source code(zip)
    cord_test.zip(204.80 MB)
    cord_train.zip(1557.47 MB)
    crnn_resnet31-69ab71db.zip(173.91 MB)
    db_resnet50-98ba765d.zip(89.62 MB)
    doctr_demo_app.png(193.86 KB)
    doctr_example_script.gif(67.45 KB)
    sroie2019_test.zip(164.36 MB)
    sroie2019_train_task1.zip(276.29 MB)
  • v0.1.0(Mar 5, 2021)

    This first release adds pretrained models for end-to-end OCR and document manipulation utilities.

    Release handled by @fg-mindee & @charlesmindee

    Note: doctr 0.1.0 requires TensorFlow 2.3.0 or newer.

    Highlights

    Easy & high-performing document reading

    Since document processing is at the core of this project, being able to read documents efficiently is a priority. In this release, we considered PDF and image-based files.

    PDF reading is a wrapper around PyMuPDF back-end for fast file reading

    from doctr.documents import read_pdf
    # from path
    doc = read_pdf("path/to/your/doc.pdf")
    # from stream
    with open("path/to/your/doc.pdf", 'rb') as f:
        doc = read_pdf(f.read())
    

    while image reading is using OpenCV backend

    from doctr.documents import read_img
    page = read_img("path/to/your/img.jpg")
    

    Pretrained End-to-End OCR predictors

    Whether you conduct text detection, text recognition or end-to-end OCR, this release brings you pretrained models and advanced predictors (that will take care of all preprocessing, model inference and post-processing for you) for easy-to-use pythonic features

    Text detection

    Currently, only DBNet-based architectures are supported, more to come in the next releases!

    from doctr.documents import read_pdf
    from doctr.models import db_resnet50_predictor
    model = db_resnet50_predictor(pretrained=True)
    doc = read_pdf("path/to/your/doc.pdf")
    result = model(doc)
    

    Text recognition

    There are two architectures implemented for recognition: CRNN, and SAR

    from doctr.models import crnn_vgg16_bn_predictor
    model = crnn_vgg16_bn_predictor(pretrained=True)
    

    End-to-End OCR

    Simply combining two models into a two-stage architecture, OCR predictors bring you the easiest way to analyze your document

    from doctr.documents import read_pdf
    from doctr.models import ocr_db_crnn
    
    model = ocr_db_crnn(pretrained=True)
    doc = read_pdf("path/to/your/doc.pdf")
    result = model([doc])
    

    New features

    Documents

    Documentation reading and manipulation

    • Added PDF (#8, #18, #25, #83) and image (#30, #79) reading utilities
    • Added document structured elements for export (#16, #26, #61, #102)

    Models

    Deep learning model building and inference

    • Added model export methods (#10)
    • Added preprocessing module (#20, #25, #36, #50, #55, #77)
    • Added text detection model and post-processing (#24, #32, #36, #43, #49, #51, #84): DBNet
    • Added image cropping function (#33, #44)
    • Added model param loading function (#49, #60)
    • Added text recognition post-processing (#35, #36, #37, #38, #43, #45, #49, #51, #63, #65, #74, #78, #84, #101, #107, #108, #111, #112): SAR & CRNN
    • Added task-specific predictors (#39, #52, #58, #62, #85, #98, #102)
    • Added VGG16 (#36), Resnet31 (#70) backbones

    Utils

    Utility features relevant to the library use cases.

    • Added page interactive prediction visualization (#54, #82)
    • Added custom types (#87)
    • Added abstract auto-repr object (#102)
    • Added metric module (#110)

    Test

    Verifications of the package well-being before release

    • Added pytest unittests (#7, #59, #75, #76, #80, #92, #104)

    Documentation

    Online resources for potential users

    • Updated README (#9, #48, #67, #68, #95)
    • Added CONTRIBUTING (#7, #29, #48, #67)
    • Added sphinx built documentation (#12, #36, #55, #86, #90, #91, #93, #96, #99, #106)

    Others

    Other tools and implementations

    • Added python package setup (#7, #21, #67)
    • Added CI verifications (#7, #67, #69, #73)
    • Added dockerized environment with library installed (#17, #19)
    • Added issue template (#34)
    • Added environment collection script (#81)
    • Added analysis script (#85, #95, #103)
    Source code(tar.gz)
    Source code(zip)
    crnn_vgg16_bn-748c855f.zip(55.97 MB)
    crnn_vgg16_bn-f29aa0aa.zip(52.14 MB)
    db_resnet50-091c08a5.zip(89.87 MB)
    db_resnet50-4448d997.zip(89.79 MB)
    db_resnet50-df8d0071.zip(89.87 MB)
    sample.pdf(28.53 KB)
    sar_resnet31-ea202587.zip(204.02 MB)
GBIM(Gesture-Based Interaction map)

手势交互地图 GBIM(Gesture-Based Interaction map),基于视觉深度神经网络的交互地图,通过电脑摄像头观察使用者的手势变化,进而控制地图进行简单的交互。网络使用PaddleX提供的轻量级模型PPYOLO Tiny以及MobileNet V3 small,使得整个模型大小约10MB左右,即使在CPU下也能快速定位和识别手势。

8 Feb 10, 2022
PyTorch implementation of "Simple and Deep Graph Convolutional Networks"

Simple and Deep Graph Convolutional Networks This repository contains a PyTorch implementation of "Simple and Deep Graph Convolutional Networks".(http

chenm 253 Dec 08, 2022
Winners of DrivenData's Overhead Geopose Challenge

Winners of DrivenData's Overhead Geopose Challenge

DrivenData 22 Aug 04, 2022
PyTorch/GPU re-implementation of the paper Masked Autoencoders Are Scalable Vision Learners

Masked Autoencoders: A PyTorch Implementation This is a PyTorch/GPU re-implementation of the paper Masked Autoencoders Are Scalable Vision Learners: @

Meta Research 4.8k Jan 04, 2023
BRNet - code for Automated assessment of BI-RADS categories for ultrasound images using multi-scale neural networks with an order-constrained loss function

BRNet code for "Automated assessment of BI-RADS categories for ultrasound images using multi-scale neural networks with an order-constrained loss func

Yong Pi 2 Mar 09, 2022
This repo includes our code for evaluating and improving transferability in domain generalization (NeurIPS 2021)

Transferability for domain generalization This repo is for evaluating and improving transferability in domain generalization (NeurIPS 2021), based on

gordon 9 Nov 29, 2022
Gin provides a lightweight configuration framework for Python

Gin Config Authors: Dan Holtmann-Rice, Sergio Guadarrama, Nathan Silberman Contributors: Oscar Ramirez, Marek Fiser Gin provides a lightweight configu

Google 1.7k Jan 03, 2023
Variational Attention: Propagating Domain-Specific Knowledge for Multi-Domain Learning in Crowd Counting (ICCV, 2021)

DKPNet ICCV 2021 Variational Attention: Propagating Domain-Specific Knowledge for Multi-Domain Learning in Crowd Counting Baseline of DKPNet is availa

19 Oct 14, 2022
Implementing Vision Transformer (ViT) in PyTorch

Lightning-Hydra-Template A clean and scalable template to kickstart your deep learning project 🚀 ⚡ 🔥 Click on Use this template to initialize new re

2 Dec 24, 2021
Embracing Single Stride 3D Object Detector with Sparse Transformer

SST: Single-stride Sparse Transformer This is the official implementation of paper: Embracing Single Stride 3D Object Detector with Sparse Transformer

TuSimple 385 Dec 28, 2022
Benchmarks for Object Detection in Aerial Images

Benchmarks for Object Detection in Aerial Images

Jian Ding 691 Dec 30, 2022
Python scripts to detect faces in Python with the BlazeFace Tensorflow Lite models

Python scripts to detect faces using Python with the BlazeFace Tensorflow Lite models. Tested on Windows 10, Tensorflow 2.4.0 (Python 3.8).

Ibai Gorordo 46 Nov 17, 2022
Progressive Growing of GANs for Improved Quality, Stability, and Variation

Progressive Growing of GANs for Improved Quality, Stability, and Variation — Official TensorFlow implementation of the ICLR 2018 paper Tero Karras (NV

Tero Karras 5.9k Jan 05, 2023
[WACV21] Code for our paper: Samuel, Atzmon and Chechik, "From Generalized zero-shot learning to long-tail with class descriptors"

DRAGON: From Generalized zero-shot learning to long-tail with class descriptors Paper Project Website Video Overview DRAGON learns to correct the bias

Dvir Samuel 25 Dec 06, 2022
Official implementation of the network presented in the paper "M4Depth: A motion-based approach for monocular depth estimation on video sequences"

M4Depth This is the reference TensorFlow implementation for training and testing depth estimation models using the method described in M4Depth: A moti

Michaël Fonder 76 Jan 03, 2023
The repository for our EMNLP 2021 paper "Finnish Dialect Identification: The Effect of Audio and Text"

Finnish Dialect Identification The repository for our EMNLP 2021 paper "Finnish Dialect Identification: The Effect of Audio and Text". We present a te

Rootroo Ltd 2 Dec 25, 2021
The code for our paper submitted to RAL/IROS 2022: OverlapTransformer: An Efficient and Rotation-Invariant Transformer Network for LiDAR-Based Place Recognition.

OverlapTransformer The code for our paper submitted to RAL/IROS 2022: OverlapTransformer: An Efficient and Rotation-Invariant Transformer Network for

HAOMO.AI 136 Jan 03, 2023
Implementation of our paper "DMT: Dynamic Mutual Training for Semi-Supervised Learning"

DMT: Dynamic Mutual Training for Semi-Supervised Learning This repository contains the code for our paper DMT: Dynamic Mutual Training for Semi-Superv

Zhengyang Feng 120 Dec 30, 2022
Unofficial PyTorch implementation of Attention Free Transformer (AFT) layers by Apple Inc.

aft-pytorch Unofficial PyTorch implementation of Attention Free Transformer's layers by Zhai, et al. [abs, pdf] from Apple Inc. Installation You can i

Rishabh Anand 184 Dec 12, 2022
[NeurIPS 2021] Large Scale Learning on Non-Homophilous Graphs: New Benchmarks and Strong Simple Methods

Large Scale Learning on Non-Homophilous Graphs: New Benchmarks and Strong Simple Methods Large Scale Learning on Non-Homophilous Graphs: New Benchmark

60 Jan 03, 2023