OpenMMLab Text Detection, Recognition and Understanding Toolbox

Overview

Introduction

English | 简体中文

build docs codecov license PyPI Average time to resolve an issue Percentage of issues still open

MMOCR is an open-source toolbox based on PyTorch and mmdetection for text detection, text recognition, and the corresponding downstream tasks including key information extraction. It is part of the OpenMMLab project.

The main branch works with PyTorch 1.6+.

Documentation: https://mmocr.readthedocs.io/en/latest/.

Major Features

  • Comprehensive Pipeline

    The toolbox supports not only text detection and text recognition, but also their downstream tasks such as key information extraction.

  • Multiple Models

    The toolbox supports a wide variety of state-of-the-art models for text detection, text recognition and key information extraction.

  • Modular Design

    The modular design of MMOCR enables users to define their own optimizers, data preprocessors, and model components such as backbones, necks and heads as well as losses. Please refer to getting_started.md for how to construct a customized model.

  • Numerous Utilities

    The toolbox provides a comprehensive set of utilities which can help users assess the performance of models. It includes visualizers which allow visualization of images, ground truths as well as predicted bounding boxes, and a validation tool for evaluating checkpoints during training. It also includes data converters to demonstrate how to convert your own data to the annotation files which the toolbox supports.

Model Zoo

Supported algorithms:

Text Detection
Text Recognition
  • CRNN (TPAMI'2016)
  • NRTR (ICDAR'2019)
  • RobustScanner (ECCV'2020)
  • SAR (AAAI'2019)
  • SATRN (CVPR'2020 Workshop on Text and Documents in the Deep Learning Era)
  • SegOCR (Manuscript'2021)
Key Information Extraction
Named Entity Recognition

Please refer to model_zoo for more details.

License

This project is released under the Apache 2.0 license.

Citation

If you find this project useful in your research, please consider cite:

@article{mmocr2021,
    title={MMOCR:  A Comprehensive Toolbox for Text Detection, Recognition and Understanding},
    author={Kuang, Zhanghui and Sun, Hongbin and Li, Zhizhong and Yue, Xiaoyu and Lin, Tsui Hin and Chen, Jianyong and Wei, Huaqiang and Zhu, Yiqin and Gao, Tong and Zhang, Wenwei and Chen, Kai and Zhang, Wayne and Lin, Dahua},
    journal= {arXiv preprint arXiv:2108.06543},
    year={2021}
}

Changelog

v0.3.0 was released in 2021-8-25.

Installation

Please refer to install.md for installation.

Get Started

Please see getting_started.md for the basic usage of MMOCR.

Contributing

We appreciate all contributions to improve MMOCR. Please refer to CONTRIBUTING.md for the contributing guidelines.

Acknowledgement

MMOCR is an open-source project that is contributed by researchers and engineers from various colleges and companies. We appreciate all the contributors who implement their methods or add new features, as well as users who give valuable feedbacks. We hope the toolbox and benchmark could serve the growing research community by providing a flexible toolkit to reimplement existing methods and develop their own new OCR methods.

Projects in OpenMMLab

  • MMCV: OpenMMLab foundational library for computer vision.
  • MIM: MIM Installs OpenMMLab Packages.
  • MMClassification: OpenMMLab image classification toolbox and benchmark.
  • MMDetection: OpenMMLab detection toolbox and benchmark.
  • MMDetection3D: OpenMMLab's next-generation platform for general 3D object detection.
  • MMSegmentation: OpenMMLab semantic segmentation toolbox and benchmark.
  • MMAction2: OpenMMLab's next-generation action understanding toolbox and benchmark.
  • MMPose: OpenMMLab's pose estimation toolbox and benchmark.
  • MMTracking: OpenMMLab video perception toolbox and benchmark.
  • MMEditing: OpenMMLab image editing toolbox and benchmark.
  • MMOCR: A Comprehensive Toolbox for Text Detection, Recognition and Understanding.
  • MMGeneration: OpenMMLab image and video generative models toolbox.
  • MMFlow: OpenMMLab optical flow toolbox and benchmark.
Comments
  • Not getting accuracy on MMRecognition SAR Model Training

    Not getting accuracy on MMRecognition SAR Model Training

    HI @gaotongxiao

    I am training Custom dataset for Text recognition using SAR Model. I have total 7K plus images for training. Can you please help me how long should I have to wait for trained model. As of now it completed 65th epoch and the Accuracy matric at 65th epoch is as below:

    2022-03-21 08:06:45,433 - mmocr - INFO - Epoch(val) [65][100] 0_word_acc: 0.0000, 0_word_acc_ignore_case: 0.0000, 0_word_acc_ignore_case_symbol: 0.0000, 0_char_recall: 0.1346, 0_char_precision: 0.1089, 0_1-N.E.D: 0.0776

    As you can see precision and recall are very less.

    Also can you please suggest any preprocessing technique which you are aware to achieve good accuracy with respect to text recognition task?

    Here is the attached SS of training continuation: Training_66th_epoch

    awaiting response 
    opened by payal211 31
  • [Feature] Add Tesserocr Inference

    [Feature] Add Tesserocr Inference

    Append MMOCR.tesseract_det_inference() method, which can call by:

    from mmocr.utils.ocr import MMOCR ocr = MMOCR(det='Tesseract') results = ocr.readtext('demo/demo_kie.jpeg', print_result=True, imshow=True)

    opened by garvan2021 30
  • TypeError: 'DataContainer' object is not subscriptable/TypeError: SDMGR: __init__() got an unexpected keyword argument 'pretrained'

    TypeError: 'DataContainer' object is not subscriptable/TypeError: SDMGR: __init__() got an unexpected keyword argument 'pretrained'

    Checklist

    1. I have searched related issues but cannot get the expected help.

    Describe the bug I have trained the SDMGR model for 10 images and now Testing the SDMGR model to Visualize the Predictions.But iam getting Type error ypeError: 'DataContainer' object is not subscriptable .Please help me to resolve.

    Reproduction

    1. What command or script did you run?

    I ran the following command :

    python /disk2/mmocr/tools/test.py /disk2/mmocr/configs/kie/sdmgr/sdmgr_unet16_60e_wildreceipt_min_DS.py /disk2/mmocr/sdmgr/latest.pth --show-dir /disk2/mmocr/mmocr_kie_output Use load_from_local loader [ ] 0/1, elapsed: 0s, ETA:Traceback (most recent call last): File "/disk2/mmocr/tools/test.py", line 243, in main() File "/disk2/mmocr/tools/test.py", line 213, in main args.show_score_thr) File "/disk2/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/mmdet/apis/test.py", line 31, in single_gpu_test if batch_size == 1 and isinstance(data['img'][0], torch.Tensor): TypeError: 'DataContainer' object is not subscriptable

    Also iam using script as below if i run the script i get the error TypeError: SDMGR: init() got an unexpected keyword argument 'pretrained'

    During handling of the above exception, another exception occurred:

    Traceback (most recent call last): File "ocr_kie_config_test.py", line 13, in model = init_detector(cfg, checkpoint, device="cuda:0") File "/disk2/mmocr/mmocr/apis/inference.py", line 40, in init_detector model = build_detector(config.model, test_cfg=config.get('test_cfg')) File "/disk2/mmocr/mmocr/models/builder.py", line 140, in build_detector cfg, default_args=dict(train_cfg=train_cfg, test_cfg=test_cfg)) File "/disk2/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/mmcv/utils/registry.py", line 210, in build return self.build_func(*args, **kwargs, registry=self) File "/disk2/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/mmcv/cnn/builder.py", line 26, in build_model_from_cfg return build_from_cfg(cfg, registry, default_args) File "/disk2/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/mmcv/utils/registry.py", line 54, in build_from_cfg raise type(e)(f'{obj_cls.name}: {e}') TypeError: SDMGR: init() got an unexpected keyword argument 'pretrained'

    1. Did you make any modifications on the code or config? Did you understand what you have modified? I have changed few parameters of config file sdmgr_unet16_60e_wildreceipt.py
    2. What dataset did you use? Wildreceipt dataset

    Environment

    1. Please run python mmocr/utils/collect_env.py to collect necessary environment information and paste it here. sys.platform: linux Python: 3.7.11 (default, Jul 27 2021, 14:32:16) [GCC 7.5.0] CUDA available: True GPU 0: Tesla K80 CUDA_HOME: /usr/local/cuda-10.1 NVCC: Cuda compilation tools, release 10.1, V10.1.243 GCC: gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0 PyTorch: 1.6.0 PyTorch compiling details: PyTorch built with:
    • GCC 7.3
    • C++ Version: 201402
    • Intel(R) oneAPI Math Kernel Library Version 2021.3-Product Build 20210617 for Intel(R) 64 architecture applications
    • Intel(R) MKL-DNN v1.5.0 (Git Hash e2ac1fac44c5078ca927cb9b90e1b3066a0b2ed0)
    • OpenMP 201511 (a.k.a. OpenMP 4.5)
    • NNPACK is enabled
    • CPU capability usage: AVX2
    • CUDA Runtime 10.1
    • NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_37,code=compute_37
    • CuDNN 7.6.3
    • Magma 2.5.2
    • Build settings: BLAS=MKL, BUILD_TYPE=Release, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DUSE_VULKAN_WRAPPER -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, USE_CUDA=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, USE_STATIC_DISPATCH=OFF,

    TorchVision: 0.7.0 OpenCV: 4.2.0 MMCV: 1.3.8 MMCV Compiler: GCC 7.3 MMCV CUDA Compiler: 10.1 MMOCR: 0.3.0+76c9570

    opened by pushpalatha1405 30
  • Training data and code for key-information-extraction

    Training data and code for key-information-extraction

    Hi. Amazing work at https://mmocr.readthedocs.io/en/latest/demo.html#example-4-text-detection-recognition-key-information-extraction. Where can I code code for training the model for key-information extraction?

    P.S If it is not available in docs, I can send a PR.

    opened by INF800 25
  • [Errno 21] Is a directory: 'data/mixture/Syn90k/label.lmdb'

    [Errno 21] Is a directory: 'data/mixture/Syn90k/label.lmdb'

    When I tried to train MASTER on GPUs, it raised the error as below, however, I had orgnaized my data right and the directory "label.lmdb" surely had two files named "data.mdb" and "lock.mdb"

    RCS){OT{P@`2LUI@_Z4~SQ7

    %ZWF28 )OY@@`@2R%T4$)TJ

    opened by MingyuLau 22
  • Getting AssertionError: UniformConcatDataset: OCRDataset: AnnFileLoader:  when build dataset

    Getting AssertionError: UniformConcatDataset: OCRDataset: AnnFileLoader: when build dataset

    Upon building dataset, the compiler return AssertionError.

    The complete trace-back is as below.

    AssertionError: 
    
    
    During handling of the above exception, another exception occurred:
    
    AssertionError                            Traceback (most recent call last)
    
    AssertionError: AnnFileLoader: 
    
    
    During handling of the above exception, another exception occurred:
    
    AssertionError                            Traceback (most recent call last)
    
    AssertionError: OCRDataset: AnnFileLoader: 
    
    
    During handling of the above exception, another exception occurred:
    
    AssertionError                            Traceback (most recent call last)
    
    [/usr/local/lib/python3.7/dist-packages/mmcv/utils/registry.py](https://localhost:8080/#) in build_from_cfg(cfg, registry, default_args)
         67     except Exception as e:
         68         # Normal TypeError does not print class name.
    ---> 69         raise type(e)(f'{obj_cls.__name__}: {e}')
         70 
         71 
    
    AssertionError: UniformConcatDataset: OCRDataset: AnnFileLoader:
    

    In this case, I am using annotation of the format jsonl. Hence, the both the parser under the loader for training and testing was set to LineJsonParser.

    loader_dt_train = dict(type='AnnFileLoader',
                                repeat=1,                   
                                file_format='jsonl',
                                file_storage_backend='disk',
                                parser=dict(type='LineJsonParser',
                                            keys=['filename', 'text']))
    
    loader_dt_test = dict(type = 'AnnFileLoader',
                            repeat = 1,
                            file_format = 'jsonl',
                            file_storage_backend = 'disk',
                            parser = dict(type = 'LineJsonParser',
                                        keys = ['filename', 'text']))
    
    train_datasets1 = dict(type='OCRDataset',
                           img_prefix=img_prefix,
                           ann_file=train_anno_file1,
                           loader=loader_dt_train,
                           pipeline=None,           
                           test_mode=False)
    
    
    
    val_dataset = dict(type='OCRDataset',
                       img_prefix=img_prefix,
                       ann_file=train_anno_file1,
                       loader=loader_dt_test,
                       pipeline=None,               
                       test_mode=True)
    

    I think, the type (e.g., AnnFileLoader and OCRDataset )has been assigned properly.

    May I know what is the issue.

    The full code and issue can be reproduced via this Notebook.

    bug 
    opened by balandongiv 21
  • KIE annotation tool

    KIE annotation tool

    Hey! I really appreciate your excellent work! I want to add some of my own examples of annotated receipts to the wildreceipt dataset to train the model with my dataset. Is there any annotation tool available? Or is there a converter from other formats?

    opened by VtlNmnk 20
  • Permission denied: '/mmocr/mmocr/core/font.TTF'

    Permission denied: '/mmocr/mmocr/core/font.TTF'

    When I run recog-demo liked that python mmocr/utils/ocr.py %INPUT_FOLDER_PATH% --det None --recog CRNN_TPS --batch-mode --single-batch-size 10 --output %OUPUT_FOLDER_PATH%,permissionError happened.

    Matplotlib created a temporary config/cache directory at /tmp/matplotlib-wz8itdit because the default path (/.config/matplotlib) is not a writable directory; it is highly recommended to set the MPLCONFIGDIR environment variable to a writable directory, in particular to speed up the import of Matplotlib and to better support multiprocessing. load checkpoint from local path: /gdata1/huyc/hzx/weights/sar_r31_parallel_decoder_chineseocr_20210507-b4be8214.pth /mmocr/mmocr/apis/utils.py:53: UserWarning: Remove "MultiRotateAugOCR" to support batch inference since samples_per_gpu > 1. warnings.warn(warning_msg) /mmocr/mmocr/apis/utils.py:53: UserWarning: Remove "MultiRotateAugOCR" to support batch inference since samples_per_gpu > 1. warnings.warn(warning_msg) Downloading https://download.openmmlab.com/mmocr/data/font.TTF ...(because of the safe question,I can't download it directly from the web) Traceback (most recent call last): File "/opt/conda/lib/python3.8/shutil.py", line 788, in move os.rename(src, real_dst) PermissionError: [Errno 13] Permission denied: '/tmp/tmpt7zujgkl' -> '/mmocr/mmocr/core/font.TTF'

    During handling of the above exception, another exception occurred:

    Traceback (most recent call last): File "mmocr/utils/ocr.py", line 718, in main() File "mmocr/utils/ocr.py", line 714, in main ocr.readtext(**vars(args)) File "mmocr/utils/ocr.py", line 438, in readtext pp_result = self.single_pp(result, model) File "mmocr/utils/ocr.py", line 479, in single_pp res_img = model.show_result(arr, res, out_file=output) File "/mmocr/mmocr/models/textrecog/recognizer/base.py", line 218, in show_result img = imshow_text_label( File "/mmocr/mmocr/core/visualize.py", line 357, in imshow_text_label pred_img = draw_texts_by_pil(img, [pred_label], None) File "/mmocr/mmocr/core/visualize.py", line 608, in draw_texts_by_pil shutil.move(local_filename, font_path) File "/opt/conda/lib/python3.8/shutil.py", line 802, in move copy_function(src, real_dst) File "/opt/conda/lib/python3.8/shutil.py", line 432, in copy2 copyfile(src, dst, follow_symlinks=follow_symlinks) File "/opt/conda/lib/python3.8/shutil.py", line 261, in copyfile with open(src, 'rb') as fsrc, open(dst, 'wb') as fdst: PermissionError: [Errno 13] Permission denied: '/mmocr/mmocr/core/font.TTF' I have no [email protected]:/ghome/mmocr-main$ Permission denied: '/mmocr/mmocr/core/font.TTF' ``

    opened by formerlya 18
  • Reproduce total-text emperiment by FCENet

    Reproduce total-text emperiment by FCENet

    Abstract: I followed MMOCR Documentation reproduce the CTW1500 and Icdar 2015 emperiments, and followed the ctw1500 config create Total-text. Download Total-text imgs and annotations(.txt) from https://github.com/cs-chan/Total-Text-Dataset/tree/master/Dataset, and used tools/totaltext_convert.py convert totaltxt to icdardataset. About Fcenet_targets and textsnake_targets, i am not alter anywhere.But i found something wrong when i run train.py, whatever used or not convert annotations.

    Env: ubuntu:20.04 python:3.7.11 pytorch:1.8.0 cuda:11.5 mmcv-full:1.3.16 mmdet:2.18.0

    Error:

    ` --------------------
    2022-02-05 18:53:38,576 - mmocr - INFO - workflow: [('train', 1)], max: 1500 epochs
    2022-02-05 18:53:38,576 - mmocr - INFO - Checkpoints will be saved to /home/bill/Project/mmocr/fce_2626*2020 by HardDiskBackend.
    2022-02-05 18:53:44,399 - mmocr - INFO - Epoch [1][5/315]	lr: 1.000e-03, eta: 6 days, 6:53:38, time: 1.150, data_time: 0.543, memory: 3931, loss_text: 2.5313, loss_center: 2.2066,  loss_reg_x: 7.5278, loss_reg_y: 4.5634, loss: 34.7722
    2022-02-05 18:53:46,342 - mmocr - INFO - Epoch [1][10/315]	lr: 1.000e-03, eta: 4 days, 4:57:35, time: 0.389, data_time: 0.029, memory: 3931, loss_text: 1.7856, loss_center: 1.8217, loss_reg_x: 5.2337, loss_reg_y: 3.8988, loss: 20.8382
    2022-02-05 18:53:48,251 - mmocr - INFO - Epoch [1][15/315]	lr: 1.000e-03, eta: 3 days, 12:00:22, time: 0.382, data_time: 0.025, memory: 3931, loss_text: 1.9730, loss_center: 2.1837,loss_reg_x: 7.9185, loss_reg_y: 3.2578, loss: 21.1362
    Traceback (most recent call last):
      File "/home/bill/Project/mmocr/tools/train.py", line 221, in <module>
        main()
      File "/home/bill/Project/mmocr/tools/train.py", line 217, in main
        meta=meta)
      File "/home/bill/Project/mmocr/mmocr/apis/train.py", line 163, in train_detector
        runner.run(data_loaders, cfg.workflow)
      File "/home/bill/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 127, in run
        epoch_runner(data_loaders[i], **kwargs)
      File "/home/bill/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 47, in train
        for i, data_batch in enumerate(self.data_loader):
      File "/home/bill/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 517, in __next__
        data = self._next_data()
      File "/home/bill/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 1199, in _next_data
        return self._process_data(data)
      File "/home/bill/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 1225, in _process_data
        data.reraise()
      File "/home/bill/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/_utils.py", line 429, in reraise
        raise self.exc_type(msg)
    AssertionError: Caught AssertionError in DataLoader worker process 0.
    Original Traceback (most recent call last):
      File "/home/bill/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/utils/data/_utils/worker.py", line 202, in _worker_loop
        data = fetcher.fetch(index)
      File "/home/bill/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch
        data = [self.dataset[idx] for idx in possibly_batched_index]
      File "/home/bill/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 44, in <listcomp>
        data = [self.dataset[idx] for idx in possibly_batched_index]
      File "/home/bill/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/mmdet/datasets/custom.py", line 195, in __getitem__
        data = self.prepare_train_img(idx)
      File "/home/bill/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/mmdet/datasets/custom.py", line 218, in prepare_train_img
        return self.pipeline(results)
      File "/home/bill/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/mmdet/datasets/pipelines/compose.py", line 41, in __call__
        data = t(data)
      File "/home/bill/Project/mmocr/mmocr/datasets/pipelines/textdet_targets/base_textdet_targets.py", line 167, in __call__
        results = self.generate_targets(results)
      File "/home/bill/Project/mmocr/mmocr/datasets/pipelines/textdet_targets/fcenet_targets.py", line 351, in generate_targets
        polygon_masks_ignore)
      File "/home/bill/Project/mmocr/mmocr/datasets/pipelines/textdet_targets/fcenet_targets.py", line 316, in generate_level_targets
        level_img_size, lv_text_polys[ind])[None]
      File "/home/bill/Project/mmocr/mmocr/datasets/pipelines/textdet_targets/fcenet_targets.py", line 69, in generate_center_region_mask
        _, _, top_line, bot_line = self.reorder_poly_edge(polygon_points)
      File "/home/bill/Project/mmocr/mmocr/datasets/pipelines/textdet_targets/textsnake_targets.py", line 179, in reorder_poly_edge
        assert points.shape[0] >= 4
    AssertionError
    `
    

    Config:

    `dataset_type = 'IcdarDataset'
    data_root = 'tests/data/total-text-txt'
    
    img_norm_cfg = dict(
        mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)
    
    train_pipeline = [
        dict(type='LoadImageFromFile', color_type='color_ignore_orientation'),
        dict(
            type='LoadTextAnnotations',
            with_bbox=True,
            with_mask=True,
            poly2mask=False),
        dict(
            type='ColorJitter',
            brightness=32.0 / 255,
            saturation=0.5,
            contrast=0.5),
        dict(type='Normalize', **img_norm_cfg),
        dict(type='RandomScaling', size=800, scale=(3. / 4, 5. / 2)),
        dict(
            type='RandomCropFlip', crop_ratio=0.5, iter_num=1, min_area_ratio=0.2),
        dict(
            type='RandomCropPolyInstances',
            instance_key='gt_masks',
            crop_ratio=0.8,
            min_side_ratio=0.3),
        dict(
            type='RandomRotatePolyInstances',
            rotate_ratio=0.5,
            max_angle=30,
            pad_with_fixed_color=False),
        dict(type='SquareResizePad', target_size=800, pad_ratio=0.6),
        dict(type='RandomFlip', flip_ratio=0.5, direction='horizontal'),
        dict(type='Pad', size_divisor=32),
        dict(
            type='FCENetTargets',
            fourier_degree=fourier_degree,
            level_proportion_range=((0, 0.25), (0.2, 0.65), (0.55, 1.0))),
            # level_proportion_range=((0, 0.4), (0.3, 0.7), (0.6, 1.0))),
        dict(
            type='CustomFormatBundle',
            keys=['p3_maps', 'p4_maps', 'p5_maps'],
            visualize=dict(flag=False, boundary_key=None)),
        dict(type='Collect', keys=['img', 'p3_maps', 'p4_maps', 'p5_maps'])
    ]
    test_pipeline = [
        dict(type='LoadImageFromFile', color_type='color_ignore_orientation'),
        dict(
            type='MultiScaleFlipAug',
            img_scale=(1080, 736),
            flip=False,
            transforms=[
                dict(type='Resize', img_scale=(2626, 2020), keep_ratio=True), #12080,800  #1920,1080 #2022,2022 cha, 2560 1600 cha
                dict(type='Normalize', **img_norm_cfg),
                dict(type='Pad', size_divisor=32),
                dict(type='ImageToTensor', keys=['img']),
                dict(type='Collect', keys=['img']),
            ])
    ]
    data = dict(
        samples_per_gpu=4,
        workers_per_gpu=2,
        val_dataloader=dict(samples_per_gpu=1),
        test_dataloader=dict(samples_per_gpu=1),
        train=dict(
            type=dataset_type,
            ann_file=data_root + '/instances_training.json',
            img_prefix=data_root + '/imgs',
            pipeline=train_pipeline),
        val=dict(
            type=dataset_type,
            ann_file=data_root + '/instances_test.json',
            img_prefix=data_root + '/imgs',
            pipeline=test_pipeline),
        test=dict(
            type=dataset_type,
            ann_file=data_root + '/instances_test.json',
            img_prefix=data_root + '/imgs',
            pipeline=test_pipeline))
    evaluation = dict(interval=1, metric='hmean-iou', save_best='auto')
    
    # optimizer
    optimizer = dict(type='SGD', lr=1e-3, momentum=0.90, weight_decay=5e-4)
    optimizer_config = dict(grad_clip=None)
    lr_config = dict(policy='poly', power=0.9, min_lr=1e-7, by_epoch=True)
    total_epochs = 1500
    
    checkpoint_config = dict(interval=150)
    # yapf:disable
    log_config = dict(
        interval=5,
        hooks=[
            dict(type='TextLoggerHook')
    
        ])
    # yapf:enable
    dist_params = dict(backend='nccl')
    log_level = 'INFO'
    load_from = None
    resume_from = None
    workflow = [('train', 1)]`
    

    Annotations(Example_poly_gt_img12.txt): original annotations

    `x: [[112 149 200 210 166 134]], y: [[411 358 336 358 381 422]], ornt: [u'c'], transcriptions: [u'WOODFORD']
    x: [[212 262 316 307 257 217]], y: [[333 325 337 359 350 355]], ornt: [u'c'], transcriptions: [u'RESERVE']
    x: [[326 385 401 377 356 315]], y: [[346 391 440 442 396 365]], ornt: [u'c'], transcriptions: [u'DISTILLERY']
    x: [[199 222 245 246 230 208]], y: [[374 364 362 385 384 392]], ornt: [u'c'], transcriptions: [u'DSP']
    x: [[257 286 283 253]], y: [[363 366 388 383]], ornt: [u'm'], transcriptions: [u'KY']
    x: [[297 324 316 290]], y: [[370 384 401 391]], ornt: [u'm'], transcriptions: [u'52']
    x: [[168 251 248 167]], y: [[473 478 497 490]], ornt: [u'm'], transcriptions: [u'BOURBON']
    x: [[258 333 334 259]], y: [[479 483 503 495]], ornt: [u'm'], transcriptions: [u'WHISKEY']`
    

    or convert annotations

    `112,411,149,358,200,336,210,358,166,381,134,422,WOODFORD
    212,333,262,325,316,337,307,359,257,350,217,355,RESERVE
    326,346,385,391,401,440,377,442,356,396,315,365,DISTILLERY
    199,374,222,364,245,362,246,385,230,384,208,392,DSP
    257,363,286,366,283,388,253,383,KY
    297,370,324,384,316,401,290,391,52
    168,473,251,478,248,497,167,490,BOURBON
    258,479,333,483,334,503,259,495,WHISKEY`
    

    Adds: I success to reproduce ctw1500 and total-text emperiment, i remember to see that about total-text is similar to ctw1500 ,so i copy it as base config, and alter some network to improve p,r,and h-means ,but i found can not run this, i search for github of mmocr issuse about total-text, unfortunately, i cann't , so i hope some body can help me to fix it . thank your for your help sincerely. @gaotongxiao @cuhk-hbsun

    need reproduce 
    opened by Xiangrui-Li 18
  • Problems in training MJ + ST training set using robustscanner algorithm

    Problems in training MJ + ST training set using robustscanner algorithm

    At the beginning of training , the value of data_time is small. But a few mintues later, data_time increases gradually which leads to low GPU utilization. image image

    opened by GaoXinJian-USTC 18
  • seeking for totaltext .txt format ground truth support

    seeking for totaltext .txt format ground truth support

    with totaltext format data,we tried to follow this documentation of total text converter for detection task : https://mmocr.readthedocs.io/en/latest/datasets.html#text-detection

    whenever we run :

    !python /home/apsisdev/IMPORTANT/mmocr/tools/data/textdet/totaltext_converter.py /home/apsisdev/IMPORTANT/mmocr/demo/totaltext/ -o /home/apsisdev/IMPORTANT/mmocr/demo/totaltext/ --split-list training test

    we get this error :

    `Converting training into instances_training.json Loaded 17033 images from /home/apsisdev/IMPORTANT/mmocr/demo/totaltext/imgs/training [ ] 0/17033, elapsed: 0s, ETA:It takes 0.165757417678833s to convert totaltext annotation Traceback (most recent call last): File "/home/apsisdev/anaconda3/envs/ocr/lib/python3.8/site-packages/scipy/io/matlab/mio.py", line 39, in _open_file return open(file_like, mode), True FileNotFoundError: [Errno 2] No such file or directory: '/home/apsisdev/IMPORTANT/mmocr/demo/totaltext/annotations/training/poly_gt_img10369.mat'

    During handling of the above exception, another exception occurred:

    Traceback (most recent call last): File "/home/apsisdev/IMPORTANT/mmocr/tools/data/textdet/totaltext_converter.py", line 311, in main() File "/home/apsisdev/IMPORTANT/mmocr/tools/data/textdet/totaltext_converter.py", line 306, in main image_infos = collect_annotations(files, split, nproc=args.nproc) File "/home/apsisdev/IMPORTANT/mmocr/tools/data/textdet/totaltext_converter.py", line 85, in collect_annotations images = mmcv.track_progress(load_img_info_with_split, files) File "/home/apsisdev/anaconda3/envs/ocr/lib/python3.8/site-packages/mmcv/utils/progressbar.py", line 92, in track_progress results.append(func(task, **kwargs)) File "/home/apsisdev/IMPORTANT/mmocr/tools/data/textdet/totaltext_converter.py", line 261, in load_img_info img_info = load_mat_info(img_info, gt_file, split) File "/home/apsisdev/IMPORTANT/mmocr/tools/data/textdet/totaltext_converter.py", line 157, in load_mat_info contours, words = get_contours(gt_file, split) File "/home/apsisdev/IMPORTANT/mmocr/tools/data/textdet/totaltext_converter.py", line 108, in get_contours data = scio.loadmat(gt_path) File "/home/apsisdev/anaconda3/envs/ocr/lib/python3.8/site-packages/scipy/io/matlab/mio.py", line 224, in loadmat with _open_file_context(file_name, appendmat) as f: File "/home/apsisdev/anaconda3/envs/ocr/lib/python3.8/contextlib.py", line 113, in enter return next(self.gen) File "/home/apsisdev/anaconda3/envs/ocr/lib/python3.8/site-packages/scipy/io/matlab/mio.py", line 17, in _open_file_context f, opened = _open_file(file_like, appendmat, mode) File "/home/apsisdev/anaconda3/envs/ocr/lib/python3.8/site-packages/scipy/io/matlab/mio.py", line 45, in _open_file return open(file_like, mode), True FileNotFoundError: [Errno 2] No such file or directory: '/home/apsisdev/IMPORTANT/mmocr/demo/totaltext/annotations/training/poly_gt_img10369.mat'`

    our dataset groundtruth is not in .mat format rather .txt format,we want either :

    1. .txt groundtruth support
    2. both .txt and .mat file support
    documentation enhancement 
    opened by mobassir94 17
  • Can I return a bounding box for each character of the checked image?

    Can I return a bounding box for each character of the checked image?

    What is the feature?

    When you are interested in a part of the recognition result, you can use the bounding box of each recognized character to intercept the original image.

    Any other context?

    No response

    opened by Karity 0
  • [Paper List-2] Add 10 textrecog papers

    [Paper List-2] Add 10 textrecog papers

    Thanks for your contribution and we appreciate it a lot. The following instructions would make your pull request more healthy and more easily get feedback. If you do not understand some items, don't worry, just make the pull request and seek help from maintainers.

    Motivation

    Please describe the motivation of this PR and the goal you want to achieve through this PR.

    Modification

    Please briefly describe what modification is made in this PR.

    BC-breaking (Optional)

    Does the modification introduce changes that break the backward-compatibility of the downstream repositories? If so, please describe how it breaks the compatibility and how the downstream projects should modify their code to keep compatibility with this PR.

    Use cases (Optional)

    If this PR introduces a new feature, it is better to list some use cases here, and update the documentation.

    Checklist

    Before PR:

    • [ ] I have read and followed the workflow indicated in the CONTRIBUTING.md to create this PR.
    • [ ] Pre-commit or linting tools indicated in CONTRIBUTING.md are used to fix the potential lint issues.
    • [ ] Bug fixes are covered by unit tests, the case that causes the bug should be added in the unit tests.
    • [ ] New functionalities are covered by complete unit tests. If not, please add more unit test to ensure the correctness.
    • [ ] The documentation has been modified accordingly, including docstring or example tutorials.

    After PR:

    • [ ] If the modification has potential influence on downstream or other related projects, this PR should be tested with some of those projects.
    • [ ] CLA has been signed and all committers have signed the CLA in this PR.
    opened by Mountchicken 1
  • [Paper List-1] Add 10 textrecog papers

    [Paper List-1] Add 10 textrecog papers

    Thanks for your contribution and we appreciate it a lot. The following instructions would make your pull request more healthy and more easily get feedback. If you do not understand some items, don't worry, just make the pull request and seek help from maintainers.

    Motivation

    Please describe the motivation of this PR and the goal you want to achieve through this PR.

    Modification

    Please briefly describe what modification is made in this PR.

    BC-breaking (Optional)

    Does the modification introduce changes that break the backward-compatibility of the downstream repositories? If so, please describe how it breaks the compatibility and how the downstream projects should modify their code to keep compatibility with this PR.

    Use cases (Optional)

    If this PR introduces a new feature, it is better to list some use cases here, and update the documentation.

    Checklist

    Before PR:

    • [ ] I have read and followed the workflow indicated in the CONTRIBUTING.md to create this PR.
    • [ ] Pre-commit or linting tools indicated in CONTRIBUTING.md are used to fix the potential lint issues.
    • [ ] Bug fixes are covered by unit tests, the case that causes the bug should be added in the unit tests.
    • [ ] New functionalities are covered by complete unit tests. If not, please add more unit test to ensure the correctness.
    • [ ] The documentation has been modified accordingly, including docstring or example tutorials.

    After PR:

    • [ ] If the modification has potential influence on downstream or other related projects, this PR should be tested with some of those projects.
    • [ ] CLA has been signed and all committers have signed the CLA in this PR.
    opened by Mountchicken 1
  • [Fix] Deepcopy datasample in browse_dataset

    [Fix] Deepcopy datasample in browse_dataset

    Some data transforms modify annotations in-place, and may affect the visualization stage in browse_dataset.py. Applying deepcopy on intermediate results can fix this bug.

    opened by gaotongxiao 0
Releases(v1.0.0rc4)
  • v1.0.0rc4(Dec 6, 2022)

    Highlights

    1. Dataset Preparer can automatically generate base dataset configs at the end of the preparation process, and supports 6 more datasets: IIIT5k, CUTE80, ICDAR2013, ICDAR2015, SVT, SVTP.
    2. Introducing our projects/ folder - implementing new models and features into OpenMMLab's algorithm libraries has long been complained to be troublesome due to the rigorous requirements on code quality, which could hinder the fast iteration of SOTA models and might discourage community members from sharing their latest outcome here. We now introduce projects/ folder, where some experimental features, frameworks and models can be placed, only needed to satisfy the minimum requirement on the code quality. Everyone is welcome to post their implementation of any great ideas in this folder! We also add the first example project to illustrate what we expect a good project to have (check out the raw content of README.md for more info!).
    3. Inside the projects/ folder, we are releasing the preview version of ABCNet, which is the first implementation of text spotting models in MMOCR. It's inference-only now, but the full implementation will be available very soon.

    New Features & Enhancements

    • Add SVT to dataset preparer by @xinke-wang in https://github.com/open-mmlab/mmocr/pull/1521
    • Polish bbox2poly by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/1532
    • Add SVTP to dataset preparer by @xinke-wang in https://github.com/open-mmlab/mmocr/pull/1523
    • Iiit5k converter by @Harold-lkk in https://github.com/open-mmlab/mmocr/pull/1530
    • Add cute80 to dataset preparer by @xinke-wang in https://github.com/open-mmlab/mmocr/pull/1522
    • Add IC13 preparer by @xinke-wang in https://github.com/open-mmlab/mmocr/pull/1531
    • Add 'Projects/' folder, and the first example project by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/1524
    • Rename to {dataset-name}_task_train/test by @Harold-lkk in https://github.com/open-mmlab/mmocr/pull/1541
    • Add print_config.py to the tools by @IncludeMathH in https://github.com/open-mmlab/mmocr/pull/1547
    • Add get_md5 by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/1553
    • Add config generator by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/1552
    • Support IC15_1811 by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/1556
    • Update CT80 config by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/1555
    • Add config generators to all textdet and textrecog configs by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/1560
    • Refactor TPS by @Mountchicken in https://github.com/open-mmlab/mmocr/pull/1240
    • Add TextSpottingConfigGenerator by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/1561
    • Add common typing by @Harold-lkk in https://github.com/open-mmlab/mmocr/pull/1596
    • Update textrecog config and readme by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/1597
    • Support head loss or postprocessor is None for only infer by @Harold-lkk in https://github.com/open-mmlab/mmocr/pull/1594
    • Textspotting datasample by @Harold-lkk in https://github.com/open-mmlab/mmocr/pull/1593
    • Simplify mono_gather by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/1588
    • ABCNet v1 infer by @Harold-lkk in https://github.com/open-mmlab/mmocr/pull/1598

    Docs

    • Add Chinese Guidance on How to Add New Datasets to Dataset Preparer by @xinke-wang in https://github.com/open-mmlab/mmocr/pull/1506
    • Update the qq group link by @vansin in https://github.com/open-mmlab/mmocr/pull/1569
    • Collapse some sections; update logo url by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/1571
    • Update dataset preparer (CN) by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/1591

    Bug Fixes

    • Fix two bugs in dataset preparer by @xinke-wang in https://github.com/open-mmlab/mmocr/pull/1513
    • Register bug of CLIPResNet by @jyshee in https://github.com/open-mmlab/mmocr/pull/1517
    • Being more conservative on Dataset Preparer by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/1520
    • python -m pip upgrade in windows by @Harold-lkk in https://github.com/open-mmlab/mmocr/pull/1525
    • Fix wildreceipt metafile by @xinke-wang in https://github.com/open-mmlab/mmocr/pull/1528
    • Fix Dataset Preparer Extract by @xinke-wang in https://github.com/open-mmlab/mmocr/pull/1527
    • Fix ICDARTxtParser by @xinke-wang in https://github.com/open-mmlab/mmocr/pull/1529
    • Fix Dataset Zoo Script by @xinke-wang in https://github.com/open-mmlab/mmocr/pull/1533
    • Fix crop without padding and recog metainfo delete unuse info by @Harold-lkk in https://github.com/open-mmlab/mmocr/pull/1526
    • Automatically create nonexistent directory for base configs by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/1535
    • Change mmcv.dump to mmengine.dump by @ProtossDragoon in https://github.com/open-mmlab/mmocr/pull/1540
    • mmocr.utils.typing -> mmocr.utils.typing_utils by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/1538
    • Wildreceipt tests by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/1546
    • Fix judge exist dir by @Harold-lkk in https://github.com/open-mmlab/mmocr/pull/1542
    • Fix IC13 textdet config by @xinke-wang in https://github.com/open-mmlab/mmocr/pull/1563
    • Fix IC13 textrecog annotations by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/1568
    • Auto scale lr by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/1584
    • Fix icdar data parse for text containing separator by @Harold-lkk in https://github.com/open-mmlab/mmocr/pull/1587
    • Fix textspotting ut by @Harold-lkk in https://github.com/open-mmlab/mmocr/pull/1599
    • Fix TextSpottingConfigGenerator and TextSpottingDataConverter by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/1604
    • Keep E2E Inferencer output simple by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/1559

    New Contributors

    • @jyshee made their first contribution in https://github.com/open-mmlab/mmocr/pull/1517
    • @ProtossDragoon made their first contribution in https://github.com/open-mmlab/mmocr/pull/1540
    • @IncludeMathH made their first contribution in https://github.com/open-mmlab/mmocr/pull/1547

    Full Changelog: https://github.com/open-mmlab/mmocr/compare/v1.0.0rc3...v1.0.0rc4

    Source code(tar.gz)
    Source code(zip)
  • v1.0.0rc3(Nov 3, 2022)

    Highlights

    1. We release several pretrained models using oCLIP-ResNet as the backbone, which is a ResNet variant trained with oCLIP and can significantly boost the performance of text detection models.

    2. Preparing datasets is troublesome and tedious, especially in OCR domain where multiple datasets are usually required. In order to free our users from laborious work, we designed a Dataset Preparer to help you get a bunch of datasets ready for use, with only one line of command! Dataset Preparer is also crafted to consist of a series of reusable modules, each responsible for handling one of the standardized phases throughout the preparation process, shortening the development cycle on supporting new datasets.

    New Features & Enhancements

    • Add Dataset Preparer by @xinke-wang in https://github.com/open-mmlab/mmocr/pull/1484
    • support modified resnet structure used in oCLIP by @HannibalAPE in https://github.com/open-mmlab/mmocr/pull/1458
    • Add oCLIP configs by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/1509

    Docs

    • Update install.md by @rogachevai in https://github.com/open-mmlab/mmocr/pull/1494
    • Refine some docs by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/1455
    • Update some dataset preparer related docs by @xinke-wang in https://github.com/open-mmlab/mmocr/pull/1502
    • oclip readme by @Harold-lkk in https://github.com/open-mmlab/mmocr/pull/1505

    Bug Fixes

    • Fix offline_eval error caused by new data flow by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/1500

    New Contributors

    • @rogachevai made their first contribution in https://github.com/open-mmlab/mmocr/pull/1494
    • @HannibalAPE made their first contribution in https://github.com/open-mmlab/mmocr/pull/1458

    Full Changelog: https://github.com/open-mmlab/mmocr/compare/v1.0.0rc2...v1.0.0rc3

    Source code(tar.gz)
    Source code(zip)
  • v0.6.3(Nov 3, 2022)

    Highlights

    This release enhances the inference script and fixes a bug that might cause failure on TorchServe.

    Besides, a new backbone, oCLIP-ResNet, and a dataset preparation tool, Dataset Preparer, have been released in MMOCR 1.0.0rc3 (1.x branch). Check out the changelog for more information about the features, and maintenance plan for how we will maintain MMOCR in the future.

    New Features & Enhancements

    • Convert numpy.float32 type to python built-in float type by @JunYao1020 in https://github.com/open-mmlab/mmocr/pull/1462
    • When '.' char not in output string, output is also considered to be a… by @JunYao1020 in https://github.com/open-mmlab/mmocr/pull/1457
    • Refactor issue template by @Harold-lkk in https://github.com/open-mmlab/mmocr/pull/1449
    • issue template by @Harold-lkk in https://github.com/open-mmlab/mmocr/pull/1489
    • Update maintainers by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/1504
    • Support MMCV < 1.8.0 by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/1508

    Bug Fixes

    • fix ci by @Harold-lkk in https://github.com/open-mmlab/mmocr/pull/1491
    • [CI] Fix CI by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/1463

    Docs

    • [DOCs] Add MMYOLO in Readme. by @ysh329 in https://github.com/open-mmlab/mmocr/pull/1475
    • [Docs] Update contributing.md by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/1490

    New Contributors

    • @ysh329 made their first contribution in https://github.com/open-mmlab/mmocr/pull/1475

    Full Changelog: https://github.com/open-mmlab/mmocr/compare/v0.6.2...v0.6.3

    Source code(tar.gz)
    Source code(zip)
  • v1.0.0rc2(Oct 14, 2022)

    This release relaxes the version requirement of MMEngine to >=0.1.0, < 1.0.0.

    Full Changelog: https://github.com/open-mmlab/mmocr/compare/v1.0.0rc1...v1.0.0rc2

    Source code(tar.gz)
    Source code(zip)
  • v0.6.2(Oct 14, 2022)

    Highlights

    It's now possible to train/test models through Python Interface. For example, you can train a model under mmocr/ directory in this way:

    # an example of how to use such modifications is shown as the following:
    from mmocr.tools.train import TrainArg, parse_args, run_train_cmd
    args = TrainArg(config='/path/to/config.py')
    args.add_arg('--work-dir', '/path/to/dir')
    args = parse_args(args.arg_list)
    run_train_cmd(args)
    

    See PR #1138 for more details.

    Besides, release candidates for MMOCR 1.0 with tons of new features are available at 1.x branch now! Check out the changelog for more information about the features, and maintenance plan for how we will maintain MMOCR in the future.

    New Features

    • Adding test & train API to be used directly in code by @wybryan in https://github.com/open-mmlab/mmocr/pull/1138
    • Let ResizeOCR full support mmcv.impad's pad_val parameters by @hsiehpinghan in https://github.com/open-mmlab/mmocr/pull/1437

    Bug Fixes

    • Fix ABINet config by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/1256
    • Fix Recognition Score Normalization Issue by @xinke-wang in https://github.com/open-mmlab/mmocr/pull/1333
    • Remove max_seq_len inconsistency by @antoniolanza1996 in https://github.com/open-mmlab/mmocr/pull/1433
    • box points ordering by @yjmm10 in https://github.com/open-mmlab/mmocr/pull/1205
    • Correct spelling by misspelling 'preperties' to 'properties' by @JunYao1020 in https://github.com/open-mmlab/mmocr/pull/1446

    Docs

    • Demo, experiments and live inference API on Tiyaro by @Venkat2811 in https://github.com/open-mmlab/mmocr/pull/1272
    • Update 1.x info by @Harold-lkk in https://github.com/open-mmlab/mmocr/pull/1369
    • Add global notes to the docs and the version switcher menu by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/1406
    • Logger Hook Config Updated to Add WandB by @Nourollah in https://github.com/open-mmlab/mmocr/pull/1345

    New Contributors

    • @Venkat2811 made their first contribution in https://github.com/open-mmlab/mmocr/pull/1272
    • @wybryan made their first contribution in https://github.com/open-mmlab/mmocr/pull/1139
    • @hsiehpinghan made their first contribution in https://github.com/open-mmlab/mmocr/pull/1437
    • @yjmm10 made their first contribution in https://github.com/open-mmlab/mmocr/pull/1205
    • @JunYao1020 made their first contribution in https://github.com/open-mmlab/mmocr/pull/1446
    • @Nourollah made their first contribution in https://github.com/open-mmlab/mmocr/pull/1345

    Full Changelog: https://github.com/open-mmlab/mmocr/compare/v0.6.1...v0.6.2

    Source code(tar.gz)
    Source code(zip)
  • v1.0.0rc1(Oct 9, 2022)

    Highlights

    This release fixes a severe bug causing inaccurate metric reports in multi-GPU training. Together with the fix, weights for all the text recognition models in MMOCR 1.0 architecture are released. The inference shorthand for them are also added back to ocr.py. Besides, more documentation chapters are available now.

    New Features & Enhancements

    • Simplify the Mask R-CNN config by @xinke-wang in https://github.com/open-mmlab/mmocr/pull/1391
    • auto scale lr by @Harold-lkk in https://github.com/open-mmlab/mmocr/pull/1326
    • Update paths to pretrain weights by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/1416
    • Streamline duplicated split_result in pan_postprocessor by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/1418
    • Update model links in ocr.py and inference.md by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/1431
    • Update rec configs by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/1417
    • Visualizer refine by @Harold-lkk in https://github.com/open-mmlab/mmocr/pull/1411
    • Support get flops and parameters in dev-1.x by @vansin in https://github.com/open-mmlab/mmocr/pull/1414

    Docs

    • intersphinx and api by @Harold-lkk in https://github.com/open-mmlab/mmocr/pull/1367
    • Fix quickrun by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/1374
    • Fix some docs issues by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/1385
    • Add Documents for DataElements by @xinke-wang in https://github.com/open-mmlab/mmocr/pull/1381
    • config english by @Harold-lkk in https://github.com/open-mmlab/mmocr/pull/1372
    • Metrics by @xinke-wang in https://github.com/open-mmlab/mmocr/pull/1399
    • Add version switcher to menu by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/1407
    • Data Transforms by @xinke-wang in https://github.com/open-mmlab/mmocr/pull/1392
    • Fix inference docs by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/1415
    • Fix some docs by @xinke-wang in https://github.com/open-mmlab/mmocr/pull/1410
    • Add maintenance plan to migration guide by @xinke-wang in https://github.com/open-mmlab/mmocr/pull/1413
    • Update Recog Models by @xinke-wang in https://github.com/open-mmlab/mmocr/pull/1402

    Bug Fixes

    • clear metric.results only done in main process by @Harold-lkk in https://github.com/open-mmlab/mmocr/pull/1379
    • Fix a bug in MMDetWrapper by @xinke-wang in https://github.com/open-mmlab/mmocr/pull/1393
    • Fix browse_dataset.py by @Mountchicken in https://github.com/open-mmlab/mmocr/pull/1398
    • ImgAugWrapper: Do not cilp polygons if not applicable by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/1231
    • Fix CI by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/1365
    • Fix merge stage test by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/1370
    • Del CI support for torch 1.5.1 by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/1371
    • Test windows cu111 by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/1373
    • Fix windows CI by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/1387
    • Upgrade pre commit hooks by @Harold-lkk in https://github.com/open-mmlab/mmocr/pull/1429
    • Skip invalid augmented polygons in ImgAugWrapper by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/1434

    New Contributors

    • @vansin made their first contribution in https://github.com/open-mmlab/mmocr/pull/1414

    Full Changelog: https://github.com/open-mmlab/mmocr/compare/v1.0.0rc0...v1.0.0rc1

    Source code(tar.gz)
    Source code(zip)
  • v1.0.0rc0(Sep 1, 2022)

    We are excited to announce the release of MMOCR 1.0.0rc0! MMOCR 1.0.0rc0 is the first version of MMOCR 1.x, a part of the OpenMMLab 2.0 projects. Built upon the new training engine, MMOCR 1.x unifies the interfaces of dataset, models, evaluation, and visualization with faster training and testing speed.

    Highlights

    1. New engines. MMOCR 1.x is based on MMEngine, which provides a general and powerful runner that allows more flexible customizations and significantly simplifies the entrypoints of high-level interfaces.

    2. Unified interfaces. As a part of the OpenMMLab 2.0 projects, MMOCR 1.x unifies and refactors the interfaces and internal logics of train, testing, datasets, models, evaluation, and visualization. All the OpenMMLab 2.0 projects share the same design in those interfaces and logics to allow the emergence of multi-task/modality algorithms.

    3. Cross project calling. Benefiting from the unified design, you can use the models implemented in other OpenMMLab projects, such as MMDet. We provide an example of how to use MMDetection's Mask R-CNN through MMDetWrapper. Check our documents for more details. More wrappers will be released in the future.

    4. Stronger visualization. We provide a series of useful tools which are mostly based on brand-new visualizers. As a result, it is more convenient for the users to explore the models and datasets now.

    5. More documentation and tutorials. We add a bunch of documentation and tutorials to help users get started more smoothly. Read it here.

    Breaking Changes

    We briefly list the major breaking changes here. We also have the migration guide that provides complete details and migration instructions.

    Dependencies

    • MMOCR 1.x relies on MMEngine to run. MMEngine is a new foundational library for training deep learning models in OpenMMLab 2.0 models. The dependencies of file IO and training are migrated from MMCV 1.x to MMEngine.
    • MMOCR 1.x relies on MMCV>=2.0.0rc0. Although MMCV no longer maintains the training functionalities since 2.0.0rc0, MMOCR 1.x relies on the data transforms, CUDA operators, and image processing interfaces in MMCV. Note that the package mmcv is the version that provide pre-built CUDA operators and mmcv-lite does not since MMCV 2.0.0rc0, while mmcv-full has been deprecated.

    Training and testing

    • MMOCR 1.x uses Runner in MMEngine rather than that in MMCV. The new Runner implements and unifies the building logic of dataset, model, evaluation, and visualizer. Therefore, MMOCR 1.x no longer maintains the building logics of those modules in mmocr.train.apis and tools/train.py. Those code have been migrated into MMEngine. Please refer to the migration guide of Runner in MMEngine for more details.
    • The Runner in MMEngine also supports testing and validation. The testing scripts are also simplified, which has similar logic as that in training scripts to build the runner.
    • The execution points of hooks in the new Runner have been enriched to allow more flexible customization. Please refer to the migration guide of Hook in MMEngine for more details.
    • Learning rate and momentum schedules has been migrated from Hook to Parameter Scheduler in MMEngine. Please refer to the migration guide of Parameter Scheduler in MMEngine for more details.

    Configs

    Dataset

    The Dataset classes implemented in MMOCR 1.x all inherits from the BaseDetDataset, which inherits from the BaseDataset in MMEngine. There are several changes of Dataset in MMOCR 1.x.

    • All the datasets support serializing the data list to reduce the memory when multiple workers are built to accelerate data loading.
    • The interfaces are changed accordingly.

    Data Transforms

    Data transforms in MMOCR 1.x all inherits from those in MMCV>=2.0.0rc0, which follows a new convention in OpenMMLab 2.0 projects. The changes are listed below:

    • The interfaces are also changed. Please refer to the API Reference
    • The functionalities of some data transforms (e.g., Resize) are decomposed into several transforms.
    • The same data transforms in different OpenMMLab 2.0 libraries have the same augmentation implementation and the logic of the same arguments, i.e., Resize in MMDet 3.x and MMOCR 1.x will resize the image in the exact same manner given the same arguments.

    Model

    The models in MMOCR 1.x all inherit from BaseModel in MMEngine, which defines a new convention of models in OpenMMLab 2.0 projects. Users can refer to the tutorial of model in MMEngine for more details. Accordingly, there are several changes as the following:

    • The model interfaces, including the input and output formats, are significantly simplified and unified following the new convention in MMOCR 1.x. Specifically, all the input data in training and testing are packed into inputs and data_samples, where inputs contains model inputs like a list of image tensors, and data_samples contains other information of the current data sample such as ground truths and model predictions. In this way, different tasks in MMOCR 1.x can share the same input arguments, which makes the models more general and suitable for multi-task learning.
    • The model has a data preprocessor module, which is used to pre-process the input data of model. In MMOCR 1.x, the data preprocessor usually does the necessary steps to form the input images into a batch, such as padding. It can also serve as a place for some special data augmentations or more efficient data transformations like normalization.
    • The internal logic of model has been changed. In MMOCR 0.x, model used forward_train and simple_test to deal with different model forward logics. In MMOCR 1.x and OpenMMLab 2.0, the forward function has three modes: loss, predict, and tensor for training, inference, and tracing or other purposes, respectively. The forward function calls self.loss(), self.predict(), and self._forward() given the modes loss, predict, and tensor, respectively.

    Evaluation

    MMOCR 1.x mainly implements corresponding metrics for each task, which are manipulated by Evaluator to complete the evaluation. In addition, users can build an evaluator in MMOCR 1.x to conduct offline evaluation, i.e., evaluate predictions that may not be produced by MMOCR, prediction follows our dataset conventions. More details can be find in the Evaluation Tutorial in MMEngine.

    Visualization

    The functions of visualization in MMOCR 1.x are removed. Instead, in OpenMMLab 2.0 projects, we use Visualizer to visualize data. MMOCR 1.x implements TextDetLocalVisualizer, TextRecogLocalVisualizer, and KIELocalVisualizer to allow visualization of ground truths, model predictions, and feature maps, etc., at any place, for the three tasks supported in MMOCR. It also supports dumping the visualization data to any external visualization backends such as Tensorboard and Wandb. Check our Visualization Document for more details.

    Improvements

    • Most models enjoy a performance improvement from the new framework and refactor of data transforms. For example, in MMOCR 1.x, DBNet-R50 achieves 0.854 hmean score on ICDAR 2015, while the counterpart can only get 0.840 hmean score in MMOCR 0.x.
    • Support mixed precision training of most of the models. However, the rest models are not supported yet because the operators they used might not be representable in fp16. We will update the documentation and list the results of mixed precision training.

    Ongoing changes

    1. Test-time augmentation: which was supported in MMOCR 0.x, is not implemented yet in this version due to limited time slot. We will support it in the following releases with a new and simplified design.
    2. Inference interfaces: unified inference interfaces will be supported in the future to ease the use of released models.
    3. Interfaces of useful tools that can be used in notebook: more useful tools that are implemented in the tools/ directory will have their python interfaces so that they can be used through notebook and in downstream libraries.
    4. Documentation: we will add more design docs, tutorials, and migration guidance so that the community can deep dive into our new design, participate the future development, and smoothly migrate downstream libraries to MMOCR 1.x.
    Source code(tar.gz)
    Source code(zip)
  • v0.6.1(Aug 4, 2022)

    Highlights

    1. ArT dataset is available for text detection and recognition!
    2. Fix several bugs that affects the correctness of the models.
    3. Thanks to MIM, our installation is much simpler now! The docs has been renewed as well.

    New Features & Enhancements

    • Add ArT by @xinke-wang in https://github.com/open-mmlab/mmocr/pull/1006
    • add ABINet_Vision api by @Abdelrahman350 in https://github.com/open-mmlab/mmocr/pull/1041
    • add codespell ignore and use mdformat by @Harold-lkk in https://github.com/open-mmlab/mmocr/pull/1022
    • Add mim to extras_requrie to setup.py, update mminstall… by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/1062
    • Simplify normalized edit distance calculation by @maxbachmann in https://github.com/open-mmlab/mmocr/pull/1060
    • Test mim in CI by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/1090
    • Remove redundant steps by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/1091
    • Update links to SDMGR links by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/1252

    Bug Fixes

    • Remove unnecessary requirements by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/1000
    • Remove confusing img_scales in pipelines by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/1007
    • inplace operator "+=" will cause RuntimeError when model backward by @garvan2021 in https://github.com/open-mmlab/mmocr/pull/1018
    • Fix a typo problem in MASTER by @Mountchicken in https://github.com/open-mmlab/mmocr/pull/1031
    • Fix config name of MASTER in ocr.py by @Mountchicken in https://github.com/open-mmlab/mmocr/pull/1044
    • Relax OpenCV requirement by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/1061
    • Restrict the minimum version of OpenCV to avoid potential vulnerability by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/1065
    • typo by @tpoisonooo in https://github.com/open-mmlab/mmocr/pull/1024
    • Fix a typo in setup.py by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/1095
    • fix #1067: add torchserve DockerFile and fix bugs by @Hegelim in https://github.com/open-mmlab/mmocr/pull/1073
    • Incorrect filename in labelme_converter.py by @xiefeifeihu in https://github.com/open-mmlab/mmocr/pull/1103
    • Fix dataset configs by @Mountchicken in https://github.com/open-mmlab/mmocr/pull/1106
    • Fix #1098: normalize text recognition scores by @Hegelim in https://github.com/open-mmlab/mmocr/pull/1119
    • Update ST_SA_MJ_train.py by @MingyuLau in https://github.com/open-mmlab/mmocr/pull/1117
    • PSENet metafile by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/1121
    • Flexible ways of getting file name by @balandongiv in https://github.com/open-mmlab/mmocr/pull/1107
    • Updating edge-embeddings after each GNN layer by @amitbcp in https://github.com/open-mmlab/mmocr/pull/1134
    • links update by @TekayaNidham in https://github.com/open-mmlab/mmocr/pull/1141
    • bug fix: access params by cfg.get by @doem97 in https://github.com/open-mmlab/mmocr/pull/1145
    • Fix a bug in LmdbAnnFileBackend that cause breaking in Synthtext detection training by @Mountchicken in https://github.com/open-mmlab/mmocr/pull/1159
    • Fix typo of --lmdb-map-size default value by @easilylazy in https://github.com/open-mmlab/mmocr/pull/1147
    • Fixed docstring syntax error of line 19 & 21 by @APX103 in https://github.com/open-mmlab/mmocr/pull/1157
    • Update lmdb_converter and ct80 cropped image source in document by @doem97 in https://github.com/open-mmlab/mmocr/pull/1164
    • MMCV compatibility due to outdated MMDet by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/1192
    • Update maximum version of mmcv by @xinke-wang in https://github.com/open-mmlab/mmocr/pull/1219
    • Update ABINet links for main by @Mountchicken in https://github.com/open-mmlab/mmocr/pull/1221
    • Update owners by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/1248
    • Add back some missing fields in configs by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/1171

    Docs

    • Fix typos by @xinke-wang in https://github.com/open-mmlab/mmocr/pull/1001
    • Configure Myst-parser to parse anchor tag by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/1012
    • Fix a error in docs/en/tutorials/dataset_types.md by @Mountchicken in https://github.com/open-mmlab/mmocr/pull/1034
    • Update readme according to the guideline by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/1047
    • Limit markdown version by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/1172
    • Limit extension versions by @Mountchicken in https://github.com/open-mmlab/mmocr/pull/1210
    • Update installation guide by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/1254
    • Update image link @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/1255

    New Contributors

    • @tpoisonooo made their first contribution in https://github.com/open-mmlab/mmocr/pull/1024
    • @Abdelrahman350 made their first contribution in https://github.com/open-mmlab/mmocr/pull/1041
    • @Hegelim made their first contribution in https://github.com/open-mmlab/mmocr/pull/1073
    • @xiefeifeihu made their first contribution in https://github.com/open-mmlab/mmocr/pull/1103
    • @MingyuLau made their first contribution in https://github.com/open-mmlab/mmocr/pull/1117
    • @balandongiv made their first contribution in https://github.com/open-mmlab/mmocr/pull/1107
    • @amitbcp made their first contribution in https://github.com/open-mmlab/mmocr/pull/1134
    • @TekayaNidham made their first contribution in https://github.com/open-mmlab/mmocr/pull/1141
    • @easilylazy made their first contribution in https://github.com/open-mmlab/mmocr/pull/1147
    • @APX103 made their first contribution in https://github.com/open-mmlab/mmocr/pull/1157

    Full Changelog: https://github.com/open-mmlab/mmocr/compare/v0.6.0...v0.6.1

    Source code(tar.gz)
    Source code(zip)
  • v0.6.0(May 5, 2022)

    Highlights

    1. A new recognition algorithm MASTER has been added into MMOCR, which was the championship solution for the "ICDAR 2021 Competition on Scientific Table Image Recognition to Latex"! The model pre-trained on SynthText and MJSynth is available for testing! Credit to @JiaquanYe
    2. DBNet++ has been released now! A new Adaptive Scale Fusion module has been equipped for feature enhancement. Benefiting from this, the new model achieved 2% better h-mean score than its predecessor on the ICDAR2015 dataset.
    3. Three more dataset converters are added: LSVT, RCTW and HierText. Check the dataset zoo (Det & Recog) to explore further information.
    4. To enhance the data storage efficiency, MMOCR now supports loading both images and labels from .lmdb format annotations for the text recognition task. To enable such a feature, the new lmdb_converter.py is ready for use to pack your cropped images and labels into an lmdb file. For a detailed tutorial, please refer to the following sections and the doc.
    5. Testing models on multiple datasets is a widely used evaluation strategy. MMOCR now supports automatically reporting mean scores when there is more than one dataset to evaluate, which enables a more convenient comparison between checkpoints. Doc
    6. Evaluation is more flexible and customizable now. For text detection tasks, you can set the score threshold range where the best results might come out. (Doc) If too many results are flooding your text recognition train log, you can trim it by specifying a subset of metrics in evaluation config. Check out the Evaluation section for details.
    7. MMOCR provides a script to convert the .json labels obtained by the popular annotation toolkit Labelme to MMOCR-supported data format. @Y-M-Y contributed a log analysis tool that helps users gain a better understanding of the entire training process. Read tutorial docs to get started.

    Lmdb Dataset

    Reading images or labels from files can be slow when data are excessive, e.g. on a scale of millions. Besides, in academia, most of the scene text recognition datasets are stored in lmdb format, including images and labels. To get closer to the mainstream practice and enhance the data storage efficiency, MMOCR now officially supports loading images and labels from lmdb datasets via a new pipeline LoadImageFromLMDB. This section is intended to serve as a quick walkthrough for you to master this update and apply it to facilitate your research.

    Specifications

    To better align with the academic community, MMOCR now requires the following specifications for lmdb datasets:

    • The parameter describing the data volume of the dataset is num-samples instead of total_number (deprecated).
    • Images and labels are stored with keys in the form of image-000000001 and label-000000001, respectively.

    Usage

    1. Use existing academic lmdb datasets if they meet the specifications; or the tool provided by MMOCR to pack images & annotations into a lmdb dataset.
    • Previously, MMOCR had a function txt2lmdb (deprecated) that only supported converting labels to lmdb format. However, it is quite different from academic lmdb datasets, which usually contain both images and labels. Now MMOCR provides a new utility lmdb_converter to convert recognition datasets with both images and labels to lmdb format.

    • Say that your recognition data in MMOCR's format are organized as follows. (See an example in ocr_toy_dataset).

      # Directory structure
      
      ├──img_path
      |      |—— img1.jpg
      |      |—— img2.jpg
      |      |—— ...
      |——label.txt (or label.jsonl)
      
      # Annotation format
      
      label.txt:  img1.jpg HELLO
                  img2.jpg WORLD
                  ...
      
      label.jsonl:    {'filename':'img1.jpg', 'text':'HELLO'}
                      {'filename':'img2.jpg', 'text':'WORLD'}
                      ...
      
    • Then pack these files up:

      python tools/data/utils/lmdb_converter.py  {PATH_TO_LABEL} {OUTPUT_PATH} --i {PATH_TO_IMAGES}
      
    • Check out tools.md for more details.

    1. The second step is to modify the configuration files. For example, to train CRNN on MJ and ST datasets:
    • Set parser as LineJsonParser and file_format as 'lmdb' in dataset config

      # configs/_base_/recog_datasets/ST_MJ_train.py
      train1 = dict(
          type='OCRDataset',
          img_prefix=train_img_prefix1,
          ann_file=train_ann_file1,
          loader=dict(
              type='AnnFileLoader',
              repeat=1,
              file_format='lmdb',
              parser=dict(
                  type='LineJsonParser',
                  keys=['filename', 'text'],
              )),
          pipeline=None,
          test_mode=False)
      
    • Use LoadImageFromLMDB in pipeline:

      # configs/_base_/recog_pipelines/crnn_pipeline.py
      train_pipeline = [
          dict(type='LoadImageFromLMDB', color_type='grayscale'),
          ...
      
    1. You are good to go! Start training and MMOCR will load data from your lmdb dataset.

    New Features & Enhancements

    • Add analyze_logs in tools and its description in docs by @Y-M-Y in https://github.com/open-mmlab/mmocr/pull/899
    • Add LSVT Data Converter by @xinke-wang in https://github.com/open-mmlab/mmocr/pull/896
    • Add RCTW dataset converter by @xinke-wang in https://github.com/open-mmlab/mmocr/pull/914
    • Support computing mean scores in UniformConcatDataset by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/981
    • Support loading images and labels from lmdb file by @Mountchicken in https://github.com/open-mmlab/mmocr/pull/982
    • Add recog2lmdb and new toy dataset files by @Mountchicken in https://github.com/open-mmlab/mmocr/pull/979
    • Add labelme converter for textdet and textrecog by @cuhk-hbsun in https://github.com/open-mmlab/mmocr/pull/972
    • Update CircleCI configs by @xinke-wang in https://github.com/open-mmlab/mmocr/pull/918
    • Update Git Action by @xinke-wang in https://github.com/open-mmlab/mmocr/pull/930
    • More customizable fields in dataloaders by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/933
    • Skip CIs when docs are modified by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/941
    • Rename Github tests, fix ignored paths by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/946
    • Support latest MMCV by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/959
    • Support dynamic threshold range in eval_hmean by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/962
    • Update the version requirement of mmdet in docker by @Mountchicken in https://github.com/open-mmlab/mmocr/pull/966
    • Replace opencv-python-headless with open-python by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/970
    • Update Dataset Configs by @xinke-wang in https://github.com/open-mmlab/mmocr/pull/980
    • Add SynthText dataset config by @xinke-wang in https://github.com/open-mmlab/mmocr/pull/983
    • Automatically report mean scores when applicable by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/995
    • Add DBNet++ by @xinke-wang in https://github.com/open-mmlab/mmocr/pull/973
    • Add MASTER by @JiaquanYe in https://github.com/open-mmlab/mmocr/pull/807
    • Allow choosing metrics to report in text recognition tasks by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/989
    • Add HierText converter by @Mountchicken in https://github.com/open-mmlab/mmocr/pull/948
    • Fix lint_only in CircleCI by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/998

    Bug Fixes

    • Fix CircleCi Main Branch Accidentally Run PR Stage Test by @xinke-wang in https://github.com/open-mmlab/mmocr/pull/927
    • Fix a deprecate warning about mmdet.datasets.pipelines.formating by @Mountchicken in https://github.com/open-mmlab/mmocr/pull/944
    • Fix a Bug in ResNet plugin by @Mountchicken in https://github.com/open-mmlab/mmocr/pull/967
    • revert a wrong setting in db_r18 cfg by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/978
    • Fix TotalText Anno version issue by @xinke-wang in https://github.com/open-mmlab/mmocr/pull/945
    • Update installation step of albumentations by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/984
    • Fix ImgAug transform by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/949
    • Fix GPG key error in CI and docker by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/988
    • update label.lmdb by @Mountchicken in https://github.com/open-mmlab/mmocr/pull/991
    • correct meta key by @garvan2021 in https://github.com/open-mmlab/mmocr/pull/926
    • Use new image by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/976
    • Fix Data Converter Issues by @xinke-wang in https://github.com/open-mmlab/mmocr/pull/955

    Docs

    • Update CONTRIBUTING.md by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/905
    • Fix the misleading description in test.py by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/908
    • Update recog.md for lmdb Generation by @xinke-wang in https://github.com/open-mmlab/mmocr/pull/934
    • Add MMCV by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/954
    • Add wechat QR code to CN readme by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/960
    • Update CONTRIBUTING.md by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/947
    • Use QR codes from MMCV by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/971
    • Renew dataset_types.md by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/997

    New Contributors

    • @Y-M-Y made their first contribution in https://github.com/open-mmlab/mmocr/pull/899

    Full Changelog: https://github.com/open-mmlab/mmocr/compare/v0.5.0...v0.6.0

    Source code(tar.gz)
    Source code(zip)
  • v0.5.0(Mar 31, 2022)

    Highlights

    1. MMOCR now supports SPACE recognition! (What a prominent feature!) Users only need to convert the recognition annotations that contain spaces from a plain .txt file to JSON line format .jsonl, and then revise a few configurations to enable the LineJsonParser. For more information, please read our step-by-step tutorial.
    2. Tesseract is now available in MMOCR! While MMOCR is more flexible to support various downstream tasks, users might sometimes not be satisfied with DL models and would like to turn to effective legacy solutions. Therefore, we offer this option in mmocr.utils.ocr by wrapping Tesseract as a detector and/or recognizer. Users can easily create an MMOCR object by MMOCR(det=’Tesseract’, recog=’Tesseract’). Credit to @garvan2021
    3. We release data converters for 16 widely used OCR datasets, including multiple scenarios such as document, handwritten, and scene text. Now it is more convenient to generate annotation files for these datasets. Check the dataset zoo ( Det & Recog ) to explore further information.
    4. Special thanks to @EighteenSprings @BeyondYourself @yangrisheng, who had actively participated in documentation translation!

    Migration Guide - ResNet

    Some refactoring processes are still going on. For text recognition models, we unified the ResNet-like architectures which are used as backbones. By introducing stage-wise and block-wise plugins, the refactored ResNet is highly flexible to support existing models, like ResNet31 and ResNet45, and other future designs of ResNet variants.

    Plugin

    • Plugin is a module category inherited from MMCV's implementation of PLUGIN_LAYERS, which can be inserted between each stage of ResNet or into a basicblock. You can find a simple implementation of plugin at mmocr/models/textrecog/plugins/common.py, or click the button below.

      Plugin Example
      @PLUGIN_LAYERS.register_module()
      class Maxpool2d(nn.Module):
          """A wrapper around nn.Maxpool2d().
      
          Args:
              kernel_size (int or tuple(int)): Kernel size for max pooling layer
              stride (int or tuple(int)): Stride for max pooling layer
              padding (int or tuple(int)): Padding for pooling layer
          """
      
          def __init__(self, kernel_size, stride, padding=0, **kwargs):
              super(Maxpool2d, self).__init__()
              self.model = nn.MaxPool2d(kernel_size, stride, padding)
      
          def forward(self, x):
              """
              Args:
                  x (Tensor): Input feature map
      
              Returns:
                  Tensor: The tensor after Maxpooling layer.
              """
              return self.model(x)
      

    Stage-wise Plugins

    • ResNet is composed of stages, and each stage is composed of blocks. E.g., ResNet18 is composed of 4 stages, and each stage is composed of basicblocks. For each stage, we provide two ports to insert stage-wise plugins by giving plugins parameters in ResNet.

      [port1: before stage] ---> [stage] ---> [port2: after stage]
      
    • E.g. Using a ResNet with four stages as example. Suppose we want to insert an additional convolution layer before each stage, and an additional convolution layer at stage 1, 2, 4. Then you can define the special ResNet18 like this

      resnet18_speical = ResNet(
              # for simplicity, some required
              # parameters are omitted
              plugins=[
                  dict(
                      cfg=dict(
                      type='ConvModule',
                      kernel_size=3,
                      stride=1,
                      padding=1,
                      norm_cfg=dict(type='BN'),
                      act_cfg=dict(type='ReLU')),
                      stages=(True, True, True, True),
                      position='before_stage')
                  dict(
                      cfg=dict(
                      type='ConvModule',
                      kernel_size=3,
                      stride=1,
                      padding=1,
                      norm_cfg=dict(type='BN'),
                      act_cfg=dict(type='ReLU')),
                      stages=(True, True, False, True),
                      position='after_stage')
              ])
      
    • You can also insert more than one plugin in each port and those plugins will be executed in order. Let's take ResNet in MASTER as an example:

      Multiple Plugins Example
      • ResNet in Master is based on ResNet31. And after each stage, a module named GCAModule will be used. The GCAModule is inserted before the stage-wise convolution layer in ResNet31. In conlusion, there will be two plugins at after_stage port in the same time.

        resnet_master = ResNet(
                        # for simplicity, some required
                        # parameters are omitted
                        plugins=[
                            dict(
                                cfg=dict(type='Maxpool2d', kernel_size=2, stride=(2, 2)),
                                stages=(True, True, False, False),
                                position='before_stage'),
                            dict(
                                cfg=dict(type='Maxpool2d', kernel_size=(2, 1), stride=(2, 1)),
                                stages=(False, False, True, False),
                                position='before_stage'),
                            dict(
                                cfg=dict(type='GCAModule', kernel_size=3, stride=1, padding=1),
                                stages=[True, True, True, True],
                                position='after_stage'),
                            dict(
                                cfg=dict(
                                    type='ConvModule',
                                    kernel_size=3,
                                    stride=1,
                                    padding=1,
                                    norm_cfg=dict(type='BN'),
                                    act_cfg=dict(type='ReLU')),
                                stages=(True, True, True, True),
                                position='after_stage')
                        ])
        
        
    • In each plugin, we will pass two parameters (in_channels, out_channels) to support operations that need the information of current channels.

Block-wise Plugin (Experimental)

  • We also refactored the BasicBlock used in ResNet. Now it can be customized with block-wise plugins. Check here for more details.

  • BasicBlock is composed of two convolution layer in the main branch and a shortcut branch. We provide four ports to insert plugins.

        [port1: before_conv1] ---> [conv1] --->
        [port2: after_conv1] ---> [conv2] --->
        [port3: after_conv2] ---> +(shortcut) ---> [port4: after_shortcut]
    
  • In each plugin, we will pass a parameter in_channels to support operations that need the information of current channels.

  • E.g. Build a ResNet with customized BasicBlock with an additional convolution layer before conv1:

    Block-wise Plugin Example
    resnet_31 = ResNet(
            in_channels=3,
            stem_channels=[64, 128],
            block_cfgs=dict(type='BasicBlock'),
            arch_layers=[1, 2, 5, 3],
            arch_channels=[256, 256, 512, 512],
            strides=[1, 1, 1, 1],
            plugins=[
                dict(
                    cfg=dict(type='Maxpool2d',
                    kernel_size=2,
                    stride=(2, 2)),
                    stages=(True, True, False, False),
                    position='before_stage'),
                dict(
                    cfg=dict(type='Maxpool2d',
                    kernel_size=(2, 1),
                    stride=(2, 1)),
                    stages=(False, False, True, False),
                    position='before_stage'),
                dict(
                    cfg=dict(
                    type='ConvModule',
                    kernel_size=3,
                    stride=1,
                    padding=1,
                    norm_cfg=dict(type='BN'),
                    act_cfg=dict(type='ReLU')),
                    stages=(True, True, True, True),
                    position='after_stage')
            ])
    

Full Examples

ResNet without plugins
  • ResNet45 is used in ASTER and ABINet without any plugins.

    resnet45_aster = ResNet(
        in_channels=3,
        stem_channels=[64, 128],
        block_cfgs=dict(type='BasicBlock', use_conv1x1='True'),
        arch_layers=[3, 4, 6, 6, 3],
        arch_channels=[32, 64, 128, 256, 512],
        strides=[(2, 2), (2, 2), (2, 1), (2, 1), (2, 1)])
    
    resnet45_abi = ResNet(
        in_channels=3,
        stem_channels=32,
        block_cfgs=dict(type='BasicBlock', use_conv1x1='True'),
        arch_layers=[3, 4, 6, 6, 3],
        arch_channels=[32, 64, 128, 256, 512],
        strides=[2, 1, 2, 1, 1])
    
ResNet with plugins
  • ResNet31 is a typical architecture to use stage-wise plugins. Before the first three stages, Maxpooling layer is used. After each stage, a convolution layer with BN and ReLU is used.

    resnet_31 = ResNet(
        in_channels=3,
        stem_channels=[64, 128],
        block_cfgs=dict(type='BasicBlock'),
        arch_layers=[1, 2, 5, 3],
        arch_channels=[256, 256, 512, 512],
        strides=[1, 1, 1, 1],
        plugins=[
            dict(
                cfg=dict(type='Maxpool2d',
                kernel_size=2,
                stride=(2, 2)),
                stages=(True, True, False, False),
                position='before_stage'),
            dict(
                cfg=dict(type='Maxpool2d',
                kernel_size=(2, 1),
                stride=(2, 1)),
                stages=(False, False, True, False),
                position='before_stage'),
            dict(
                cfg=dict(
                type='ConvModule',
                kernel_size=3,
                stride=1,
                padding=1,
                norm_cfg=dict(type='BN'),
                act_cfg=dict(type='ReLU')),
                stages=(True, True, True, True),
                position='after_stage')
        ])
    

Migration Guide - Dataset Annotation Loader

The annotation loaders, LmdbLoader and HardDiskLoader, are unified into AnnFileLoader for a more consistent design and wider support on different file formats and storage backends. AnnFileLoader can load the annotations from disk(default), http and petrel backend, and parse the annotation in txt or lmdb format. LmdbLoader and HardDiskLoader are deprecated, and users are recommended to modify their configs to use the new AnnFileLoader. Users can migrate their legacy loader HardDiskLoader referring to the following example:

# Legacy config
train = dict(
    type='OCRDataset',
    ...
    loader=dict(
        type='HardDiskLoader',
        ...))

# Suggested config
train = dict(
    type='OCRDataset',
    ...
    loader=dict(
        type='AnnFileLoader',
        file_storage_backend='disk',
        file_format='txt',
        ...))

Similarly, using AnnFileLoader with file_format='lmdb' instead of LmdbLoader is strongly recommended.

New Features & Enhancements

  • Update mmcv install by @Harold-lkk in https://github.com/open-mmlab/mmocr/pull/775
  • Upgrade isort by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/771
  • Automatically infer device for inference if not speicifed by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/781
  • Add open-mmlab precommit hooks by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/787
  • Add windows CI by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/790
  • Add CurvedSyntext150k Converter by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/719
  • Add FUNSD Converter by @xinke-wang in https://github.com/open-mmlab/mmocr/pull/808
  • Support loading annotation file with petrel/http backend by @cuhk-hbsun in https://github.com/open-mmlab/mmocr/pull/793
  • Support different seeds on different ranks by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/820
  • Support json in recognition converter by @Mountchicken in https://github.com/open-mmlab/mmocr/pull/844
  • Add args and docs for multi-machine training/testing by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/849
  • Add warning info for LineStrParser by @xinke-wang in https://github.com/open-mmlab/mmocr/pull/850
  • Deploy openmmlab-bot by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/876
  • Add Tesserocr Inference by @garvan2021 in https://github.com/open-mmlab/mmocr/pull/814
  • Add LV Dataset Converter by @xinke-wang in https://github.com/open-mmlab/mmocr/pull/871
  • Add SROIE Converter by @xinke-wang in https://github.com/open-mmlab/mmocr/pull/810
  • Add NAF Converter by @xinke-wang in https://github.com/open-mmlab/mmocr/pull/815
  • Add DeText Converter by @xinke-wang in https://github.com/open-mmlab/mmocr/pull/818
  • Add IMGUR Converter by @xinke-wang in https://github.com/open-mmlab/mmocr/pull/825
  • Add ILST Converter by @Mountchicken in https://github.com/open-mmlab/mmocr/pull/833
  • Add KAIST Converter by @xinke-wang in https://github.com/open-mmlab/mmocr/pull/835
  • Add IC11 (Born-digital Images) Data Converter by @xinke-wang in https://github.com/open-mmlab/mmocr/pull/857
  • Add IC13 (Focused Scene Text) Data Converter by @xinke-wang in https://github.com/open-mmlab/mmocr/pull/861
  • Add BID Converter by @Mountchicken in https://github.com/open-mmlab/mmocr/pull/862
  • Add Vintext Converter by @Mountchicken in https://github.com/open-mmlab/mmocr/pull/864
  • Add MTWI Data Converter by @xinke-wang in https://github.com/open-mmlab/mmocr/pull/867
  • Add COCO Text v2 Data Converter by @xinke-wang in https://github.com/open-mmlab/mmocr/pull/872
  • Add ReCTS Data Converter by @xinke-wang in https://github.com/open-mmlab/mmocr/pull/892
  • Refactor ResNets by @Mountchicken in https://github.com/open-mmlab/mmocr/pull/809

Bug Fixes

  • Bump mmdet version to 2.20.0 in Dockerfile by @GPhilo in https://github.com/open-mmlab/mmocr/pull/763
  • Update mmdet version limit by @cuhk-hbsun in https://github.com/open-mmlab/mmocr/pull/773
  • Minimum version requirement of albumentations by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/769
  • Disable worker in the dataloader of gpu unit test by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/780
  • Standardize the type of torch.device in ocr.py by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/800
  • Use RECOGNIZER instead of DETECTORS by @cuhk-hbsun in https://github.com/open-mmlab/mmocr/pull/685
  • Add num_classes to configs of ABINet by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/805
  • Support loading space character from dict file by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/854
  • Description in tools/data/utils/txt2lmdb.py by @Mountchicken in https://github.com/open-mmlab/mmocr/pull/870
  • ignore_index in SARLoss by @Mountchicken in https://github.com/open-mmlab/mmocr/pull/869
  • Fix a bug that may cause inplace operation error by @Mountchicken in https://github.com/open-mmlab/mmocr/pull/884
  • Use hyphen instead of underscores in script args by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/890

Docs

  • Add deprecation message for deploy tools by @xinke-wang in https://github.com/open-mmlab/mmocr/pull/801
  • Reorganizing OpenMMLab projects in readme by @xinke-wang in https://github.com/open-mmlab/mmocr/pull/806
  • Add demo/README_zh.md by @EighteenSprings in https://github.com/open-mmlab/mmocr/pull/802
  • Add detailed version requirement table by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/778
  • Correct misleading section title in training.md by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/819
  • Update README_zh-CN document URL by @BeyondYourself in https://github.com/open-mmlab/mmocr/pull/823
  • translate testing.md. by @yangrisheng in https://github.com/open-mmlab/mmocr/pull/822
  • Fix confused description for load-from and resume-from by @xinke-wang in https://github.com/open-mmlab/mmocr/pull/842
  • Add documents getting_started in docs/zh by @BeyondYourself in https://github.com/open-mmlab/mmocr/pull/841
  • Add the model serving translation document by @BeyondYourself in https://github.com/open-mmlab/mmocr/pull/845
  • Update docs about installation on Windows by @Mountchicken in https://github.com/open-mmlab/mmocr/pull/852
  • Update tutorial notebook by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/853
  • Update Instructions for New Data Converters by @xinke-wang in https://github.com/open-mmlab/mmocr/pull/900
  • Brief installation instruction in README by @Harold-lkk in https://github.com/open-mmlab/mmocr/pull/897
  • update doc for ILST, VinText, BID by @Mountchicken in https://github.com/open-mmlab/mmocr/pull/902
  • Fix typos in readme by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/903
  • Recog dataset doc by @Harold-lkk in https://github.com/open-mmlab/mmocr/pull/893
  • Reorganize the directory structure section in det.md by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/894

New Contributors

  • @GPhilo made their first contribution in https://github.com/open-mmlab/mmocr/pull/763
  • @xinke-wang made their first contribution in https://github.com/open-mmlab/mmocr/pull/801
  • @EighteenSprings made their first contribution in https://github.com/open-mmlab/mmocr/pull/802
  • @BeyondYourself made their first contribution in https://github.com/open-mmlab/mmocr/pull/823
  • @yangrisheng made their first contribution in https://github.com/open-mmlab/mmocr/pull/822
  • @Mountchicken made their first contribution in https://github.com/open-mmlab/mmocr/pull/844
  • @garvan2021 made their first contribution in https://github.com/open-mmlab/mmocr/pull/814

Full Changelog: https://github.com/open-mmlab/mmocr/compare/v0.4.1...v0.5.0

Source code(tar.gz)
Source code(zip)
  • v0.4.1(Jan 27, 2022)

    Highlights

    1. Visualizing edge weights in OpenSet KIE is now supported! https://github.com/open-mmlab/mmocr/pull/677
    2. Some configurations have been optimized to significantly speed up the training and testing processes! Don't worry - you can still tune these parameters in case these modifications do not work. https://github.com/open-mmlab/mmocr/pull/757
    3. Now you can use CPU to train/debug your model! https://github.com/open-mmlab/mmocr/pull/752
    4. We have fixed a severe bug that causes users unable to call mmocr.apis.test with our pre-built wheels. https://github.com/open-mmlab/mmocr/pull/667

    New Features & Enhancements

    • Show edge score for openset kie by @cuhk-hbsun in https://github.com/open-mmlab/mmocr/pull/677
    • Download flake8 from github as pre-commit hooks by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/695
    • Deprecate the support for 'python setup.py test' by @Harold-lkk in https://github.com/open-mmlab/mmocr/pull/722
    • Disable multi-processing feature of cv2 to speed up data loading by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/721
    • Extend ctw1500 converter to support text fields by @Harold-lkk in https://github.com/open-mmlab/mmocr/pull/729
    • Extend totaltext converter to support text fields by @Harold-lkk in https://github.com/open-mmlab/mmocr/pull/728
    • Speed up training by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/739
    • Add setup multi-processing both in train and test.py by @Harold-lkk in https://github.com/open-mmlab/mmocr/pull/757
    • Support CPU training/testing by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/752
    • Support specify gpu for testing and training with gpu-id instead of gpu-ids and gpus by @Harold-lkk in https://github.com/open-mmlab/mmocr/pull/756
    • Remove unnecessary custom_import from test.py by @Harold-lkk in https://github.com/open-mmlab/mmocr/pull/758

    Bug Fixes

    • Fix satrn onnxruntime test by @AllentDan in https://github.com/open-mmlab/mmocr/pull/679
    • Support both ConcatDataset and UniformConcatDataset by @cuhk-hbsun in https://github.com/open-mmlab/mmocr/pull/675
    • Fix bugs of show_results in single_gpu_test by @cuhk-hbsun in https://github.com/open-mmlab/mmocr/pull/667
    • Fix a bug for sar decoder when bi-rnn is used by @MhLiao in https://github.com/open-mmlab/mmocr/pull/690
    • Fix opencv version to avoid some bugs by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/694
    • Fix py39 ci error by @Harold-lkk in https://github.com/open-mmlab/mmocr/pull/707
    • Update visualize.py by @TommyZihao in https://github.com/open-mmlab/mmocr/pull/715
    • Fix link of config by @cuhk-hbsun in https://github.com/open-mmlab/mmocr/pull/726
    • Use yaml.safe_load instead of load by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/753
    • Add necessary keys to test_pipelines to enable test-time visualization by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/754

    Docs

    • Fix recog.md by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/674
    • Add config tutorial by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/683
    • Add MMSelfSup/MMRazor/MMDeploy in readme by @cuhk-hbsun in https://github.com/open-mmlab/mmocr/pull/692
    • Add recog & det model summary by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/693
    • Update docs link by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/710
    • add pull request template.md by @Harold-lkk in https://github.com/open-mmlab/mmocr/pull/711
    • Add website links to readme by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/731
    • update readme according to standard by @Harold-lkk in https://github.com/open-mmlab/mmocr/pull/742

    New Contributors

    • @MhLiao made their first contribution in https://github.com/open-mmlab/mmocr/pull/690
    • @TommyZihao made their first contribution in https://github.com/open-mmlab/mmocr/pull/715

    Full Changelog: https://github.com/open-mmlab/mmocr/compare/v0.4.0...v0.4.1

    Source code(tar.gz)
    Source code(zip)
  • v0.4.0(Dec 15, 2021)

    Highlights

    1. We release a new text recognition model - ABINet (CVPR 2021, Oral). With dedicated model design and useful data augmentation transforms, ABINet achieves the best performance on irregular text recognition tasks. Check it out!
    2. We are also working hard to fulfill the requests from our community. OpenSet KIE is one of the achievements, which extends the application of SDMGR from text node classification to node-pair relation extraction. We also provide a demo script to convert WildReceipt to open set domain, though it may not take full advantage of the OpenSet format. For more information, read our tutorial.
    3. APIs of models can be exposed through TorchServe. Docs

    Breaking Changes & Migration Guide

    Postprocessor

    Some refactoring processes are still going on. For all text detection models, we unified their decode implementations into a new module category, POSTPROCESSOR, which is responsible for decoding different raw outputs into boundary instances. In all text detection configs, the text_repr_type argument in bbox_head is deprecated and will be removed in the future release.

    Migration Guide: Find a similar line from detection model's config:

    text_repr_type=xxx,
    

    And replace it with

    postprocessor=dict(type='{MODEL_NAME}Postprocessor', text_repr_type=xxx)),
    

    Take a snippet of PANet's config as an example. Before the change, its config for bbox_head looks like:

        bbox_head=dict(
            type='PANHead',
            text_repr_type='poly',
            in_channels=[128, 128, 128, 128],
            out_channels=6,
            loss=dict(type='PANLoss')),
    

    Afterwards:

        bbox_head=dict(
        type='PANHead',
        in_channels=[128, 128, 128, 128],
        out_channels=6,
        loss=dict(type='PANLoss'),
        postprocessor=dict(type='PANPostprocessor', text_repr_type='poly')),
    

    There are other postprocessors and each takes different arguments. Interested users can find their interfaces or implementations in mmocr/models/textdet/postprocess or through our api docs.

    New Config Structure

    We reorganized the configs/ directory by extracting reusable sections into configs/_base_. Now the directory tree of configs/_base_ is organized as follows:

    _base_
    ├── det_datasets
    ├── det_models
    ├── det_pipelines
    ├── recog_datasets
    ├── recog_models
    ├── recog_pipelines
    └── schedules
    

    Most of model configs are making full use of base configs now, which makes the overall structural clearer and facilitates fair comparison across models. Despite the seemingly significant hierarchical difference, these changes would not break the backward compatibility as the names of model configs remain the same.

    New Features

    • Support openset kie by @cuhk-hbsun in https://github.com/open-mmlab/mmocr/pull/498
    • Add converter for the Open Images v5 text annotations by Krylov et al. by @baudm in https://github.com/open-mmlab/mmocr/pull/497
    • Support Chinese for kie show result by @cuhk-hbsun in https://github.com/open-mmlab/mmocr/pull/464
    • Add TorchServe support for text detection and recognition by @Harold-lkk in https://github.com/open-mmlab/mmocr/pull/522
    • Save filename in text detection test results by @cuhk-hbsun in https://github.com/open-mmlab/mmocr/pull/570
    • Add codespell pre-commit hook and fix typos by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/520
    • Avoid duplicate placeholder docs in CN by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/582
    • Save results to json file for kie. by @cuhk-hbsun in https://github.com/open-mmlab/mmocr/pull/589
    • Add SAR_CN to ocr.py by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/579
    • mim extension for windows by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/641
    • Support muitiple pipelines for different datasets by @cuhk-hbsun in https://github.com/open-mmlab/mmocr/pull/657
    • ABINet Framework by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/651

    Refactoring

    • Refactor textrecog config structure by @cuhk-hbsun in https://github.com/open-mmlab/mmocr/pull/617
    • Refactor text detection config by @cuhk-hbsun in https://github.com/open-mmlab/mmocr/pull/626
    • refactor transformer modules by @cuhk-hbsun in https://github.com/open-mmlab/mmocr/pull/618
    • refactor textdet postprocess by @cuhk-hbsun in https://github.com/open-mmlab/mmocr/pull/640

    Docs

    • C++ example section by @apiaccess21 in https://github.com/open-mmlab/mmocr/pull/593
    • install.md Chinese section by @A465539338 in https://github.com/open-mmlab/mmocr/pull/364
    • Add Chinese Translation of deployment.md. by @fatfishZhao in https://github.com/open-mmlab/mmocr/pull/506
    • Fix a model link and add the metafile for SATRN by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/473
    • Improve docs style by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/474
    • Enhancement & sync Chinese docs by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/492
    • TorchServe docs by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/539
    • Update docs menu by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/564
    • Docs for KIE CloseSet & OpenSet by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/573
    • Fix broken links by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/576
    • Docstring for text recognition models by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/562
    • Add MMFlow & MIM by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/597
    • Add MMFewShot by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/621
    • Update model readme by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/604
    • Add input size check to model_inference by @mpena-vina in https://github.com/open-mmlab/mmocr/pull/633
    • Docstring for textdet models by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/561
    • Add MMHuman3D in readme by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/644
    • Use shared menu from theme instead by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/655
    • Refactor docs structure by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/662
    • Docs fix by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/664

    Enhancements

    • Use bounding box around polygon instead of within polygon by @alexander-soare in https://github.com/open-mmlab/mmocr/pull/469
    • Add CITATION.cff by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/476
    • Add py3.9 CI by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/475
    • update model-index.yml by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/484
    • Use container in CI by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/502
    • CircleCI Setup by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/611
    • Remove unnecessary custom_import from train.py by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/603
    • Change the upper version of mmcv to 1.5.0 by @zhouzaida in https://github.com/open-mmlab/mmocr/pull/628
    • Update CircleCI by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/631
    • Pass custom_hooks to MMCV by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/609
    • Skip CI when some specific files were changed by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/642
    • Add markdown linter in pre-commit hook by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/643
    • Use shape from loaded image by @cuhk-hbsun in https://github.com/open-mmlab/mmocr/pull/652
    • Cancel previous runs that are not completed by @Harold-lkk in https://github.com/open-mmlab/mmocr/pull/666

    Bug Fixes

    • Modify algorithm "sar" weights path in metafile by @ShoupingShan in https://github.com/open-mmlab/mmocr/pull/581
    • Fix Cuda CI by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/472
    • Fix image export in test.py for KIE models by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/486
    • Allow invalid polygons in intersection and union by default by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/471
    • Update checkpoints' links for SATRN by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/518
    • Fix converting to onnx bug because of changing key from img_shape to resize_shape by @Harold-lkk in https://github.com/open-mmlab/mmocr/pull/523
    • Fix PyTorch 1.6 incompatible checkpoints by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/540
    • Fix paper field in metafiles by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/550
    • Unify recognition task names in metafiles by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/548
    • Fix py3.9 CI by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/563
    • Always map location to cpu when loading checkpoint by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/567
    • Fix wrong model builder in recog_test_imgs by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/574
    • Improve dbnet r50 by fixing img std by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/578
    • Fix resource warning: unclosed file by @cuhk-hbsun in https://github.com/open-mmlab/mmocr/pull/577
    • Fix bug that same start_point for different texts in draw_texts_by_pil by @cuhk-hbsun in https://github.com/open-mmlab/mmocr/pull/587
    • Keep original texts for kie by @cuhk-hbsun in https://github.com/open-mmlab/mmocr/pull/588
    • Fix random seed by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/600
    • Fix DBNet_r50 config by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/625
    • Change SBC case to DBC case by @cuhk-hbsun in https://github.com/open-mmlab/mmocr/pull/632
    • Fix kie demo by @innerlee in https://github.com/open-mmlab/mmocr/pull/610
    • fix type check by @cuhk-hbsun in https://github.com/open-mmlab/mmocr/pull/650
    • Remove depreciated image validator in totaltext converter by @gaotongxiao in https://github.com/open-mmlab/mmocr/pull/661
    • Fix change locals() dict by @Fei-Wang in https://github.com/open-mmlab/mmocr/pull/663
    • fix #614: textsnake targets by @HolyCrap96 in https://github.com/open-mmlab/mmocr/pull/660

    New Contributors

    • @alexander-soare made their first contribution in https://github.com/open-mmlab/mmocr/pull/469
    • @A465539338 made their first contribution in https://github.com/open-mmlab/mmocr/pull/364
    • @fatfishZhao made their first contribution in https://github.com/open-mmlab/mmocr/pull/506
    • @baudm made their first contribution in https://github.com/open-mmlab/mmocr/pull/497
    • @ShoupingShan made their first contribution in https://github.com/open-mmlab/mmocr/pull/581
    • @apiaccess21 made their first contribution in https://github.com/open-mmlab/mmocr/pull/593
    • @zhouzaida made their first contribution in https://github.com/open-mmlab/mmocr/pull/628
    • @mpena-vina made their first contribution in https://github.com/open-mmlab/mmocr/pull/633
    • @Fei-Wang made their first contribution in https://github.com/open-mmlab/mmocr/pull/663

    Full Changelog: https://github.com/open-mmlab/mmocr/compare/v0.3.0...v0.4.0

    Source code(tar.gz)
    Source code(zip)
  • v0.3.0(Aug 25, 2021)

    Highlights

    1. We add a new text recognition model -- SATRN! Its pretrained checkpoint achieves the best performance over other provided text recognition models. A lighter version of SATRN is also released which can obtain ~98% of the performance of the original model with only 45 MB in size. (@2793145003) #405
    2. Improve the demo script, ocr.py, which supports applying end-to-end text detection, text recognition and key information extraction models on images with easy-to-use commands. Users can find its full documentation in the demo section. (@samayala22, @manjrekarom) #371, #386, #400, #374, #428
    3. Our documentation is reorganized into a clearer structure. More useful contents are on the way! #409, #454
    4. The requirement of Polygon3 is removed since this project is no longer maintained or distributed. We unified all its references to equivalent substitutions in shapely instead. #448

    Breaking Changes & Migration Guide

    1. Upgrade version requirement of MMDetection to 2.14.0 to avoid bugs #382
    2. MMOCR now has its own model and layer registries inherited from MMDetection's or MMCV's counterparts. (#436) The modified hierarchical structure of the model registries are now organized as follows.
    mmcv.MODELS -> mmdet.BACKBONES -> BACKBONES
    mmcv.MODELS -> mmdet.NECKS -> NECKS
    mmcv.MODELS -> mmdet.ROI_EXTRACTORS -> ROI_EXTRACTORS
    mmcv.MODELS -> mmdet.HEADS -> HEADS
    mmcv.MODELS -> mmdet.LOSSES -> LOSSES
    mmcv.MODELS -> mmdet.DETECTORS -> DETECTORS
    mmcv.ACTIVATION_LAYERS -> ACTIVATION_LAYERS
    mmcv.UPSAMPLE_LAYERS -> UPSAMPLE_LAYERS
    

    To migrate your old implementation to our new backend, you need to change the import path of any registries and their corresponding builder functions (including build_detectors) from mmdet.models.builder to mmocr.models.builder. If you have referred to any model or layer of MMDetection or MMCV in your model config, you need to add mmdet. or mmcv. prefix to its name to inform the model builder of the right namespace to work on.

    Interested users may check out MMCV's tutorial on Registry for in-depth explanations on its mechanism.

    New Features

    • Automatically replace SyncBN with BN for inference #420, #453
    • Support batch inference for CRNN and SegOCR #407
    • Support exporting documentation in pdf or epub format #406
    • Support persistent_workers option in data loader #459

    Bug Fixes

    • Remove depreciated key in kie_test_imgs.py #381
    • Fix dimension mismatch in batch testing/inference of DBNet #383
    • Fix the problem of dice loss which stays at 1 with an empty target given #408
    • Fix a wrong link in ocr.py (@naarkhoo) #417
    • Fix undesired assignment to "pretrained" in test.py #418
    • Fix a problem in polygon generation of DBNet #421, #443
    • Skip invalid annotations in totaltext_converter #438
    • Add zero division handler in poly utils, remove Polygon3 #448

    Improvements

    • Replace lanms-proper with lanms-neo to support installation on Windows (with special thanks to @gen-ko who has re-distributed this package!)
    • Support MIM #394
    • Add tests for PyTorch 1.9 in CI #401
    • Enables fullscreen layout in readthedocs #413
    • General documentation enhancement #395
    • Update version checker #427
    • Add copyright info #439
    • Update citation information #440

    Contributors

    We thank @2793145003, @samayala22, @manjrekarom, @naarkhoo, @gen-ko, @duanjiaqi, @gaotongxiao, @cuhk-hbsun, @innerlee, @wdsd641417025 for their contribution to this release!

    Source code(tar.gz)
    Source code(zip)
  • v0.2.1(Jul 20, 2021)

    Highlights

    1. Upgrade to use MMCV-full >= 1.3.8 and MMDetection >= 2.13.0 for latest features
    2. Add ONNX and TensorRT export tool, supporting the deployment of DBNet, PSENet, PANet and CRNN (experimental) #278, #291, #300, #328
    3. Unified parameter initialization method which uses init_cfg in config files #365

    New Features

    • Support TextOCR dataset #293
    • Support Total-Text dataset #266, #273, #357
    • Support grouping text detection box into lines #290, #304
    • Add benchmark_processing script that benchmarks data loading process #261
    • Add SynthText preprocessor for text recognition models #351, #361
    • Support batch inference during testing #310
    • Add user-friendly OCR inference script #366

    Bug Fixes

    • Fix improper class ignorance in SDMGR Loss #221
    • Fix potential numerical zero division error in DRRG #224
    • Fix installing requirements with pip and mim #242
    • Fix dynamic input error of DBNet #269
    • Fix space parsing error in LineStrParser #285
    • Fix textsnake decode error #264
    • Correct isort setup #288
    • Fix a bug in SDMGR config #316
    • Fix kie_test_img for KIE nonvisual #319
    • Fix metafiles #342
    • Fix different device problem in FCENet #334
    • Ignore improper tailing empty characters in annotation files #358
    • Docs fixes #247, #255, #265, #267, #268, #270, #276, #287, #330, #355, #367
    • Fix NRTR config #356, #370

    Improvements

    • Add backend for resizeocr #244
    • Skip image processing pipelines in SDMGR novisual #260
    • Speedup DBNet #263
    • Update mmcv installation method in workflow #323
    • Add part of Chinese documentations #353, #362
    • Add support for ConcatDataset with two workflows #348
    • Add list_from_file and list_to_file utils #226
    • Speed up sort_vertex #239
    • Support distributed evaluation of KIE #234
    • Add pretrained FCENet on IC15 #258
    • Support CPU for OCR demo #227
    • Avoid extra image pre-processing steps #375
    Source code(tar.gz)
    Source code(zip)
  • v0.2.0(May 18, 2021)

    Highlights

    1. Add the NER approach Bert-softmax (NAACL'2019)
    2. Add the text detection method DRRG (CVPR'2020)
    3. Add the text detection method FCENet (CVPR'2021)
    4. Increase the ease of use via adding text detection and recognition end-to-end demo, and colab online demo.
    5. Simplify the installation.

    New Features

    Bug Fixes

    • Fix the duplicated point bug due to transform for textsnake #130
    • Fix CTC loss NaN #159
    • Fix error raised if result is empty in demo #144
    • Fix results missing if one image has a large number of boxes #98
    • Fix package missing in dockerfile #109

    Improvements

    • Simplify installation procedure via removing compiling #188
    • Speed up panet post processing so that it can detect dense texts #188
    • Add zh-CN README #70 #95
    • Support windows #89
    • Add Colab #147 #199
    • Add 1-step installation using conda environment #193 #194 #195
    Source code(tar.gz)
    Source code(zip)
  • v0.1.0(Apr 13, 2021)

    Main Features

    • Support text detection, text recognition and the corresponding downstream tasks such as key information extraction.
    • For text detection, support both single-step (PSENet, PANet, DBNet, TextSnake) and two-step (MaskRCNN) methods.
    • For text recognition, support CTC-loss based method CRNN; Encoder-decoder (with attention) based methods SAR, Robustscanner; Segmentation based method SegOCR; Transformer based method NRTR.
    • For key information extraction, support GCN based method SDMG-R.
    • Provide checkpoints and log files for all of the methods above.
    Source code(tar.gz)
    Source code(zip)
  • The Empirical Investigation of Representation Learning for Imitation (EIRLI)

    The Empirical Investigation of Representation Learning for Imitation (EIRLI)

    Center for Human-Compatible AI 31 Nov 06, 2022
    Code for paper Novel View Synthesis via Depth-guided Skip Connections

    Novel View Synthesis via Depth-guided Skip Connections Code for paper Novel View Synthesis via Depth-guided Skip Connections @InProceedings{Hou_2021_W

    8 Mar 14, 2022
    Semi-supervised Learning for Sentiment Analysis

    Neural-Semi-supervised-Learning-for-Text-Classification-Under-Large-Scale-Pretraining Code, models and Datasets for《Neural Semi-supervised Learning fo

    47 Jan 01, 2023
    Implementation of Kaneko et al.'s MaskCycleGAN-VC model for non-parallel voice conversion.

    MaskCycleGAN-VC Unofficial PyTorch implementation of Kaneko et al.'s MaskCycleGAN-VC (2021) for non-parallel voice conversion. MaskCycleGAN-VC is the

    86 Dec 25, 2022
    Code-free deep segmentation for computational pathology

    NoCodeSeg: Deep segmentation made easy! This is the official repository for the manuscript "Code-free development and deployment of deep segmentation

    André Pedersen 26 Nov 23, 2022
    Implementation of the final project of the course DDA6309 Probabilistic Graphical Model

    Task-aware Joint CWS and POS (TCwsPos) This is the implementation of the final project of the course DDA6309 Probabilistic Graphical Models, The Chine

    Peng 1 Dec 26, 2021
    Codes and models of NeurIPS2021 paper - DominoSearch: Find layer-wise fine-grained N:M sparse schemes from dense neural networks

    DominoSearch This is repository for codes and models of NeurIPS2021 paper - DominoSearch: Find layer-wise fine-grained N:M sparse schemes from dense n

    11 Sep 10, 2022
    FaceOcc: A Diverse, High-quality Face Occlusion Dataset for Human Face Extraction

    FaceExtraction FaceOcc: A Diverse, High-quality Face Occlusion Dataset for Human Face Extraction Occlusions often occur in face images in the wild, tr

    16 Dec 14, 2022
    LaBERT - A length-controllable and non-autoregressive image captioning model.

    Length-Controllable Image Captioning (ECCV2020) This repo provides the implemetation of the paper Length-Controllable Image Captioning. Install conda

    bearcatt 53 Nov 13, 2022
    This project uses reinforcement learning on stock market and agent tries to learn trading. The goal is to check if the agent can learn to read tape. The project is dedicated to hero in life great Jesse Livermore.

    Reinforcement-trading This project uses Reinforcement learning on stock market and agent tries to learn trading. The goal is to check if the agent can

    Deepender Singla 1.4k Dec 22, 2022
    A Free and Open Source Python Library for Multiobjective Optimization

    Platypus What is Platypus? Platypus is a framework for evolutionary computing in Python with a focus on multiobjective evolutionary algorithms (MOEAs)

    Project Platypus 424 Dec 18, 2022
    Pytorch implementation of ICASSP 2022 paper Attention Probe: Vision Transformer Distillation in the Wild

    Attention Probe: Vision Transformer Distillation in the Wild Jiahao Wang, Mingdeng Cao, Shuwei Shi, Baoyuan Wu, Yujiu Yang In ICASSP 2022 This code is

    IIGROUP 6 Sep 21, 2022
    MVS2D: Efficient Multi-view Stereo via Attention-Driven 2D Convolutions

    MVS2D: Efficient Multi-view Stereo via Attention-Driven 2D Convolutions Project Page | Paper If you find our work useful for your research, please con

    96 Jan 04, 2023
    A Pytorch implementation of the multi agent deep deterministic policy gradients (MADDPG) algorithm

    Multi-Agent-Deep-Deterministic-Policy-Gradients A Pytorch implementation of the multi agent deep deterministic policy gradients(MADDPG) algorithm This

    Phil Tabor 159 Dec 28, 2022
    Official code for "End-to-End Optimization of Scene Layout" -- including VAE, Diff Render, SPADE for colorization (CVPR 2020 Oral)

    End-to-End Optimization of Scene Layout Code release for: End-to-End Optimization of Scene Layout CVPR 2020 (Oral) Project site, Bibtex For help conta

    Andrew Luo 41 Dec 09, 2022
    A Python training and inference implementation of Yolov5 helmet detection in Jetson Xavier nx and Jetson nano

    yolov5-helmet-detection-python A Python implementation of Yolov5 to detect head or helmet in the wild in Jetson Xavier nx and Jetson nano. In Jetson X

    12 Dec 05, 2022
    ResNEsts and DenseNEsts: Block-based DNN Models with Improved Representation Guarantees

    ResNEsts and DenseNEsts: Block-based DNN Models with Improved Representation Guarantees This repository is the official implementation of the empirica

    Kuan-Lin (Jason) Chen 2 Oct 02, 2022
    Sparse Progressive Distillation: Resolving Overfitting under Pretrain-and-Finetune Paradigm

    Sparse Progressive Distillation: Resolving Overfitting under Pretrain-and-Finetu

    3 Dec 05, 2022
    *ObjDetApp* deploys a pytorch model for object detection

    *ObjDetApp* deploys a pytorch model for object detection

    Will Chao 1 Dec 26, 2021
    Official implementation of the RAVE model: a Realtime Audio Variational autoEncoder

    RAVE: Realtime Audio Variational autoEncoder Official implementation of RAVE: A variational autoencoder for fast and high-quality neural audio synthes

    ACIDS 587 Jan 01, 2023