Image transformations designed for Scene Text Recognition (STR) data augmentation. Published at ICCV 2021 Workshop on Interactive Labeling and Data Augmentation for Vision.

Last update: Dec 28, 2022

Overview

Data Augmentation for Scene Text Recognition (ICCV 2021 Workshop)

(Pronounced as "strog")

Paper

Arxiv

Why it matters?

Scene Text Recognition (STR) requires data augmentation functions that are different from object recognition. STRAug is data augmentation designed for STR. It offers 36 data augmentation functions that are sorted into 8 groups. Each function supports 3 levels or magnitudes of severity or intensity.

Given a source image:

it can be transformed as follows:

warp.py - to generate Curve, Distort, Stretch (or Elastic) deformations

`Curve`	`Distort`	`Stretch`

geometry.py - to generate Perspective, Rotation, Shrink deformations

`Perspective`	`Rotation`	`Shrink`

pattern.py - to create different grids: Grid, VGrid, HGrid, RectGrid, EllipseGrid

`Grid`	`VGrid`	`HGrid`	`RectGrid`	`EllipseGrid`

blur.py - to generate synthetic blur: GaussianBlur, DefocusBlur, MotionBlur, GlassBlur, ZoomBlur

`GaussianBlur`	`DefocusBlur`	`MotionBlur`	`GlassBlur`	`ZoomBlur`

noise.py - to add noise: GaussianNoise, ShotNoise, ImpulseNoise, SpeckleNoise

`GaussianNoise`	`ShotNoise`	`ImpulseNoise`	`SpeckleNoise`

weather.py - to simulate certain weather conditions: Fog, Snow, Frost, Rain, Shadow

`Fog`	`Snow`	`Frost`	`Rain`	`Shadow`

camera.py - to simulate camera sensor tuning and image compression/resizing: Contrast, Brightness, JpegCompression, Pixelate

`Contrast`	`Brightness`	`JpegCompression`	`Pixelate`

process.py - all other image processing issues: Posterize, Solarize, Invert, Equalize, AutoContrast, Sharpness, Color

`Posterize`	`Solarize`	`Invert`	`Equalize`

`AutoContrast`	`Sharpness`	`Color`

Pip install

pip3 install straug

How to use

Command line (e.g. input image is nokia.png):

>>> from straug.warp import Curve
>>> from PIL import Image
>>> img = Image.open("nokia.png")
>>> img = Curve()(img, mag=3)
>>> img.save("curved_nokia.png")

Python script (see test.py):

python3 test.py --image=<target image>

For example:

python3 test.py --image=images/telekom.png

The corrupted images are in results directory.

Reference

Image corruptions (eg blur, noise, camera effects, fog, frost, etc) are based on the work of Hendrycks et al.

Citation

If you find this work useful, please cite:

@inproceedings{atienza2021data,
  title={Data Augmentation for Scene Text Recognition},
  author={Atienza, Rowel},
  booktitle = {IEEE/CVF International Conference on Computer Vision Workshops (ICCVW)},
  year={2021},
  pubstate={published},
  tppubtype={inproceedings}
}

Image transformations designed for Scene Text Recognition (STR) data augmentation. Published at ICCV 2021 Workshop on Interactive Labeling and Data Augmentation for Vision.

Related tags

Overview

Data Augmentation for Scene Text Recognition (ICCV 2021 Workshop)

Paper

Why it matters?

Pip install

How to use

Reference

Citation

Owner

Rowel Atienza

Make a Turtlebot3 follow a figure 8 trajectory and create a robot arm and make it follow a trajectory

Official Implementation of Swapping Autoencoder for Deep Image Manipulation (NeurIPS 2020)

Graph-Refined Convolutional Network for Multimedia Recommendation with Implicit Feedback

Pytorch implementation of paper Semi-supervised Knowledge Transfer for Deep Learning from Private Training Data

Pointer networks Tensorflow2

Code for "Learning the Best Pooling Strategy for Visual Semantic Embedding", CVPR 2021

A TensorFlow implementation of the Mnemonic Descent Method.

A general-purpose, flexible, and easy-to-use simulator alongside an OpenAI Gym trading environment for MetaTrader 5 trading platform (Approved by OpenAI Gym)

Predicting the duration of arrival delays for commercial flights.

Real-time LIDAR-based Urban Road and Sidewalk detection for Autonomous Vehicles 🚗

SeqAttack: a framework for adversarial attacks on token classification models

DeepSpeed is a deep learning optimization library that makes distributed training easy, efficient, and effective.

Python code to fuse multiple RGB-D images into a TSDF voxel volume.

Official pytorch implementation of the IrwGAN for unaligned image-to-image translation

Pytorch implementation of Cut-Thumbnail in the paper Cut-Thumbnail:A Novel Data Augmentation for Convolutional Neural Network.

Keras Image Embeddings using Contrastive Loss

A flexible framework of neural networks for deep learning

SemiNAS: Semi-Supervised Neural Architecture Search

Automatic Idiomatic Expression Detection

Bayesian dessert for Lasagne