Fast image augmentation library and easy to use wrapper around other libraries. Documentation: Paper about library:



PyPI version CI

Albumentations is a Python library for image augmentation. Image augmentation is used in deep learning and computer vision tasks to increase the quality of trained models. The purpose of image augmentation is to create new training samples from the existing data.

Here is an example of how you can apply some augmentations from Albumentations to create new images from the original one: parrot

Why Albumentations

  • Albumentations supports all common computer vision tasks such as classification, semantic segmentation, instance segmentation, object detection, and pose estimation.
  • The library provides a simple unified API to work with all data types: images (RBG-images, grayscale images, multispectral images), segmentation masks, bounding boxes, and keypoints.
  • The library contains more than 70 different augmentations to generate new training samples from the existing data.
  • Albumentations is fast. We benchmark each new release to ensure that augmentations provide maximum speed.
  • It works with popular deep learning frameworks such as PyTorch and TensorFlow. By the way, Albumentations is a part of the PyTorch ecosystem.
  • Written by experts. The authors have experience both working on production computer vision systems and participating in competitive machine learning. Many core team members are Kaggle Masters and Grandmasters.
  • The library is widely used in industry, deep learning research, machine learning competitions, and open source projects.

Table of contents


Alexander Buslaev — Computer Vision Engineer at Mapbox | Kaggle Master

Alex Parinov — Computer Vision Architect at X5 Retail Group | Kaggle Master

Vladimir I. Iglovikov — Senior Computer Vision Engineer at Lyft Level5 | Kaggle Grandmaster

Evegene Khvedchenya — AI/ML Advisor and Independent researcher | Kaggle Master

Mikhail Druzhinin — Computer Vision Engineer at Simicon | Kaggle Expert


Albumentations requires Python 3.6 or higher. To install the latest version from PyPI:

pip install -U albumentations

Other installation options are described in the documentation.


The full documentation is available at

A simple example

import albumentations as A
import cv2

# Declare an augmentation pipeline
transform = A.Compose([
    A.RandomCrop(width=256, height=256),

# Read an image with OpenCV and convert it to the RGB colorspace
image = cv2.imread("image.jpg")
image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)

# Augment an image
transformed = transform(image=image)
transformed_image = transformed["image"]

Getting started

I am new to image augmentation

Please start with the introduction articles about why image augmentation is important and how it helps to build better models.

I want to use Albumentations for the specific task such as classification or segmentation

If you want to use Albumentations for a specific task such as classification, segmentation, or object detection, refer to the set of articles that has an in-depth description of this task. We also have a list of examples on applying Albumentations for different use cases.

I want to know how to use Albumentations with deep learning frameworks

We have examples of using Albumentations along with PyTorch and TensorFlow.

I want to explore augmentations and see Albumentations in action

Check the online demo of the library. With it, you can apply augmentations to different images and see the result. Also, we have a list of all available augmentations and their targets.

Who is using Albumentations

See also:

List of augmentations

Pixel-level transforms

Pixel-level transforms will change just an input image and will leave any additional targets such as masks, bounding boxes, and keypoints unchanged. The list of pixel-level transforms:

Spatial-level transforms

Spatial-level transforms will simultaneously change both an input image as well as additional targets such as masks, bounding boxes, and keypoints. The following table shows which additional targets are supported by each transform.

Transform Image Masks BBoxes Keypoints

A few more examples of augmentations

Semantic segmentation on the Inria dataset


Medical imaging


Object detection and semantic segmentation on the Mapillary Vistas dataset


Keypoints augmentation

Benchmarking results

To run the benchmark yourself, follow the instructions in benchmark/

Results for running the benchmark on the first 2000 images from the ImageNet validation set using an Intel Xeon Gold 6140 CPU. All outputs are converted to a contiguous NumPy array with the np.uint8 data type. The table shows how many images per second can be processed on a single core; higher is better.

torchvision (Pillow-SIMD backend)
HorizontalFlip 9909 2821 2267 873 2301 6223
VerticalFlip 4374 2218 1952 4339 1968 3562
Rotate 371 296 163 27 60 345
ShiftScaleRotate 635 437 147 28 - -
Brightness 2751 1178 419 229 418 2300
Contrast 2756 1213 352 - 348 2305
BrightnessContrast 2738 699 195 - 193 1179
ShiftRGB 2757 1176 - 348 - -
ShiftHSV 597 284 58 - - 137
Gamma 2844 - 382 - - 946
Grayscale 5159 428 709 - 1064 1273
RandomCrop64 175886 3018 52103 - 41774 20732
PadToSize512 3418 - 574 - - 2874
Resize512 1003 634 1036 - 1016 977
RandomSizedCrop_64_512 3191 939 1594 - 1529 2563
Posterize 2778 - - - - -
Solarize 2762 - - - - -
Equalize 644 413 - - 735 -
Multiply 2727 1248 - - - -
MultiplyElementwise 118 209 - - - -
ColorJitter 368 78 57 - - -

Python and library versions: Python 3.8.6 (default, Oct 13 2020, 20:37:26) [GCC 8.3.0], numpy 1.19.2, pillow-simd 7.0.0.post3, opencv-python, scikit-image 0.17.2, scipy 1.5.2.


To create a pull request to the repository, follow the documentation at


In some systems, in the multiple GPU regime, PyTorch may deadlock the DataLoader if OpenCV was compiled with OpenCL optimizations. Adding the following two lines before the library import may help. For more details



If you find this library useful for your research, please consider citing Albumentations: Fast and Flexible Image Augmentations:

    AUTHOR = {Buslaev, Alexander and Iglovikov, Vladimir I. and Khvedchenya, Eugene and Parinov, Alex and Druzhinin, Mikhail and Kalinin, Alexandr A.},
    TITLE = {Albumentations: Fast and Flexible Image Augmentations},
    JOURNAL = {Information},
    VOLUME = {11},
    YEAR = {2020},
    NUMBER = {2},
    ARTICLE-NUMBER = {125},
    URL = {},
    ISSN = {2078-2489},
    DOI = {10.3390/info11020125}
  • [TensorFlow] Failed to get reproducible trainings with albumentations included to the data pipeline

    [TensorFlow] Failed to get reproducible trainings with albumentations included to the data pipeline

    🐛 Bug

    I could not get my training work in reproducible way when albumentations added to the data pipeline. I followed this thread and fixed all possible seeds, so in overall my snippet that should have enabled reproducible experiments looks like this:

    import os
    import random
    import numpy as np
    import tensorflow as tf
    def set_random_seed(seed: int = 42):
        Globally fix all possible sources of randomness to keep experiment reproducible 
        os.environ['PYTHONHASHSEED'] = str(seed)
        os.environ['TF_DETERMINISTIC_OPS'] = '1'
        os.environ['TF_CUDNN_DETERMINISTIC'] = '1'

    Unfortunately, this doesn't help me to get reproducible results. I have executed training process 6 times and got all different results. You can also see the whole picture in W&B:

    • (best_val_acc: 0.7104, best_epoch: 3)
    • (best_val_acc: 0.7875, best_epoch: 8)
    • (best_val_acc: 0.6771, best_epoch: 8)
    • (best_val_acc: 0.7729, best_epoch: 6)
    • (best_val_acc: 0.7208, best_epochs: 0 and 8)
    • (best_val_acc: 0.8, best_epoch: 9)
    Screenshot 2021-05-23 at 12 29 29
    • Mean: 0.74478
    • Std: 0.044726

    Also, I tried to set random.seed() right before passing my batch into a.Compose() pipeline. That did not really help.

    However, when I comment out albumentations from my data pipeline or replace it with some pure TF augmentations, I can get my training reproducible.

    Any clues what's wrong here?

    To Reproduce

    Steps to reproduce the behavior:

    1. Clone the project state at 0.1.0-bugrep tag:
    git clone --depth 1 --branch 0.1.0-bugrep
    1. Pull dataset:
    cd data
    kaggle datasets download --unzip frtgnn/rock-paper-scissor
    1. Install project deps:
    poetry install
    1. Uncomment any of the reported augmentations in the config file (they are all commented out in the git):

    2. Run training a couple of times and you get results that differs by a lot:


    Expected behavior

    In order to do experiments that analyze impact of different ideas and changes, I would like to see my training process reproducible.


    • Albumentations version (e.g., 0.1.8): 0.5.2
    • Python version (e.g., 3.7): 3.8.6
    • OS (e.g., Linux): Ubuntu 20.10
    • How you installed albumentations (conda, pip, source): poetry (pip-like)
    • tensorflow-gpu: 2.5.0 (for the sake of compatibility with RTX3070 (ampere arch.))

    Additional context

    This report is reproduced in a project that is also mentioned in

    The data pipeline is the same for both issues:

    def augment_image(inputs, labels, augmentation_pipeline: a.Compose):
        def apply_augmentation(images):
            aug_data = augmentation_pipeline(image=images.astype('uint8'))
            return aug_data['image']
        inputs = tf.numpy_function(func=apply_augmentation, inp=[inputs], Tout=tf.uint8)
        return inputs, labels
    def get_dataset(
            dataset_path: str,
            subset_type: str,
            augmentation_pipeline: a.Compose,
            validation_fraction: float = 0.2,
            batch_size: int = 32,
            image_size: Tuple[int, int] = (300, 300),
            seed: int = 42
    ) ->
        augmentation_func = partial(
        dataset = image_dataset_from_directory(
        return dataset \
            .map(augmentation_func, num_parallel_calls=AUTOTUNE) \
    Tensorflow Reproducibility 
    opened by roma-glushko 19
  • ValueError: Expected x_max for bbox (0.94375, 0.5775173611111111, 1.003125, 0.6372395833333333, 0) to be in the range [0.0, 1.0], got 1.003125.

    ValueError: Expected x_max for bbox (0.94375, 0.5775173611111111, 1.003125, 0.6372395833333333, 0) to be in the range [0.0, 1.0], got 1.003125.

    🐛 Bug

    I tried to use any of transforms like VerticalFlip, RandomSizedBBoxSafeCrop and others box coordinate transformations but always i got the error "Expected x_max for bbox (0.9515625, 0.5316840277777778, 1.003125, 0.6955729166666667, 0) to be in the range [0.0, 1.0], got 1.003125". if i replace lines x_min, x_max = x_min / cols, x_max / cols, y_min, y_max = y_min / rows, y_max / rows in in normalize_bbox method by x_min, x_max = min(x_min / cols, 1.0), min(x_max / cols, 1.0), y_min, y_max = min(y_min / rows, 1.0), min(y_max / rows, 1.0) . it works correctly.

    To Reproduce

    Steps to reproduce the behavior:

    1. transforms = [ VerticalFlip(), RandomBrightnessContrast(), RandomShadow(p=0.5), RandomSnow(p=0.5), RandomFog(), JpegCompression()] augmentor = Compose(transforms, bbox_params=BboxParams(format='yolo', label_fields=['category_id']))
    2. Input bboxes [[0.492578125, 0.5118055555555555, 0.01328125, 0.02638888888888889], [0.501171875, 0.5013888888888889, 0.01171875, 0.019444444444444445], [0.509765625, 0.5020833333333333, 0.01328125, 0.020833333333333332], [0.51640625, 0.51875, 0.0265625, 0.034722222222222224], [0.581640625, 0.5131944444444444, 0.02265625, 0.029166666666666667], [0.613671875, 0.5145833333333333, 0.02734375, 0.034722222222222224], [0.7546875, 0.5319444444444444, 0.0859375, 0.08055555555555556], [0.46796875, 0.5423611111111111, 0.065625, 0.10138888888888889], [0.9734375, 0.6097222222222223, 0.0515625, 0.1638888888888889]]

    Traceback (most recent call last): File "/home/robo/Code/Python/ONNX/", line 655, in for batch_data, boxes in det_dataset.get_batchGPU(batch_size): File "/home/robo/Code/Python/ONNX/", line 609, in get_batchGPU max_length_box = self.get_image(start_index, batch_size, batch, labels) File "/home/robo/Code/Python/ONNX/", line 579, in get_image sample = self.getItemGPURandomGreed(start_index) File "/home/robo/Code/Python/ONNX/", line 569, in getItemGPURandomGreed return self.getItemGPUVariableGreed(indx, np.random.randint(1, 3), np.random.randint(1, 3)) File "/home/robo/Code/Python/ONNX/", line 564, in getItemGPUVariableGreed aug = augmentor(**annotation) File "/home/robo/.local/lib/python3.6/site-packages/albumentations/core/", line 174, in call p.preprocess(data) File "/home/robo/.local/lib/python3.6/site-packages/albumentations/core/", line 63, in preprocess data[data_name] = self.check_and_convert(data[data_name], rows, cols, direction="to") File "/home/robo/.local/lib/python3.6/site-packages/albumentations/core/", line 71, in check_and_convert return self.convert_to_albumentations(data, rows, cols) File "/home/robo/.local/lib/python3.6/site-packages/albumentations/augmentations/", line 51, in convert_to_albumentations return convert_bboxes_to_albumentations(data, self.params.format, rows, cols, check_validity=True) File "/home/robo/.local/lib/python3.6/site-packages/albumentations/augmentations/", line 305, in convert_bboxes_to_albumentations return [convert_bbox_to_albumentations(bbox, source_format, rows, cols, check_validity) for bbox in bboxes] File "/home/robo/.local/lib/python3.6/site-packages/albumentations/augmentations/", line 305, in return [convert_bbox_to_albumentations(bbox, source_format, rows, cols, check_validity) for bbox in bboxes] File "/home/robo/.local/lib/python3.6/site-packages/albumentations/augmentations/", line 253, in convert_bbox_to_albumentations check_bbox(bbox) File "/home/robo/.local/lib/python3.6/site-packages/albumentations/augmentations/", line 332, in check_bbox "to be in the range [0.0, 1.0], got {value}.".format(bbox=bbox, name=name, value=value) ValueError: Expected x_max for bbox (0.9515625, 0.5316840277777778, 1.003125, 0.6955729166666667, 0) to be in the range [0.0, 1.0], got 1.003125.


    • Albumentations version 0.4.2:
    • Python version 3.6.8:
    • OS Ubuntu 18.04:
    • pip :
    opened by adelkaiarullin 19
  • RandomShadow input type wrong

    RandomShadow input type wrong

    🐛 Bug

    Weather transformation. For RandomRain, RandomSnow, RandomSunFlare the inputs are just numpy uint8. However, RandomShadow does not allow the same input format.

    To Reproduce

    albu_shadow = albu.RandomShadow(p=1, num_shadows_lower=1, num_shadows_upper=1, shadow_dimension=5, shadow_roi=(0, 0.5, 1, 1))
    x_np = albu_shadow(image=x_np)['image']


    TypeError: Expected Ptr<cv::UMat> for argument img

    Expected behavior

    It should take the same uint8 numpy array as input.


    • Albumentations version: 0.4.5
    • Python version: 3.7.6
    • OS (e.g., Linux): Linux
    • How you installed albumentations: pip
    opened by shamangary 15
  • Random Crop yields incorrect value for bounding box

    Random Crop yields incorrect value for bounding box

    🐛 Bug

    The bbox_random_crop function does not produce a reasonable result. Consider the following snippet of code.

    To Reproduce

    from albumentations import functional as F
    bbox = (0.129, 0.7846, 0.1626, 0.818)
    cropped_bbox = F.bbox_random_crop(bbox=bbox, crop_height=100, crop_width=100, h_start=0.7846, w_start=0.12933, rows=1500, cols=1500)
    (0.125, 0.788999, 0.629, 1.29)
    #Notice y2 is outside of crop range.
    #But the following assert passes
    assert bbox[3] < (100/1500) + 0.7846
    assert all([(y >= 0) & (y<=1) for y in list(cropped_bbox)])

    Expected behavior

    A augmented bbox that is fully within the image crop. The crop_height plus the start of the crop is larger than the y2 of the bounding box, but 1.29 coordinate in the cropped box suggestion it is outside of the crop area.


    • Albumentations version (e.g., 0.1.8): 0.5.2
    • Python version (e.g., 3.7): 3.7
    • OS (e.g., Linux): OSX
    • How you installed albumentations (conda, pip, source): pip
    • Any other relevant information:

    Additional context

    I am making a custom augmentation to Zoom in on a given bounding box. CropSafe (but not all boxes). Is there syntax that i'm misunderstanding, it doesn't feel like this could be the case. Dtype issue?

    opened by bw4sz 14
  • Changed downscale interpolation to avoid aliasing

    Changed downscale interpolation to avoid aliasing

    Hi !

    I recently used the albumentation library for a Kaggle competition and more particularly the Downscale transform.

    After looking at the result the transform gave me I was a little bit surprised:

    Result using bilinear interpolation


    Result using bicubic interpolation


    We can see a lot of artifacts and aliasing happening here.

    After checking the source code, I noticed that the same interpolation method was used both for the downscaling part and for the upscaling to the original size part. However, as the OpenCV doc mentions:

    cv2.INTER_AREA: resampling using pixel area relation. It may be a preferred method for image decimation, as it gives moire’-free results. But when the image is zoomed, it is similar to the INTER_NEAREST method

    This was indeed the case, here are the results of the same image but with cv2.INTER_AREA applied for the downscaling part:

    Result using bilinear interpolation


    Result using bicubic interpolation


    So we can see that now the images are of way better quality and better recreates what actual image resizing might look like, which is the main goal of the data transformation.

    Awaiting merge 
    opened by nathanhubens 14
  • RandomSunFlare dump

    RandomSunFlare dump

    🐛 Bug

    To Reproduce

    add function A.RandomSunFlare(p=0.2)

    Steps to reproduce the behavior:

    /Pytorch/lib/python3.6/site-packages/albumentations/augmentations/", line 863, in add_sun_flare, (x, y), rad3, (r_color, g_color, b_color), -1) cv2.error: OpenCV(4.5.4) :-1: error: (-5:Bad argument) in function 'circle'

    Overload resolution failed:

    • Can't parse 'center'. Sequence item with index 1 has a wrong type
    • Can't parse 'center'. Sequence item with index 1 has a wrong type

    Expected behavior


    • Albumentations version (e.g., 0.1.8): albumentations 1.1.0
    • Python version (e.g., 3.6): 3.6
    • OS (e.g., Linux): linux
    • How you installed albumentations (conda, pip, source): pip
    • Any other relevant information:

    Additional context

    bug Need more info 
    opened by Tim5Tang 13
  • YOLO format without normalization and denormalization

    YOLO format without normalization and denormalization

    Since yolo and albumentations are normalized formats, we don't need to normalize and denormalize the values in the conversion step. The previous approach gave round-off errors.

    These changes should fix the following issues:

    • #922
    • #903
    • #862
    • #883
    • #848
    • #679
    opened by Dipet 12
  • Implementation of #617. `check_validity` BBox parameter

    Implementation of #617. `check_validity` BBox parameter

    Fix #617 check_validity parameter is added to BboxParams. Setting it to False gives a way to handle bounding boxes extending beyond the image. See motivation for it in #617.

    Need more info Branch conflict 
    opened by IlyaOvodov 12
  • ToTransform before Normalize causes Tensor no attribute astype Error

    ToTransform before Normalize causes Tensor no attribute astype Error

    This is my albumentations transform. Before, this was Normalize --> ToTensor. Changing the order (which I think is the right order) produces an error.

    def get_transforms(phase, mean, std):
        list_transforms = []
        if phase == "train":
                        shift_limit=0,  # no resizing
                        rotate_limit=10, # rotate
                    # albu.RandomRotate90(),
                    # albu.Cutout(),
                    # albu.RandomBrightnessContrast(
                    #     brightness_limit=0.2, contrast_limit=0.2, p=0.3
                    # ),
                    # # albu.GridDistortion(p=0.3),
                    # albu.HueSaturationValue(p=0.2),
                    # albu.RandomContrast(p=0.2),
                    # albu.MedianBlur(p=0.2)
                    # Resize(320, 480),
                Normalize(mean=mean, std=std, p=1),
        list_trfms = Compose(list_transforms)
        return list_trfms

    When loading using DataLoader, it generates an error

         90     denominator = np.reciprocal(std, dtype=np.float32)
    ---> 92     img = img.astype(np.float32)
         93     img -= mean
         94     img *= denominator
    AttributeError: 'Tensor' object has no attribute 'astype'
    opened by sarmientoj24 11
  • HorizontalFlip and VerticalFlip inconsistent with multilabel masks

    HorizontalFlip and VerticalFlip inconsistent with multilabel masks

    🐛 Bug

    Given a pair of image and its corresponding mask, the generated output for the augmented mask through augmentation is not the same as the image.(it is inconsistent) when HorizontalFlip() and VerticalFlip() are included in the augmentations.

    To Reproduce

    The following snippet is a small dataloader that i usually use. Can't share code.

        self.aug = Compose([
                            # HorizontalFlip(),
                            # VerticalFlip(),
    def __getitem__(self, patient_id):
        image_path = os.path.join(self.df.iloc[patient_id, 0])
        z = np.load(image_path)
        image = z['patch']
        gt_data = z['patch_gt']
        # print("Pre : ", image.shape, gt_data.shape)
        gt_data = gt_data.swapaxes(0, 2)
        # print("Pre swapped: ", image.shape, gt_data.shape)
        # gt_data = gt_data[:4, :, :]
        if not self.valid:
            augmented = self.aug(image=image, mask=gt_data)
            image = augmented['image']
            gt_data = augmented['mask']
        image = (image/255).astype(np.float32)
        # print("Post Augment :", image.shape, gt_data.shape)
        image = image.swapaxes(0, 2)
        gt_data = gt_data.swapaxes(0, 2)
        # print("Post Augment Swapped: ", image.shape, gt_data.shape)
        image = torch.FloatTensor(image)
        gt_data = torch.FloatTensor(gt_data)
        #mask.shape = (5, 256, 256)
        #image.shape = (256, 256, 3)
        return image, gt_data

    Expected behavior

    The augmentation over the images for horizontal and vertical flip should be working fine for both the mask and the image, but for some reason, there are errors in mask augmentations during horizontal and vertical flips.

    Image shape : 256, 256, 3 Mask Shape : 5, 256, 256


    • Albumentations version : 0.43.
    • Python version : 3.6
    • OS (e.g., Linux): Linux
    • How you installed albumentations (conda, pip, source): pip
    • Any other relevant information:

    Additional context

    opened by Geeks-Sid 11
  • Shift augmentation in `ShiftScaleRotate` works incorrect for keypoints and bboxes

    Shift augmentation in `ShiftScaleRotate` works incorrect for keypoints and bboxes

    Version: 1.12 Shift augmentation in ShiftScaleRotate works incorrect for keypoints and bboxes. Please compare how it's applied to img:


    and keypoints:

    'dx' and 'dy' is percentage values of image width and height. As we don't have access to image shape during these transforms it may be good to set shift range in pixels not in percents.

    opened by mortido 11
  • Test-Time-Augmentation Demo-Notebook for Tensorflow

    Test-Time-Augmentation Demo-Notebook for Tensorflow

    I posted an issue in the "Albumentations-Example" group but it doesn't seem to be getting any attention. Cross-posting here to see if it can get the required attention from Albumentations team. Test-Time-Augmentation Demo-Notebook for Tensorflow Thank you.

    opened by RachelRamirez 0
  • RandomGridShuffle error

    RandomGridShuffle error

    🐛 Bug

    RandomGridShuffle does not function properly.

    To Reproduce

    display_ims = 20
    grid = (3, 3)
    p = 0.5
    for i in range(display_ims):   
        tfs = A.Compose([A.RandomGridShuffle()])
        tfs_im = tfs(image=im)
        plt.subplot(4, display_ims // 4, i+1)


    • Albumentations version (1.3.0):
    • Python version (3.9):
    • OS (Linux):
    • How you installed albumentations (pip):

    Additional context

    height_split = np.linspace(0, height, n + 1,
    width_split = np.linspace(0, width, m + 1,

    AttributeError: module 'numpy' has no attribute 'int'

    opened by bekhzod-olimov 0
  • Support different rgb to grayscale methods.

    Support different rgb to grayscale methods.

    Support different methods for grayscale conversions:

    • Luminosity with different coefficients.
    • Lightness
    • Average Some info you can find on wiki:
    good first issue feature request 
    opened by Dipet 0
  • Rotate & SafeRotate doesn't properly rotate the label in YOLO format

    Rotate & SafeRotate doesn't properly rotate the label in YOLO format

    🐛 Bug

    I'm using the following code to rotate the image and its label -

    def bboxes2TxtFile(bboxes, category_ids, output_path):
        with open(output_path, 'w') as f:
            for i, bbox in enumerate(bboxes):
                f.write(f"{category_ids[i]} {bbox[0]} {bbox[1]} {bbox[2]} {bbox[3]}\n")
    transform = A.Compose(
                    A.SafeRotate(always_apply=True, p=1.0, limit=(-360, 360), interpolation=0, border_mode=0, value=(0, 0, 0), mask_value=None)
                bbox_params=A.BboxParams(format='yolo', label_fields=['category_ids']),
            transformed = transform(image=image, bboxes=bboxes, category_ids=category_ids)
            cv2.imwrite(outputImageDir+"SR_"+imageFile, transformed['image'])

    But I'm getting incorrect labels for the same. I used A.Rotate too but still, the same error persists. I'm attaching screenshots of the visualization.

    Rotated Label's visualization (Red lines drawn manually represent the expected output) - image

    Image without augmentation - image

    Test image and label for testing -

    opened by pillai-karthik 0
  • The `add_targets` method sets targets instead of adding them

    The `add_targets` method sets targets instead of adding them

    Suppose you do

    import albumentations as A
    t = A.ToGray()  # Works with all transformations
    t.add_targets({"my_image1": "image"})
    t.add_targets({"my_image2": "image"})

    You get {'my_image2': 'image'}. That seems to be what you want since you (albu) wrote

    class BasicTransform(Serializable):
        def add_targets(self, additional_targets):
            self._additional_targets = additional_targets

    But, given the name of the function, and the docstring "Add targets to transform them the same way as one of existing targets", I expected {'my_image1': "image", 'my_image2': "image"}.

    My own use case is very uncommon, but I could see other people adding some targets in 2 different places for some other reason. I would suggest either:

    • Replace self._additional_targets = additional_targets by self._additional_targets.update(additional_targets)
    • Rename add_targets to set_targets / set_additional_targets

    For completeness purpose, my own use case is that I have a child class (named Tee) of BasicTransform which outputs 2 keys from one key. So my pipeline looks like:

    before_tee1 = A.SomeTransform(...)
    before_tee2 = A.SomeTransform(...)
    tee = Tee(...)
    after_tee1 = A.SomeTransform(...)
    after_tee2 = A.SomeTransform(...)
    for transfo in [after_tee1, after_tee2]:
        transfo.add_targets({"image_copy": "image"})
    dynamic_composed_transfo = A.Compose(
        [before_tee1, before_tee2, tee, after_tee1, after_tee2], additional_targets=dynamic_targets)
    opened by ernest-tg 2
  • 1.3.0(Sep 20, 2022)


    Breaking changes

    • Renamed method to rotate_method inside Rotate to keep consistency between naming parameters. (#1258 by @Dipet, thanks to @MichaelMonashev)

    New augmentations

    • RandomCropFromBorders - Crops image based on indents from image borders. (#1240 by @Dipet based on #476 by @ZFTurbo)
    • BBoxSafeRandomCrop - Crops image without loss of bboxes. Instead of RandomSizedBBoxSafeCrop this implementation do not apply resize to target size. (#579 by @SunQpark)
    • Spatter - Simulates corruption which can occlude a lens in the form of rain or mud. (#573 by @akarsakov)
    • Defocus - Imitates lens defocusing. (#551 by @akarsakov)
    • ZoomBlur - Imitates lens blur on zoomig. (#551 by @akarsakov)


    • Fixed wrong result in RandomBrightnessContrast when brightness_by_max=False. (#487 by @Dipet)
    • Fixed wrong bbox clipping inside Perspective and Affine. (#1231 by @Dipet)
    • Fixed incorrect removal of bboxes when min_visibility=0 or min_visibility=1. (#616 by @IlyaOvodov)
    • Fixed wrong keypoint's cropping inside Rotate when crop_border=True. (#1250 by @Dipet, thanks to @jonkoi)
    • Fixed wrong propagation of always_apply Compose children. (#561 by @albu)
    • RandomSunFlare now correctly works with src_color, and use all three color values. (#1285 by @hoel-bagard)
    • RandomGamma now correctly works with float gamma_limit. (#1286 by @zahragolpa)

    Minor changes:

    • Speeded up Normalize in some case up to 2 times. (#563 by @Dipet)
    • GridDistortion, ElasticTransform and OpticalDistortion now supports bbox targets. (#476, #1262 by @ZFTurbo and @Dipet)
    • MotionBlur now supports allow_shifted flag. When it's value is False only non shifted kernels generated. (#1239 by @Dipet)
    • Updated versions of type formatters. (#1245 by @ternaus)
    • GridDistortion now supports normalized flag. When it is set to True will be applied distortion inside image border. (#722 by @poke1024)
    • Now you can describe downscale and upscale interpolation method for Downscale. This is needed to avoid interpolation artefacts. (#584 by @nathanhubens)
    • Refactoring. Spatial transforms moved to geometric files. (#1241 by @ternaus)
    • Refactoring. Common functions moved into (#1260 by @Dipet)
    • Refactoring. Blur transforms moved into albumentations.augmentations.blur. (#1259 by @Dipet)
    Source code(tar.gz)
    Source code(zip)
  • 1.2.1(Jul 12, 2022)


    Minor changes

    • A.Rotate and A.ShiftScaleRotate now support new rotation method for bounding boxes, ellipse. (#1203 by @victor1cea)
    • A.Rotate now supports new argument crop_border. If set to True, the rotated image will be cropped as much as possible to eliminate pixel values at the edges that were not well defined after rotation. (#1214 by @bonlime)
    • Tests that use multiprocessing now run much faster (#1218 by @Dipet)
    • Improved type hints (#1219 by @Dipet )
    • Fixed a deprecation warning in match_histograms. (#1121 by @BloodAxe)


    • A.CropNonEmptyMaskIfExists modified the first element of masks in-place. Now, this behavior is fixed and A.CropNonEmptyMaskIfExists doesn't do in-place modification of input masks. (#1193 by @ORippler).
    • Albumentations now correctly serialized and desirealized fill_value and mask_fill_value parameters for A.GridDropout. (#1191 by @victor1cea)
    • A.ColorJitter now correctly works with A.ReplayCompose. (#1199 by @zakajd)
    • Fixed incorrect behavior of A.ColorJitter for np.float32 input images when contrast is set to 0 (previously, all values were set to 0.5 instead of using the average value).. (#1207 by @Dipet)
    • A.Rotate, A.Affine and A.ShiftScaleRotate now do rotation in the same way. Fixed incorrect rotation angle for A.Affine. A.Rotate and A.ShiftScaleRotate now correctly rotate the keypoints 90 degrees and don't leave black lines around the edges of the image. (#1091 by @Dipet )
    Source code(tar.gz)
    Source code(zip)
  • 1.2.0(Jun 15, 2022)

    New augmentations

    New augmentations

    New augmentations:

    • A.UnsharpMask. This transform sharpens the input image using Unsharp Masking processing and overlays the result with the original image. (#1063 by @zakajd)
    • A.RingingOvershoot. This transform creates ringing or overshoot artifacts by convolving the image with a 2D sinc filter. (#1064 by @zakajd)
    • A.AdvancedBlur. This transform blurs the input image using a Generalized Normal filter with randomly selected parameters. It also adds multiplicative noise to generated kernel before convolution. (#1066 by @zakajd)
    • A.PixelDropout. This transformation randomly replaces pixels with the passed value. (#1082 by @Dipet)


    • Fixed a problem that prevented A.RandomShadow from working with non-contiguous input. (#1117 by @i-aki-y)
    • A.PadIfNeeded now works with an arbitrary number of channels. (#1069 by @BloodAxe)
    • Fixed all np.random use cases to prevent identical values when using multiprocessing. (#1070 by @Dipet)
    • The slant param now has an effect in A.RandomRain. (#1179 by @victor1cea)
    • translate_percent now uses 0 as a default value in the A.Affine transform. (#1183 by @victor1cea)
    • A.SafeRotate no longer loses blocks and keypoints. (#1109 by @Dipet)
    • A.CropAndPad now correctly handles bboxes when keep_size=True. (#1059 by @cannon)
    • A.RandomCrop, A.RandomSizedCrop, and A.RandomSizedBBoxSafeCrop now sample last pixel. (#1080 by @Multihuntr)

    Minor changes:

    • Old code is refactored, and more type hints are added (#1052 by @Dipet).
    • A.Compose now warns the user if it receives a single augmentation instead of a sequence of augmentations. (#1055 by @Dipet)
    • A.CoarseDropout and A.RandomGridShuffle now support keypoints. (#1084 by @BloodAxe)
    • A.ToTensorV2 now supports the masks target. (#1097 by @alessiobonfiglio)
    • A.PadIfNeeded now supports random padding. (#1160 by @mys007 )
    • Improved and corrected documentation: #1047 by @shyn, #1164 by @notplus, #1105 by @i-aki-y
    • Speeded up tests by removing unnecessary tests. (#1188 by @creafz)
    • A.Affine now has keep_ratio flag. (#1104 by @i-aki-y)
    Source code(tar.gz)
    Source code(zip)
  • 1.1.0(Oct 4, 2021)


    New augmentations

    • TemplateTransform. This transform allows the blending of an input image with specified templates. (#572 by @akarsakov )
    • PixelDistributionAdaptation. A new domain adaptation augmentation. It fits a simple transform on both the original and reference image, transforms the original image with transform trained on this image, and performs inverse transformation using transform fitted on the reference image. See the examples of this transform in the qudida repository. (#959 by @arsenyinfo)

    Minor changes:

    • LongestMaxSize and SmallestMaxSize now can also accept a list of sizes as their max_size argument and the actual max_size value will be sampled randomly from this list. (#930 by @kmistry-wx )
    • A.Affine now accepts mask_interpolation as a parameter. (#975 by @dskkato )
    • A.RandomRain now alters brightness in HSV space instead of HLS space to prevent image corruption. (#990 by @ErlingLie)
    • Albumentations now raises ValueError if bbox_params is not specified and bbox transformation is called (#1013 by @VirajBagal)
    • CoarseDropout can now set the height and width of holes based on the fraction of original image height and width (#1014 by @VirajBagal )
    • ElasticTransform got performance optimizations. (#1004 by @b0nce)


    • Fixed a bug when CropNonEmptyMaskIfExists thrown an error when it was used with a keypoint even though keypoints were mentioned as a correct target. (#986 by @GalDude33 )
    • Fixed KeyError with RandomCropNearBBox when it received values with x_min <= 0 or y_min <= 0 (#993 by @Dipet )
    Source code(tar.gz)
    Source code(zip)
  • 1.0.3(Jul 15, 2021)

    • Fixed problem with incorrect shape at keypoints and bboxes processors after ToTensorV2 #963
    • Fixed problems with float values in YOLO format in edge cases #958
    Source code(tar.gz)
    Source code(zip)
  • 1.0.2(Jul 9, 2021)

    1. Fixed YOLO format conversion problem when bbox greater than image by 1 pixel. Now YOLO bbox will be converted to Albumentations format without bbox denormalization. More info in PR: #924
    2. Removed redundant search of first & last dual transform #946
    Source code(tar.gz)
    Source code(zip)
  • 1.0.1(Jul 6, 2021)

    Added position argument to PadIfNeeded (#933 by @yisaienkov)

    Possible values: center top_left, top_right, bottom_left, bottom_right, with center being the default value.

    One possible use case for this feature is object detection where you need to pad an image to square, but you want predicted bounding boxes being equal to the bounding box of the unpadded image.

    image_padding_2 image source

    Source code(tar.gz)
    Source code(zip)
  • 1.0.0(Jun 1, 2021)

    Breaking changes

    • imgaug dependency is now optional, and by default, Albumentations won't install it. This change was necessary to prevent simultaneous install of both opencv-python-headless and opencv-python (you can read more about the problem in this issue). If you still need imgaug as a dependency, you can use the pip install -U albumentations[imgaug] command to install Albumentations with imgaug.
    • Deprecated augmentation ToTensor that converts NumPy arrays to PyTorch tensors is completely removed from Albumentations. You will get a RuntimeError exception if you try to use it. Please switch to ToTensorV2 in your pipelines.

    New augmentations

    • A.RandomToneCurve. See a notebook for examples of this augmentation (#839 by @aaroswings)
    • SafeRotate. Safely Rotate Images Without Cropping (#888 by @deleomike)
    • SomeOf transform that applies N augmentations from a list. Generalizing of OneOf (#889 by @henrique)
    • We are deprecating imgaug transforms and providing Albumentations' implementations for them. (#786 by @KiriLev, #787 by @KiriLev, #790, #843, #844, #849, #885, #892)

    By default, Albumentations doesn't require imgaug as a dependency. But if you need imgaug, you can install it along with Albumentations by running pip install -U albumentations[imgaug].

    Here is a table of deprecated imgaug augmentations and respective augmentations from Albumentations that you should use instead:

    | Old deprecated augmentation | New augmentation | |-----------------------------|------------------| | IAACropAndPad | CropAndPad | | IAAFliplr | HorizontalFlip | | IAAFlipud | VerticalFlip | | IAAEmboss | Emboss | | IAASharpen | Sharpen | | IAAAdditiveGaussianNoise | GaussNoise | | IAAPerspective | Perspective | | IAASuperpixels | Superpixels | | IAAAffine | Affine | | IAAPiecewiseAffine | PiecewiseAffine |

    Major changes

    • Serialization logic is updated. Previously, Albumentations used the full classpath to identify an augmentation (e.g. albumentations.augmentations.transforms.RandomCrop). With the updated logic, Albumentations will use only the class name for augmentations defined in the library (e.g., RandomCrop). For custom augmentations created by users and not distributed with Albumentations, the library will continue to use the full classpath to avoid name collisions (e.g., when a user creates a custom augmentation named RandomCrop and uses it in a pipeline).

      This new logic will allow us to refactor the code without breaking serialized augmentation pipelines created using previous versions of Albumentations. This change will also reduce the size of YAML and JSON files with serialized data.

      The new serialization logic is backward compatible. You can load serialized augmentation pipelines created in previous versions of Albumentations because Albumentations supports the old format.


    • Fixed a bug that prevented A.ReplayCompose to work with bounding boxes and keypoints correctly. (#748)
    • A.GlassBlur now correctly works with float32 inputs (#826)
    • MultiplicativeNoise now correctly works with gray images with shape [h, w, 1]. (#793)

    Minor changes

    • Code for geometric transforms moved to a standalone module albumentations.augmentations.geometric. (#784)
    • Code for crop transforms moved to a standalone module albumentations.augmentations.crops. (#791)
    • CI now runs tests under Python 3.9 as well (#830)
    • Linters and code formatters for CI and pre-commit hooks are updated to the latest versions (#831)
    • Logic in that detects existing installations of OpenCV now also looks for opencv-contrib-python and opencv-contrib-python-headless (#837 by @agchang-cgl)
    Source code(tar.gz)
    Source code(zip)
  • 0.5.2(Nov 29, 2020)

    Minor changes

    • ToTensorV2 now automatically expands grayscale images with the shape [H, W] to the shape [H, W, 1]. PR #604 by @Ingwar.
    • CropNonEmptyMaskIfExists now also works with multiple masks that are provided by the masks argument to the transform function. Previously this augmentation worked only with a single mask provided by the mask argument. PR #761
    Source code(tar.gz)
    Source code(zip)
  • 0.5.1(Nov 2, 2020)

    Breaking changes

    • API for A.FDA is changed to resemble API of A.HistogramMatching. Now, both transformations expect to receive a list of reference images, a function to read those image, and additional augmentation parameters. (#734)
    • A.HistogramMatching now usesread_rgb_image as a default read_fn. This function reads an image from the disk as an RGB NumPy array. Previously, the default read_fn was cv2.imread which read an image as a BGR NumPy array. (#734)

    New transformations

    • A.Sequential transform that can apply augmentations in a sequence. This transform is not intended to be a replacement for A.Compose. Instead, it should be used inside A.Compose the same way A.OneOf or A.OneOrOther. For instance, you can combine A.OneOf with A.Sequential to create an augmentation pipeline containing multiple sequences of augmentations and apply one randomly chosen sequence to input data. (#735)

    Minor changes

    • A.ShiftScaleRotate now has two additional optional parameters: shift_limit_x and shift_limit_y. If either of those parameters (or both of them) is set A.ShiftScaleRotate will use the set values to shift images on the respective axis. (#735)
    • A.ToTensorV2 now supports an additional argument transpose_mask (False by default). If the argument is set to True and an input mask has 3 dimensions, A.ToTensorV2 will transpose dimensions of a mask tensor in addition to transposing dimensions of an image tensor. (#735)


    • A.FDA now correctly uses coordinates of the center of an image. (#730)
    • Fixed problems with grayscale images for A.HistogramMatching. (#734)
    • Fixed a bug that led to an exception when A.load() was called to deserialize a pipeline that contained A.ToTensor or A.ToTensorV2, but those transforms were not imported in the code before the call. (#735)
    Source code(tar.gz)
    Source code(zip)
  • 0.5.0(Oct 19, 2020)

    Breaking changes

    • Albumentations now explicitly checks that all inputs to augmentations are named arguments and raise an exception otherwise. So if an augmentation receives input like aug(image) instead of aug(image=image), Albumentations will raise an exception. (#560)
    • Dropped support of Python 3.5 (#709)
    • Keypoints and bboxes are checked for visibility after each transform (#566)

    New transformations

    • A.FDA transform for Fourier-based domain adaptation. (#685)
    • A.HistogramMatching transform that applies histogram matching. (#708)
    • A.ColorJitter transform that behaves similarly to ColorJitter from torchvision (though there are some minor differences due to different internal logic for working with HSV colorspace in Pillow, which is used in torchvision and OpenCV, which is used in Albumentations). (#705)

    Minor changes

    • A.PadIfNeeded now accepts additional pad_width_divisor, pad_height_divisor (None by default) to ensure image has width & height that is dividable by given values. (#700)
    • Added support to apply A.CoarseDropout to masks via mask_fill_value. (#699)
    • A.GaussianBlur now supports the sigma parameter that sets standard deviation for Gaussian kernel. (#674, #673) .


    • Fixed bugs in A.HueSaturationValue for float dtype. (#696, #710)
    • Fixed incorrect rounding error on bboxes in YOLO format. (#688)
    Source code(tar.gz)
    Source code(zip)
  • 0.4.6(Jul 19, 2020)


    • Change the ImgAug dependency version from “imgaug>=0.2.5,<0.2.7” to “imgaug>=0.4.0". Now Albumentations won’t downgrade your existing ImgAug installation to the old version. PR #658.
    • Do not try to resize an image if it already has the required height and width. That eliminates the redundant call to the OpenCV function that requires additional copying of the input data. PR #639. ReplayCompose is now serializable. PR #623 by IlyaOvodov
    • Documentation fixes and updates.

    Bug Fixes

    • Fix a bug that causes some keypoints and bounding boxes to lie outside the visible part of the augmented image if an augmentation pipeline contained augmentations that increase the height and width of an image (such as PadIfNeeded). That happened because Albumentations checked which bounding boxes and keypoints lie outside the image only after applying all augmentations. Now Albumentations will check and remove keypoints and bounding boxes that lie outside the image after each augmentation. If, for some reason, you need the old behavior, pass check_each_transform=False in your KeypointParams or BboxParams. Issue #565 and PR #566.
    • Fix a bug that causes an exception when Albumentations received images with the number of color channels that are even but are not multiples of 4 (such as 6, 10, etc.). PR #638.
    • Fix the off-by-one error in applying steps for GridDistortion. Commit 9c225a99a379594098dbea2a077fd22da684ade9
    • Fix bugs that prevent serialization of ImageCompression and GaussNoise. PR #569
    • Fix a bug that causes errors with some values for label_fields in BboxParams. PR #504 by IlyaOvodov
    • Fix a bug that prevents HueSaturationValue for working with grayscale images. PR #500.
    Source code(tar.gz)
    Source code(zip)
  • 0.4.0(Oct 14, 2019)

    Table of Contents

    New transforms

    ISONoise Target: image

    This transform mimics the noise that images will have if the ISO parameter of the camera is high. Wiki

    Solarize Targets: image

    Solarize inverts all pixels above some threshold. It is an essential part of the work AutoAugment: Learning Augmentation Policies from Data.

    Equilize Target: image

    Equalizes image histogram. It is an essential part of the work AutoAugment: Learning Augmentation Policies from Data.

    Posterize Target: image

    Reduce the number of bits for each pixel. It is an essential part of the work AutoAugment: Learning Augmentation Policies from Data.


    Target: image Decrease Jpeg or WebP compression to the image.

    Downscale Target: image

    Decreases image quality by downscaling and upscaling back.

    RandomResizedCrop Targets: image, mask, bboxes, keypoints

    Crop the given Image to the random size and aspect ratio. This transform is an essential part of many image classification pipelines. Very popular for ImageNet classification.

    It has the same API as RandomResizedCrop in torchvision.

    RandomGridShuffle Targets: image, mask

    Partition an image into tiles. Shuffle them and merge back.


    Targets: image, mask, bboxes, keypoints

    Crop area with a mask if the mask is non-empty, else make a random crop.

    ToTensorV2 Targets: image, mask

    Convert image and mask to torch.Tensor

    New features

    Added YOLO format to bounding boxes.

    The Yolo format of a bounding box has a format [x, y, width, height], where values normalized to the size of the image. Ex: [0.3, 0.1, 0.05, 0.07]

    Added Deterministic / Replay mode

    Augmentations pipeline has a lot of randomnesses, which is hard to debug. We added Determentsic / Replay mode in which you can track what parameters were applied to the input and use precisely the same transform to another input if necessary.

    Jupyter notebook with an example.

    Added fill_value to the Cutout transform.

    Separated fill_value for images and masks

    One of the use cases is it to use mask_value, which is equal to the ignore_index of your loss. This will decrease the level of noise and may improve convergence.

    Speedup in the RGBShift

    3.2 times faster for uint8 images.

    Speedup in HueSaturationValue

    2 times faster for uint8 images.

    Speedup in RandomBrightnessContrast

    2.7 times faster for uint8 images.

    Speedup in RandomGamma

    4 times faster for uint8 images.

    Added support for images and masks with more than 3 channels

    Added key points support Not all spatial tranforms jave keypoints support yet. In this release we added Crop, CropNonEmptyMaskIfExists, LongestMaxSize, RandomCropNearBBox, Resize, SmallestMaxSize, and Transpose.

    Add per channel transform composition

    Bug Fixes

    • Bugfix in the GaussNoise
    • Bugfix in the RandomGamma
    • Bugfix in the RandomSizedBBoxSafeCrop

    Documentation Updated

    Added a page that lists pre-prints and papers that cite albumentations

    We are delighted that albumentations are helpful to the academic community. We extended documentation with a page that lists all papers and preprints that cite albumentations in their work. This page is automatically generated by parsing Google Scholar. At this moment, this number is 24.

    Added a page that lists competitions in which top teams used albumentations.

    We are delighted that albumentations help people to get top results in machine learning competitions at Kaggle and other platforms. We added a "Hall of Fame" where people can share their achievements. This page is manually created. We encourage people to add more information about their results with pull requests, following the contributing guide.

    People that made this release happen

    @albu @Dipet @creafz @BloodAxe @ternaus @vfdev-5 @arsenyinfo @qubvel @toshiks @Jae-Hyuck @BelBES @alekseynp @timeous @jveitchmichaelis @bfialkoff

    Source code(tar.gz)
    Source code(zip)
  • 0.3.0(Jun 26, 2019)

    Added serialization / deserialization

    • Now we can define transformations in a python dictionary, json, yaml files and they will be deserialized and used in the code.
    • Now we can define transformations in the code and serialize them in python dictionary, json and yaml files.

    Jupyter notebook with an example

    Special thanks to @creafz

    Added new transformations

    Special thanks to @vfdev-5 @ternaus @BloodAxe @kirillbobyrev

    Bugfixes and improvements

    Special thanks to @qubvel @ternaus @albu @BloodAxe

    Source code(tar.gz)
    Source code(zip)
  • 0.2.0(Mar 4, 2019)

    Added support for the keypoint transformations to

    Notebook with an example

    Special thanks to the Evegene Khvedchenya (@BloodAxe) for the work.

    Added an option to apply the same transformation to the more than one target of the same type.

    The possible use case are image2image or stereo-image pipelines.

    Notebook with an example

    Special thanks to Alexander Buslaev (@albu) for the work.

    Added new transformations

    Speed up in

    Bug fixes

    And many others.


    • Performance benchmark was extended to the Augmentor and Solt libraries.
    • Added table to Readme that shows all implemented transformations with the set of possible targets: images, bounding boxes, masks, key points. (Special thanks to Alex Parinov @creafz )
    • The library can be installed in anaconda.


    @BloodAxe @albu @creafz @ternaus @erikgaas @marcocaccin @libfun @DBusAI @alexobednikov @StrikerRUS @IlyaOvodov @ZFTurbo @Vcv85 @georgymironov @LinaShiryaeva @vfdev-5 @daisukelab @cdicle

    Source code(tar.gz)
    Source code(zip)
  • v0.1.1(Sep 26, 2018)

    Bounding boxes support

    Transformations that support bounding boxes

    The main change in this release is the addition of the operations on bounding boxes to the

    Supported formats

    Currently supported the following formats for the bounding boxes:

    1. COCO: [x_min, y_min, width, height], ex [97, 12, 150, 200]
    2. Pascal VOC: [x_min, y_min, x_max, y_max], ex [97, 12, 247, 212]

    Bounding box filtering

    It may happen that after the transformation a big part of the bounding box was cropped and it is needed to exclude such boxes.

    We support such a bounding box filtering based on the:

    • Bounding box area, measured in pixels.
    • Visible box area, measured in percent.

    Smaller changes

    • Added support for 8-bit images.
    • We changed all np.random occurrences to random due to the numpy behavior reported in
    • Multiple bugfixes.

    Added notebooks with examples

    Source code(tar.gz)
    Source code(zip)
RATCHET is a Medical Transformer for Chest X-ray Diagnosis and Reporting

RATCHET: RAdiological Text Captioning for Human Examined Thoraxes RATCHET is a Medical Transformer for Chest X-ray Diagnosis and Reporting. Based on t

26 Nov 14, 2022
PyTorch implementation for paper StARformer: Transformer with State-Action-Reward Representations.

StARformer This repository contains the PyTorch implementation for our paper titled StARformer: Transformer with State-Action-Reward Representations.

Jinghuan Shang 14 Dec 09, 2022
CNN Based Meta-Learning for Noisy Image Classification and Template Matching

CNN Based Meta-Learning for Noisy Image Classification and Template Matching Introduction This master thesis used a few-shot meta learning approach to

Kumar Manas 2 Dec 09, 2021
A foreign language learning aid using a neural network to predict probability of translating foreign words

Langy Langy is a reading-focused foreign language learning aid orientated towards young children. Reading is an activity that every child knows. It is

Shona Lowden 6 Nov 17, 2021
Spiking Neural Network for Computer Vision using SpikingJelly framework and Pytorch-Lightning

Spiking Neural Network for Computer Vision using SpikingJelly framework and Pytorch-Lightning

Sami BARCHID 2 Oct 20, 2022
use tensorflow 2.0 to tell a dog and cat from a specified picture

dog_or_cat use tensorflow 2.0 to tell a dog and cat from a specified picture This is one of the classic experiments for the introduction of deep learn

你这个代码我看不懂 1 Oct 22, 2021
Library for converting from RGB / GrayScale image to base64 and back.

Library for converting RGB / Grayscale numpy images from to base64 and back. Installation pip install -U image_to_base_64 Conversion RGB to base 64 b

Vladimir Iglovikov 16 Aug 28, 2022
Trash Sorter Extraordinaire is a software which efficiently detects the different types of waste in a pile of random trash through feeding it pictures or videos.

Trash-Sorter-Extraordinaire Trash Sorter Extraordinaire is a software which efficiently detects the different types of waste in a pile of random trash

Rameen Mahmood 1 Nov 07, 2021
[NeurIPS 2021] A weak-shot object detection approach by transferring semantic similarity and mask prior.

TransMaS This repository is the official pytorch implementation of the following paper: NIPS2021 Mixed Supervised Object Detection by TransferringMask

BCMI 49 Jul 27, 2022
[ICCV 2021] Excavating the Potential Capacity of Self-Supervised Monocular Depth Estimation

EPCDepth EPCDepth is a self-supervised monocular depth estimation model, whose supervision is coming from the other image in a stereo pair. Details ar

Rui Peng 110 Dec 23, 2022
Imaginaire - NVIDIA's Deep Imagination Team's PyTorch Library

Imaginaire Docs | License | Installation | Model Zoo Imaginaire is a pytorch library that contains optimized implementation of several image and video

NVIDIA Research Projects 3.6k Dec 29, 2022
Codes for [NeurIPS'21] You are caught stealing my winning lottery ticket! Making a lottery ticket claim its ownership.

You are caught stealing my winning lottery ticket! Making a lottery ticket claim its ownership Codes for [NeurIPS'21] You are caught stealing my winni

VITA 8 Nov 01, 2022
[ICCV 2021] Focal Frequency Loss for Image Reconstruction and Synthesis

Focal Frequency Loss - Official PyTorch Implementation This repository provides the official PyTorch implementation for the following paper: Focal Fre

Liming Jiang 460 Jan 04, 2023
AlgoVision - A Framework for Differentiable Algorithms and Algorithmic Supervision

NeurIPS 2021 Paper "Learning with Algorithmic Supervision via Continuous Relaxations"

Felix Petersen 76 Jan 01, 2023
Implementation of "Debiasing Item-to-Item Recommendations With Small Annotated Datasets" (RecSys '20)

Debiasing Item-to-Item Recommendations With Small Annotated Datasets This is the code for our RecSys '20 paper. Other materials can be found here: Ful

Microsoft 34 Aug 10, 2022
Composable transformations of Python+NumPy programs: differentiate, vectorize, JIT to GPU/TPU, and more

JAX: Autograd and XLA Quickstart | Transformations | Install guide | Neural net libraries | Change logs | Reference docs | Code search News: JAX tops

Google 21.3k Jan 01, 2023
ProjectOxford-ClientSDK - This repo has moved :house: Visit our website for the latest SDKs & Samples

This project has moved 🏠 We heard your feedback! This repo has been deprecated and each project has moved to a new home in a repo scoped by API and p

Microsoft 970 Nov 28, 2022
Semi-Supervised 3D Hand-Object Poses Estimation with Interactions in Time

Semi Hand-Object Semi-Supervised 3D Hand-Object Poses Estimation with Interactions in Time (CVPR 2021).

96 Dec 27, 2022
CurriculumNet: Weakly Supervised Learning from Large-Scale Web Images

CurriculumNet Introduction This repo contains related code and models from the ECCV 2018 CurriculumNet paper. CurriculumNet is a new training strategy

156 Jul 04, 2022
Official implementation for the paper: "Multi-label Classification with Partial Annotations using Class-aware Selective Loss"

Multi-label Classification with Partial Annotations using Class-aware Selective Loss Paper | Pretrained models Official PyTorch Implementation Emanuel

99 Dec 27, 2022