当前位置：网站首页>CV学习笔记【1】：transforms

CV学习笔记【1】：transforms

2022-07-17 05:10:00 【zzzyzh】

文章目录

前言
1. 导入库
2. 裁剪
2. 翻转和旋转
3. 图像变换
4. 对transforms操作
总结

前言

图像变换往往是CV的第一步，合适的图像大小才能传入网络中进行训练以获得合适的结果
本文主要对TorchVision文档中包含的与transforms有关的方法进行分类以及解释，其包含的方法在裁剪板块给出，后续板块的方法的具体用法可参考裁剪板块

1. 导入库

from torchvision import transforms
import PIL.Image as Image
import torch

image = Image.open("cat.png")
image.size, image.format, image.mode

2. 裁剪

transforms.CenterCrop(size)
中心裁剪

size (sequence or int)
裁剪的预期输出尺寸
- 如果size是一个int，会产生一个方形的裁剪（size, size）
- 如果size是一个sequence，会产生对应大小的裁剪 (height, weight)

center_crop = transforms.CenterCrop(40)
image_crop = center_crop(image)
image_crop.size

transforms.RandomCrop(size，padding = None，pad_if_needed = False，fill = 0，padding_mode ='constant')
随机裁剪

size (sequence or int)
裁剪的预期输出尺寸
- 如果size是一个int，会产生一个方形的裁剪（size, size）
- 如果size是一个sequence，会产生对应大小的裁剪 (height, weight)
padding (sequence or int)
参数代表在输入特征矩阵四周补值的情况，默认为0
- int类型：在上下左右各补对应的输入的值的列的像素
- tuple类型：分别设置上下和左右方向上的补充的像素的列，(x, y)
pad_if_needed (boolean)
如果图像小于所需大小，它将填充图像，以避免引发异常。
fill
恒定填充的像素填充值,默认值为0。如果是长度为3的元组，则分别用于填充R，G，B通道。
仅当padding_mode为zeros时才使用此值。
padding_mode
补值的类型，默认为’zeros’
- ‘zeros’：常量填充
- ‘reflect’：镜像填充，即以矩阵中的某个行或列为轴，中心对称的padding到最外围。
- ‘replicate’：重复填充，即直接使用边缘的像素值来填充
- ‘circular’：循环填充，即从上到下进行无限的重复延伸

random_crop = transforms.RandomCrop(100, pad_if_needed=True)
image_crop = random_crop(image)
image_crop.size

# 添加torch的随机种子，以保证图像和标签的随机剪裁区域一致
seed = torch.random.seed()
torch.random.manual_seed(seed)
random_crop = transforms.RandomCrop(300)
image_crop = random_crop(image)
image_crop.size

transforms.RandomResizedCrop(size, scale=(0.08, 1.0), ratio=(0.75, 1.3333333333333333), interpolation=2)
随机长宽比裁剪

size (sequence or int)
裁剪的预期输出尺寸
- 如果size是一个int，会产生一个方形的裁剪（size, size）
- 如果size是一个sequence，会产生对应大小的裁剪 (height, weight)
scale (float)
裁剪的原始尺寸的大小范围
默认值：是原始图像的0.08到1.0倍
ratio (float)
裁剪的原始宽高比的宽高比范围
默认值：3/4到4/3倍
interpolation (InterpolationMode)
插值
- 默认值为：InterpolationMode.BILINEAR
- InterpolationMode.NEAREST, InterpolationMode.BILINEAR and InterpolationMode.BICUBIC

random_size_crop = transforms.RandomResizedCrop(100)
image_crop = random_size_crop(image)
image_crop.size

transforms.FiveCrop(size)
图片的四个角和中心各截取一幅大小为 size 的图片

size (sequence or int)
裁剪的预期输出尺寸
- 如果size是一个int，会产生一个方形的裁剪（size, size）
- 如果size是一个sequence，会产生对应大小的裁剪 (height, weight)

five_crop = transforms.FiveCrop(40)
image_crop = five_crop(image)
image_crop

transforms.TenCrop(size, vertical_flip=False)
将给定的PIL图像裁剪为四个角，中央裁剪加上这些的翻转版本（默认使用水平翻转）。
此转换返回图像元组，并且数据集返回的输入和目标数量可能不匹配。

size (sequence or int)
裁剪的预期输出尺寸
- 如果size是一个int，会产生一个方形的裁剪（size, size）
- 如果size是一个sequence，会产生对应大小的裁剪 (height, weight)
vertical_flip（bool）
水平翻转 / 垂直翻转

ten_crop = transforms.TenCrop(40)
image_crop = ten_crop(image)
image_crop

2. 翻转和旋转

transforms.RandomHorizontalFlip(p=0.5)
以给定的概率随机水平翻转给定的PIL图像

p (float)
概率，默认值为0.5
即翻转的图片的部分在原图片中的占比

transforms.RandomVerticalFlip(p=0.5)
以给定的概率随机竖直翻转给定的PIL图像

p (float)
概率，默认值为0.5
即翻转的图片的部分在原图片中的占比

transforms.RandomRotation(degrees, resample=False, expand=False, center=None)
按角度随机旋转图像

degrees（sequence or float or int）
要选择的度数范围
- 如果degrees是一个int，度数范围将是 (-degrees，+ degrees)
- 如果degrees是一个像 (min，max) 这样的序列，度数范围将是(min，max)。
resample ({PIL.Image.NEAREST ，PIL.Image.BILINEAR ，PIL.Image.BICUBIC})
可选的重采样过滤器
expand (bool，optional)
可选的扩展标志。
- 如果为true，则展开输出以使其足够大以容纳整个旋转图像
- 如果为false或省略，则使输出图像与输入图像的大小相同。
- 请注意，展开标志假定围绕中心旋转而不进行平移。
center (2-tuple ，optional)
可选的旋转中心。
- 原点是左上角。默认值是图像的中心。

3. 图像变换

transforms.Resize(size, interpolation=2)
将输入PIL图像的大小调整为给定大小

size (sequence or int)
裁剪的预期输出尺寸
- 如果size是一个int，会产生一个方形的裁剪（size, size）
- 如果size是一个sequence，会产生对应大小的裁剪 (height, weight)
interpolation (InterpolationMode)
插值
- 默认值为：InterpolationMode.BILINEAR
- InterpolationMode.NEAREST, InterpolationMode.BILINEAR and InterpolationMode.BICUBIC

transforms.Normalize(mean, std)
用平均值和标准偏差正则化张量图像

mean（sequence）
每个通道的均值序列。
std（sequence）
每个通道的标准偏差序列。

transforms.ToTensor
将PIL Image或者numpy.ndarray转换为tensor

如果PIL图像属于其中一种模式(L, LA, P, I, F, RGB, YCbCr, RGBA, CMYK, 1)
或者ndarray的dtype = np.uint8
则将PIL图像或numpy.ndarray(H x W x C)在[0, 255]范围内转换为形状(C x H x W)的Torch.FloatTensor

# 实例化一个ToTensor对象
to_tensor = transforms.ToTensor()
image_crop = to_tensor(image)
image_crop

transforms.Pad(padding, fill=0, padding_mode='constant')

padding (sequence or int)
参数代表在输入特征矩阵四周补值的情况，默认为0
- int类型：在上下左右各补对应的输入的值的列的像素
- tuple类型：分别设置上下和左右方向上的补充的像素的列，(x, y)
fill (int)
恒定填充的像素填充值,默认值为0。如果是长度为3的元组，则分别用于填充R，G，B通道。
仅当padding_mode为constant时才使用此值。
padding_mode (str)
补值的类型，默认为’constant’
- ‘constant’：常量填充
- ‘reflect’：镜像填充，即以矩阵中的某个行或列为轴，中心对称的padding到最外围。
- ‘replicate’：重复填充，即直接使用边缘的像素值来填充
- ‘circular’：循环填充，即从上到下进行无限的重复延伸

transforms.ColorJitter(brightness=0, contrast=0, saturation=0, hue=0)
随机更改图像的亮度，对比度和饱和度

brightness / 亮度 (float or tuple(min, max))
- 两种范围
  - [max(0,1-brightness), 1+brightness]
  - [min，max]
- 从上面两种范围均匀地选择brightness_factor
contrast / 对比度 (float or tuple(min, max))
- 两种范围
  - [max(0,1-contrast), 1+contrast]
  - [min，max]
- 从上面两种范围均匀地选择contrast_factor
saturation / 饱和度 (float or tuple(min, max))
- 两种范围
  - [max(0,1-saturation), 1+saturation]
  - [min，max]
- 从上面两种范围均匀地选择saturation_factor
hue / 色调 (float or tuple(min, max))
- 两种范围
  - [max(0,1-hue), 1+hue]
  - [min，max]
- 从上面两种范围均匀地选择hue_factor

transforms.Grayscale(num_output_channels=1)
将图像转换为灰度

num_output_channels (int)
- If num_output_channels == 1 : returned image is single channel
- If num_output_channels == 3 : returned image is 3 channel with r == g == b

transforms.RandomGrayscale(p=0.1)
根据概率p随机将图像转换为灰度图

p (float)
- 图像被转换为灰度的概率
输出的维度由输入的维度决定
- If input image is 1 channel: grayscale version is 1 channel
- If input image is 3 channel: grayscale version is 3 channel with r == g == b

transforms.LinearTransformation(transformation_matrix)
用一个方形的变换矩阵和一个离线计算的平均矢量来变换张量图像。这种转换不支持PIL图像。给定transformation_matrix和mean_vector，将把torch.*Tensor放平，并从其中减去mean_vector，然后计算与transformation矩阵的点乘，再把tensor重塑为原来的形状。

transformation_matrix (Tensor) – tensor [D x D], D = C x H x W
mean_vector (Tensor) – tensor [D], D = C x H x W

transforms.RandomAffine(degrees, translate=None, scale=None, shear=None, interpolation=<InterpolationMode.NEAREST: 'nearest'>, fill=0, fillcolor=None, resample=None, center=None)
图像保持中心不变，进行随机的随机仿射变换

degrees（sequence or float or int）
要选择的度数范围
- 如果degrees是一个int，度数范围将是 (-degrees，+ degrees)
- 如果degrees是一个像 (min，max) 这样的序列，度数范围将是(min，max)。
translate (tuple, optional)
水平和垂直平移的最大绝对分数的元组
- 在-img_width * a < dx < img_width * a的范围内随机采样水平位移
- 在-img_height * b < dy < img_height * b的范围内随机采样竖直位移
scale (tuple)
裁剪的原始尺寸的大小范围
shear (sequence or int or float, optional)
可选择的度数范围
- int or float，在x轴上取(-shear, +shear)
- sequence，
  - 2个值，在x轴上取(min, max)
  - 4个值，在x轴上取(min_x, max_x)，在y轴上取(min_y, max_y)
interpolation (InterpolationMode)
插值（已不使用）
- 默认值为：InterpolationMode.BILINEAR
- InterpolationMode.NEAREST, InterpolationMode.BILINEAR and InterpolationMode.BICUBIC
fill (int)
恒定填充的像素填充值,默认值为0。如果是长度为3的元组，则分别用于填充R，G，B通道。
fillcolor (sequence)
输出图像中变换外部区域的可选填充颜色 (R, G, B)
resample ({PIL.Image.NEAREST ，PIL.Image.BILINEAR ，PIL.Image.BICUBIC})
可选的重采样过滤器
center (2-tuple ，optional)
可选的旋转中心。
- 原点是左上角。默认值是图像的中心。

transforms.ToPILImage(mode=None)
将张量或ndarray转换成PIL图像

将形状为C x H x W的Torch.*Tensor或形状为H x W x C的numpy ndarray转换为PIL图像，同时保留数值范围。

transforms.Lambda(lambd)
应用一个用户定义的lambda作为一个转换

lambd (function)
- Lambda/function to be used for transform.

4. 对transforms操作

transforms.Compose(transforms)
将几个变换组合在一起

transforms (list of Transform objects)

transforms.Compose([
    transforms.CenterCrop(10),
    transforms.PILToTensor(),
    transforms.ConvertImageDtype(torch.float),
])

transforms.RandomChoice(transforms)
应用从列表中随机抽取的单个转换

transforms.RandomApply(transforms, p=0.5)
以给定的概率随机地应用一个变换列表

为了编写转换脚本，请使用torch.nn.ModuleList作为输入
transforms (list of Transform objects)
p (float)
概率

transforms.RandomOrder(transforms)
以随机的顺序应用一系列的变换。

transforms (list of Transform objects)
以随机的顺序应用

总结

参考自 PyTorch 学习笔记（三）：transforms的二十二个方法

原网站

版权声明
本文为[zzzyzh]所创，转载请带上原文链接，感谢
https://blog.csdn.net/HoraceYan/article/details/125744166