简体中文 | English
This part mainly explains data loading, preprocessing, formatting, and data augmentation. The entire process is executed sequentially in a pipeline manner, and each step of the operation is allowed to receive a dict table and output a dict table for the next conversion. Until all the pipelines are executed, the final dict table is returned.
The configuration items of YOLOv4 related to the pipeline are given below.
img_scale = 608
train_pipeline = [
dict(
type='Mosaic',
img_scale=img_scale,
transforms=[
dict(type='LoadImageFromFile',to_rgb=True),
dict(type='Resize',img_scale=img_scale,letterbox=False),
]),
dict(type='RandomAffine',degrees=0, translate=0, scale=.5, shear=0.0),
dict(type='RandomHSV'),
dict(type='RandomFlip'),
dict(type='Normalize'),
dict(type='ImageToTensor'),
dict(type='Collect'),
]
test_pipeline = [
dict(type='LoadImageFromFile',to_rgb=True),
dict(type='Resize',img_scale=img_scale,letterbox=True, auto=True,scaleup=True),
dict(type='Normalize'),
dict(type='ImageToTensor'),
dict(type='Collect',keys=['img']),
]
val_pipeline = [
dict(type='LoadImageFromFile',to_rgb=True),
dict(type='Resize',img_scale=img_scale,letterbox=True, auto=False,scaleup=False),
dict(type='Normalize'),
dict(type='ImageToTensor'),
dict(type='Collect'),
]
For each operation, list all related dict fields that can be added/updated/deleted.
Img
: When the data is loaded, it is the numpy 3D data of the original image RGB, and subsequent enhancements, formatting, etc. are the processed image numpy 3D dataOri_shape
: the size of the original picture [h,w,d]Img_shape
: The size of the converted picture [h,w,d]gt_bboxes
: Truth box [x1y1x2y2], when the data is loaded, the data will be normalized, the range is [0,1], the normalization method is: x1 = x1/w, y1 = y1/h, x2 = x2/w, y2 = y2/hgt_class
: index of categorygt_score
: score, the default is 1, if the MixUp method is used, the gt_score value may not be 1
LoadImageFromFile
- Added: img, img_shape, ori_shape,gt_bboxes, gt_class, gt_score
- Description: It will load basic information such as pictures and tags from the configuration file
Mosaic
- Update: img, img_shape,gt_bboxes, gt_class
- Description: The mosaic operation will randomly stitch four pictures into one picture
Resize
- Update: img, img_shape,gt_bboxes, gt_class
letterbox
: If it is False, it means proportional zooming. If it is True, the image is zoomed proportionally, the image is centered, and the two insufficient sides are filled with gray (114).Auto
: This parameter is valid when letterbox=True. For the insufficient filling size of the two sides, auto is True and fills in the minimum multiple (32), and False fills in img_scale.- Description: Resize, zoom pictures in equal proportions.
RandomAffine
- Update: img, img_shape,gt_bboxes, gt_class
degrees
: rotation angle.translate
: Translation range.scale
: zoom ratio.shear
: Cut size.- Description: RandomAffine random radiation transformation, mainly including rotation, scaling, shearing and translation.
RandomHSV
- Update: img
- Description: RandomHSV random color jitter, including hue, saturation and exposure.
RandomFlip
- Update: img,gt_bboxes
- Description: RandomFlip flips randomly.
Normalize
- Update: img
- Description: Normalize the picture is normalized, the range is [0,1]. The color value of each channel of RGB/255.
ImageToTensor
- Update: img
- Description: ImageToTensor converts the picture into a pytorch Tensor tensor, and changes it to the shape required by the model input Input [d,h,w]
Collect
- Add: img_meta (
meta_keys
key is specified bymeta_keys
parameter) - Remove: Remove all
keys
except the specifiedkeys