Image Data Augmentation: From Random Crops to AutoAugment

Data augmentation is a go-to technique for fighting overfitting during training in computer vision. This post covers a few common image augmentations first, then some automated ones.

Common image augmentations

Random resized crop randomly crops the input and scales it to a target size. In PyTorch it’s torchvision.transforms.RandomResizedCrop(size, scale, ratio): size is the final size; scale is a range for the cropped area as a fraction of the original; ratio is a range for the crop’s aspect ratio.

Cutout masks out a random square region of the image (sets pixels to 0):

Cutout on CIFAR-10
Cutout on CIFAR-10: mask a random square.

Random erase is like Cutout — also masking a random region — except it uses a rectangle rather than a square:

Random erase
Random erase: mask a random rectangle.

Mixup creates “virtual” training samples by randomly blending two inputs and their labels:

mixup creates virtual samples
mixup: linearly blend two images and their labels with a random weight λ∈[0,1].

There are plenty of others too: random flips, random rotations, adjusting contrast, brightness, sharpness, and so on.

Automated augmentation

AutoAugment organizes many augmentation operations and their magnitudes into a search space, then automatically searches for the best-performing combination (each policy has two operations, each with a “probability” and a “magnitude” parameter).

AutoAugment search space
AutoAugment’s search space: combinations of augmentation operations and magnitudes.

Its search resembles NAS — an RNN samples policies and reinforcement learning updates them. Each candidate policy requires training a network to validate it, which is very expensive.

AutoAugment search flow
AutoAugment: RNN sampling + RL updates, validating each policy by training a network.

Applying this combinatorial augmentation to ImageNet:

AutoAugment results on ImageNet
Policies found by AutoAugment, applied to ImageNet.

Fast AutoAugment is much faster and won the AutoCV competition; I also used it to win the image track of the AutoDL competition. Its speedup comes mainly from splitting the dataset into parts, searching the best augmentation policy on each, then merging them. In the competition I also swapped the paper’s Bayesian optimization for random search, using AutoAugment’s results as the search space.

Fast AutoAugment workflow
Fast AutoAugment: search augmentation policies on data splits in parallel, then merge.

One last note: beyond training-time augmentation, there’s Test Time Augmentation (TTA) — augment a test sample, run inference on each variant, then ensemble the results into the final output.

References

  • DeVries, Terrance, Taylor, Graham W. Improved Regularization of Convolutional Neural Networks with Cutout. arXiv:1708.04552, 2017.
  • Zhong, Zhun, et al. Random Erasing Data Augmentation. AAAI, 2020.
  • Zhang, Hongyi, et al. mixup: Beyond Empirical Risk Minimization. arXiv:1710.09412, 2017.
  • Shorten, Connor, Khoshgoftaar, Taghi M. A Survey on Image Data Augmentation for Deep Learning. Journal of Big Data, 2019.
  • Cubuk, Ekin D., et al. AutoAugment: Learning Augmentation Policies from Data. arXiv:1805.09501, 2018.
  • Lim, Sungbin, et al. Fast AutoAugment. NeurIPS, 2019.
  • Zoph, Barret, Le, Quoc V. Neural Architecture Search with Reinforcement Learning. arXiv:1611.01578, 2016.