Notes on a Survey of Automated Data Augmentation Methods

Today I organized a few automated data augmentation papers I have read recently, writing down the ideas and experimental results.

AutoAugment: Searching for Augmentation Policies with Reinforcement Learning

Paper: AutoAugment: Learning Augmentation Policies from Data

The core idea of this work is to use an RNN as a policy controller, paired with reinforcement learning, to optimize the sampling probabilities for data augmentation. Concretely, the workflow is: the controller samples a set of data augmentation policies, trains a child model with this set of policies, and then feeds the accuracy the child model achieves on the validation set back to the controller as a reward, iterating the search continuously. You could call it a textbook case of the “throw a big RNN at it and wonders happen” approach.

Dataset	GPU hours	Best published results	Our results
CIFAR-10	5000	2.1	1.5
CIFAR-100	0	12.2	10.7
SVHN	1000	1.3	1.0
Stanford Cars	0	5.9	5.2
ImageNet	15000	3.9	3.5

As for the experimental results, the numbers reported on ImageNet are top-5 accuracy, while the other datasets use top-1, so be careful to distinguish them when comparing.

RandAugment: Drastically Shrinking the Search Space

Paper: RandAugment: Practical automated data augmentation with a reduced search space

AutoAugment’s search cost is too high. This work takes a more direct approach: instead of searching for the specific probability of each transformation, it uses just two global hyperparameters—the number N of augmentation operations and the transformation magnitude M—and then randomly picks N transformations from the candidate set and applies them in sequence. The search space immediately shrinks by orders of magnitude.

Judging from the experimental results, Random Augmentation can ultimately converge to very good accuracy too, with a rather small gap from AutoAugment, but at a far lower computational cost.

Method	Search space	CIFAR-10 PyramidNet	SVHN WRN	ImageNet ResNet	ImageNet E. Net-B7
Baseline	0	97.3	98.5	76.3	84.0
AA	$10^{32}$	98.5	98.9	77.6	84.4
Fast AA	$10^{32}$	98.3	98.8	77.6	-
PBA	$10^{61}$	98.5	98.9	-	-
RA (ours)	$10^{2}$	98.5	99.0	77.6	85.0

Dataset / Model	Baseline	PBA	Fast AA	AA	RA
CIFAR-10 · Wide-ResNet-28-2	94.9	-	-	95.9	95.8
CIFAR-10 · Wide-ResNet-28-10	96.1	97.4	97.3	97.4	97.3
CIFAR-10 · Shake-Shake	97.1	98.0	98.0	98.0	98.0
CIFAR-10 · PyramidNet	97.3	98.5	98.3	98.5	98.5
CIFAR-100 · Wide-ResNet-28-2	75.4	-	-	78.5	78.3
CIFAR-100 · Wide-ResNet-28-10	81.2	83.3	82.7	82.9	83.3
SVHN (core set) · Wide-ResNet-28-2	96.7	-	-	98.0	98.3
SVHN (core set) · Wide-ResNet-28-10	96.9	-	-	98.1	98.3
SVHN · Wide-ResNet-28-2	98.2	-	-	98.7	98.7
SVHN · Wide-ResNet-28-10	98.5	98.9	98.8	98.9	99.0

Fast AutoAugment: Speeding Up the Search by Merging Policies

Paper: Fast AutoAugment

The idea here is: first search out N groups of data augmentation sub-policies that each perform well, then directly merge them into one large policy set for training. Compared with AutoAugment’s end-to-end reinforcement learning search, search efficiency improves noticeably.

Algorithm 1: Fast AutoAugment
Input: (θ, D_train, K, T, B, N)

1: Split D_train into K-fold data D_train^(k) = {(D_M^(k), D_A^(k))}   // stratified shuffling
2: for k ∈ {1, ..., K} do
3:     T*^(k) ← ∅,  (D_M, D_A) ← (D_M^(k), D_A^(k))                    // initialize
4:     Train θ on D_M
5:     for t ∈ {0, ..., T-1} do
6:         B ← BayesOptim(T, L(θ | T(D_A)), B)                        // explore-and-exploit
7:         T_t ← Select top-N policies in B
8:         T*^(k) ← T*^(k) ∪ T_t                                      // merge augmentation policies
9: return T* = ⋃_k T*^(k)

Summary

The thread running through the three works is fairly clear: AutoAugment proved the feasibility of automatically searching for augmentation policies, but at an extremely high search cost; Fast AutoAugment improved on search efficiency, reducing overhead by merging multiple groups of candidate policies; RandAugment took another route entirely, compressing the search space down to the bare minimum and striking a good balance between practicality and final accuracy through random sampling plus two hyperparameters. Today I also incidentally organized the experimental results for model compression, and tried modifying ResNet-18 to adapt it to the Apollon dataset—I’ll record the detailed results later.

Paper Notes
Data Augmentation
AutoML

2020 · 02 · 29