Notes on a Survey of Automated Data Augmentation Methods
Today I organized a few automated data augmentation papers I have read recently, writing down the ideas and experimental results.
AutoAugment: Searching for Augmentation Policies with Reinforcement Learning
Paper: AutoAugment: Learning Augmentation Policies from Data
The core idea of this work is to use an RNN as a policy controller, paired with reinforcement learning, to optimize the sampling probabilities for data augmentation. Concretely, the workflow is: the controller samples a set of data augmentation policies, trains a child model with this set of policies, and then feeds the accuracy the child model achieves on the validation set back to the controller as a reward, iterating the search continuously. You could call it a textbook case of the “throw a big RNN at it and wonders happen” approach.
| Dataset | GPU hours | Best published results | Our results |
|---|---|---|---|
| CIFAR-10 | 5000 | 2.1 | 1.5 |
| CIFAR-100 | 0 | 12.2 | 10.7 |
| SVHN | 1000 | 1.3 | 1.0 |
| Stanford Cars | 0 | 5.9 | 5.2 |
| ImageNet | 15000 | 3.9 | 3.5 |
As for the experimental results, the numbers reported on ImageNet are top-5 accuracy, while the other datasets use top-1, so be careful to distinguish them when comparing.
RandAugment: Drastically Shrinking the Search Space
Paper: RandAugment: Practical automated data augmentation with a reduced search space
AutoAugment’s search cost is too high. This work takes a more direct approach: instead of searching for the specific probability of each transformation, it uses just two global hyperparameters—the number N of augmentation operations and the transformation magnitude M—and then randomly picks N transformations from the candidate set and applies them in sequence. The search space immediately shrinks by orders of magnitude.
Judging from the experimental results, Random Augmentation can ultimately converge to very good accuracy too, with a rather small gap from AutoAugment, but at a far lower computational cost.
| Method | Search space | CIFAR-10 PyramidNet | SVHN WRN | ImageNet ResNet | ImageNet E. Net-B7 |
|---|---|---|---|---|---|
| Baseline | 0 | 97.3 | 98.5 | 76.3 | 84.0 |
| AA | 98.5 | 98.9 | 77.6 | 84.4 | |
| Fast AA | 98.3 | 98.8 | 77.6 | - | |
| PBA | 98.5 | 98.9 | - | - | |
| RA (ours) | 98.5 | 99.0 | 77.6 | 85.0 |
| Dataset / Model | Baseline | PBA | Fast AA | AA | RA |
|---|---|---|---|---|---|
| CIFAR-10 · Wide-ResNet-28-2 | 94.9 | - | - | 95.9 | 95.8 |
| CIFAR-10 · Wide-ResNet-28-10 | 96.1 | 97.4 | 97.3 | 97.4 | 97.3 |
| CIFAR-10 · Shake-Shake | 97.1 | 98.0 | 98.0 | 98.0 | 98.0 |
| CIFAR-10 · PyramidNet | 97.3 | 98.5 | 98.3 | 98.5 | 98.5 |
| CIFAR-100 · Wide-ResNet-28-2 | 75.4 | - | - | 78.5 | 78.3 |
| CIFAR-100 · Wide-ResNet-28-10 | 81.2 | 83.3 | 82.7 | 82.9 | 83.3 |
| SVHN (core set) · Wide-ResNet-28-2 | 96.7 | - | - | 98.0 | 98.3 |
| SVHN (core set) · Wide-ResNet-28-10 | 96.9 | - | - | 98.1 | 98.3 |
| SVHN · Wide-ResNet-28-2 | 98.2 | - | - | 98.7 | 98.7 |
| SVHN · Wide-ResNet-28-10 | 98.5 | 98.9 | 98.8 | 98.9 | 99.0 |
Fast AutoAugment: Speeding Up the Search by Merging Policies
Paper: Fast AutoAugment
The idea here is: first search out N groups of data augmentation sub-policies that each perform well, then directly merge them into one large policy set for training. Compared with AutoAugment’s end-to-end reinforcement learning search, search efficiency improves noticeably.
Algorithm 1: Fast AutoAugment
Input: (θ, D_train, K, T, B, N)
1: Split D_train into K-fold data D_train^(k) = {(D_M^(k), D_A^(k))} // stratified shuffling
2: for k ∈ {1, ..., K} do
3: T*^(k) ← ∅, (D_M, D_A) ← (D_M^(k), D_A^(k)) // initialize
4: Train θ on D_M
5: for t ∈ {0, ..., T-1} do
6: B ← BayesOptim(T, L(θ | T(D_A)), B) // explore-and-exploit
7: T_t ← Select top-N policies in B
8: T*^(k) ← T*^(k) ∪ T_t // merge augmentation policies
9: return T* = ⋃_k T*^(k)
Summary
The thread running through the three works is fairly clear: AutoAugment proved the feasibility of automatically searching for augmentation policies, but at an extremely high search cost; Fast AutoAugment improved on search efficiency, reducing overhead by merging multiple groups of candidate policies; RandAugment took another route entirely, compressing the search space down to the bare minimum and striking a good balance between practicality and final accuracy through random sampling plus two hyperparameters. Today I also incidentally organized the experimental results for model compression, and tried modifying ResNet-18 to adapt it to the Apollon dataset—I’ll record the detailed results later.