Class 3: Robust Overfitting & Mitigation Methods
A blog post for class 3 (created by Xiao Zhang on 25 May, 23).
3 minutes
Here are the PPT slides for the presentation.
We first did a recap on what we have learnt so far in the first two class meetings. Then we briefly summarize the problem task of robust learning against small adversarial perturbations. Given a data distribution \( \mathcal{D} \) and \( \epsilon>0 \) representing the perturbation strenght measured by some distance metric such \( \ell_p \)-norm , the goal of adversarially robust learning is to learn some classification model \( f \) such that \( f \) has both small standard risk and small adversarial risk. More specifically, their definitions are given as follows:
$$ \mathrm{Risk}(f) = \mathrm{Pr}_{(\mathbf{x}, y)\sim \mathcal{D}} \ [f(\mathbf{x}) \neq y], $$
$$ \mathrm{AdvRisk}(f, \epsilon) = \mathrm{Pr}_{(\mathbf{x}, y)\sim \mathcal{D}} \ [\exists \mathbf{x}’\in\mathcal{B}(\mathbf{x}, \epsilon) \text{ s.t. } f(\mathbf{x}) \neq y]. $$
There are two central research questions for adversarially robust learning: (1) How to measure the adversarial risk for any given model \( f \)? (2) How to learn a desirable robust model \( f \)? Due to the nonconvexity of neural networks, it is difficult to enumerate every possible perturbation within the perturbation set, and it also pose challenges to solve the two-party game between the attacker and the defender. Existing works propose methods to approximately answer the aforementioned two research questions. In particular, we can summarize the papers what we have read so far with respect to the two quesitons in the following figure:
It is worth noting that both attack-based robustness evaluation, used by most heuristic defenses such as adversarial training, and certification based robustness evaluation, used by certified robust learning methos such as randomized smoothing can be regarded as approximating the underlying adversarial risk. Typically, the training method of an adversarially robust classifier makes use of the corresponding robustness evaluation method.
After the summarization of existing works, we introduce and look into the intriguing robust overfitting phenomenon for adversarially trained models. The phenomeon is illustrated by the following figure extracted from the paper Overfitting in Adversarially Robust Deep Learning by Rice et al., ICML 2020.
As demonstrated in the above figure, robust test error dips immediate after the first learning rate decay at Epoch 100 but only increases beyond this point, whereas the robust train error exhibits a double decent curve. In sharp contrast, the standard test error keeps decreasing, suggesting that overfitting is not a serious issue for standard generalization. By extensively conducting experiments to analyze the robust overfiting phenomenon, the paper then advocates early stopping (with a small validation set) as the most effective method for mitigating the robust overfitting phenomenon, which boosts the test robustness of adversarially trained models from around 47% to over 53%.
Finally, we also discussed two follow-up papers related to robust overfitting: (1) Adversarial Weight Perturbation Helps Robust Generalization, Wu et al., NeurIPS 2020 and (2) Robust Overfitting Might be Mitigated by Properly Learned Smoothening, Chen et al., ICLR 2021. They borrowed insights from literature on understanding standard generalization of deep learning models, such as overconfident predictions and flatness/smoothenss of weight loss landscape to analyze the robust overfitting phenomenon of adversarially trained models. Corresponding, they proposed overfitting mitigation techinques by leveraging the gained insights, which achieves some improvements in building more robust models.