Class 2: Robustness Certification Methods
A blog post for class 2 (created by Gopal Bhattrai on 17 May, 23).
8 minutes
Here are the PPT slides for the presentation. The slides were also made by Gopal Bhattrai.
In the recent years, machine and deep learning has done significant wonders in various fields of engineering and medicine. Today many big tech firm are using these algorithm to power their products. Applications like self driving cars, object detection etc, uses these deep learning algorithms in background. But recently, it was discovered that these machine learning models can easily be fooled. So the questions we should be asking is: Is machine learning really secure? Is it worth powering several billion dollars products?
Before actually indulging ourselves into the above question. Let’s look at some use case scenario where this sense of security could be potentially dangerous. In computer vision applications, the mostly used network is called Convolutional Neural Network (CNN). These classes of algorithms are responsible for efficiently detecting the objects in an image. In core, they uses feature detection techniques using something called as kernels, and based on those feature patterns the algorithm decides what object it is. Imagine, we are building a self driving car, where these algorithms are responsible for deciding where to stop, and when to speed. In the background, they uses the deep learning algorithm to make the decision. If everything is properly setup, the model will easily recognize a stop sign, and will decide to stop the car. But recently it’s discovered that it is easy to fool these models by slightly modifying the data, with which the model will now recognize a stop symbol to be a speeding symbol or something else. Why is that the case?
Let’s look at the below figure:
The CNN model was initially given the original image of a panda and it predicted the correct label with 57.7% confidence, but later suppose an attacker add a random perturbation (here perturbation means random noise) then the model now predicts the panda as being a ‘gibbon’ with very high probability. The important point to think about is why is this happening?
To understand this let’s have a look at the below image. This is typical setting of classification. The task is to predict if the use of XSS is done or not. It’s a binary classification problem, which means we have just two classes, and we need to find a decision boundary which seperates these both classes. Such decision boundary can be seen as the dotted black line. This dotted line signifies that whatever that lies on the left side of the boundary are positive classes and the right side represents the negative classes.
Let’s focus on the red highlighted sample. Now suppose if that sample starts to move horizontally towards the decision boundary and crosses it, then the model will now do a wrong prediction. You might be wondering how is point moving? When you add some noise into the sample, it’s position will change. This is the reason why in the first case the panda was predicted as gibbon.
Now a very important question to wonder? How does the adversary finds this direction, where the classification will go wrong?
The answer lies in the data gradient. Let’s have a look at the below image.
We know that the gradient of the function with respect to parameters typically points to the direction where the function increases if we make a small change in the parameters. Therefore it is always advised to move in the opposite direction. Similarly if we calulate the data gradient i.e., gradient with respect to the dataset, it will typically point to the direction where the function increases if we make a small change in the sample. This is exactly what adversary wants. If adversary get access to this data gradient, he can find the direction where the loss function increases. The data gradient can also be used to find those regions in the image which are very important or in other words, where the activation of the neuron is highest.
Then how do we define the robustness, when adversary can always find the direction where the loss function increases? Ideally, this is like a cat and mouse game. There are no guarantees that proposed algorithm is robust to all sort of adversarial examples. Butatleast can we define a range within which we can guarantee that the adversary cannot fool our model?
This is precisely why we need certified robustness. In general, robustness is the resistance of the model to not get fooled by an adversarial example and mispredict the sample.
In this blog, we will study two different approaches: Randomized smoothing and Robust Deep Learning via Adversarially Trained Smoothed Classifiers.
Randomized Smoothing:
The idea of randomized smoothing is very simple. Let’s have a look at the following diagram
Let’s focus ourselves to the point x, which lies in a very close boundary of the blue and the green region. The chances are that if perturbation are done on this sample, it will probably end up in the green region. The Adversarial samples are usually found at the decision boundaries.The main idea is to smooth the decision boundaries by sampling from a distribution around the input. What does that mean?
In extremely simple terms if i have a sample x, and say i add a perturbation into it, due to which it ended up in the green region, I will take that point and add it to my dataset and provide it with the label blue. Doing this will force the model to predict all the points within a bound to belong in the blue class. This bound can be seen with dotted black circular region. The model will be robust if the difference between the lower bound of the most predicted class and the second best class is as high as possible. This gap shows the region within which if perturbation is done the model will be robust.
Ideally we use gaussian distribution to sample thenoise, because its symmetrical in the sense that the region of spread is uniform within the region of standard deviation. The effect of different stadard deviation can be seen in the below figure.
It’s pretty clear from the diagram that if you increase the standard deviation the region of reach increases and the curve become more flat, meaning that the noise samples will exist in these regions which will let the samples in these larger region to be predicted into the same class category. This will eventually make the model more robust but in expense of reduction in the model classification accuracy.
Let’s see how does it provide the robustness bound. Let’s have a look at the below theorem.
Firstly, no pre assumption about the function f is done, it means the function can be stocahstic or static. The certified radius R, represents the bound within which the model is robust and R becomes large when:
◦ The noise level σ is high.
◦ The probability of the top class cA is high.
◦ The probability of each other class is low.
The certified radius R goes to infinity as pA → 1 and pB → 0. This will happen if cA is everywhere. Ф-1 represents the inverse CDF of GaussianThe expression R, gives us the desired robustness bound.
In extremely simple words, randomized smoothing is just making the model robust against gaussian perturbation.
Provably Robust Deep Learning via Adversarially Trained Smoothed Classifiers:
Let’s again define robustness in terms of probabilitic distribution. In simple terms it says that the probability of the most likely class should be high even after doing perturbation. The below figure shows the mathematical version of it.
In this paper we try to adversarially train the smooth classifier. Smooth classifier is nothing but the randomized smoothing model. Hence the idea of this paper can be summarized as: Adversarial Training + Randomized Smoothing = Adversarial Training for Smooth Classifier.
The difference between the randomized smoothing approach and adversarially training the randomized smoothing lies in very small fact, and can be better explained with the following image:
The first term is the randomized smoothing aka smooth classifier. This simply takes the expectation over the noise samples from gaussian distribution, the expectation is on the losses after perturbing the sample with various noise samples. The first one never talks about the smooth classifier, it just says that the base classifier must be robust to random gaussian noise. On the other hand the second expression, first calcualtes the expected label over the pertubed samples, and then calculates the loss over that expectation. Here we are using the smooth classifier itself. This is what we will be working with.
How to calculate the gradient, we can take the second expression and calculate the gradient, authors has approximated this gradient with the follwing below expression.
The final algorithm can be simply defined as:
Finally this final model is taken and adversarial training is done using the projected gradient descent approach (PGD). The PGD update can be summarized in the figure:
Blog Credit:
• Professor Dr. Mario Fritz: Machine learning in Cybersecurity 2019 class.
• Slides by Jerry li (Microsoft Research)
• Link to the talk: https://www.youtube.com/watch?v=ZFwmyP__p_4&t=2418s