Class 5: Indiscriminate Poisoning & Backdoor Attacks
A blog post for class 5 (created by Baoshuang Zhang on 25 June,23).
6 minutes
Here are the PPT slides for the presentation. The slides were also made by Baoshuang.
First, there is an example presented to go through the basic ideas of the paper.
In this example, we know that there are two things the paper wants to discuss. The first is how to develop a defense for indiscriminate poisoning and the second is how to give a certification for these defenses.
However, before going further about these ideas, we give definitions of what is poisoning attack and what is indiscriminate poisoning.
Supposed that there is a classifier doing some task on a clean dataset, like what is shown in the below figure:
It is very easy to train the model. But, if something happened before the training, the dataset is poisoned by some attacker, and new data points are added. Then the classifier may change its behavior like below:
This time. the classifier can not do the task properly and is affected heavily by the poisoned data points, and the overall test loss increases. So, from this example, we can know that a poisoning attack is a technique used to manipulate or compromise a machine learning model by injecting malicious or misleading data during the training phase. And indiscriminate poisoning means that the attacks do not have any specific target data points and attempt to increase the overall test loss of the trained model.
Problem Setting #
After we understand what kind of attack we are facing in this paper. Let us formalize the problem to be addressed as follows:
Supposed there are \( n\) clean data points generated from some distribution \( P^* \) and a clean dataset \( \mathcal{D_c}\) is formed. We consider a binary classification task, an input \( x\)\( \in \)\( \mathcal{X} \) to an output \( y\)\( \in \)\( \mathcal{Y} \) and \( \mathcal{Y} = \lbrace -1,+1 \rbrace \). And we define the test loss as \( \mathbf{L} (\theta) = \mathbf{E} _{(x,y)\sim p^*}[\ell (\theta ;x,y)] \). The attacker adaptively choose a “poisoned” dataset \( \mathcal{D_p} \) with \(\epsilon n \) poisoned points,where \(\epsilon \in [0,1] \). These data points are added to the dataset, and the defender trains on the full dataset, which is \( \mathcal{D_c} \cup \mathcal{D_p} \) now, to produce a model and incurs test loss \( \mathbf{L} (\hat{\theta}) \) as shown in the above figure.
Data Sanitization Defenses #
From the example at the beginning, we know that if the defender does nothing to get rid of the poisoned data points, the classifier may be manipulated by the attacker. So, in this paper, the defender chooses to set up a feasible set \( \mathcal{F} \) and only trains the model on the data points within \( \mathcal{F} \). The idea is shown in the following formula: $$ \hat{\theta } \overset{def}{=} arg\min_{\theta \in \Theta }\mathbf{L} (\theta;(\mathcal{D_c} \cup \mathcal{D_p} )\cap \mathcal{F}),\space where\space \mathbf{L}(\theta;S)\overset{def}{=}\sum_{(x,y)\in S}\ell (\theta;x,y) $$
How to find the feasible set. #
Now, we can switch the key point to how to find a feasible set. The intuitive idea is that we can remove the data points that are too far away from the corresponding centroid, for example, the sphere defense, which gets rid of the data points outside a spherical centroid, and the slab defense, which first projects point onto the line between the centroids and then discards points that are too far on this line.
Based on this idea, we consider two classes of defense which are fixed defense and data-dependent defenses.
• Fixed defenses, where the feasible set \( \mathcal{F} \) does not depend on \( \mathcal{D_p} \). We can get the idea from the following figure, no matter what the distribution of poisoned data is, the centroid is fixed and the \( \mathcal{F} \) is not affected at all. These kinds of defenses are not implementable in practice, however, they provide bounds.
• Data-dependent defenses, where \( \mathcal{F} \) depends on the whole dataset \( \mathcal{D_c} \cup \mathcal{D_p} \). As the figure below shows, though the clean dataset is the same, the centroids change with different poisoned attacks, and the defense sets are different too.
Certified Defenses #
After we know how to set up the defenses, the following question may be how to measure the defenses.
Since the poisoning attacks want to mislead the classifier during the test phase through poisoning training data, we also want to know how whether the defense is good or not when it goes through tests.
The idea to answer this question is that, given a defense \( \mathcal{F} \), we would like to upper bound the worst possible test loss over any attacker. If we can tell the upper bound of the test loss, it means we give a certified defense, since there is no case worse.
Besides, for the clean dataset, the test loss and training loss are almost equivalent, we transfer the upper bound of test loss to the upper bound of training loss.
So, what is the worst-case of training?
• Set the feasible set \( \mathcal{F} \) with all poisoned points;
• Fit the classifier towards the poisoned data.
Algorithm for Certified Defenses #
Now, we give an online learning model for a fixed defense to know how to get the upper bound. The basic idea is that: in each iteration, it alternates between finding the worst attack point\( (x^{(t)},y^{(t)}) \) with respect to the current model\( \theta^{(t-1)} \) and updating the model in the direction of the attack point, producing \( \theta^{(t)} \). The candidate attack set is the set of points thus found since they have the biggest loss in each iteration.
The steps of the algorithm are shown above and we can see that, in the second step, the upper bound of the loss is also updated. It is updated following the formula below.
Now let’s talk about the data-dependent case. This case is much more complicated. In the data-dependent defenders, the problem is not convex and the closed form of an optimal solution is almost not possible. To make things easier, we think of \( \mathcal{D_p} \) as a probability distribution. What we can do is calculate the maximum expectation in all possible distributions.
Experiments #
Oracle Defenses #
First of all, there is a successful experiment, which is fixed defense. This experiment studies two image datasets: MNIST-1-7, which is a binary classification between the digits 1 and 7; and the Dogfish dataset, which is a binary classification task also.
The results of this experiment are the following:
We can see that the upper bounds of both datasets are small even if we add 30% poisoning data into the clean dataset, showing that the datasets are resilient to attack under the Oracle defense.
Text Data #
This one is also fixed defense. Although, in this experiment, the upper bound can give the certification, the loss increases fast when just 5% poisoned data is added. One of the reasons why the attack can be so successful relies on the nature of the dataset.
Data-Dependent Defenses #
This experiment is also based on the MNIST-1-7 dataset but changes the setting to data-dependent defense.
From the result, we can know that the defenses are much more vulnerable than in the first experiment. So there is more further work needed for data-dependent defenses.
Conclusion #
In this paper, we obtain a tool to defend against indiscriminate poisoning attacks and learn about the robustness of the defenses. The first thing we do is to set a feasible set and the second thing is to tell the certified defenses through an upper bound of loss. And future work includes more methods to set the feasible set and how to deal with data-dependent defenses.
Blog Credit:
- Certified Defense for Data Poisoning Attacks