Class 6: Targeted Poisoning Attacks & Certification
A blog post for class 6 (created by Gopal Bhattrai on 23 June, 23).
10 minutes
Here are the PPT slides for the presentation. The slides were also made by Gopal.
In the recent years, machine and deep learning has done significant wonders in various fields of engineering and medicine. Today many big tech firm are using these algorithm to power their products. Applications like self driving cars, object detection etc, uses these deep learning algorithms in background. But recently, it was discovered that these machine learning models can easily be fooled. So the questions we should be asking is: Is machine learning really secure? Is it worth powering several billion dollars products?
Before actually indulging ourselves into the above question. Let’s look at some use case scenario where this sense of security could be potentially dangerous. In computer vision applications, the mostly used network is called Convolutional Neural Network (CNN). These classes of algorithms are responsible for efficiently detecting the objects in an image. In core, they uses feature detection techniques using something called as kernels, and based on those feature patterns the algorithm decides what object it is. Imagine, we are building a self driving car, where these algorithms are responsible for deciding where to stop, and when to speed. In the background, they uses the deep learning algorithm to make the decision. If everything is properly setup, the model will easily recognize a stop sign, and will decide to stop the car. But recently it’s discovered that it is easy to fool these models by slightly modifying the data, with which the model will now recognize a stop symbol to be a speeding symbol or something else. Why is that the case?
Let’s look at the below figure:
The CNN model was initially given the original image of a panda and it predicted the correct label with 57.7% confidence, but later suppose an attacker add a random perturbation (here perturbation means random noise) then the model now predicts the panda as being a ‘gibbon’ with very high probability. The main reason for this, is as we add noise to our dataset this instance actually went on the wrong side of the decision boundary, such approaches are called Evasion attacks.
But today we are going to talk about a very different kind of attack which is called Data Poisoning. Unlike evasion attack where, the adversary tries to create an adversarial example during inference, data poisoning happens during training time. Here we add a poison instance so that the decision boundary of the model gets affected, therefore the test performance of the model will dramatically affected.
Let’s understand this by the below figure:
In the figure the first case represents the evasion attack, at test time we are adding noise to our sample which make the sample move to the other side of the decision boundary. The second diagram represents the poisoning, where adversary adds a sample to the dataset often wrongly labeled therefore changing the decision boundary. The last case represents the clean poisoning attack, where the poison sample is created by the adversary and taken by the victim, correctly labeled by the victim but still the adversary is successful in changing the decision boundary in his favor. Important point to remember, Poisoning is happening at the training time unlike evasion attack that happens during test time.
Poisoning happens at training time, and due to which the decision boundary is changed in the favor of adversary.
Let’s see some real world example as how data poisoning could severly hurt the model performance. Below states a real world example showing how data poisoning can make a chatbot go rogue.
Hence the problem is really dangerous and must be handeled properly. Let’s talk about the motivation behind the problem. If we focus on the below diagram, we can see that initially, when the poisonous data points weren’t injected we had a very good decision boundary. When the adversary injected the datapoints shown by Dp, we can see that the decision boundary changed drastically. One common defense is data sanatization or outlier detection. But the adversary might create poison in a way where it might not be very obvious.
Today we will talk about two papers that address this problem:
• DEEP PARTITION AGGREGATION: PROVABLE DEFENSES AGAINST GENERAL POISONING ATTACKS
• Poison Frogs! Targeted Clean-Label Poisoning Attacks on Neural Networks
Let’s start with the first paper. Before going to the actual idea about dealing with the problem, lets again formulate some important key points:
• Adversary objective is to change the test time behavior of a classifier by making minor distortion to training data.
• As classifiers are mostly trained on public data, it posses a great level of security threat.
• It can mean either: Inserting new training samples, distorting existing training samples or changing the labels of training samples.
Here we are considering a very general poisoning threat model, where the attacker can add or remove up to ß training samples. For example, if ß is 10, it means it can either remove, insert or manipulate 10 images in any order. As better explained by below figure, where adversary removed 4 images and added 6 samples. It’s the standard MNIST hand written dataset.
They use something known as Deep partition aggregation algorithm. Their approach is a certified defense against a general poisoning attacks. They first partition the data into several partitions, and then they train a single sperate classifier on each partition separately. The number of partitions affected by poisoning is at most ß, because a single poisoning can’t affect multiple partitions. Why? We will see in a bit. This can be represented by below figure, where they partitioned the dataset into several partitions, and when a change happened it happened in first and third partition.
Hence if there are K partitions, we will train K classifiers. At inference time, a test sample is classified K times. The final classification is based on the hard voting i.e. the class represented by most of the classifiers. Poisoning can affect only ß of thebase classifications, we will see why? in a while. They also show that if the gap between the number of base classifier returning the top class and number of base classifier returning the runner up class is greater than 2ß, we can guarantee that poisoning will not affect classification of test sample. This provides robustness guarantee for each test sample.
In order for their approach to work, its very important to have a robust partition scheme, robust means if one sample is changed it should not affect the partition assignment of other samples. So How should we do the partitioning?
One way is to do Naive partitioning, i.e. simply creating the partition as we go, but it has one big problem. This scheme may allow for a single insertion or deletion to cause an unbounded number of base classifier to change.
It can be seen in the below figure. We can see that as we add a sample 6, it literally changed every partition.
For finding the robust partitioning scheme, they used Use Hashing instead. Because adding or removing an instance will affect only that partition.
- Let T be the training set, h(.) be a deterministic hash function. Then the partitions P1, P2, … ,PK, are defined as: Pi = {t € T | h(t) = I(mod K)}
The hashing partition can be seen the below diagram.
This will ensure that atmost ß partition will be affected. They tested on the On MNIST, and they certify >= 50% of images to be robust to 509 poisonings. On CIFAR 10, total of 9 poisonings. To their knowledge, this is the first certified defense against general poisoning attacks.
Now let’s talk about the next paper, which is “Poison Frogs! Targeted Clean-Label Poisoning Attacks on Neural Networks”. This is a clean label poison attack. It happens at training time. The adversary tries to manipulate performance of system through constructed poison instances. The question of the interest is: Can I inject or give some data points taken by the victim and correctly labeled, am I able to change the decision boundary?. It’s important to understand that it’s targeted attack, it aims to control behavior on specific test instance(s).Its indiscriminate attack: It degrades the test accuracy. Let’s understant how this works by below figure.
First, we choose a target instance from the test set. Then we sample a base instance from the base class and construct a poison. Then poison is injected into the training data. The poison is cleanly labeled by labeling party.The model is retrained on poisoned dataset. It would be a success if target is classified as being in the base class.
Again, the key idea is to construct a posion base instance. It’s kind of a derivative of this benign example but we construct it in a strategic way.In particular if we add it into the training set it will flip the label of this target instance. The motivation is poison base instance will look as clean target instance, and therefore will be labeled as clean target instance by the victim or the annotator.
This idea they call it as “Crafting Poison Data via Feature Collisions”. Let’s understand it more properly, by below diagram.
So ideally, we want to create an adversarial example close to the benign example. It’s simply an optimization problem. The regularization part tells us that the X we are looking for stays close to the benign examples, this ensure that someone who is looking for it will give a benign label to it, and on the other hand the non regularizer part ensures that on the embedding space of the funtion, its feature is close to the target example, i.e. spam. Note, here we are assuming a white box setting. So here in the clean space we ensure that it looks similar to the benign example and at the embedding space it looks close to the target example.
The full algorithm looks as follows:
Forward step is gradient descent update to minimize distance from poison to the target instance in feature space. Backward step is proximal update that minimizes the distance from the poison to base instance in input space. Beta is tuned to make poison instance look realistic.
In the feature space the poisoned sample will be close to the target which will make the decision boundary get significantly changed.
Here we can see that the ‘x’ which is injected poison, it’s close to the target instance but labeled as benign instance as ‘x’.
They followed this approach and they showed that it works. They tried to misclassify the dog examples with fish and vice versa. As shown in below figure. Here objective is to missclassify the fish examples with dog examples, here the bottom poisoned instance that looks like a dog is inserted into the dataset, which misclassifies several above shown fishes to dogs.
Another variant of it is ‘One shot Kill attack’, i.e. adding just one poison to do misclassification.Here they consider the transfer learning scenario. A Pretrained CNN is used as feature extraction network. They froze all the lower layer weights, but the last layer (SoftMax) is retrained to adapt the network to a specific task. Add one poison instance to cause misclassification of the target.They showed 100% success rate across 1099 trials. High success rate due to more weights (2048) than examples (1801) causing overfitting on training data. Original accuracy on test set is hardly affected: 0.2% average drop in accuracy.
Uptill now we were adding just one poison samples but we can have Multiple Bases and Outliers. Now we need more than one poison to succeed. Generally ~50 poisons for 70% success rate. Using multiple poison bases causes the target instance to get pulled along with the poison instances toward the base distribution. Target data outliers, and examples with the lowest classification confidence.
The feature space for such multiple attack can be seen below, where we can see that having multiple poison will overlap to the target class, leading to misclassification.
Blog Credit:
-
Professor Dr. Mario Fritz: Machine learning in Cybersecurity 2019 class.
-
Deep Partition Aggregation: Provable Defenses against General Poisoning Attacks: