Class 8: Adversarial ML Beyond Image Classification
A blog post for class 8 (created by Baoshuang Zhang on 21 July,23).
6 minutes
Here are the PPT slides for the presentation. The slides were also made by Baoshuang.
In this section, we talked about adversarial machine learning beyond image classification. In the first paper, there is a framework to help us understand the basic idea of adversarial attacks in the NLP field. And in the second paper, We analyze an advanced ad-blocking technique and how adversarial examples can attack it from different aspects. Then we conclude that this technique is still quite vulnerable to adversarial attacks.
Adversarial Attacks in NLP #
In general, there are two kinds of adversarial perturbations in this field:
• Examples that are almost visually indistinguishable from humans;
• Examples that are indistinguishable in meaning from the original input.
For the first kind, it is easy to detect. Defenders can pass the input to a spell-checker to correct all misspelled words before feeding them into the model.
However, for the second kind of attack, it is not easy to figure out. There is also a lot of research to develop these perturbations to fool the machine learning models. We will talk about how to generate attack examples using our framework.
Framework for NLP Adversarial Attacks #
There are four components in our framework shown as follows:
Transformation #
We start from the “transformation” part. Transformation means that, given an input, generates a set of potential perturbations. But the perturbations generated are not always valid. Here is an invalid attack example:
As shown above, the perturbation has the opposite meaning of the input one instead of a similar meaning. We want to get rid of this kind of perturbations and leave all indistinguishable ones to fool the model. So we need the second component.
Constraints #
A set of constraints can help us determine if a perturbation is valid with respect to the original input. For example, we can use the cosine similarity between the input and the perturbation to check validity. Also, there are a bunch of methods useful.
Though the two components may generate “good” adversarial examples, there is another component that helps us make things easy.
Search Method #
Let us go back to the sentence above. If we set to find 10 substitute words for each word in the sentence during the “transformation” part, the potential perturbation space would be huge for this short example. So we need to select which words we want to perturb in advance using some search methods like word importance ranking and focus on one or two words in the following steps.
Goal Function #
To generate an example, every attack would have a goal function telling what are the attacks for and determining whether the attack is successful or not in terms of the model outputs. These goal functions can be untargeted classification, target classification, etc.
In general, NLP attacks can be constructed from four components:
• transformation that generates a list of potential \(x_{adv}\);
• constraint(s) that filter out “bad” \(x_{adv}\);
• goal function that tells us when we’ve successfully fooled the model;
• search method that applies the transformation until a successful \(x_{adv}\) is found.
Python Package-TextAttack #
Since there are a lot of adversarial attacks that can be concluded in this framework, the authors of this paper develop a Python package containing 16 different attack methods and also there are more than 82 pre-trained models to evaluate these attacks.
Besides that, this Python package is quite useful in many cases.
• Create new attacks as a combination of novel and pre-existing components;
• Evaluating the robustness of custom models;
• Data augmentation by using the “transformation” and “constraint” modules to expand the training dataset;
• Adversarial training through the default attacks in the package.
Conclusion for the First Paper #
In the first paper, we talked about a framework that defines an attack in four modules: a goal function, a list of constraints, a transformation, and a search method. Also, there is a corresponding Python package that concludes different attacks in this framework. This package can be used to create new attacks, expand datasets, promote the robustness of models, etc..and the framework can help us to understand what the adversarial attacks in the NLP field look like.
Perceptual Ad Blocking #
For the second paper, we first discuss the structure of perceptual ad-blocking and three concrete approaches.
Framework of Perceptual Ad-blocker #
There are two phases for the ad-blocker. During the offline stage, the classifier is trained on collected data. Besides, the classifier range from some classical ad-blocker models to large ML models as shown below.
During the online phase, there are three steps to block ads on the webpage:
• The ad-blocker optionally segments the web page into smaller chunks;
• A classifier labels each chunk as an ad or non-ad content;
• The ad-blocker acts on the underlying web page based on these predictions.
Approaches about Perceptual Ad-blocker #
• Element-based perceptual ad-blockers: ad-blockers segment pages into HTML elements that are likely to contain ad disclosures, for example, all “img” tags may be extracted. Then classifier does the classification based on the segmentation and blocks all chunks which are classified as “Ad”;
• Frame-based perceptual ad-blockers: ad-blockers would pre-segment pages into smaller frames using the HTML elements and classify on the rendered content;
• Page-based perceptual ad-blocker: this approach fully emulates visual detection of online ads from rendered web content alone. In a page-based ad-blocker, there is no page segmentation, the classifier does the task directly on the screenshot of webpages.
Attacks on Perceptual Ad-blocking #
As we state above, there are three steps to block advertisements. So the attacks can attack every component on this pipeline.
Attacks against Page Segmentation #
These attacks use standard Web techniques (e.g., HTML obfuscation). These attacks are already applied in an ongoing arms race between the publisher who owns or curates websites and ad-blockers. To escape the arms race caused by these segmentation attacks, perceptual ad-blockers can operate over rendered web content (i.e., frame or page-based approaches), which in turn increases the attack surface for adversarial examples on the ad-blockers’ visual classifier.
Attacks against Classifiers #
The authors develop 4 kinds of attacks to attack 7 classifiers and the results are shown below:
These classifiers first are trained on benign datasets and fulfill the classification very well. However, these models can be easily attacked successfully.
There are four concrete types of attacks on the seven visual classifiers:
• (C1) adversarial ad-disclosures that evade detection: perturbed “AdChoices” logo that fool all element-based classifiers;
• (C2) adversarial ads that evade detection: perturb the ads to fool frame-based ad-blockers;
• (C3) adversarial non-ad content that alters the classifier’s output on nearby ads: universal perturbation to attack page-based ad-blockers;
• (C4) adversarial honeypots: fool classifier to misclassify non-ad elements.
Conclusion for the Second Paper #
For the second paper, we talked about a unified architecture of visual ad-classification (perceptual ad-blocking); explore a variety of attacks on this ad-blocker, that enable publishers or ad networks to evade or detect ad-blocking. Also, we argue that, though the arms race around page markup obfuscation (e.g.HTML obfuscation) will be eased through some perceptual ad-blockers, a new arms race between adversarial examples and the blocking technique may be raised.
Blog Credit:
[1] https://www.youtube.com/watch?v=VpLAjOQHaLU&t=3530s
[2] Tramèr, Florian, et al. “Adversarial: Perceptual ad blocking meets adversarial machine learning.” Proceedings of the 2019 ACM SIGSAC conference on computer and communications security. 2019.
[3] Morris, John X., et al. “Textattack: A framework for adversarial attacks, data augmentation, and adversarial training in nlp.” arXiv preprint arXiv:2005.05909 (2020).