Machine learning models are brittle, and small changes in the training data can result in different predictions. We study the problem of proving that a prediction is robust to data poisoning, where an attacker can inject a number of malicious elements into the training set to influence the learned model. We target decision tree models, a popular and simple class of machine learning models that underlies many complex learning techniques. We present a sound verification technique based on abstract interpretation and implement it in a tool called Antidote. Antidote abstractly trains decision trees for an intractably large space of possible poisoned datasets. Due to the soundness of our abstraction, Antidote can produce proofs that, for a given input, the corresponding prediction would not have changed had the training set been tampered with or not. We demonstrate the effectiveness of Antidote on a number of popular datasets.
Artificial intelligence, in the form of machine learning (ML), is rapidly transforming the world as we know it. Today, ML is responsible for an ever-growing spectrum of sensitive decisions—from loan decisions, to diagnosing diseases, to autonomous driving. Many recent works have shown how ML models are brittle,2,5,19,20,22 and with ML spreading across many industries, the issue of robustness in ML models has taken center stage. The research field that deals with studying robustness of ML models is referred to as adversarial machine learning. In this field, researchers have proposed many definitions that try to capture robustness to different adversaries. The majority of these works have focused on verifying or improving the model's robustness to test-time attacks,8,9,21 where an adversary can craft small perturbations to input examples that fool the ML model into changing its prediction, for example, making the model think a picture of a cat is that of a zebra.4
Data-poisoning robustness. This paper focuses on verifying data-poisoning robustness, which captures how robust a training algorithm L is to variations in a given training set T. Intuitively, applying L to the training set T results in a classifier (model) M, and in this paper, we are interested in how the trained model varies when producing perturbations of the input training set T.
The idea is that an adversary can produce slight modifications of the training set, for example, by supplying a small amount of malicious training points, to influence the produced model and its predictions. This attack model is possible when the data is curated, for example, via crowdsourcing or from online repositories, where attackers can try to add malicious elements to the training data. For instance, one might add malicious training points to affect a malware detection model,23 or a small number of images to bypass a facial recognition model.5
Verification challenges. Data-poisoning robustness has been studied extensively.2,13,15,24,25 While some defenses have been proposed against specific attacks10,19—that is, modifications to training sets—we are not aware of any technique that can formally verify that a given learning algorithm is robust to perturbations to a given training set. Verifying data-poisoning robustness of a given learner requires solving a number of challenges. First, the datasets over which the learner operates can have thousands of elements, making the number of modified training sets we need to consider intractably large to represent and explore explicitly. Second, because learners are complicated programs that employ complex metrics (e.g., entropy and loss functions), their verification requires new techniques.
Our approach. We present a new abstraction-interpretation-based approach on the problem of verifying data-poisoning robustness for decision tree learners. We choose decision trees because they are simple, widely used, interpretable models. We summarize our contributions as follows:
In this section, we give an overview of decision tree learning, the poisoning robustness problem, and motivate our abstraction-based proof technique (Figure 1).
Figure 1. High-level overview of our approach.
Decision tree learning. Consider the dataset Tbw at the top of Figure 2. It is comprised of 13 elements with a single numerical feature. Each element is labeled as a white (empty) or black (solid) circle. We use x to denote the feature value of each element. Our goal is to construct a decision tree that classifies a given number into white or black.
Figure 2. Illustrative example.
For simplicity, we assume that we can only build trees of depth 1, like the one shown at the bottom Figure 2. At each step of building a decision tree, the learning algorithm is looking for a predicate ϕ with the best score, with the goal of splitting the dataset into two pieces with least diversity; that is, most elements have the same class (formally defined usually using a notion of entropy). This is what we see in our example: Using the predicate x ≤ 10, we split the data-set into two sets, one that is mostly white (left) and one that is completely black (right). This is the best split we can have for our data, assuming we can only pick predicates of the form x ≤ c, for an integer c. Note that, while the set of predicates x ≤ c is infinite, for this dataset (and in general for any dataset), there exists only finitely many inequivalent predicates—for example, x ≤ 4 and x ≤ 5 split the dataset into the same two sets.
Given a new element for a classification, we check if it is ≤ 10, in which case we say it is white with probability 7/9—that is, the fraction of white elements such that ≤ 10. Otherwise, if the element is > 10, we say it is black with probability 1.
Data-poisoning robustness. Imagine we want to classify an input x but want to make sure the classification would not have changed had the training data been slightly different. For example, maybe some percentage of the data was maliciously added by an attacker to sway the learning algorithm, a problem known as data poisoning. Our goal is to check whether the classification of x is robust to data poisoning.
A naïve approach. Consider our running example and imagine we want to classify the number 5. Additionally, we want to prove that removing up to two elements from the training set would not change the classification of 5—that is, we assume that up to ~15% (or 2/13) of the dataset is contributed maliciously. The naïve way to do this is to consider every possible training dataset with up to two elements removed and retrain the decision tree. If all trees classify the input 5 as white, the classification is robust to this level of poisoning.
Unfortunately, this approach is intractable. Even for our tiny example, we have to train 92 trees . For a dataset of 1,000 elements and a poisoning of up to 10 elements, we have ~1023 possibilities.
An abstract approach. Our approach to efficiently proving poisoning robustness exploits a number of insights. First, we can perform decision tree learning abstractly on a symbolic set of training sets, without having to deal with a combinatorial explosion. The idea is that the operations in decision tree learning, for example, selecting a predicate and splitting the dataset, do not need to look at every concrete element of a dataset, but at aggregate statistics (counts).
Recall our running example in Figure 2. Let us say that up to two elements have been removed. No matter what two elements you choose, the predicate x ≤ 10 remains one that gives a best split for the dataset. In cases of ties between predicates, our algorithm abstractly represents all possible splits. For each predicate, we can symbolically compute best- and worst-case scores in the presence of poisoning as an interval. Similarly, we can also compute an interval that overapproximates the set of possible classification probabilities. For instance, in the left branch of the decision tree, the probability will be [0.71, 1] instead of 0.78 (or 7/9). The best-case probability of 1 is when we drop the black points 0 and 4; the worst-case probability of 0.71 (or 5/7) is when we drop any two white points.
The next insight that enables our approach is that we do not need to explicitly build the tree. Since our goal is to prove robustness of a single input point, which effectively takes a single trace through the tree, we mainly need to keep track of the abstract training sets as they propagate along those traces. This insight drastically simplifies our approach; otherwise, we would need to somehow abstractly represent sets of elements of a tree data structure, a nontrivial problem in program analysis.
Abstraction and imprecision. We note that our approach is sound but necessarily incomplete; that is, when our approach returns "robust" the answer is correct, but there are robust instances for which our approach will not be able to prove robustness. There are numerous sources of imprecision due to overapproximation, for example, we use the interval domain to capture real-valued entropy calculations of different training set splits, as well as the final probability of classification.
An involved example. To further illustrate our technique, we preview one of our experiments. We applied our approach to the MNIST-1-7 dataset, which has been used to study data poisoning for deep neural networks19 and support vector machines.2 In our experiments, we checked whether Antidote could prove data poisoning for the inputs used in the same dataset when training a decision tree. For example, when applying Antidote to the image of the digit in Figure 3, Antidote proves that it is poisoning robust (always classified as a seven) for up to 192 poisoned elements in 90 seconds. This is equivalent to training on ~10432 datasets!
Figure 3. Example MNIST-1-7 digit that is proven poisoning-robust by Antidote.
In this section, we begin by formally defining the data-poisoning-robustness problem. Then, we present a trace-based view of decision tree learning, which will pave the way for a poisoning robustness proof technique.
3.1. The poisoning robustness problem
In a typical supervised learning setting, we are given a learning algorithm L and a training set T ⊆ X · Y comprised of elements of some set X, each with its classification label from a finite set of classes Y. Applying L to T results in a classifier (or model): M : X → Y. For now, we assume that both the learning algorithm L and the models it learns are deterministic functions.
A perturbed set Δ(T) ∪ 2X x Y defines a set of possible neighboring datasets of T. Our robustness definitions are relative to some given perturbation Δ. (In Section 4.1, we define a specific perturbed set that captures a particular form of data poisoning.)
DEFINITION 3.1 (POISONING ROBUSTNESS). Fix a learning algorithm L, a training set T, and let Δ(T) be a perturbed set. Given an element x ∈ X, we say that x is robust to poisoning T if and only if
When T and Δ are clear from context, we will simply say that x is robust.
In other words, no matter what dataset T′ ∈ Δ(T) we use to construct the model M = L(T′), we want M to always return the same classification for x. returned for x by the model L(T) learned on the original training set T.
EXAMPLE 3.1. Imagine we suspect that an attacker has contributed 10 training points to T, but we do not know which ones. We can define Δ(T) to be T as well as every subset of T of size |T| – 10. If an input x is robust for this definition of Δ(T), then no matter whether the attacker has contributed 10 training items or not, the classification of x does not change.
3.2. Decision trees: a trace-based view
We will formalize a tree as the set of traces from the root to each of the leaves. As we will see, this trace-based view will help enable our proof technique.
A decision tree R is a finite set of traces, where each trace is a tuple (σ, y) such that σ is a sequence of Boolean predicates and y ∈ Y is the classification.
Semantically, a tree R is a function in X → Y. Given an input x ∈ X, applying R(x) results in a classification y from the trace (σ, y) ∈ R where x satisfies all the predicates in the sequence σ = [ϕ1, …, ϕn], that is, is true. We say a tree R is well-formed if for every x ∈ X, there exists exactly one trace (σ, y) ∈ R such that x satisfies all predicates in σ. In the following, we assume all trees are well-formed.
EXAMPLE 3.2 (DECISION TREE TRACES). Consider the decision tree in Figure 2. It contains two traces, each with a sequence of predicates containing a single predicate: ([x ≤ 10], white) and ([x > 10], black).
3.3. Tree learning: a trace-based view
We now present a simple decision tree learning algorithm, Trace. Then, in Section 4, we abstractly interpret Trace with the goal of proving poisoning robustness.
One of our key insights is that we do not need to explicitly represent the learned trees (i.e., the set of all traces), since our goal is to prove robustness of a single input point, which effectively takes a single trace through the tree. Therefore, in this section, we will define a trace-based decision tree learning algorithm. This is inspired by standard algorithms—such as CART,3 ID3,16 and C4.517—but it is input-directed, in the sense that it only builds the trace of the tree that a given input x will actually traverse.
A trace-based learner. Our trace-based learner Trace is shown in Figure 4. It takes a training set T and an input x and computes the trace traversed by x in the tree learned on T. Intuitively, if we compute the set of all traces Trace(T, x) for each x ∈ T, we get the full tree, the one that we would have traditionally learned for T.
Figure 4. Trace-based decision tree learner Trace.
The learner Trace repeats two core operations: (i) selecting a predicate ϕ with which to split the dataset (using bestSplit) and (ii) removing elements of the training set based on whether they satisfy the predicate ϕ (depending on x, using filter). The number of times the loop is repeated (d) is the maximum depth of the trace that is constructed. Throughout, we assume a fixed set of classes Y = {1, …, k}.
The mutable state of Trace is the triple (T, ϕ, σ): (i) T is the training set, which will keep getting refined (by dropping elements) as the trace is constructed. (ii) ϕ is the most recent predicate along the trace, which is initially undefined (denoted by ◇). (iii) σ is the sequence of predicates along the trace, which is initially empty.
Predicate selection. We assume that Trace is equipped with a finite set of predicates Φ with which it can construct a decision tree classifier; each predicate in Φ is a Boolean function in X → B.
bestSplit(T) computes a predicate ϕ* ∈ Φ that splits the current dataset T—usually minimizing a notion of entropy. Ideally, the learning algorithm would consider every possible sequence of predicates to partition a dataset in order to arrive at an optimal classifier. For efficiency, a decision tree-learning algorithm does this greedily: It selects the best predicate it can find for a single split and moves on to the next split. To perform this greedy choice, it measures how diverse the two datasets resulting from the split are. We formalize this below:
We use T↓ϕ to denote the subset of T that satisfies ϕ, that is,
Let Φ′ be the set of all predicates that do not trivially split the dataset: Φ′ = {ϕ ∈ Φ | T↓ϕ ≠ ∅ ∧ T↓ϕ ≠ T}. Finally, bestSplit(T) is defined as follows:
where score(T, ϕ) = |T↓ϕ|·ent(T↓ϕ)+|T↓¬ϕ|·ent(T↓¬ϕ). Informally, bestSplit(T) is the predicate that splits T into two sets with the lowest entropy, as defined by the function ent in Figure 5. Formally, ent computes Gini impurity, which is used, for instance, in the CART algorithm.3 Note that if Φ′ = ∅, we assume bestSplit(T) is undefined (returns ◇). If multiple predicates are possible values of bestSplit(T), we assume one is returned nondeterministically. Later, in Section 4, our abstract interpretation of Trace will actually capture all possible predicates in the case of a tie.
Figure 5. Auxiliary operator definitions. ent is Gini impurity; cprob returns a vector of classification probabilities, one element for each class i ∈ [1, k].
EXAMPLE 3.3. Recall our example from Section 2 and Figure 2. For readability, we use T instead of Tbw for the dataset. Let us compute score(T, ϕ), where ϕ is x ≤ 10. We have |T↓ϕ| = 9 and |T↓¬ϕ| = 4. For the classification probabilities, defined by cprob (Figure 5), we have cprob(T↓ϕ) = 〈7/9, 2/9〉 and cprob(T↓¬ϕ) = 〈0, 1〉 assuming the first element represents white classification; for example, in T↓ϕ, there's a 7/9 chance of being classified as white. For ent, we have ent(T↓ϕ) ≈ 0.35 and ent(T↓¬ϕ) = 0. Since T↓ϕ only comprises black points, its Gini impurity is 0.
The score of x ≤ 10 is therefore ~3.1. For the predicate x ≤ 11, we get the higher (worse) score of ~3.2, as it generates a more diverse split.
Filtering the dataset. The operator filter removes elements of T that evaluate differently than x on ϕ. Formally,
Learner result. When Trace terminates in a state (Tr, ϕr, σr), we can read the classification of x as the class i with the highest number of training elements in Tr.
Using cprob, in Figure 5, we compute the probability of each class i for a training set T as a vector of probabilities. Trace returns the class with the highest probability:
As before, in case of a tie in probabilities, we assume a nondeterministic choice.
EXAMPLE 3.4. Following the computation from Ex. 3.3, Trace(T, 18) terminates in state (T↓x>10, x ≤ 10, [x > 10]). Point 18 is associated with the trace [x > 10] and is classified as black because cprob(T↓x>10) = 〈0, 1〉.
In this section, we begin by defining a data-poisoning model in which an attacker contributes a number of malicious training items. Then, we demonstrate how to apply the trace-based learner Trace to abstract sets of training sets, allowing us to efficiently prove poisoning robustness.
4.1. The n-poisoning model
We consider a poisoning model where the attacker has contributed up to n elements of the training set—we call it n-poisoning. Formally, given a training set T and a natural number n ≤ |T|, we define the following perturbed set:
In other words, Δn(T) captures every training set the attacker could have possibly started from to arrive at T.
This definition of dataset poisoning matches many settings studied in the literature.5,19,23 The idea is that an attacker has contributed a number of malicious data points into the training set to influence the resulting classifier.
We do not know which n points in T are the malicious ones, or if there are malicious points at all. Thus, the set Δn(T) captures every possible subset of T where we have removed up to n (potentially malicious) elements. Our goal is to prove that our classification is robust to up to n possible poisoned points added by the attacker. So if we try every possible dataset in Δn(T) and they all result in the same classification on x, then x is robust regardless of the attacker's potential contribution.
Observe that . So even for relatively small datasets and number n, the set of possibilities is massive, for example, for MNIST-1-7 dataset (§5), for n = 50, we have about 10141 possible training sets in Δn(T).
4.2. Abstractions for verifying n-poisoning
Our goal is to efficiently evaluate Trace on an input x for all possible training datasets in Δn(T). If all of them yield the same classification y, then we know that x is a robust input. Our insight is that we can abstractly interpret Trace on a symbolic set of training sets without having to fully expand it into all of its possible concrete instantiations.
Recall that the state of Trace is (T, ϕ, σ); for our purposes, we do not have to consider the sequence of predicates σ, as we are only interested in the final classification, which is a function of T. In this section, we present the abstract domains for each component of the learner's state.
Abstract training sets. Abstracting training sets is the main novelty of our technique. We use the abstract element 〈T′, n′〉 to denote a set of training sets and it captures the definition of Δn′ (T′): For every training set T′ and number n′, the concretization function—that is, the function that maps an abstract element to the set of concrete elements it represents—is γ (〈T′, n′〉) = Δn′ (T′). Therefore, we have that initially the abstraction function α—that is., the function that maps a set of concrete elements to an abstract one—which is defined as α(Δn (T)) = 〈T, n〉 is precise. Note that an abstract element 〈T′, n′〉 succinctly captures a large number of concrete sets, Δn′ (T′). Further, all operations we perform on 〈T′, n′〉 will only modify T′ and n′, without resorting to concretization.
Elements in the domain are ordered so that 〈T1, n1〉 ⊑ 〈T2, n2〉 if and only if T1 ⊆ T2 ∧ n1 ≤ n2 − |T2 \ T1|.
We can define an efficient join operation—that is, the operation that given two abstract elements, returns an abstract element that subsumes both—on two elements in the abstract domain as follows:
DEFINITION 4.1 (JOINS). Given two training sets T1, T2 and n1, n2 ∈ ℕ, 〈T1, n1〉 ⊔ 〈T2, n2〉 ≔ 〈T′, n′〉 where T′ = T1 ∪ T2 and n′ = max(|T1 \ T2| + n2, | T2 \ T1 | + n1).
Notice that the join of two sets is an overapproximation of the union of the two sets.
PROPOSITION 4.1. For any T1, T2, n1, n2:
EXAMPLE 4.1. For any training set T1, if we consider the abstract sets 〈T1, 2〉 and 〈T1, 3〉, because the second set represents strictly more concrete training sets, we have
Now consider the training set T2 = {x1, x2}. We have
Notice how the join increased the poisoned elements from 2 to 3 to accommodate for the additional element x3.
Abstract predicates and numeric values. When abstractly interpreting what predicates the learner might choose for different training sets, we will need to abstractly represent sets of possible predicates. Simply, a set of predicates is abstracted precisely as the corresponding set of predicates Ψ—that is, for every set Ψ, we have α(Ψ) = Ψ and γ(Ψ) = Ψ. Moreover, Ψ1 ⊔ Ψ2 = Ψ1 ∪ Ψ2. For certain operations, it will be handy for Ψ to contain a special null predicate ◇.
When abstractly interpreting numerical operations, such as cprob and ent, we will need to abstract sets of numerical values. We do so using the standard intervals abstract domain (denoted [l, u]). For instance, α({0.2, 0.4, 0.6}) = [0.2, 0.6] and γ([0.2, 0.6]) = {x | 0.2 ≤ x ≤ 0.6}. The join of two intervals is defined as [l1, u1] ⊔ [l2, u2] = [min(l1, l2), max(r1, r2)]. Interval arithmetic follows the standard definitions.
4.3. Abstract learner Trace#
We are now ready to define an abstract interpretation of the semantics of our decision tree learner, denoted Trace#.
Abstract domain. Recall that the state of Trace is (T, ϕ, σ); for our purposes, we do not have to consider the sequence of predicates σ, as we are only interested in the final classification, which is a function of T. Using the domains described in Section 4.2, at each point in the learner, our abstract state is a pair (〈T′, n′〉, Ψ′) that tracks the current set of training sets and the current set of possible most recent predicates the algorithm has split on (for all considered training sets).
When verifying n-poisoning for a training set T, the initial abstract state of the learner will be the pair (〈T, n〉, {◇}). In the rest of the section, we define the abstract semantics (i.e., our abstract transformers) for all the operations performed by Trace#. For operations that only affect one element of the state, we assume the other component is unchanged.
4.4. Abstract semantics of auxiliary operators
We will begin by defining the abstract semantics of the auxiliary operations in the algorithm before proceeding to the core operations, filter and bestSplit.
We begin with , that is, the abstract analog of T↓ϕ.
It removes elements not satisfying ϕ from T; since the size of T↓ϕ can go below n, we take the minimum of the two.
PROPOSITION 4.2. Let T′ ∈ γ(〈T, n〉). For any predicate ϕ, we have .
Now, consider cprob(T), which returns a vector of probabilities for different classes. Its abstract version returns an interval for each probability, denoting the lower and upper bounds based on the training sets in the abstract set:
where ci = |{(x, i) ∈ T}|. In other words, for each class i, we need to consider the best- and worst-case probability based on removing n elements from the training set, as denoted by the denominator and the numerator. Note that in the corner case where n = |T|, we set cprob#(〈T, n〉) = 〈[0, 1]〉i∈[1, k].
PROPOSITION 4.3. Let T′ ∈ γ(〈T, n〉). Then,
where γ (cprob#(〈T, n〉)) is the set of all possible probability vectors in the vector of intervals.
EXAMPLE 4.2. Consider the training set on the left side of the tree in Figure 2; call it Tℓ. It has 7 white elements and 2 black elements. cprob(Tℓ) = 〈7/9, 2/9〉, where the first element is the white probability. cprob#(〈Tℓ, 2〉) produces the vector 〈[5/9, 1], [0, 2/7]〉. Notice the loss of precision in the lower bound of the first element. If we remove two white elements, we should get a probability of 5/7, but the interval domain cannot capture the relation between the numerator and denominator in the definition of cprob#.
The abstract version of the Gini impurity is identical to the concrete one, except that it performs interval arithmetic:
Each term ιi denotes an interval.
4.5. Abstract semantics of filter
We are now ready to define the abstract version of filter. Since we are dealing with abstract training sets, as well as a set of predicates Ψ, we need to consider for each ϕ ∈ Ψ all cases where x ⊨ ϕ or x ⊨ ¬ϕ, and take the join of all the resulting training sets (Definition 4.1). Let
Then,
PROPOSITION 4.4. Let T′ ∈ γ(〈T, n〉) and ϕ′ ∈ Ψ. Then,
EXAMPLE 4.3. Consider the full dataset Tbw from Figure 2. For readability, we write T instead of Tbw in the example. Let x denote the input with numerical feature 4, and let Ψ = {x ≤ 10}. First, note that because Ψ ¬x is the empty set, the right-hand side of the result of applying the filter# operator will be the bottom element 〈Ø, 0〉 (i.e., the identity element for ⊔). Then,
4.6. Abstract semantics of bestSplit
We are now ready to define the abstract version of bestSplit. We begin by defining bestSplit# without handling trivial predicates, then we refine our definition.
Minimal intervals. Recall that in the concrete case, bestSplit returns a predicate that minimizes the function score(T, ϕ). To lift bestSplit to the abstract semantics, we define score#, which returns an interval, and what it means to be a minimal interval—that is, the interval corresponding to the abstract minimal value of the objective function score#(T, ϕ).
Lifting score(T, ϕ) to score#(〈T, n〉, ϕ) can be done using the sound transformers for the intermediary computations:
where |〈T, n〉| ≔ [|T| − n, |T|].
However, given a set of predicates Φ, bestSplit# must
return the ones with the minimal scores. Before providing the formal definition, we illustrate the idea with an example.
EXAMPLE 4.4. Imagine a set of predicates Φ = {ϕ1, ϕ2, ϕ3, ϕ4} with the following intervals for score#(〈T, n〉, ϕi).
Notice that ϕ1 has the lowest upper bound for score (denoted in red and named lubΦ). Therefore, we call score#(〈T, n〉, ϕ1) the minimal interval with respect to Φ. bestSplit# returns all the predicates whose scores overlap with the minimal interval score#(〈T, n〉, ϕ1), which in this case are ϕ1, ϕ2, and ϕ3. This is because there is a chance that ϕ1, ϕ2 and ϕ3 are indeed the predicates with the best score, but our abstraction has lost too much precision for us to tell conclusively.
Let lb/ub be functions that return the lower/upper bound of an interval. First, we define the lowest upper bound among the abstract scores of the predicates in Φ as
We can now define the set of predicates whose score overlaps with the minimal interval as (we avoid some corner cases here for clarity):
LEMMA 4.1. Let T′ ∈ γ (〈T, n〉)). Then,
4.7. Abstracting conditionals
We abstractly interpret conditionals in Trace, as is standard, by taking the join of all abstract states from the feasible then and else paths. In Trace, there are two branching statements of interest for our purposes, one with the condition ent(T) = 0 and one with ϕ = ◇.
Let us consider the condition ϕ = ◇. Given an abstract state (〈T, n〉, Ψ), we simply set Ψ = {◇} and propagate the state to the then branch (unless, of course, ◇ ∉ Ψ, in which case we omit this branch). For ϕ ≠ ◇, we remove ◇ from Ψ and propagate the resulting state through the else branch.
Next, consider the conditional ent(T) = 0. For the then branch, we need to restrict an abstract state (〈T, n〉, Ψ) to training sets with 0 entropy: Intuitively, this occurs when all elements have the same classification. We ask: Are there any concretizations composed of elements of the same class?, and we proceed through the then branch with the following training set abstraction:
The idea is as follows: The set T′ defines a subset of T containing only elements of class i. But if we have to remove more than n elements from T to arrive at T′, then the conditional is not realizable by a concrete training set of class i, and so we return the empty abstract state.
In the case of ent(T) ≠ 0 (else branch), we soundly (imprecisely) propagate the original state without restriction.
4.8. Soundness of abstract learner
Trace# soundly overapproximates the results of Trace and can therefore be used to prove robustness to n-poisoning. Let I = ([l1, u2], …, [lk, uk)] be a set of intervals. We say that interval [li, ui] dominates I if and only if li > uj for every j ≠ i.
THEOREM 4.2. Let 〈T′, n′〉 be the final abstract state of Trace#(〈T, n〉, x). If I = cprob#(〈T′, n′〉)) and there exists an interval in I that dominates I (i.e., same class is selected for every T ∈ γ(〈T, n〉)), then x is robust to n-poisoning of T.
4.9. Disjunctive abstraction
The state abstraction used by Trace# can be imprecise, mainly due to the join operations that take place, for example, during filter#. The primary concern is that we are forced to perform a very imprecise join between possibly quite dissimilar training set fragments. Consider the following example:
EXAMPLE 4.5. Let us return to Tbw from Figure 2, but imagine we have continued the computation after filtering using x ≤ 10 and have selected some best predicates. Specifically, consider a case in which we have x = 4 and
Let us evaluate filter#(〈T, 1〉, Ψ, x). Following the definition of filter#, we will compute
where T≤4 = {(4, b), (3, w), (2, w), (1, w), (0, b)} and T>3 = {(4, b), (7, w), (8, w), (9, w), (10, w)}, thus giving us T′ = T (the set we began with) and n′ = 5 (much larger than what we began with). This is a large loss in precision.
To address this imprecision, we will consider a disjunctive version of our abstract domain, consisting of unboundedly many disjuncts of this previous domain, which we represent as a set {(〈T, n〉i, 〈i, Ψi)}i. Our join operation becomes very simple: It is the union of the two sets of disjuncts.
DEFINITION 4.2 (JOINS). Given two disjunctive abstractions DI = {(〈T, n〉i, Ψi)}i∈I and DJ = {(〈T, n〉j, Ψj)}j∈J, we define
Adapting Trace# to operate on this domain is immediate: Each of the transformers described in the previous section is applied to each disjunct. Because our disjunctive domain eschews memory efficiency and time efficiency for precision, we are able to prove more things, but at a cost.
We implemented our algorithms Trace and Trace# in C++ in a (single-threaded) prototype we call Antidote. Our evaluation aims to answer the following research questions:
5.1. Benchmarks and experimental setup
We experiment on 5 datasets (Table 1). We obtained the first three datasets from the UCI Machine Learning Repository.7 Iris is a small dataset that categorizes three related flower species; Mammographic Masses and Wisconsin Diagnostic Breast Cancer are the two datasets of differing complexities related to classifying whether tumors are cancerous. We also evaluate the widely studied MNIST dataset of handwritten digits,11 which consists of 70,000 grayscale images (60,000 training, 10,000 test) of the digits zero through nine. We consider a form of MNIST that has been used in the poisoning literature and create another variant for evaluation:
Table 1. Detailed metrics for the benchmark datasets considered in our evaluation.
For each dataset, we consider a decision tree learner with a maximum tree depth (i.e. the number of calls to bestSplit) ranging from 1 to 4. Table 1 shows that test set accuracies of the decision trees learned by Trace are reasonably high—affirmation that when we prove the robustness of its results, we are proving something worthwhile.
Experimental setup. For each test element, we explore the amount of poisoning (i.e., how large of a n from our Δn model) for which we can prove the robustness property as follows.
Failure occurs due to any of three cases: (i) The computed over-approximation does not conclusively prove robustness, (ii) the computation runs out of memory, or (iii) the computation exceeds a one-hour time-out. We run the entire procedure for the non-disjunctive and disjunctive abstract domains.
5.2. Effectiveness of antidote
We evaluate how effective Antidote is at proving data-poisoning robustness. In this experiment, we consider a run of the algorithm on a single test element successful if either the non-disjunctive or disjunctive abstract domain succeeds at proving robustness (mimicking a setting in which two instances of Trace#, one for each abstract domain, are run in parallel)—we contrast the results for the different domains in the original paper. Figure 6 shows these results.
Figure 6. Fraction of test instances proven robust versus poisoning parameter n (log scale). The dotted line is a visual aid, indicating n is 1% of the training set size.
To exemplify the power of Antidote, draw your attention to the depth-2 instance of Trace# invoked on MNIST-1-7-Real. For 38/100 test instances, we are able to verify that even if the training set had been poisoned by an attacker who contributed up to 64 poisoned elements , the attacker would not have had any power to change the resulting classification. Conventional machine learning wisdom says that, in decision tree learning, small changes to the training set can cause the model to behave quite differently. Our results verify nuance—sometimes, there is some stability. These 38 verified instances average ~800s run time. Δ64(T) consists of ~10174 concrete training sets; this is staggeringly efficient compared to a naïve enumeration baseline, which would be unable to verify robustness at this scale.
To answer RQ1, Antidote can verify robustness for real-world datasets with extremely large perturbed sets and decision tree learners with high accuracies.
We delegate the results for RQ2 to the original paper, but we note that decision tree depth the number of poisoned elements are the most important factors for the scalability of our approach: Larger trees and more poisoning mean a higher loss of precision, and therefore less proofs. Similarly, larger trees lead to scalability issues as the number of abstract datasets we maintain may grow exponentially.
Data poisoning. Data-poisoning robustness has been studied extensively from an attacker perspective.2, 13, 15, 24, 25 This body of work has demonstrated attacks that can degrade classifier (in particular, SVMs) accuracy, sometimes dramatically. Our approach instead proves that no poisoning attack exists using abstract interpretation, while existing techniques largely provide search techniques for finding poisoned training sets. Recently, we have showed that our technique is robust and can handle different poisoning models,14 for example, ones in which the attacker can remove, add, or modify data. This recent work also shows how poisoning can disparately impact different parts of the test data (e.g., when dividing a test data where each element corresponds to a person based on ethnicity), therefore opening new problems in the field of algorithmic fairness.
Some recent techniques modify the training processes and hypothesis class to prove robustness to data poisoning,12,18 while our approach proves that a certain algorithm is robust to poisoning with respect to a given data set. Additionally, these approaches provide probabilistic guarantees about certain kinds of attacks on a modified training algorithm, while our work provides deterministic guarantees.
Abstract interpretation for robustness. Abstract interpretation6 is a framework for static program analysis. Our work is inspired by abstract interpretation-based verification of neural networks.1 However, we tackle the problem of verifying training time robustness, while existing works focus on test-time robustness. The former problem requires abstracting sets of training sets, while the latter only requires abstracting sets of individual inputs.
This material is based upon work supported by the National Science Foundation under grant numbers 1652140, 1704117, 1750965, and 1918211. The work is also supported by a Facebook Award and a Microsoft Faculty fellowship.
1. Albarghouthi, A. Introduction to neural network verification. arXiv:2109.10317, 2021.
2. Biggio, B., Nelson, B., Laskov, P. Poisoning attacks against support vector machines. In Proceedings of the 29th International Conference on Machine Learning, ICML'12, Omnipress.
3. Breiman, L. Classification and Regression Trees, Routledge, 2017.
4. Carlini, N., Wagner, D. Towards evaluating the robustness of neural networks. In 2017 IEEE Symposium on Security and Privacy (SP). IEEE, 39–57.
5. Chen, X., Liu, C., Li, B., Lu, K., Song, D. Targeted backdoor attacks on deep learning systems using data poisoning. arXiv:1712.05526, 2017.
6. Cousot, P., Cousot, R. Abstract interpretation: A unified lattice model for static analysis of programs by construction or approximation of fixpoints, 1977.
7. Dua, D., Graff, C. UCI machine learning repository, 2017.
8. Gehr, T., Mirman, M., Drachsler-Cohen, D., Tsankov, P., Chaudhuri, S., Vechev, M. Ai2: Safety and robustness certification of neural networks with abstract interpretation. In 2018 IEEE Symposium on Security and Privacy (SP), May 2018, 3–18.
9. Katz, G., Barrett, C., Dill, D.L., Julian, K., Kochenderfer, M.J. Reluplex: An efficient smt solver for verifying deep neural networks. In International Conference on Computer Aided Verification. Springer, 2017, 97–117.
10. Laishram, R., Phoha, V.V. Curie: A method for protecting SVM classifier from poisoning attack. CoRR, abs/1606.01584, 2016.
11. LeCun, Y., Cortes, C., Burges, C.J.C. The MNIST database of handwritten digits.
12. Levine, A., Feizi, S. Deep partition aggregation: Provable defenses against general poisoning attacks. In International Conference on Learning Representations, 2020.
13. Mei, S., Zhu, X. Using machine teaching to identify optimal training-set attacks on machine learners. In Proceedings of the 29th AAAI Conference on Artificial Intelligence, AAAI'15. AAAI Press, 2015, 2871–2877.
14. Meyer, A.P., Albarghouthi, A., D'Antoni, L. Certifying robustness to programmable data bias in decision trees. In Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems, December 6-14, 2021.
15. Newell, A., Potharaju, R., Xiang, L., Nita-Rotaru, C. On the practicality of integrity attacks on document-level sentiment analysis. In Proceedings of the 2014 Workshop on Artificial Intelligent and Security Workshop (New York, NY, USA). ACM, 83–93.
16. Quinlan, J.R. Induction of decision trees. Mach. Learn 1, 1 (1986), 81–106.
17. Quinlan, J.R. C4.5: Programs for machine learning. The Morgan Kaufmann Series in Machine Learning, Morgan Kaufmann, San Mateo, CA, 1993.
18. Rosenfeld, E., Winston, E., Ravikumar, P., Kolter, Z. Certified robustness to label-flipping attacks via randomized smoothing. In International Conference on Machine Learning. PMLR, 2020, 8230–8241.
19. Steinhardt, J., Koh, P.W.W., Liang, P.S. Certified defenses for data poisoning attacks. In Advances in Neural Information Processing Systems, 2017, 3517–3529.
20. Szegedy, C., Zaremba, W., Sutskever, I., Bruna, J., Erhan, D., Goodfellow, I. J., Fergus, R. Intriguing properties of neural networks. In 2nd International Conference on Learning Representations, Banff, AB, Canada, April 14-16, 2014, Conference Track Proceedings.
21. Wang, S., Pei, K., Whitehouse, J., Yang, J., Jana, S. Formal security analysis of neural networks using symbolic intervals. In 27th {USENIX} Security Symposium. 2018, 1599–1614.
22. Wang, Y., Jha, S., Chaudhuri, K. Analyzing the robustness of nearest neighbors to adversarial examples. In Proceedings of the 35th International Conference on Machine Learning, ICML 2018 (Stockholmsmässan, Stockholm, Sweden, July 10-15, 2018), 5120–5129.
23. Xiao, H., Biggio, B., Brown, G., Fumera, G., Eckert, C., Roli, F. Is feature selection secure against training data poisoning? In International Conference on Machine Learning, 2015, 1689–1698.
24. Xiao, H., Biggio, B., Nelson, B., Xiao, H., Eckert, C., Roli, F. Support vector machines under adversarial label contamination. Neurocomput. 160, C (July 2015), 53–62.
25. Xiao, H., Xiao, H., Eckert, C. Adversarial label flips attack on support vector machines. In Proceedings of the 20th European Conference on Artificial Intelligence, ECAI'12 (Amsterdam, The Netherlands, 2012), IOS Press, 870–875.
To view the accompanying Technical Perspective, visit doi.acm.org/10.1145/3576893
The original version of this paper was published in the Proceedings of the 41st ACM SIGPLAN Conference on Programming Language Design and Implementation, June 20, 2020.
©2023 ACM 0001-0782/23/02
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and full citation on the first page. Copyright for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers, or to redistribute to lists, requires prior specific permission and/or fee. Request permission to publish from [email protected] or fax (212) 869-0481.
The Digital Library is published by the Association for Computing Machinery. Copyright © 2023 ACM, Inc.
No entries found