=Paper=
{{Paper
|id=Vol-2640/paper_13
|storemode=property
|title=Safety Augmentation in Decision Trees
|pdfUrl=https://ceur-ws.org/Vol-2640/paper_13.pdf
|volume=Vol-2640
|authors=Sumanta Dey,Pallab Dasgupta,Briti Gangopadhyay
|dblpUrl=https://dblp.org/rec/conf/ijcai/DeyDG20
}}
==Safety Augmentation in Decision Trees==
Safety Augmentation in Decision Trees Sumanta Dey , Pallab Dasgupta and Briti Gangopadhyay∗ Dept of Computer Science and Engineering, IIT Kharagpur, India sumanta.dey@iitkgp.ac.in, pallab@cse.iitkgp.ac.in, briti gangopadhyay@iitkgp.ac.in Abstract decision tree decides whether a type of Mushroom with given features like the CapShape, the CapColor, and the GillColor Decision trees are among the most popular rep- is poisonous or edible based on the information gained from resentations used in statistical machine learning. the given dataset. As can be seen in Figure 1a, the decision This paper studies the problem of incorporating do- is taken by examining only the features, CapColor, and Cap- main knowledge into the learning of decision trees, Shape. The feature, GillColor, is ignored by the decision tree, specifically, when the knowledge is given in the and this can be statistically justified from the dataset. form of a set of known decision cases, captured us- For safety-critical decisions, relying on a decision tree con- ing assertions. The ability to ensure that the deci- structed using statistical machine learning can be risky, be- sion making never violates a known set of safety cause the dataset will typically not contain samples covering assertions is essential for the use of decision tree all possible combinations of features. The missing combi- learning in safety critical domains. A pure machine nations may either not exist in the real world or maybe so learning approach does not guarantee safety, since rare that they do not show up in the dataset. This is often the learning can be impaired by the lack of ade- true when the number of features and their value domains are quate data on possible failure scenarios. To the best large. It is, therefore, possible that the decision tree will make of our knowledge, this is the first work on formal a wrong decision because it has no supervised precedence for safety augmentation in decision trees with provable unobserved combinations of feature values. For example, we guarantees. may know from scientific knowledge that mushrooms hav- ing green GillColor and CapShape like a bell are definitely 1 Introduction poisonous, though there is no support for this knowledge in Decision Tree Learning is among the most widely used and the data. Consequently, the decision tree of Figure 1a rec- practical methods for supervised learning for both classifi- ommends all mushrooms with yellow CapColor, ignoring the cation and regression tasks [Breiman, 2017; Song and Ying, features CapShape and GillColor, possibly leading to disas- 2015]. Decisions based on decision trees are inherently ex- trous consequences. The decision tree also declares as edible plainable and statistically supported by the data used to learn all mushrooms with white CapColor and bell-like CapShape, the tree. The primary goal of a decision tree learning al- whereas the above scientific knowledge should be used to gorithm is to construct an optimal decision tree, where the eliminate the ones with green GillColor from this set. quality is defined in terms of the depth of the tree (which is an upper-bound on the number of variables to be exam- ined before reaching a decision) and the number of nodes in Safety requirements gleaned from (non-statistical) domain the tree (which reflects the average effort in reaching a deci- knowledge can be provided explicitly in the form of asser- sion). Finding an optimal decision tree is an NP-Complete tions, a practice which is widely used in many safety-critical problem [Laurent and Rivest, 1976; Murthy, 1998]. There domains [SVA, 2018; Peled et al., 1999]. For example, the are several greedy algorithms for learning decision trees, in- knowledge on mushrooms with green GillColor and bell-like cluding ID3, C4.5, C5.0 [Quinlan, 1986; Quinlan, 2014; CapShape can be expressed as the following propositional as- Quinlan, 2007]. Greedy metrics, such as Information Gain, sertion: are used to order the nodes in the decision tree during the (CapShape = bell ∧ GillColor = green) ⇒ P oisonous construction. For example, the decision tree developed using ID3 for the dataset of Table 1 is shown in Figure 1a. This This paper studies the augmentation of decision trees with ∗ The authors like to thank DST, Govt of India, for partial finan- such safety assertions. We show that naive post-facto pro- cial support of this project. cessing of a decision tree with the given assertions adversely Copyright c 2020 for this paper by its authors. Use permitted affects the size of the decision tree. For example, such an under Creative Commons License Attribution 4.0 International (CC augmentation with the above assertion leads to the decision BY 4.0). tree of Figure 1b, where the average number of variables to paths of the decision tree where the decision is safe any- CapShape CapColor GillColor Poisonous way. 2. Analyzing safety assertions after using the decision tree. Bell Pink Green Poisonous We may analyze the decision recommended by the de- Bell Pink White Poisonous cision tree against the safety properties. For example, Bell Pink Gray Poisonous the decision tree of Figure 1a recommends us to eat Convex Pink Gray Poisonous mushrooms having white CapColor and bell-like Cap- Convex Pink Brown Poisonous Shape, whereas the safety assertion tells us that mush- Convex White Brown Poisonous rooms with bell-like CapShape and green GillColor are Convex White White Poisonous poisonous. We must therefore not go by the decision Convex White Gray Poisonous recommended by the decision tree, but examine the Gill- Convex Yellow Brown Edible Color. This post-facto safety augmentation is shown in Convex Yellow Gray Edible Figure 1b. Convex Yellow White Edible For a leaf node, v, of the decision tree, let π(v) denote the Bell Yellow White Edible conjunction of predicates leading to v, and let dec(v) denote Bell Yellow Gray Edible the decision recommended at v. For example, in Figure 1a, if Bell Yellow Brown Edible v is the second leaf node from the left, then dec(v) = edible Bell White Brown Edible and: Bell White Gray Edible Bell White White Edible π(v) = CapColor = white ∧ CapShape = bell Table 1: A section of the mushroom dataset [Dua and Graff, 2017] For an assertion, ϕ, let ant(ϕ) denote the antecedent of ϕ. For example, for the assertion, ϕ: (CapShape = bell ∧ GillColor = green) ⇒ P oisonous be examined before reaching a decision is significantly more than that of Figure 1a. ant(ϕ) = (CapShape = bell ∧ GillColor = green) We present a methodology for biasing the information gain metric based on the safety assertions to compensate for the Lemma 2.1 Safety augmentation is necessary in a leaf node missing information in the dataset. This leads to considerable v of a decision tree iff π(v) ∧ ant(ϕ) is satisfiable, but dec(v) improvement in the safety-augmentation process. The modi- refutes the consequent of ϕ. fied decision tree using our proposed methodology is shown Proof: If π(v) ∧ ant(ϕ) is False (unsatisfiable), then no in Figure 2. The average number of variables to be examined safety augmentation is needed at v because the antecedent of before reaching a decision in this tree is significantly better ϕ does not match on this path of the decision tree. If dec(v) than that of Figure 1b. satisfies the consequent of ϕ, then no safety augmentation The paper is organized as follows. Section 2 outlines is needed at v because the decision tree recommends a safe the post-facto safety augmentation methodology. Section 3 decision anyway. Therefore it suffices to consider only the presents the proposed integrated safety augmentation ap- cases where π(v) ∧ ant(ϕ) is satisfiable, but dec(v) refutes proach, and Section 4 outlines multi-assertion extensions. the consequent of ϕ. The necessity is obvious since dec(v) Experimental results on benchmark datasets are presented in refutes the consequent of ϕ, which will be a wrong decision Section 5. Related work and concluding remarks are given in for the cases satisfying π(v) ∧ ant(ϕ). sections 6 and 7 respectively. The methodology for post-facto safety augmentation in the necessary leaf nodes identified by Lemma 2.1 is as follows: 2 Post-Facto Safety Augmentation • Let η(ϕ) denote the set of variables referenced in ant(ϕ) but not present in π(v). For example, for This section examines the merit of considering the safety as- the rightmost leaf node of Figure 1a, eta(ϕ) = sertions after the decision tree has been constructed. We con- {CapShape, GillColor}. sider the following two options: • A sequence of nodes corresponding to the variables in 1. Analyzing safety assertions before using the decision η(ϕ) replaces the leaf node. This sequence is shown in tree. This is typically not a good option in practice. red in Figure 1b. Safety-critical scenarios often have specific attributes which need not be examined if the decision is safe any- • Decisions corresponding to the remaining branches of way. For example, suppose a safety property specifies the new subtrees are recomputed using the decision tree that: sulfa drugs should not be administered unless a learning algorithm. sensitivity test rules out allergic reaction. It does not The need for recomputing the decisions in the new branches make sense to perform a sensitivity test for allergic reac- of the decision tree arises when the decision taken at the orig- tions to sulfa drugs before the decision tree recommends inal node was due to the statistical majority and not due to such a drug. In other words, examining all the predicates unanimity. In order to avoid overfitting, decision tree learn- in the antecedent of an assertion is redundant in those ing algorithms can take a specific decision at a node when the (a) Decision Tree from Table 1. (b) Decision Tree after Post-facto Safety Augmentation. Figure 1: Decision trees for mushroom classification. Figure 2: Decision Tree constructed from the rectified training dataset augmented by the support dataset of safety property by ID3 algorithm. samples in favor of that decision is in overwhelming major- various values that the root variable can take. Formally, the ity. The safety augmentation splits these samples into the new information gain, IG(a|t), of attribute a at the end of branch branches of the decision tree, and in some of the branches, t is: the overwhelming majority may not hold anymore. Hence it l is necessary to run the decision tree learning algorithm on the X N (t ∧ a[vj ]) IG(a|t) = E(a|t) − E(t ∧ a[vj ]) (1) new branches. j=1 N (t) 3 Integrated Safety Augmentation where: • E(a|t) is the entropy of the attribute a at the end of the The post-facto safety augmentation approaches of Section 2 current branch t, do not consider the information gain resulting out of the safety assertions, and may, therefore, produce safe decision • l is the cardinality of attribute a’s domain, trees which are suboptimal. For example, the decision tree • N (t ∧ a[vj ]) is the count of all decision variable class shown in Figure 2 is also a safe decision tree, but it uses fewer after branch t and attribute a’s branch a[vj ]. Here a[vj ] variables in the rightmost branch as compared to Figure 1b. is the j th value in attribute a’s domain. Specifically, the tree of Figure 2 takes advantage of the fact that GillColor is sufficient to separate the edible and poi- Also: k sonous mushrooms when CapColor is yellow. Such choices X E(t) = (−1).p(i|t).log(p(i|t)) (2) can be made when the safety augmentation is integrated with i=1 the learning algorithm of the decision tree. We elucidate the integrated safety augmentation approach where, k is the number of different decision variable classes by considering the ID3 algorithm [Quinlan, 1986]. Given a and p(i) is the probability of decision variable class i, and set of training samples that are individually tagged with the therefore: decision, the key step of the algorithm is to choose the vari- T D(i|t) T D(i|t) p(i|t) = Pk = (3) able at the root of the decision tree based on an information T D(j|t) T D(t) j=1 gain metric. Once the root is chosen, the algorithm is recur- sively applied on each branch of the root corresponding to the where, • T D is the Training Dataset ID3 algorithm chooses the attributes corresponding to each • T D(i|t) is the count of decision variable i, node in the decision tree using the information gain met- ric, and this metric only requires the count of the samples • T D(t) is the size of the total dataset at the end of the having a specific decision for each value in the domain of branch t. an attribute. The proposed methodology is implemented by At the root of the decision tree, branch t will be empty. The manipulating these counts based on our understanding of ID3 algorithm calculates the Information Gain for all the at- Aug(ϕ). Formally, we replace Equation 3 with the follow- tributes using the above formulae and selects the attribute ing: having the highest Information Gain as the root node. Each N (i|t) N (i|t) value of the chosen attribute becomes the branch of the root p(i|t) = Pk = (4) j=1 N (j|t) N (t) node. The training dataset is then divided corresponding to the decision tree branches. Then the same procedure is re- where N (i|t) it the count of the decision variable class i after cursively applied on each of the branches with the remaining the branch t and N (t) is the count of all decision variable attributes and the data items, until all the data items have the classes after the branch t. As we are also considering the same decision value (or we choose to prune the tree at that count from Aug(ϕ), we have: branch by choosing majority decision). We then add a leaf node with the decision value. N (i|t) = R(i|t) + S(i|t) (5) 3.1 The Principle of Safety Augmentation where, R(i|t) is the count of decision variable class i in the input dataset after the branch t, and S(i|t) is the count of The reason why a decision tree constructed out of a given data decision variable class i corresponding to ϕ after the branch set violates a safety assertion is that samples corresponding to t. the counter-examples were not present in the data. We may The modified information gain is calculated thereafter in address this issue by augmenting the data set with additional the standard way: samples that are relevant to the assertion. For example, con- sider the mushroom data set of Table 1 and the assertion, ϕ: l X N (t ∧ a[vj ]) (CapShape = bell ∧ GillColor = green) ⇒ P oisonous IG(a|t) = E(a|t) − E(t ∧ a[vj ]) (6) j=1 N (t) Table 1 does not contain rows for all combinations of attributes where CapShape = bell and GillColor = green. 3.3 Does Safety Augmentation Create Bias? Suppose we add the following rows in Table 1: Post-facto safety augmentation does not affect the relative po- sitions of the attributes in the decision tree, since the method CapShape CapColor GillColor Poisonous splits those leaf nodes where the decision contradicts the Bell White Green Poisonous safety requirement, while the other nodes in the tree are not Bell Yellow Green Poisonous disturbed. The integrated safety augmentation method pro- Adding these rows into the data set will ensure that the duces a more optimal decision tree by augmenting the safety decision tree learned using ID3 is safe with respect to the as- into the information gain metric and then allowing the ID3 sertion, ϕ. Formally, let A = {A1 , . . . , An } denote the set algorithm to proceed with the modified information gain. In of attributes in the data set, and Q ⊆ A denote the set of at- this section, we find out whether this augmentation introduces tributes referred in ant(ϕ). Let dom(Ai ) denote the set of a bias in the input, thereby affecting the choice among the values of attribute Ai . The safety augmentation data corre- other attributes in the system. sponding to ϕ is defined as follows: Let us return to our safety assertion, ϕ: Aug(ϕ) = {hv1 , . . . , vn , vd i | ∀i, vi ∈ dom(Ai ) and (CapShape = bell ∧ GillColor = green) ⇒ P oisonous hv1 , . . . , vn i satisfies ant(ϕ) and the decision variable vd satisfies the consequent of ϕ } As explained in Section 3.1, the proposed integrated method will affect the count corresponding to the following cases: Lemma 3.1 The decision tree constructed by the ID3 algo- rithm on the data set augmented with Aug(ϕ) is safe with CapShape CapColor GillColor Poisonous respect to assertion ϕ. Bell White Green Poisonous Proof sketch: Let us assume the contrary. Then there ex- ists a leaf node v in the decision tree such dec(v) refutes the Bell Yellow Green Poisonous consequent of ϕ and π(v) ∧ ant(ϕ) is satisfiable. But, in The augmentation methodology inherently assumes that the augmented data set, by the definition of Aug(ϕ), at least these types of mushrooms, which are not represented in the 50% of the samples at v have a decision which does to satisfy data, are actually possible. This need not be true, and in dec(v). Therefore, ID3 can never prune the tree at v with a nature, we may not have a mushroom of yellow CapColor, decision which disagrees with 50% of the samples. which has a bell-like CapShape and green GillColor, as re- quired by ant(ϕ). Formally, safety augmentation assumes 3.2 The Safety Augmentation Methodology that the attributes referenced in ant(ϕ) are independent of The proposed safety augmentation methodology does not ac- each other and that all combinations of values of these at- tually add Aug(ϕ) explicitly in the data set. Essentially the tributes can occur. (a) Information Gain for the attribute C and D of the toy example (b) Information Gain for the attribute C and D of the toy example without considering the support dataset. considering the support dataset. Figure 3: Change in the Information Gain for the attribute C and D of the toy example for considering the support dataset. Such an assumption may create a bias in the input to the combination of both. In this section, we discuss these variants decision tree learning algorithm. This is explained with the and show that the principle of safety augmentation presented case shown in Figure 3. We consider the information gain in the previous section can be applied to all such cases. of attributes C and D in a setting where C, D, and the de- • Causal versus Diagnostic Assertions. It is well known cision variables are all Boolean. Figure 3a shows the case in the literature that in logic causal and diagnostic rules when no safety augmentation has been done. In this case, at are inter-convertible. For example, the causal rule: a branch of the decision tree, we have 10 samples where the Cavity ⇒ Toothache can be rewritten as a diagnostic decision is True and 8 samples where the decision is False. rule: ¬ Toothache ⇒ ¬ Cavity. We have presented Figure 3a shows the pattern in which the attributes C and the safety augmentation methodology with assertions in D splits the samples with the decisions True and False. Ac- causal form, where the consequent is a predicate over cordingly, IG(C) and IG(D) are computed as shown, and the decision variable. If the assertion is given in another therefore, ID3 prefers D over C as the next attribute. form, it can be rewritten in the causal form and then used Figure 3b shows the case when safety augmentation is used in our methodology. to modify the counts. In this case, the safe decision is False, hence only the counts of samples with decision False are af- • Assertions with the same consequent. Given two asser- fected by the augmentation. The original ratio of (10 : 8) tions of the form α1 ⇒ β and α2 ⇒ β, we can perform has now become (10 : 12). Figure 3b shows the ratios of the safety augmentation in any order, or together by aug- True/False decision when split by the the attributes C and D. menting the counts with the ways of satisfying α1 ∨ α2 . Accordingly, IG(C) and IG(D) are computed as shown, and • Assertions with different consequents. Given two as- therefore, now ID3 prefers C over D as the next attribute. sertions of the form α1 ⇒ β1 and α2 ⇒ β2 , where The important thing to note here is that such an inversion β1 6= β2 , we can perform the safety augmentation in of priority between attributes like C and D can happen even any order. In this case we additionally need to ensure when they do not belong to the antecedent of an assertion, φ. that α1 ∧ α2 is unsatisfiable, otherwise the safety asser- For any attribute A 6∈ ant(φ), the count of the samples cor- tions contradict each other. This is done using a standard responding to the safe decision will be equal on all branches SAT engine. leading out of a node for A. In the specific case outlined above, C and D were not in ant(φ), and therefore the in- In general, we may also have assertions of the form α ⇒ β, crease in the counts of the False decision in both the left and where neither α, nor β contains the decision variable. Such right branches of these nodes are equal, that is, 2. Yet the assertions convey additional information about the variables, priority inversion takes place, indicating the introduction of which may not be explicit in the data. Such assertions can bias. also influence the counts during safety augmentation. For ex- We believe that enforcing safety always comes at the ex- ample, suppose the safety assertion states: pense of bias. In reality, it is often not possible to envis- age all practicable ways in which a property can fail. Some (CapShape = bell ∧ GillColor = green) ⇒ P oisonous combinations of values for the variables in the antecedent of In the absence of any other information, as discussed earlier, an assertion may not be possible, but this is not known to we augment two cases, corresponding to CapColor of White us. Therefore, the safety augmentation must assume that all and Yellow. Now suppose we are given the assertion: combinations are possible, even at the expense of the bias it introduces. It is theoretically possible to examine ways to (GillColor = Green) ⇒ (CapColor = P ink) minimize the effects of this bias on the construction of the decision tree, but that is beyond the scope of this paper. This assertion does not provide any information regarding the safety of mushrooms, but it tells us that mushrooms with 4 Multi-Assertion Safety Augmentation green GillColor always have pink CapColor. With this in- formation, the augmentation count drops from 2 to 0. The In general, safety may be defined in terms of multiple asser- safety augmentation methodology computes the augmenta- tions. Moreover, assertions may be causal, diagnostic, or a tion counts accordingly. Dataset ID Assertion Breast Cancer 1 (Age = (30 -39 ) ∧ Tumor -Size = (30 -34 ) ∧ Irradiation = Yes) ⇒ Recurrence-Events 1 (Cap-Shape = Bell ∧ Gill -Color = Green) ⇒ Poisonous Mushroom 2 (Stalk -color -above-ring = Bell ∧ Stalk -color -below -ring = Green) ⇒ Poisonous Nursery 1 (Student-Health = Not-Recommended ) ⇒ Not-Recommended 1 (top-left-square = o ∧ top-middle-square = o ∧ top-right-square = o) ⇒ x -losses Tic-Tac-Toe 2 (middle-left-square = o ∧ middle-middle-square = o ∧ middle-right-square = o) ⇒ x -losses 3 (bottom-left-square = o ∧ bottom-middle-square = o ∧ bottom-right-square = o) ⇒ x -losses Table 2: Assertions for Benchmark Datasets Original Decision Tree Post-facto Safety Augmentation Integrated Safety Augmentation Dataset Prop Total Runtime Total Runtime Total Runtime Depth Depth Depth ID Nodes (Sec) Nodes (Sec) Nodes (Sec) Breast Cancer 1 8 179 0.014 8 202 0.001 7 181 0.012 1 5 29 0.284 7 281 0.002 6 60 0.346 Mushroom 2 5 29 0.284 7 281 0.002 6 36 0.375 Nursery 1 9 803 0.275 9 803 0.001 9 803 0.260 1 8 343 0.034 8 343 0.001 8 318 0.035 Tic-Tac-Toe 2 8 343 0.034 10 463 0.002 8 363 0.036 3 8 343 0.034 10 514 0.002 8 310 0.035 Table 3: Comparison of runtimes and dimensions of decision trees 5 Experimental Results 2005] for helping National Health Service (NHS) managers We provide experimental results on four benchmark datasets in the United Kingdom to determine a fair and consistent from the UCI Machine Learning Repository [Dua and Graff, course of action towards staff involved in patient safety in- 2017], namely Breast Cancer [Zwitter and Soklic, 1988], cidents. All these trees are either manually created by the do- Mushroom , Nursery, and Tic-Tac-Toe. The assertions are main experts or by using domain-specific tools. There exists shown in Table 2 were designed based on relevance and have learning algorithms and tools for building decision trees, in- various degrees of support in the dataset. cluding ones that are robust against malicious attacks such as Table 3 compares the depth and the total number of nodes data poisoning [Alfeld et al., 2016] and evasion attacks [Big- in the decision trees created without safety augmentation with gio et al., 2013]. The tool called Antidote [Drews et al., those created with post-facto and integrated safety augmen- 2019] is used to check the robustness of decision tree learn- tation. Also shown are the runtimes, for creating the deci- ing against data poisoning attacks, where an attacker can in- sion trees without safety augmentation, the additional time ject several malicious elements into the training set to influ- for post-facto safety augmentation, and for creating the deci- ence the learned model. The decision tree learning algorithm sion trees with integrated safety augmentation. Treant [Calzavara et al., 2019] generates decision tree ensem- The effect of safety augmentation on the decision trees is bles that are accurate and nearly insensitive to evasion attacks dependent on the missing support for the safety assertions in based on a formal threat model, and minimizes an evasion- the datasets. For example, the safety property for the Nurs- aware loss function at each step of the tree construction. In ery dataset has most of the support data in the original sup- contrast, our approaches build decision trees where some of port data and therefore requires negligible augmentation. On the branches are responsible for predicting critical decisions, the other hand, support for the assertions over the mushroom are modified according to the defined safety constraints, and dataset requires considerable augmentation, resulting in sig- thus provides the robustness against those defined safety con- nificant changes due to safety augmentation. Not surpris- straints. ingly, integrated safety augmentation produces decision trees with fewer nodes and lesser depth as compared to post-facto 7 Conclusion safety augmentation. The runtimes indicate that the overhead of safety augmen- Safety augmentation is one of the primary requirements in tation is marginal in most cases. structures learned from data using machine learning tech- niques, especially when the learned function is used in a safety critical context. This paper examines the safety aug- 6 Related Work mentation problem for decision trees, which are popular in Decision trees and similar structures are already in use in crit- practice. We present the first methodology for safety aug- ical safety domains. This includes Culpability Tree [Reason, mentation in decision trees where the safety requirement is 2016] for dealing with staff involved in safety errors in the expressed in terms of assertions. Our results indicate that aviation industry, Incident Decision Trees [Meadows et al., augmenting the information gain metrics with knowledge gleaned from the safety assertions yields safe decision trees np-complete. Information processing letters, 5(1):15–17, which are considerably smaller than ones obtained by post- 1976. facto safety augmentation. [Meadows et al., 2005] Sandra Meadows, Karen Baker, and Jeremy Butler. The incident decision tree: guidelines for References action following patient safety incidents. In Advances in [Alfeld et al., 2016] Scott Alfeld, Xiaojin Zhu, and Paul Bar- Patient Safety: From Research to Implementation (Volume ford. Data poisoning attacks against autoregressive mod- 4: Programs, Tools, and Products). Agency for Healthcare els. In 30th AAAI Conference on Artificial Intelligence, Research and Quality (US), 2005. 2016. [Murthy, 1998] Sreerama K Murthy. Automatic construc- [Biggio et al., 2013] Battista Biggio, Igino Corona, Davide tion of decision trees from data: A multi-disciplinary sur- Maiorca, Blaine Nelson, Nedim Šrndić, Pavel Laskov, vey. Data mining and knowledge discovery, 2(4):345–389, Giorgio Giacinto, and Fabio Roli. Evasion attacks against 1998. machine learning at test time. In Joint European con- [Peled et al., 1999] Doron A. Peled, Edmund M. Clarke, and ference on machine learning and knowledge discovery in Orna Grumberg. Model Checking. MIT Press, 1999. databases, pages 387–402. Springer, 2013. [Quinlan, 1986] J. Ross Quinlan. Induction of decision trees. [Breiman, 2017] Leo Breiman. Classification and regression Machine learning, 1(1):81–106, 1986. trees. Routledge, 2017. [Quinlan, 2007] JR Quinlan. C5. http://rulequest.com, 2007. [Calzavara et al., 2019] Stefano Calzavara, Claudio Lucch- [Quinlan, 2014] J Ross Quinlan. C4.5: programs for ma- ese, Gabriele Tolomei, Seyum Assefa Abebe, and Salva- chine learning. Elsevier, 2014. tore Orlando. Treant: Training evasion-aware decision trees. arXiv preprint arXiv:1907.01197, 2019. [Reason, 2016] James Reason. Managing the risks of orga- nizational accidents. Routledge, 2016. [Drews et al., 2019] Samuel Drews, Aws Albarghouthi, and Loris D’Antoni. Proving data-poisoning robustness in de- [Song and Ying, 2015] Yan-Yan Song and LU Ying. Deci- cision trees. arXiv preprint arXiv:1912.00981, 2019. sion tree methods: applications for classification and pre- diction. Shanghai archives of psychiatry, 27(2):130, 2015. [Dua and Graff, 2017] Dheeru Dua and Casey Graff. UCI [SVA, 2018] SVA. (systemverilog assertions). IEEE Std machine learning repository, 2017. 1800-2017, pages 1–1315, Feb 2018. [Laurent and Rivest, 1976] Hyafil Laurent and Ronald L [Zwitter and Soklic, 1988] Matjaz Zwitter and Milan Soklic. Rivest. Constructing optimal binary decision trees is Breast cancer data set, 1988.