=Paper= {{Paper |id=Vol-2640/paper_13 |storemode=property |title=Safety Augmentation in Decision Trees |pdfUrl=https://ceur-ws.org/Vol-2640/paper_13.pdf |volume=Vol-2640 |authors=Sumanta Dey,Pallab Dasgupta,Briti Gangopadhyay |dblpUrl=https://dblp.org/rec/conf/ijcai/DeyDG20 }} ==Safety Augmentation in Decision Trees== https://ceur-ws.org/Vol-2640/paper_13.pdf
                                      Safety Augmentation in Decision Trees

                         Sumanta Dey , Pallab Dasgupta and Briti Gangopadhyay∗
                        Dept of Computer Science and Engineering, IIT Kharagpur, India
               sumanta.dey@iitkgp.ac.in, pallab@cse.iitkgp.ac.in, briti gangopadhyay@iitkgp.ac.in



                               Abstract                                 decision tree decides whether a type of Mushroom with given
                                                                        features like the CapShape, the CapColor, and the GillColor
        Decision trees are among the most popular rep-
                                                                        is poisonous or edible based on the information gained from
        resentations used in statistical machine learning.
                                                                        the given dataset. As can be seen in Figure 1a, the decision
        This paper studies the problem of incorporating do-
                                                                        is taken by examining only the features, CapColor, and Cap-
        main knowledge into the learning of decision trees,
                                                                        Shape. The feature, GillColor, is ignored by the decision tree,
        specifically, when the knowledge is given in the
                                                                        and this can be statistically justified from the dataset.
        form of a set of known decision cases, captured us-
                                                                           For safety-critical decisions, relying on a decision tree con-
        ing assertions. The ability to ensure that the deci-
                                                                        structed using statistical machine learning can be risky, be-
        sion making never violates a known set of safety
                                                                        cause the dataset will typically not contain samples covering
        assertions is essential for the use of decision tree
                                                                        all possible combinations of features. The missing combi-
        learning in safety critical domains. A pure machine
                                                                        nations may either not exist in the real world or maybe so
        learning approach does not guarantee safety, since
                                                                        rare that they do not show up in the dataset. This is often
        the learning can be impaired by the lack of ade-
                                                                        true when the number of features and their value domains are
        quate data on possible failure scenarios. To the best
                                                                        large. It is, therefore, possible that the decision tree will make
        of our knowledge, this is the first work on formal
                                                                        a wrong decision because it has no supervised precedence for
        safety augmentation in decision trees with provable
                                                                        unobserved combinations of feature values. For example, we
        guarantees.
                                                                        may know from scientific knowledge that mushrooms hav-
                                                                        ing green GillColor and CapShape like a bell are definitely
1       Introduction                                                    poisonous, though there is no support for this knowledge in
Decision Tree Learning is among the most widely used and                the data. Consequently, the decision tree of Figure 1a rec-
practical methods for supervised learning for both classifi-            ommends all mushrooms with yellow CapColor, ignoring the
cation and regression tasks [Breiman, 2017; Song and Ying,              features CapShape and GillColor, possibly leading to disas-
2015]. Decisions based on decision trees are inherently ex-             trous consequences. The decision tree also declares as edible
plainable and statistically supported by the data used to learn         all mushrooms with white CapColor and bell-like CapShape,
the tree. The primary goal of a decision tree learning al-              whereas the above scientific knowledge should be used to
gorithm is to construct an optimal decision tree, where the             eliminate the ones with green GillColor from this set.
quality is defined in terms of the depth of the tree (which
is an upper-bound on the number of variables to be exam-
ined before reaching a decision) and the number of nodes in             Safety requirements gleaned from (non-statistical) domain
the tree (which reflects the average effort in reaching a deci-         knowledge can be provided explicitly in the form of asser-
sion). Finding an optimal decision tree is an NP-Complete               tions, a practice which is widely used in many safety-critical
problem [Laurent and Rivest, 1976; Murthy, 1998]. There                 domains [SVA, 2018; Peled et al., 1999]. For example, the
are several greedy algorithms for learning decision trees, in-          knowledge on mushrooms with green GillColor and bell-like
cluding ID3, C4.5, C5.0 [Quinlan, 1986; Quinlan, 2014;                  CapShape can be expressed as the following propositional as-
Quinlan, 2007]. Greedy metrics, such as Information Gain,               sertion:
are used to order the nodes in the decision tree during the             (CapShape = bell ∧ GillColor = green) ⇒ P oisonous
construction. For example, the decision tree developed using
ID3 for the dataset of Table 1 is shown in Figure 1a. This              This paper studies the augmentation of decision trees with
    ∗
     The authors like to thank DST, Govt of India, for partial finan-   such safety assertions. We show that naive post-facto pro-
cial support of this project.                                           cessing of a decision tree with the given assertions adversely
     Copyright c 2020 for this paper by its authors. Use permitted      affects the size of the decision tree. For example, such an
under Creative Commons License Attribution 4.0 International (CC        augmentation with the above assertion leads to the decision
BY 4.0).                                                                tree of Figure 1b, where the average number of variables to
                                                                          paths of the decision tree where the decision is safe any-
    CapShape       CapColor       GillColor     Poisonous                 way.
                                                                      2. Analyzing safety assertions after using the decision tree.
    Bell           Pink           Green         Poisonous                We may analyze the decision recommended by the de-
    Bell           Pink           White         Poisonous                cision tree against the safety properties. For example,
    Bell           Pink           Gray          Poisonous                the decision tree of Figure 1a recommends us to eat
    Convex         Pink           Gray          Poisonous                mushrooms having white CapColor and bell-like Cap-
    Convex         Pink           Brown         Poisonous                Shape, whereas the safety assertion tells us that mush-
    Convex         White          Brown         Poisonous                rooms with bell-like CapShape and green GillColor are
    Convex         White          White         Poisonous                poisonous. We must therefore not go by the decision
    Convex         White          Gray          Poisonous                recommended by the decision tree, but examine the Gill-
    Convex         Yellow         Brown         Edible                   Color. This post-facto safety augmentation is shown in
    Convex         Yellow         Gray          Edible                   Figure 1b.
    Convex         Yellow         White         Edible               For a leaf node, v, of the decision tree, let π(v) denote the
    Bell           Yellow         White         Edible               conjunction of predicates leading to v, and let dec(v) denote
    Bell           Yellow         Gray          Edible               the decision recommended at v. For example, in Figure 1a, if
    Bell           Yellow         Brown         Edible               v is the second leaf node from the left, then dec(v) = edible
    Bell           White          Brown         Edible               and:
    Bell           White          Gray          Edible
    Bell           White          White         Edible                    π(v) = CapColor = white ∧ CapShape = bell

Table 1: A section of the mushroom dataset [Dua and Graff, 2017]     For an assertion, ϕ, let ant(ϕ) denote the antecedent of ϕ.
                                                                     For example, for the assertion, ϕ:
                                                                     (CapShape = bell ∧ GillColor = green) ⇒ P oisonous
be examined before reaching a decision is significantly more
than that of Figure 1a.                                                 ant(ϕ) = (CapShape = bell ∧ GillColor = green)
   We present a methodology for biasing the information gain
metric based on the safety assertions to compensate for the          Lemma 2.1 Safety augmentation is necessary in a leaf node
missing information in the dataset. This leads to considerable       v of a decision tree iff π(v) ∧ ant(ϕ) is satisfiable, but dec(v)
improvement in the safety-augmentation process. The modi-            refutes the consequent of ϕ.
fied decision tree using our proposed methodology is shown           Proof: If π(v) ∧ ant(ϕ) is False (unsatisfiable), then no
in Figure 2. The average number of variables to be examined          safety augmentation is needed at v because the antecedent of
before reaching a decision in this tree is significantly better      ϕ does not match on this path of the decision tree. If dec(v)
than that of Figure 1b.                                              satisfies the consequent of ϕ, then no safety augmentation
   The paper is organized as follows. Section 2 outlines             is needed at v because the decision tree recommends a safe
the post-facto safety augmentation methodology. Section 3            decision anyway. Therefore it suffices to consider only the
presents the proposed integrated safety augmentation ap-             cases where π(v) ∧ ant(ϕ) is satisfiable, but dec(v) refutes
proach, and Section 4 outlines multi-assertion extensions.           the consequent of ϕ. The necessity is obvious since dec(v)
Experimental results on benchmark datasets are presented in          refutes the consequent of ϕ, which will be a wrong decision
Section 5. Related work and concluding remarks are given in          for the cases satisfying π(v) ∧ ant(ϕ).
sections 6 and 7 respectively.                                         The methodology for post-facto safety augmentation in the
                                                                     necessary leaf nodes identified by Lemma 2.1 is as follows:
2     Post-Facto Safety Augmentation                                   • Let η(ϕ) denote the set of variables referenced in
                                                                         ant(ϕ) but not present in π(v). For example, for
This section examines the merit of considering the safety as-            the rightmost leaf node of Figure 1a, eta(ϕ) =
sertions after the decision tree has been constructed. We con-           {CapShape, GillColor}.
sider the following two options:
                                                                       • A sequence of nodes corresponding to the variables in
    1. Analyzing safety assertions before using the decision             η(ϕ) replaces the leaf node. This sequence is shown in
       tree. This is typically not a good option in practice.            red in Figure 1b.
       Safety-critical scenarios often have specific attributes
       which need not be examined if the decision is safe any-         • Decisions corresponding to the remaining branches of
       way. For example, suppose a safety property specifies             the new subtrees are recomputed using the decision tree
       that: sulfa drugs should not be administered unless a             learning algorithm.
       sensitivity test rules out allergic reaction. It does not     The need for recomputing the decisions in the new branches
       make sense to perform a sensitivity test for allergic reac-   of the decision tree arises when the decision taken at the orig-
       tions to sulfa drugs before the decision tree recommends      inal node was due to the statistical majority and not due to
       such a drug. In other words, examining all the predicates     unanimity. In order to avoid overfitting, decision tree learn-
       in the antecedent of an assertion is redundant in those       ing algorithms can take a specific decision at a node when the
    (a) Decision Tree from Table 1.                           (b) Decision Tree after Post-facto Safety Augmentation.

                                           Figure 1: Decision trees for mushroom classification.




Figure 2: Decision Tree constructed from the rectified training dataset augmented by the support dataset of safety property by ID3 algorithm.


samples in favor of that decision is in overwhelming major-              various values that the root variable can take. Formally, the
ity. The safety augmentation splits these samples into the new           information gain, IG(a|t), of attribute a at the end of branch
branches of the decision tree, and in some of the branches,              t is:
the overwhelming majority may not hold anymore. Hence it                                              l
is necessary to run the decision tree learning algorithm on the
                                                                                                      X N (t ∧ a[vj ])
                                                                            IG(a|t) = E(a|t) −                           E(t ∧ a[vj ])   (1)
new branches.                                                                                         j=1
                                                                                                             N (t)

3    Integrated Safety Augmentation                                      where:
                                                                           • E(a|t) is the entropy of the attribute a at the end of the
The post-facto safety augmentation approaches of Section 2                   current branch t,
do not consider the information gain resulting out of the
safety assertions, and may, therefore, produce safe decision               • l is the cardinality of attribute a’s domain,
trees which are suboptimal. For example, the decision tree                 • N (t ∧ a[vj ]) is the count of all decision variable class
shown in Figure 2 is also a safe decision tree, but it uses fewer            after branch t and attribute a’s branch a[vj ]. Here a[vj ]
variables in the rightmost branch as compared to Figure 1b.                  is the j th value in attribute a’s domain.
Specifically, the tree of Figure 2 takes advantage of the fact
that GillColor is sufficient to separate the edible and poi-             Also:
                                                                                                 k
sonous mushrooms when CapColor is yellow. Such choices
                                                                                                X
                                                                                       E(t) =       (−1).p(i|t).log(p(i|t))         (2)
can be made when the safety augmentation is integrated with
                                                                                                i=1
the learning algorithm of the decision tree.
   We elucidate the integrated safety augmentation approach              where, k is the number of different decision variable classes
by considering the ID3 algorithm [Quinlan, 1986]. Given a                and p(i) is the probability of decision variable class i, and
set of training samples that are individually tagged with the            therefore:
decision, the key step of the algorithm is to choose the vari-                                   T D(i|t)        T D(i|t)
                                                                                     p(i|t) = Pk               =                    (3)
able at the root of the decision tree based on an information                                        T D(j|t)     T D(t)
                                                                                                j=1
gain metric. Once the root is chosen, the algorithm is recur-
sively applied on each branch of the root corresponding to the           where,
   • T D is the Training Dataset                                   ID3 algorithm chooses the attributes corresponding to each
   • T D(i|t) is the count of decision variable i,                 node in the decision tree using the information gain met-
                                                                   ric, and this metric only requires the count of the samples
   • T D(t) is the size of the total dataset at the end of the     having a specific decision for each value in the domain of
      branch t.                                                    an attribute. The proposed methodology is implemented by
At the root of the decision tree, branch t will be empty. The      manipulating these counts based on our understanding of
ID3 algorithm calculates the Information Gain for all the at-      Aug(ϕ). Formally, we replace Equation 3 with the follow-
tributes using the above formulae and selects the attribute        ing:
having the highest Information Gain as the root node. Each                                   N (i|t)       N (i|t)
value of the chosen attribute becomes the branch of the root                     p(i|t) = Pk             =                 (4)
                                                                                            j=1 N  (j|t)    N (t)
node. The training dataset is then divided corresponding to
the decision tree branches. Then the same procedure is re-         where N (i|t) it the count of the decision variable class i after
cursively applied on each of the branches with the remaining       the branch t and N (t) is the count of all decision variable
attributes and the data items, until all the data items have the   classes after the branch t. As we are also considering the
same decision value (or we choose to prune the tree at that        count from Aug(ϕ), we have:
branch by choosing majority decision). We then add a leaf
node with the decision value.                                                        N (i|t) = R(i|t) + S(i|t)                  (5)

3.1   The Principle of Safety Augmentation                         where, R(i|t) is the count of decision variable class i in the
                                                                   input dataset after the branch t, and S(i|t) is the count of
The reason why a decision tree constructed out of a given data     decision variable class i corresponding to ϕ after the branch
set violates a safety assertion is that samples corresponding to   t.
the counter-examples were not present in the data. We may             The modified information gain is calculated thereafter in
address this issue by augmenting the data set with additional      the standard way:
samples that are relevant to the assertion. For example, con-
sider the mushroom data set of Table 1 and the assertion, ϕ:                                 l
                                                                                             X N (t ∧ a[vj ])
(CapShape = bell ∧ GillColor = green) ⇒ P oisonous                    IG(a|t) = E(a|t) −                        E(t ∧ a[vj ])   (6)
                                                                                             j=1
                                                                                                      N (t)
Table 1 does not contain rows for all combinations of
attributes where CapShape = bell and GillColor = green.            3.3   Does Safety Augmentation Create Bias?
Suppose we add the following rows in Table 1:                      Post-facto safety augmentation does not affect the relative po-
                                                                   sitions of the attributes in the decision tree, since the method
 CapShape       CapColor      GillColor     Poisonous
                                                                   splits those leaf nodes where the decision contradicts the
 Bell           White         Green         Poisonous              safety requirement, while the other nodes in the tree are not
 Bell           Yellow        Green         Poisonous              disturbed. The integrated safety augmentation method pro-
   Adding these rows into the data set will ensure that the        duces a more optimal decision tree by augmenting the safety
decision tree learned using ID3 is safe with respect to the as-    into the information gain metric and then allowing the ID3
sertion, ϕ. Formally, let A = {A1 , . . . , An } denote the set    algorithm to proceed with the modified information gain. In
of attributes in the data set, and Q ⊆ A denote the set of at-     this section, we find out whether this augmentation introduces
tributes referred in ant(ϕ). Let dom(Ai ) denote the set of        a bias in the input, thereby affecting the choice among the
values of attribute Ai . The safety augmentation data corre-       other attributes in the system.
sponding to ϕ is defined as follows:                                  Let us return to our safety assertion, ϕ:
Aug(ϕ) = {hv1 , . . . , vn , vd i | ∀i, vi ∈ dom(Ai ) and          (CapShape = bell ∧ GillColor = green) ⇒ P oisonous
           hv1 , . . . , vn i satisfies ant(ϕ) and the decision
           variable vd satisfies the consequent of ϕ }             As explained in Section 3.1, the proposed integrated method
                                                                   will affect the count corresponding to the following cases:
Lemma 3.1 The decision tree constructed by the ID3 algo-
rithm on the data set augmented with Aug(ϕ) is safe with            CapShape       CapColor        GillColor    Poisonous
respect to assertion ϕ.
                                                                    Bell           White           Green        Poisonous
Proof sketch: Let us assume the contrary. Then there ex-
ists a leaf node v in the decision tree such dec(v) refutes the     Bell           Yellow          Green        Poisonous
consequent of ϕ and π(v) ∧ ant(ϕ) is satisfiable. But, in             The augmentation methodology inherently assumes that
the augmented data set, by the definition of Aug(ϕ), at least      these types of mushrooms, which are not represented in the
50% of the samples at v have a decision which does to satisfy      data, are actually possible. This need not be true, and in
dec(v). Therefore, ID3 can never prune the tree at v with a        nature, we may not have a mushroom of yellow CapColor,
decision which disagrees with 50% of the samples.                  which has a bell-like CapShape and green GillColor, as re-
                                                                   quired by ant(ϕ). Formally, safety augmentation assumes
3.2   The Safety Augmentation Methodology                          that the attributes referenced in ant(ϕ) are independent of
The proposed safety augmentation methodology does not ac-          each other and that all combinations of values of these at-
tually add Aug(ϕ) explicitly in the data set. Essentially the      tributes can occur.
(a) Information Gain for the attribute C and D of the toy example (b) Information Gain for the attribute C and D of the toy example
without considering the support dataset.                          considering the support dataset.

        Figure 3: Change in the Information Gain for the attribute C and D of the toy example for considering the support dataset.


   Such an assumption may create a bias in the input to the            combination of both. In this section, we discuss these variants
decision tree learning algorithm. This is explained with the           and show that the principle of safety augmentation presented
case shown in Figure 3. We consider the information gain               in the previous section can be applied to all such cases.
of attributes C and D in a setting where C, D, and the de-
                                                                          • Causal versus Diagnostic Assertions. It is well known
cision variables are all Boolean. Figure 3a shows the case
                                                                            in the literature that in logic causal and diagnostic rules
when no safety augmentation has been done. In this case, at
                                                                            are inter-convertible. For example, the causal rule:
a branch of the decision tree, we have 10 samples where the
                                                                            Cavity ⇒ Toothache can be rewritten as a diagnostic
decision is True and 8 samples where the decision is False.
                                                                            rule: ¬ Toothache ⇒ ¬ Cavity. We have presented
Figure 3a shows the pattern in which the attributes C and
                                                                            the safety augmentation methodology with assertions in
D splits the samples with the decisions True and False. Ac-
                                                                            causal form, where the consequent is a predicate over
cordingly, IG(C) and IG(D) are computed as shown, and
                                                                            the decision variable. If the assertion is given in another
therefore, ID3 prefers D over C as the next attribute.
                                                                            form, it can be rewritten in the causal form and then used
   Figure 3b shows the case when safety augmentation is used                in our methodology.
to modify the counts. In this case, the safe decision is False,
hence only the counts of samples with decision False are af-              • Assertions with the same consequent. Given two asser-
fected by the augmentation. The original ratio of (10 : 8)                  tions of the form α1 ⇒ β and α2 ⇒ β, we can perform
has now become (10 : 12). Figure 3b shows the ratios of                     the safety augmentation in any order, or together by aug-
True/False decision when split by the the attributes C and D.               menting the counts with the ways of satisfying α1 ∨ α2 .
Accordingly, IG(C) and IG(D) are computed as shown, and
                                                                          • Assertions with different consequents. Given two as-
therefore, now ID3 prefers C over D as the next attribute.
                                                                            sertions of the form α1 ⇒ β1 and α2 ⇒ β2 , where
   The important thing to note here is that such an inversion               β1 6= β2 , we can perform the safety augmentation in
of priority between attributes like C and D can happen even                 any order. In this case we additionally need to ensure
when they do not belong to the antecedent of an assertion, φ.               that α1 ∧ α2 is unsatisfiable, otherwise the safety asser-
For any attribute A 6∈ ant(φ), the count of the samples cor-                tions contradict each other. This is done using a standard
responding to the safe decision will be equal on all branches               SAT engine.
leading out of a node for A. In the specific case outlined
above, C and D were not in ant(φ), and therefore the in-               In general, we may also have assertions of the form α ⇒ β,
crease in the counts of the False decision in both the left and        where neither α, nor β contains the decision variable. Such
right branches of these nodes are equal, that is, 2. Yet the           assertions convey additional information about the variables,
priority inversion takes place, indicating the introduction of         which may not be explicit in the data. Such assertions can
bias.                                                                  also influence the counts during safety augmentation. For ex-
   We believe that enforcing safety always comes at the ex-            ample, suppose the safety assertion states:
pense of bias. In reality, it is often not possible to envis-
age all practicable ways in which a property can fail. Some             (CapShape = bell ∧ GillColor = green) ⇒ P oisonous
combinations of values for the variables in the antecedent of          In the absence of any other information, as discussed earlier,
an assertion may not be possible, but this is not known to             we augment two cases, corresponding to CapColor of White
us. Therefore, the safety augmentation must assume that all            and Yellow. Now suppose we are given the assertion:
combinations are possible, even at the expense of the bias it
introduces. It is theoretically possible to examine ways to                   (GillColor = Green) ⇒ (CapColor = P ink)
minimize the effects of this bias on the construction of the
decision tree, but that is beyond the scope of this paper.             This assertion does not provide any information regarding
                                                                       the safety of mushrooms, but it tells us that mushrooms with
4   Multi-Assertion Safety Augmentation                                green GillColor always have pink CapColor. With this in-
                                                                       formation, the augmentation count drops from 2 to 0. The
In general, safety may be defined in terms of multiple asser-          safety augmentation methodology computes the augmenta-
tions. Moreover, assertions may be causal, diagnostic, or a            tion counts accordingly.
    Dataset         ID                                                 Assertion
    Breast Cancer   1      (Age = (30 -39 ) ∧ Tumor -Size = (30 -34 ) ∧ Irradiation = Yes) ⇒ Recurrence-Events
                    1      (Cap-Shape = Bell ∧ Gill -Color = Green) ⇒ Poisonous
    Mushroom
                    2      (Stalk -color -above-ring = Bell ∧ Stalk -color -below -ring = Green) ⇒ Poisonous
    Nursery         1      (Student-Health = Not-Recommended ) ⇒ Not-Recommended
                    1      (top-left-square = o ∧ top-middle-square = o ∧ top-right-square = o) ⇒ x -losses
    Tic-Tac-Toe     2      (middle-left-square = o ∧ middle-middle-square = o ∧ middle-right-square = o) ⇒ x -losses
                    3      (bottom-left-square = o ∧ bottom-middle-square = o ∧ bottom-right-square = o) ⇒ x -losses

                                              Table 2: Assertions for Benchmark Datasets

                              Original Decision Tree        Post-facto Safety Augmentation        Integrated Safety Augmentation
    Dataset
                    Prop             Total Runtime                    Total     Runtime                     Total     Runtime
                             Depth                          Depth                                 Depth
                     ID             Nodes      (Sec)                 Nodes       (Sec)                     Nodes       (Sec)
    Breast Cancer     1        8       179       0.014        8         202          0.001           7        181          0.012
                      1        5        29       0.284        7         281          0.002           6         60          0.346
    Mushroom
                      2        5        29       0.284        7         281          0.002           6         36          0.375
    Nursery           1        9       803       0.275        9         803          0.001           9        803          0.260
                      1        8       343       0.034        8         343          0.001           8        318          0.035
    Tic-Tac-Toe       2        8       343       0.034       10         463          0.002           8        363          0.036
                      3        8       343       0.034       10         514          0.002           8        310          0.035

                                   Table 3: Comparison of runtimes and dimensions of decision trees


5     Experimental Results                                           2005] for helping National Health Service (NHS) managers
We provide experimental results on four benchmark datasets           in the United Kingdom to determine a fair and consistent
from the UCI Machine Learning Repository [Dua and Graff,             course of action towards staff involved in patient safety in-
2017], namely Breast Cancer [Zwitter and Soklic, 1988],              cidents. All these trees are either manually created by the do-
Mushroom , Nursery, and Tic-Tac-Toe. The assertions are              main experts or by using domain-specific tools. There exists
shown in Table 2 were designed based on relevance and have           learning algorithms and tools for building decision trees, in-
various degrees of support in the dataset.                           cluding ones that are robust against malicious attacks such as
   Table 3 compares the depth and the total number of nodes          data poisoning [Alfeld et al., 2016] and evasion attacks [Big-
in the decision trees created without safety augmentation with       gio et al., 2013]. The tool called Antidote [Drews et al.,
those created with post-facto and integrated safety augmen-          2019] is used to check the robustness of decision tree learn-
tation. Also shown are the runtimes, for creating the deci-          ing against data poisoning attacks, where an attacker can in-
sion trees without safety augmentation, the additional time          ject several malicious elements into the training set to influ-
for post-facto safety augmentation, and for creating the deci-       ence the learned model. The decision tree learning algorithm
sion trees with integrated safety augmentation.                      Treant [Calzavara et al., 2019] generates decision tree ensem-
   The effect of safety augmentation on the decision trees is        bles that are accurate and nearly insensitive to evasion attacks
dependent on the missing support for the safety assertions in        based on a formal threat model, and minimizes an evasion-
the datasets. For example, the safety property for the Nurs-         aware loss function at each step of the tree construction. In
ery dataset has most of the support data in the original sup-        contrast, our approaches build decision trees where some of
port data and therefore requires negligible augmentation. On         the branches are responsible for predicting critical decisions,
the other hand, support for the assertions over the mushroom         are modified according to the defined safety constraints, and
dataset requires considerable augmentation, resulting in sig-        thus provides the robustness against those defined safety con-
nificant changes due to safety augmentation. Not surpris-            straints.
ingly, integrated safety augmentation produces decision trees
with fewer nodes and lesser depth as compared to post-facto          7    Conclusion
safety augmentation.
   The runtimes indicate that the overhead of safety augmen-         Safety augmentation is one of the primary requirements in
tation is marginal in most cases.                                    structures learned from data using machine learning tech-
                                                                     niques, especially when the learned function is used in a
                                                                     safety critical context. This paper examines the safety aug-
6     Related Work                                                   mentation problem for decision trees, which are popular in
Decision trees and similar structures are already in use in crit-    practice. We present the first methodology for safety aug-
ical safety domains. This includes Culpability Tree [Reason,         mentation in decision trees where the safety requirement is
2016] for dealing with staff involved in safety errors in the        expressed in terms of assertions. Our results indicate that
aviation industry, Incident Decision Trees [Meadows et al.,          augmenting the information gain metrics with knowledge
gleaned from the safety assertions yields safe decision trees       np-complete. Information processing letters, 5(1):15–17,
which are considerably smaller than ones obtained by post-          1976.
facto safety augmentation.                                       [Meadows et al., 2005] Sandra Meadows, Karen Baker, and
                                                                    Jeremy Butler. The incident decision tree: guidelines for
References                                                          action following patient safety incidents. In Advances in
[Alfeld et al., 2016] Scott Alfeld, Xiaojin Zhu, and Paul Bar-      Patient Safety: From Research to Implementation (Volume
   ford. Data poisoning attacks against autoregressive mod-         4: Programs, Tools, and Products). Agency for Healthcare
   els. In 30th AAAI Conference on Artificial Intelligence,         Research and Quality (US), 2005.
   2016.                                                         [Murthy, 1998] Sreerama K Murthy. Automatic construc-
[Biggio et al., 2013] Battista Biggio, Igino Corona, Davide         tion of decision trees from data: A multi-disciplinary sur-
   Maiorca, Blaine Nelson, Nedim Šrndić, Pavel Laskov,            vey. Data mining and knowledge discovery, 2(4):345–389,
   Giorgio Giacinto, and Fabio Roli. Evasion attacks against        1998.
   machine learning at test time. In Joint European con-         [Peled et al., 1999] Doron A. Peled, Edmund M. Clarke, and
   ference on machine learning and knowledge discovery in           Orna Grumberg. Model Checking. MIT Press, 1999.
   databases, pages 387–402. Springer, 2013.                     [Quinlan, 1986] J. Ross Quinlan. Induction of decision trees.
[Breiman, 2017] Leo Breiman. Classification and regression          Machine learning, 1(1):81–106, 1986.
   trees. Routledge, 2017.                                       [Quinlan, 2007] JR Quinlan. C5. http://rulequest.com, 2007.
[Calzavara et al., 2019] Stefano Calzavara, Claudio Lucch-       [Quinlan, 2014] J Ross Quinlan. C4.5: programs for ma-
   ese, Gabriele Tolomei, Seyum Assefa Abebe, and Salva-            chine learning. Elsevier, 2014.
   tore Orlando. Treant: Training evasion-aware decision
   trees. arXiv preprint arXiv:1907.01197, 2019.                 [Reason, 2016] James Reason. Managing the risks of orga-
                                                                    nizational accidents. Routledge, 2016.
[Drews et al., 2019] Samuel Drews, Aws Albarghouthi, and
   Loris D’Antoni. Proving data-poisoning robustness in de-      [Song and Ying, 2015] Yan-Yan Song and LU Ying. Deci-
   cision trees. arXiv preprint arXiv:1912.00981, 2019.             sion tree methods: applications for classification and pre-
                                                                    diction. Shanghai archives of psychiatry, 27(2):130, 2015.
[Dua and Graff, 2017] Dheeru Dua and Casey Graff. UCI
                                                                 [SVA, 2018] SVA. (systemverilog assertions). IEEE Std
   machine learning repository, 2017.
                                                                    1800-2017, pages 1–1315, Feb 2018.
[Laurent and Rivest, 1976] Hyafil Laurent and Ronald L
                                                                 [Zwitter and Soklic, 1988] Matjaz Zwitter and Milan Soklic.
   Rivest. Constructing optimal binary decision trees is
                                                                    Breast cancer data set, 1988.