<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>Workshop on Artificial Intelligence and Formal Verification, Logics, Automata and Synthesis (OVERLAY),
Rende, Italy, November</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Robustness Verification of Decision Tree Ensembles</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Francesco Ranzato</string-name>
          <email>1ranzato@math.unipd.it</email>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Marco Zanella</string-name>
          <email>2mzanella@math.unipd.it</email>
        </contrib>
      </contrib-group>
      <pub-date>
        <year>2019</year>
      </pub-date>
      <volume>1</volume>
      <fpage>9</fpage>
      <lpage>20</lpage>
      <abstract>
        <p>We study the problem of formally and automatically verifying robustness properties of decision tree ensemble classifiers such as random forests and gradient boosted decision tree models. A recent stream of works showed how abstract interpretation can be successfully deployed to formally verify neural networks. In this work we push forward this line of research by designing a general abstract interpretationbased framework for the formal verification of robustness and stability properties of decision tree ensemble models. Our method may induce complete robustness checks of standard adversarial perturbations or output concrete adversarial attacks. We implemented our abstract verification technique in a tool called silva, which leverages an abstract domain of not necessarily closed real hyperrectangles and is instantiated to verify random forests and gradient boosted decision trees. Our experimental evaluation on the MNIST dataset shows that silva provides a precise and efficient tool which advances the current state-of-the-art in tree ensembles verification.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>
        Adversarial machine learning [
        <xref ref-type="bibr" rid="ref12 ref2 ref9">2, 9, 12</xref>
        ] is a hot topic studying vulnerabilities of machine learning (ML)
in adversarial scenarios. Adversarial examples have been found in diverse application fields of ML such
as image classification, spam filtering, malware detection, and the current defense techniques include
adversarial model training, input validation, testing and automatic verification of learning algorithms
[
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. Formal verification of ML classifiers started to be an active field of investigation, in particular for
robustness properties of (deep) neural networks. A classifier is stable for some (typically very small)
perturbation of its input objects which represents an adversarial attack when it assigns the same class to
all the samples within that perturbation, so that impercetible malicious alterations of input objects should
not deceive a stable classifier. Formal verification methods for neural networks may rely on a number of
different techniques: linear approximation of functions [
        <xref ref-type="bibr" rid="ref25 ref26">25, 26</xref>
        ], semidefinite relaxations [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ], logical SMT
solvers [
        <xref ref-type="bibr" rid="ref10 ref11">10, 11</xref>
        ], symbolic interval propagation, [
        <xref ref-type="bibr" rid="ref23">23</xref>
        ] abstract interpretation [
        <xref ref-type="bibr" rid="ref17 ref18 ref8">8, 17, 18</xref>
        ] or hybrid synergistic
approaches [
        <xref ref-type="bibr" rid="ref1 ref19 ref24">1, 19, 24</xref>
        ]. Abstract interpretation [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] is a de facto standard technique used since forty years
for designing static analysers and verifiers of programming languages. Recently, abstract interpretation has
been successfully applied for designing precise and scalable robustness verification tools of (deep) neural
network models [
        <xref ref-type="bibr" rid="ref17 ref18 ref8">8, 17, 18</xref>
        ]. While all these verification techniques consider neural networks as ML model,
in this work we focus on decision tree ensemble methods, such as random forests and gradient boosted
decision tree models, which are widely applied in different fields having sensible adversarial scenarios,
notably image classification, malware detection, intrusion detection and spam filtering. It is only very
recently that adversarial attacks and robustness issues of decision tree ensembles started to be a subject
of investigation [
        <xref ref-type="bibr" rid="ref20 ref21 ref22 ref4 ref5 ref7">4, 5, 7, 20, 22, 21</xref>
        ].
      </p>
      <p>
        Contributions: Following the aforementioned stream of works applying abstract interpretation
for certifying ML models, we design a general abstract interpretation-based framework for the formal
verification of stability properties of decision tree ensemble models. Our verification algorithm of ensembles
of decision trees: (1) is domain agnostic, since it can be instantiated to any abstract domain which
represents properties of real vectors, such as simple hyperrectangles of real intervals or involved linear
relations; (2) is firmly based on the basic soundness principle of abstract interpretation and correspondingly
rely on sound approximations of functions used in decision tree classifiers; (3) under certain assumptions
may induce complete robustness checks against adversarial perturbations; (4) is able to output concrete
adversarial samples. Our formal verification methodology has been implemented in C in a tool called
silva (Latin for forest) which leverages an abstract domain of possibly open real hyperrectangles and
has been applied to random forests and gradient boosted decision trees. Our experimental evaluation on
the standard MNIST dataset shows that silva provides a precise and efficient tool both for deriving the
robustness metrics of forest classifiers and for performing a complete check of the robustness to adversarial
perturbations of input samples. Our evaluation also compared the performance of silva against a recent
robustness verification tool of tree ensembles [
        <xref ref-type="bibr" rid="ref22">22</xref>
        ], and this showed that silva improves on the current
state-of-the-art. As an example, the following three images:
show: on the left, an original image O from the test set of MNIST, correctly classified as 7 by a random
forest using 100 decision trees with maximum depth 100; in the middle, an image A which is automatically
generated by silva as adversarial attack of O and obtained as output of the robustness verification of O
for a perturbation ±1 of the brightness values of its pixels; on the right, an image showing which pixels
were changed from O to A (gray means unchanged, black +1, white −1). The image A is an adversarial
example because it is classified as either 2 or 7 (i.e., there is a tie between the scores of 2 and 7) by the
same 100×100 random forest which classifies O as 7.
2
      </p>
    </sec>
    <sec id="sec-2">
      <title>Experimental Evaluation</title>
      <p>
        We implemented a tool named silva (Latin word for forest which stands for Silvarum Interpretation Valens
Lator Analysis, efficient analyzer of forests by (abstract) interpretation) whose source code in C (about
5K LOC) is publicly available on GitHub [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ].
      </p>
      <p>The underlying algorithm, whose details are omitted here, takes a shape representing a perturbation
applied to a sample, then iteratively decomposes it into finer abstract partitions until it manages to prove
(or disprove) stability. This procedure is guaranteed to converge due to monotonicity of the refinement
step, hence silva is complete and returns a definitive answer for every sample.</p>
      <p>
        We used silva for inferring stability of random forest classifiers on MNIST, which have been trained
with different combinations of number of trees, maximum tree depth, splitting criteria (Gini and entropy
impurity measures) and voting schemes (average and max). We consider a classifier stable on a sample
with respect to a perturbation whenever every point in the perturbed region is labeled as the original
sample. We also consider the four possible interaction between accuracy and stability, hence we say
classifier on a sample is robust when it is correct and stable, vulnerable when it is wrong but stable, fragile
when a it is correct but unstable and broken when it is wrong and unstable. We compared the performance
of silva against a recent robustness verification tool of tree ensembles called VoTE [
        <xref ref-type="bibr" rid="ref22">22</xref>
        ], both for random
forests and gradient boosted decision trees. In our experiments RFs have been trained by scikit-learn
while CatBoost has been used for GBDTs. Since all these forest classifiers rely on univariate hard splits of
type xi ≤ k, silva has been instantiated to the hyperrectangle abstract domain H. Our experiments were
run on a AMD Ryzen 7 1700X 3.0GHz CPU.
      </p>
      <p>Setup: An automatic verifier Ver of robustness properties of a ML classifier C could be used for two
main purposes:
P1: to assess the robustness properties of C by inferring some stability metrics on a large test set T .
P2: to check the robustness of C on an input sample x ∈ X against some adversarial perturbation Prt(x);
when Ver infers that C is not robust on x then Ver should also output a (set of) counterexample(s)
in Prt(x);
A complete verifier such as silva always outputs a True/False answer for each input sample, provided
that computational time and memory constraints are met. When a complete verifier Ver is used for
reaching a purpose of type P1, it makes sense to add a timeout option to Ver to set a time limit for
verifying the stability of an input sample x, so that if the timeout applies then the stability check on x is
considered to be inconclusive. In our experiments using silva for evaluating the robustness properties of
tree ensemble classifiers, we utilized a timeout per sample, so that the stability metrics may be given by a
percentage ranging within an interval, thus taking into account inconclusive stability checks.</p>
      <p>
        Our experimental evaluation makes use of the standard MNIST dataset. Let us recall that MNIST
consists of 70000 gray scale pictures of hand-written digits (with L = {0, ..., 9}), where each image of
28 × 28 pixels is encoded as a vector of 784 integer values in [0, 255] representing the brightness of a pixel
(0 black, 255 white). The standard training set of MNIST consists of 60000 samples, while its test set T
includes the remaining 10000 samples. We considered the standard perturbation Prt∞, [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] with = 1,
meaning that adversarial attacks may brighten/darken each pixel up to a 0.5% magnitude. Hence, for
each input image x, its perturbation Prt∞,1(x) includes 3784 possible attacks.
      </p>
      <p>Stability of RFs: In a preliminary study we compare accuracy, stability and execution time for
random forests trained with the Gini and entropy criteria and using different voting schemes (average
and max). We observed that forests trained with the entropy criterion tend to be more stable without
negative effects on the accuracy, while difference between the two voting schemes becomes negligible for
forests of realistic size. Based on this assessment, for the successive experiments we considered random
forests trained with the entropy criterion and using the average score, which appears to be an optimal
configuration for the stability metric. Fig.1 depicts the relationship between stability and accuracy, where
darker points represent forests with lower depth d.</p>
      <p>MNIST - Stability vs Accuracy
6505 max-depth</p>
      <p>5
50 1105
% 4450 2550
y
liit 35
b
taS 30
25
20
15
10
75
80
85
90
95</p>
      <p>100</p>
      <p>Accuracy %
The number B of trees of these RFs is not depicted in the above chart and ranges in {5, 10, 15, 25, 50, 75}.
The vertical bars represent intervals of stability due to inconclusive checks within the 1 second per sample
timeout. We observe that RFs having the same maximum depth d tend to cluster together and that for
RFs with d ≤ 10 no clear interpretation of their stability can be derived, thus suggesting that RFs with
d ≤ 10 should not be considered for assessing the stability on MNIST. On the other hand, deeper forests
with d ≥ 15 tend to a vertical alignment, thus revealing a growing stability with comparable accuracy.
Fixed a depth d, we may observe that both accuracy and stability grow with B. It is worth remarking that
for a given accuracy, the highest stability is achieved by RFs with depth d = 25 rather than d = 50, thus
hinting that to increase d beyond some threshold may not positively affect the accuracy while reducing
the stability. This finding shows that by taking into account both accuracy and stability the overall
effectiveness of a random forest model does not necessarily increases with its size. In particular, the
75 × 25 random forest turns out to be the most accurate and stable. Tab.1 also displays the stability-based
metrics defined earlier.</p>
      <p>Here, it is worth remarking that: (i) stability and robustness are closely related; (ii) vulnerability decreases
with the overall size B × d of the RF, and for RFs having size ≤ 50 × 25 we found a significant percentage
of misclassified input samples (≈ 0.4%) with adversarial attacks which are consistently classified with the
same wrong label; (iii) breakage appears to converge towards ≈ 3.5% for larger forests.</p>
      <p>Verification Time per Sample: In a context of using silva as a complete verifier for checking
whether a given input sample x ∈ X is robust against the adversarial perturbation Prt∞,1(x), Tab.2
displays the average verification Time per Sample (TpS), the average verification Time for the Samples
whose verification time is above the 90th percentile of the distribution (TpS10), the Maximum verification
Time per Sample (MTpS). It is worth observing that the worst case verification time of silva never exceeds
1 minute and that the average verification time on the hardest input samples is always less than 5 seconds.</p>
      <p>
        Comparison with VoTE: Tab.3 shows the results of the comparison of silva with VoTE, a recent
robustness verifier for tree ensemble classifiers based on an abstraction-refinement approach [
        <xref ref-type="bibr" rid="ref22">22</xref>
        ]. We
replicated the experiments on the MNIST dataset as described in the paper [
        <xref ref-type="bibr" rid="ref22">22</xref>
        ] and compared the results
in terms of robustness and verification time (on our machine). Each experiment is run on RFs and GBDTs
trained with the same parameters: RFs are trained using scikit-learn with Gini/average parameters, while
GBDTs are trained by CatBoost with default learning rate and softmax voting scheme. We followed [
        <xref ref-type="bibr" rid="ref22">22</xref>
        ]
by setting an overall timeout of 7 hours for the verification time of the full test set of MNIST.
      </p>
      <p>The difference between silva and VoTE on the accuracies of the input RFs and GBDTs is negligible
and due to the randomness of the learning algorithm. The difference on the robustness ratios inferred by
silva and VoTE may reach 3.6% (for the RF 50 × 5): the reasons why this happens are not clear and
would require an in-depth inspection of the VoTE source code. Overall, it turns out that silva is faster
than VoTE: on the larger forests verified by VoTE within 7 hours, silva achieves a speedup factor of
≈ 1.7 on the RF of size 50 × 10 and of ≈ 6.6 on the GBDT of size 100 × 10. The table also shows that by
applying a timeout of 60 seconds per sample on the RF 75 × 5 and the GBDT 150 × 10, silva is able to
output in at most 2.5 hours a rather precise estimate (±0.4%) of the robustness metric for these large RF
and GBDT where VoTE exceeds the 7 hours limit.</p>
    </sec>
    <sec id="sec-3">
      <title>Future Work</title>
      <p>
        We believe that this work represents a step forward in applying formal verification methods to machine
learning models, in particular a very well known program analysis technique such as abstract interpretation.
As a benefit of this principled approach, we singled out the role of abstract interpretation for designing
a complete verifier of robustness properties of decision tree classifiers. We think that more advanced
techniques could be used for abstracting combinations of paths in different decision trees, as those
successfully applied in static program analysis (e.g. trace partitioning [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ]). We also plan to resort to
abstract interpretation in order to train decision tree classifiers which are provably robust, namely, to apply
abstract interpretation to training algorithms rather than to trained classification algorithms, similarly to
the approach of [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] for training provably robust neural networks.
      </p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>G.</given-names>
            <surname>Anderson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Pailoor</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Dillig</surname>
          </string-name>
          , and
          <string-name>
            <given-names>S.</given-names>
            <surname>Chaudhuri</surname>
          </string-name>
          .
          <article-title>Optimization and abstraction: A synergistic approach for analyzing neural network robustness</article-title>
          .
          <source>In Proc. 40th ACM Conf. on Programming Language Design and Implementation</source>
          (PLDI
          <year>2019</year>
          ), pages
          <fpage>731</fpage>
          -
          <lpage>744</lpage>
          . ACM,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>N.</given-names>
            <surname>Carlini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Athalye</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Papernot</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Brendel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Rauber</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Tsipras</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I. J.</given-names>
            <surname>Goodfellow</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Madry</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Kurakin</surname>
          </string-name>
          .
          <article-title>On evaluating adversarial robustness</article-title>
          .
          <source>CoRR</source>
          , abs/
          <year>1902</year>
          .06705,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>N.</given-names>
            <surname>Carlini</surname>
          </string-name>
          and
          <string-name>
            <given-names>D. A.</given-names>
            <surname>Wagner</surname>
          </string-name>
          .
          <article-title>Towards evaluating the robustness of neural networks</article-title>
          .
          <source>In Proc. of 2017 IEEE Symposium on Security and Privacy (SP2017)</source>
          , pages
          <fpage>39</fpage>
          -
          <lpage>57</lpage>
          ,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>H.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. S.</given-names>
            <surname>Boning</surname>
          </string-name>
          , and
          <string-name>
            <given-names>C.</given-names>
            <surname>Hsieh</surname>
          </string-name>
          .
          <article-title>Robust decision trees against adversarial examples</article-title>
          .
          <source>In Proc. 36th Int. Conf. on Machine Learning, ICML 2019</source>
          , volume
          <volume>97</volume>
          , pages
          <fpage>1122</fpage>
          -
          <lpage>1131</lpage>
          ,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>H.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , S. Si,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. S.</given-names>
            <surname>Boning</surname>
          </string-name>
          , and
          <string-name>
            <given-names>C.</given-names>
            <surname>Hsieh</surname>
          </string-name>
          .
          <article-title>Robustness verification of tree-based models</article-title>
          .
          <source>CoRR</source>
          , abs/
          <year>1906</year>
          .03849,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>P.</given-names>
            <surname>Cousot</surname>
          </string-name>
          and
          <string-name>
            <given-names>R.</given-names>
            <surname>Cousot</surname>
          </string-name>
          .
          <article-title>Abstract interpretation: a unified lattice model for static analysis of programs by construction or approximation of fixpoints</article-title>
          .
          <source>In Proc. 4th ACM Symp. on Principles of Programming Languages (POPL</source>
          <year>1977</year>
          ), pages
          <fpage>238</fpage>
          -
          <lpage>252</lpage>
          . ACM,
          <year>1977</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>G.</given-names>
            <surname>Einziger</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Goldstein</surname>
          </string-name>
          ,
          <string-name>
            <surname>Y.</surname>
          </string-name>
          <article-title>Sa'ar, and</article-title>
          <string-name>
            <surname>I. Segall.</surname>
          </string-name>
          <article-title>Verifying robustness of gradient boosted models</article-title>
          .
          <source>In Proc. 33rd AAAI Conf. on Artificial Intelligence</source>
          , pages
          <fpage>2446</fpage>
          -
          <lpage>2453</lpage>
          ,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>T.</given-names>
            <surname>Gehr</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Mirman</surname>
          </string-name>
          ,
          <string-name>
            <surname>D.</surname>
            Drachsler-Cohen,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Tsankov</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Chaudhuri</surname>
            , and
            <given-names>M. T.</given-names>
          </string-name>
          <string-name>
            <surname>Vechev</surname>
          </string-name>
          .
          <article-title>AI2: safety and robustness certification of neural networks with abstract interpretation</article-title>
          .
          <source>In Proc. 2018 IEEE Symposium on Security and Privacy (SP2018)</source>
          , pages
          <fpage>3</fpage>
          -
          <lpage>18</lpage>
          ,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>I.</given-names>
            <surname>Goodfellow</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>McDaniel</surname>
          </string-name>
          ,
          <string-name>
            <surname>and N. Papernot.</surname>
          </string-name>
          <article-title>Making machine learning robust against adversarial inputs</article-title>
          .
          <source>Commun. ACM</source>
          ,
          <volume>61</volume>
          (
          <issue>7</issue>
          ):
          <fpage>56</fpage>
          -
          <lpage>66</lpage>
          ,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>X.</given-names>
            <surname>Huang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Kwiatkowska</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Wang</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M.</given-names>
            <surname>Wu</surname>
          </string-name>
          .
          <article-title>Safety verification of deep neural networks</article-title>
          .
          <source>In Proc. Intern. Conf. on Computer Aided Verification (CAV2017)</source>
          , pages
          <fpage>3</fpage>
          -
          <lpage>29</lpage>
          . Springer,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>G.</given-names>
            <surname>Katz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Barrett</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. L.</given-names>
            <surname>Dill</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Julian</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M. J.</given-names>
            <surname>Kochenderfer</surname>
          </string-name>
          . Reluplex:
          <article-title>An efficient SMT solver for verifying deep neural networks</article-title>
          . In R. Majumdar and V. Kunčak, editors,
          <source>Proc. Intern. Conf. on Computer Aided Verification (CAV2017)</source>
          , pages
          <fpage>97</fpage>
          -
          <lpage>117</lpage>
          . Springer,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>A.</given-names>
            <surname>Kurakin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I. J.</given-names>
            <surname>Goodfellow</surname>
          </string-name>
          , and
          <string-name>
            <surname>S. Bengio.</surname>
          </string-name>
          <article-title>Adversarial machine learning at scale</article-title>
          .
          <source>In Proc. 5th Int. Conf. on Learning Representations (ICLR</source>
          <year>2017</year>
          ),
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>M.</given-names>
            <surname>Mirman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Gehr</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M.</given-names>
            <surname>Vechev</surname>
          </string-name>
          .
          <article-title>Differentiable abstract interpretation for provably robust neural networks</article-title>
          .
          <source>In Proc. Int. Conf. on Machine Learning (ICML 2018)</source>
          , pages
          <fpage>3575</fpage>
          -
          <lpage>3583</lpage>
          ,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>A.</given-names>
            <surname>Raghunathan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Steinhardt</surname>
          </string-name>
          , and
          <string-name>
            <given-names>P.</given-names>
            <surname>Liang</surname>
          </string-name>
          .
          <article-title>Semidefinite relaxations for certifying robustness to adversarial examples</article-title>
          .
          <source>In Proc. Annual Conf. on Neural Information Processing Systems (NeurIPS</source>
          <year>2018</year>
          ), pages
          <fpage>10900</fpage>
          -
          <lpage>10910</lpage>
          ,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>F.</given-names>
            <surname>Ranzato</surname>
          </string-name>
          and
          <string-name>
            <given-names>M.</given-names>
            <surname>Zanella</surname>
          </string-name>
          .
          <article-title>Silva, a tool for robustness verification of decision tree ensembles</article-title>
          . https://github.com/abstract-machine-learning/silva,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>X.</given-names>
            <surname>Rival</surname>
          </string-name>
          and
          <string-name>
            <surname>L. Mauborgne.</surname>
          </string-name>
          <article-title>The trace partitioning abstract domain</article-title>
          .
          <source>ACM Trans. Program. Lang. Syst.</source>
          ,
          <volume>29</volume>
          (
          <issue>5</issue>
          ), Aug.
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>G.</given-names>
            <surname>Singh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Gehr</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Mirman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Püschel</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M. T.</given-names>
            <surname>Vechev</surname>
          </string-name>
          .
          <article-title>Fast and effective robustness certification</article-title>
          .
          <source>In Proc. Annual Conf. on Neural Information Processing Systems</source>
          <year>2018</year>
          , (NeurIPS
          <year>2018</year>
          ), pages
          <fpage>10825</fpage>
          -
          <lpage>10836</lpage>
          ,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>G.</given-names>
            <surname>Singh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Gehr</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Püschel</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M.</given-names>
            <surname>Vechev</surname>
          </string-name>
          .
          <article-title>An abstract domain for certifying neural networks</article-title>
          .
          <source>Proc. ACM Program. Lang., 3(POPL</source>
          <year>2019</year>
          ):
          <volume>41</volume>
          :
          <fpage>1</fpage>
          -
          <lpage>41</lpage>
          :
          <fpage>30</fpage>
          ,
          <string-name>
            <surname>Jan</surname>
          </string-name>
          .
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>G.</given-names>
            <surname>Singh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Gehr</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Püschel</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M. T.</given-names>
            <surname>Vechev</surname>
          </string-name>
          .
          <article-title>Boosting robustness certification of neural networks</article-title>
          .
          <source>In Proc. 7th Int. Conf. on Learning Representations (ICLR</source>
          <year>2019</year>
          ). OpenReview.net,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>J.</given-names>
            <surname>Törnblom</surname>
          </string-name>
          and
          <string-name>
            <given-names>S.</given-names>
            <surname>Nadjm-Tehrani</surname>
          </string-name>
          .
          <article-title>Formal verification of random forests in safety-critical applications</article-title>
          .
          <source>In Proc. 6th Int. Workshop on Formal Techniques for Safety-Critical Systems (FTSCS</source>
          <year>2018</year>
          ), pages
          <fpage>55</fpage>
          -
          <lpage>71</lpage>
          . Springer,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>J.</given-names>
            <surname>Törnblom</surname>
          </string-name>
          and
          <string-name>
            <given-names>S.</given-names>
            <surname>Nadjm-Tehrani</surname>
          </string-name>
          .
          <article-title>An abstraction-refinement approach to formal verification of tree ensembles</article-title>
          .
          <source>In Proc. 2nd Int. Workshop on Artificial Intelligence Safety Engineering</source>
          , held with SAFECOMP. Springer,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <given-names>J.</given-names>
            <surname>Törnblom</surname>
          </string-name>
          and
          <string-name>
            <given-names>S.</given-names>
            <surname>Nadjm-Tehrani</surname>
          </string-name>
          .
          <article-title>Formal verification of input-output mappings of tree ensembles</article-title>
          .
          <source>arXiv preprint arXiv:1905.04194</source>
          ,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <given-names>S.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Pei</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Whitehouse</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Yang</surname>
          </string-name>
          , and
          <string-name>
            <given-names>S.</given-names>
            <surname>Jana</surname>
          </string-name>
          .
          <article-title>Efficient formal safety analysis of neural networks</article-title>
          .
          <source>In Proc. Annual Conference on Neural Information Processing Systems (NeurIPS</source>
          <year>2018</year>
          ), pages
          <fpage>6369</fpage>
          -
          <lpage>6379</lpage>
          ,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [24]
          <string-name>
            <given-names>S.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Pei</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Whitehouse</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Yang</surname>
          </string-name>
          , and
          <string-name>
            <given-names>S.</given-names>
            <surname>Jana</surname>
          </string-name>
          .
          <article-title>Formal security analysis of neural networks using symbolic intervals</article-title>
          .
          <source>In Proc. 27th USENIX Security Symposium</source>
          , pages
          <fpage>1599</fpage>
          -
          <lpage>1614</lpage>
          . USENIX Association,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          [25]
          <string-name>
            <given-names>T.</given-names>
            <surname>Weng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Song</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Hsieh</surname>
          </string-name>
          , L. Daniel,
          <string-name>
            <given-names>D. S.</given-names>
            <surname>Boning</surname>
          </string-name>
          ,
          <string-name>
            <given-names>and I. S.</given-names>
            <surname>Dhillon</surname>
          </string-name>
          .
          <article-title>Towards fast computation of certified robustness for ReLU networks</article-title>
          .
          <source>In Proc. 35th Int. Conf. on Machine Learning</source>
          ,
          <source>(ICML</source>
          <year>2018</year>
          ), pages
          <fpage>5273</fpage>
          -
          <lpage>5282</lpage>
          ,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          [26]
          <string-name>
            <given-names>H.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , T. Weng,
          <string-name>
            <given-names>P.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Hsieh</surname>
          </string-name>
          , and
          <string-name>
            <given-names>L.</given-names>
            <surname>Daniel</surname>
          </string-name>
          .
          <article-title>Efficient neural network robustness certification with general activation functions</article-title>
          .
          <source>In Proc. Annual Conference on Neural Information Processing Systems (NeurIPS</source>
          <year>2018</year>
          ), pages
          <fpage>4944</fpage>
          -
          <lpage>4953</lpage>
          ,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>