<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>Feb</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Formal Verification of Tree Ensembles against Real-World Composite Geometric Perturbations</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Valency Oscar Colaco</string-name>
          <email>valency.colaco@liu.se</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Simin Nadjm-Tehrani</string-name>
          <email>simin.nadjm-tehrani@liu.se</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="editor">
          <string-name>Machine Learning, Formal Verification, Tree Ensembles, Composite Perturbations, Geometric Perturbations, Random Forests,</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Computer and Information Science, Linköping University</institution>
          ,
          <country country="SE">Sweden</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Gradient Boosting Machines</institution>
          ,
          <addr-line>Semantic Perturbations, Stability, Robustness, Trustworthy AI, Trustworthy Computing</addr-line>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Workshop Proce dings</institution>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2023</year>
      </pub-date>
      <volume>1</volume>
      <fpage>3</fpage>
      <lpage>14</lpage>
      <abstract>
        <p>Since machine learning components are now being considered for integration in safety-critical systems, safety stakeholders should be able to provide convincing arguments that the systems are safe for use in realistic deployment settings. In the case of vision-based systems, the use of tree ensembles calls for formal stability verification against a host of composite geometric perturbations that the system may encounter. Such perturbations are a combination of an afine transformation like rotation, scaling, or translation and a pixel-wise transformation like changes in lighting. However, existing verification approaches mostly target small norm-based perturbations, and do not account for composite geometric perturbations. In this work, we present a novel method to precisely define the desired stability regions for these types of perturbations. We propose a feature space modelling process that generates abstract intervals which can be passed to VoTE, an eficient formal verification engine that is specialised for tree ensembles. Our method is implemented as an extension to VoTE by defining a new property checker. The applicability of the method is demonstrated by verifying classifier stability and computing metrics associated with stability and correctness, i.e., robustness, fragility, vulnerability, and breakage, in two case studies. In both case studies, targeted data augmentation pre-processing steps were applied for robust model training. Our results show that even models trained with augmented data are unable to handle these types of perturbations, thereby emphasising the need for certified robust training for tree ensembles.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>CEUR
htp:/ceur-ws.org
ISN1613-073
CEUR</p>
      <p>Workshop Proceedings (CEUR-WS.org)</p>
      <p>ML model could misread these signs and the efects</p>
    </sec>
    <sec id="sec-2">
      <title>1. Introduction</title>
      <p>Recent advancements in machine learning are now being
considered for integration in safety-critical systems like
medical equipment, critical infrastructure, and transport
networks (autonomous vehicles, avionics, spaceflights)
where software flaws could be detrimental to humans
and the environment. While these systems are subject to
strict regulations, a lack of evidence that the machine
learning (ML) models deployed in these systems are safe
for operation, can directly afect their trustworthiness.</p>
      <p>Considering the inherently probabilistic nature of
these models, standard industry practices like software
testing are often unsuitable to provide any guarantees
about operational safety. In this context, to help ensure
trustworthiness, Artificial Intelligence (AI) systems can
benefit from scrutiny through formal methods. Formal
verification can be a tool to analyse the safety properties
of software operating over a multi-dimensional space.</p>
      <p>
        Evidence based on formal verification can be used in a
∗Corresponding author.
could be disastrous. This calls for formal verification recent survey [
        <xref ref-type="bibr" rid="ref19">11</xref>
        ] gives a broad overview of the safety
of these models to ensure classifier stability in the and trustworthiness of deep neural networks with
presence of composite geometric perturbations that the a specific focus on topics like verification, testing,
system may encounter in real-world deployment settings. adversarial defence, and interpretability.
Engstrom et al. [12], show that state of the art
neural network classifiers are vulnerable to natural
classes of perturbations, especially simple translations
and rotations for a significant fraction of their inputs.
      </p>
      <p>Another interesting observation they made is that data
augmentation does not necessarily provide a significant
boost to robustness, which we also confirm for tree
ensembles in this paper.</p>
      <p>
        In this paper, we present a method to precisely
capture these perturbations as abstract intervals to verify
tree-ensemble based classifiers against specified safety
properties. Since our goal is to support safety arguments
in assurance cases, the verifiability of the models is of
paramount importance. The simple structure of the
tree-based models makes them easier to systematically
analyze in the formal verification process. However,
larger models may still prove hard to verify due to the
combinatorial explosion. We present a method where
composite geometric perturbations are applied to input
data and shown to prove a lack of stability even in the
presence of data augmentation during training. This is
demonstrated by applying our method to digit/character
recognition problems. The contributions of this paper
are as follows,
Pei et al. [13] and Mohapatra et al. [14] propose
verification frameworks called VERIVIS and
SemantifyNN for evaluating the robustness of machine learning
systems. The VERIVIS framework checks for violations
of the parameter space (like the range of angles) when
the inputs are subjected to semantic perturbations
while the Semantify-NN framework converts semantic
perturbations to   -norm based threat models thereby
- A method to capture composite geometric perturba- enabling the use of any   -norm based verification
tions based on real-world phenomena (like the com- method for robustness verification of deep neural
bined efects of axial rotations, scaling, or transla- networks. Balunović et al. [15] study the certification
tion together with changes in lighting) in the interval of neural networks against natural geometric
transfordomain through feature space modelling for vision- mations like rotation and scaling. They use sampling,
based systems. interpolation and Lipschitz optimisation to find the
- Realisation of the method, implemented as an exten- bounds for their certification process. The authors
sion to the Verifier of Tree Ensembles [
        <xref ref-type="bibr" rid="ref11">3</xref>
        ] (VoTE), a implement their certification framework in a system
formal verification tool that uses abstract interpreta- called DEEPG that uses a restricted Polyhedra with
tion, along with the application of the method to two custom approximations for the operations used in several
case studies commonly found in the literature. geometric transformations. Our approach, however,
uses a feature space modelling process in conjunction
with an abstraction function to get precise upper and
lower bounds for the inputs as they are perturbed by
real-world elements.
      </p>
      <p>The rest of the paper is structured as follows. Section
2 discusses related works on the verification of tree
ensembles against diferent classes of perturbations.
Section 3 presents the preliminaries on tree-ensembles, the
VoTE toolsuite, and composite geometric perturbations.</p>
      <p>Section 4 presents our abstraction function and feature
space modelling process along with the implementation
in VoTE. Section 5 presents the application of the method
to two case studies commonly found in the literature.</p>
      <p>Section 6 concludes the paper and summarizes the
learnings.</p>
    </sec>
    <sec id="sec-3">
      <title>2. Related Works</title>
      <p>Recent machine learning advances are gradually finding
their way into safety-critical systems. However, there
is a lack of verification techniques to build formal
arguments for operational safety in such systems. There
has been extensive research on formal verification of
  -norm bounded perturbations (mathematical distances
that define the perturbations) in neural networks. A</p>
      <p>Wu et al. [16], study the certification of deep
neural networks against real-world distribution shifts.</p>
      <p>
        They train a generative model to learn perturbations
from the data to improve robustness precision. Yang
et al. [17] mention that geometric image
transformations that arise in the real world, such as scaling and
rotation, have been shown to easily deceive deep neural
networks. They propose a training formulation known
as Certified Geometric Training (CGT) that improves the
deterministic certified robustness of neural networks
with respect to geometric transformations. Gowal et al.
[19], use generative models to systematically transfer
image attributes that are likely to be misclassified across
image instances to achieve better robustness. They
also mention that   -norm bounded perturbations do
not necessarily cover plausible real-world variations
that preserve the semantics of the input. We agree
that composite geometric perturbations based on
tree structure is evaluated in a top-down manner, where
real-world phenomena (that preserve the semantics of
decision functions determine which path to take towards
the input) are outside the   -norm reach. Hence, we
deploy a feature space modelling process for defining
the multidimensional perturbation regions. Also, while
using generative models can help with robust training,
this in itself cannot be used for building up a safety case
the leaves. Each internal node in the decision tree is
associated with a decision function that recursively splits
the input space, thereby separating regions from each
other. When a leaf is hit, the output  ∈ ℝ  associated
with the leaf becomes the output associated with the
in systems that use ML components. Our approach aims input from which the top-down evaluation started.
real-world perturbations like axial rotations in images
ensembles typically use univariate hard decision trees,
in the form of rotation-invariant networks [20] and
for example, scikit-learn [
        <xref ref-type="bibr" rid="ref14">6</xref>
        ] and CatBoost [
        <xref ref-type="bibr" rid="ref15">7</xref>
        ].
      </p>
      <sec id="sec-3-1">
        <title>3.2. Random Forests</title>
        <p>
          Decision trees are known to sufer from a phenomenon
called overfitting in which the models are fitted so
tightly to their training data that they memorize the data
itself instead of learning the patterns associated with
the data in order to make generalized predictions. To
counteract these issues with decision trees concerning
bias and variance, Breiman [
          <xref ref-type="bibr" rid="ref9">1</xref>
          ] proposes Random Forests.
A random forest  ∶  
→ ℝ
 is an ensemble of
 decision trees that produces outputs by averaging the
values emitted by each individual tree. i.e.,
        </p>
      </sec>
      <sec id="sec-3-2">
        <title>3.3. Gradient Boosting Machines</title>
        <p>
          Similar to Random Forests, Freidman [
          <xref ref-type="bibr" rid="ref10">2</xref>
          ] proposes
gradient boosting machines that employ several decision
trees to create a prediction function. Unlike Random
Forests, these trees are trained in a sequential manner
with each succeeding tree correcting the errors made
by the predecessor tree.
        </p>
        <sec id="sec-3-2-1">
          <title>The errors are corrected</title>
          <p>using gradient descent (hence the name).</p>
        </sec>
        <sec id="sec-3-2-2">
          <title>Although conceptually diferent from random forests in a learning context, these two models have several features in common when performing predictions.</title>
          <p>A gradient boosting machine  ∶  
→ ℝ
 is an
to provide formal proof-based evidence to support safety
assurances in these systems.</p>
          <p>
            Outside formal verification and robust training,
significant research has been done to tackle the issue of
rotation-equivariant networks [21]. In Scher et al. [
            <xref ref-type="bibr" rid="ref18">10</xref>
            ],
the authors formally define real-world perturbations and
diferentiate them between adversarial perturbations and
distribution shifts. They model the perturbations around
a point as a probability distribution and succeed in the
estimation of robustness using Monte Carlo Sampling for
low to medium dimensional data. While their method
provides probabilistic guarantees, our approach aims to
provide formal deterministic guarantees which can form
the basis for reasoning about safety in safety-critical
systems.
          </p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>3. Preliminaries</title>
      <p>
        In this section, we present the background knowledge on
tree-based ensembles, semantic perturbations,
interpolated/geometric and pixel-wise transformations, composite
geometric perturbations based on real-world phenomena,
and the toolsuite VoTE [
        <xref ref-type="bibr" rid="ref11">3</xref>
        ].
3.1. Decision Trees
a single output point   ∈ ℝ , i.e.,
chine learning and can be used in vision-based systems to
solve image classification problems (among other
applications). A decision tree model contains rules to predict
the output class and implements a prediction function
 ∶   → ℝ
 that maps disjoint sets of points   ⊂   to
( ) =
⎨
      </p>
      <p>⋮
⎧( 1,1, ...,  1, )  ∈  1
⎩( ,1 , ...,  , )  ∈</p>
      <p>
        In this paper, we only consider binary trees with
linear decision functions with one input variable,
which Irsoy et al. [
        <xref ref-type="bibr" rid="ref12">4</xref>
        ] call univariate hard decision
trees. State-of-the-art implementations of tree-based
Decision trees are employed as prediction models in ma- data (with replacement) using techniques like bagging
Where  is the number of disjoint sets and   = ⋃  
The n-dimensional input domain   includes
elements  as tuples in which each input variable  
captures some feature of interest of the system [
        <xref ref-type="bibr" rid="ref11">3</xref>
        ]. The

=1
(1)
ensemble of  additive decision trees, i.e.,
      </p>
      <sec id="sec-4-1">
        <title>3.4. Classifier</title>
        <p>Decision trees and tree-ensemble models can be used as
classifiers to categorize samples from an input domain
then be defined as,
bel unique to its class. In this paper, we only consider
functions that map each point from an input domain
to exactly one class. Let  ( ) = ( 1, ...,   ) represent
a model trained to predict the probability   associated
with a class  within disjoint regions in the input domain,
where  is the number of classes. A classifier   () may
  () =</p>
      </sec>
      <sec id="sec-4-2">
        <title>3.5. Verifier of Tree Ensembles</title>
        <p>
          VoTE [
          <xref ref-type="bibr" rid="ref11">3</xref>
          ] (Verifier of Tree Ensembles) is a toolsuite for
formally verifying that tree ensembles comply with specific
properties that the user can define through implementing
a property checker.
        </p>
        <p>(3)
(4)
into one or more classes and assign each sample a la- feature by a constant value, semantic perturbations take
in this work, the property checker defined earlier (see
Figure 1) checks for stability. The VoTE Core is instantiated
from the ensemble subject to verification,  ∶   →   .</p>
        <p>It takes as input a hyperrectangle defining   , and the
outcome of the formal verification is pass (when the tree
ensemble satisfies the concerned property) and fail when
it does not.</p>
      </sec>
      <sec id="sec-4-3">
        <title>3.6. Semantic Perturbations</title>
        <p>
          As opposed to perturbations that change the value of each
an input image and  parameters to perform a
parameterized operation that perturbs the image while preserving
its semantic information. Semantics-preserving
perturbations can include object-level shifts (like changing the
shape or size of an object in a color classification
prob′
lem), geometric transformations (scaling, rotations) and
common image corruptions (fog, blurs) [
          <xref ref-type="bibr" rid="ref16">8</xref>
          ]. Most of these
perturbations cannot be naturally represented using the
  -norm (‖  − 
‖
 ≤  ) that most robustness verification
methods are designed for. Using a large value of  to
try and capture these types of perturbations is usually
not recommended in practice as the image quality
degrades significantly [
          <xref ref-type="bibr" rid="ref16">8</xref>
          ]. Hence, we propose a feature
space modelling process to precisely capture these
multidimensional perturbations.
        </p>
      </sec>
      <sec id="sec-4-4">
        <title>3.7. Geometric Perturbations</title>
        <p>
          Geometric Perturbations are a class of semantic
perturbations [
          <xref ref-type="bibr" rid="ref16">8</xref>
          ] that involve an afine transformation on
each pixel’s row and column indices, followed by an
interpolation operation [17]. In this paper, we consider
three types of afine transformations - rotations, scaling
and translation.
        </p>
        <p>Let   ∶ ℝ
2
→</p>
        <p>ℝ2 (in 2D) be an invertible afine
transformation (like rotation) parameterized by  (like
the angle of rotation). Let   () and   () be functions
that transform ,  pixel indices to ,</p>
        <p>co-ordinates with
respect to the center of the image. However, as these
transformed co-ordinates may not align exactly with the
integer-valued pixel indices, bilinear interpolation (  ) is
necessary. Yang et al. [17] define the general form of a
geometric perturbation for an image  as,</p>
        <p>′
 ,
=  (</p>
        <p>−1(  () ,   ()))
Where  ′ is the perturbed image. For each geometric
perturbation, they</p>
        <p>present the inverse transform
function   −1 which is used to instantiate equation (5),</p>
        <p>Rotation, parameterized by angle,  ∈ [0, 2 ]






Scaling, parameterized by a scaling factor,  ∈ ℝ,  &gt; −1 , sic idea is to pass these abstract intervals to the VoTE</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>4. Proposed Method</title>
      <p>(7)</p>
      <p>
        In this section, we present a method to capture composite
geometric perturbations based on real-world phenomena
(like the combined efects of axial rotations and changes
in lighting) in the form of abstract intervals. The
baproperty checker [
        <xref ref-type="bibr" rid="ref11">3</xref>
        ] to verify stability and compute the
extended stability metrics, which will be explained in
      </p>
      <sec id="sec-5-1">
        <title>3.8. Pixel-wise Transformations</title>
        <p>Pixel-wise transformations are defined by the cumulative
efects of brightness and contrast (i.e., simply changes
in lighting) acting on a pixel  , of an image 
the respective brightness and contrast parameters are
where
,  ∈ ℝ . These transformations are defined by Balunovic
et al. [15] and Mohapatra et al. [14] as,</p>
        <p>′
 , = min(1, max(0, (1 + ) ⋅  , + ))</p>
      </sec>
      <sec id="sec-5-2">
        <title>4.1. Abstracting the Perturbations</title>
        <p>For a particular input pixel  , if  is the total number of
geometric transformation (eg. rotation) steps for a given
transformation, there exists a function, ()</p>
        <p>that projects
the efects of the transformation on this pixel at each step
() = {</p>
        <p>1, ...,   }
(9)
such that   is the outcome of the transformation at each
step  , 1 ≤  ≤  . In order to capture these pixel values in
In our work, the pixel-wise transformations are parame- the interval domain, we use the abstraction function  ,
terized by darkening and brightening ofsets,   ,   ∈ ℝ,
and the upper and lower bounds for the perturbations
are specified as,</p>
        <p>′
 , = ( , −   ,  , +   )</p>
        <p>(10)</p>
      </sec>
      <sec id="sec-5-3">
        <title>3.9. Composite Geometric Perturbations</title>
        <p>Composite geometric perturbations based on real-world
phenomena are a combination of one afine
transformation (like rotation, translation or scaling) and a pixel-wise
transformation (like changes in lighting). These
perturbations are visually represented in figure 2 as,</p>
        <sec id="sec-5-3-1">
          <title>Applying equation (12),</title>
          <p>() = {6, 2, 3, 1, 4}
 (()) = (1, 6)
If   =   = 1, applying equation (13),
 (()) = ((1 − 1) − 5, (6 + 1) − 5)</p>
          <p>(  ,   ) = (−5, 2)
These pixel-level perturbation margins are then applied
to the initial pixel value before the transformations to get
the pixel-level perturbation bounds that are
representative of composite geometric perturbations (like rotations
and changes in lighting).</p>
          <p>( +   ,  +   ) = (5 + (−5), 5 + 2) = (0, 7)
From this example, we see that the pixel  can take on
any value between (0, 7) during a composite geometric
transformation.</p>
        </sec>
      </sec>
      <sec id="sec-5-4">
        <title>4.2. Feature Space Modelling</title>
        <p>The abstraction process helps determine the upper and
lower pixel-level perturbation margins in order to achieve
eficient computations during the verification process.
Since VoTE operates based on abstracting each point
with an interval in one dimension, when applying to a
multidimensional space we need to extend the abstraction
function through feature space modelling to simulate
these perturbations for vision-based systems.
Note that while Algorithm 1 contains axial rotations as
the afine transformation, this can be easily adapted to the
corresponding other afine transformations, i.e., scaling
or translation computed by similar respective algorithms
analogously.</p>
      </sec>
      <sec id="sec-5-5">
        <title>4.3. Verification Workflow</title>
        <p>
          In this section, we present the verification workflow that
shows the feature space modelling process incorporated
into the VoTE [
          <xref ref-type="bibr" rid="ref11">3</xref>
          ] framework. We will use one of the case
studies in section 5.1 to illustrate the flow in Figure 3. If
we consider rotations between 0° and 10°,   (which is
the image generated by applying   associated with all
pixels in a particular image) and   (the corresponding
image generated by applying   ) represent the first and
last images in the verification process. We then ask the
VoTE Property Checker to verify all the images in the
transformed series.
The property checker returns a pass when the classifier
makes stable predictions on all the images in the
transformed series. If the result of the verification process
is fail, the property does not hold (the classifier is not
stable). In this case, the VoTE property checker can be
configured to return a counterexample that can help with
further analysis. The illustration of the approach in more
detail is described with use cases in section 5.
        </p>
      </sec>
      <sec id="sec-5-6">
        <title>4.4. Classifier Correctness</title>
        <p>The correctness of a classifier (   ), with respect to an input
 , denoted by    (  , ) is proven when   predicts
the correct label ( ) for an input sample ( ) according to
a ground truth. This notion of correctness computed
over a test set (  ) can be used to quantify accuracy as
accuracy =
|{  ∈  
|    (</p>
        <p>, )}|
|  |
(14)</p>
      </sec>
      <sec id="sec-5-7">
        <title>4.5. Safety Property: Classifier Stability</title>
        <p>
          Accuracy on a test set is the most commonly used metric
to evaluate a classification model. However, accuracy
alone is not enough for convincing safety arguments
in models that use machine learning. In this paper, we
consider Stability, a property commonly used in related
works. Robustness (and other metrics consistent with
the literature) are generalised to a notion of classifier
stability and correctness [
          <xref ref-type="bibr" rid="ref17">9</xref>
          ]. Note that compliance with
stability and its associated metrics alone isn’t generally
enough to ensure system safety.
        </p>
        <p>Safety Engineers
typically define requirements on software functions that
are much richer than this property alone.
and only if
Let   ∶  →
ifcation where</p>
        <p>be the classifier subject to
veri (the image) is an n-array vector
consisting of individual pixels, and  is the output class
or label.   ,   ∈ ℝ are lower and upper perturbation
margins (

&lt;  ) associated with an individual pixel</p>
        <p>in the image. If  is the number of pixels in the image,
Δ = ((  1,   1), ..., (   ,</p>
        <p>)) is a sequence containing
tuples of pixel-level perturbation margins associated
with the image.  is an n-tuple of perturbations, with
 = (
sample  and all possible variations of  , the classifier 
1, ...,   ) and    ⩽   ⩽    . If we consider a test
is stable on that sample (denoted by  (
 , ) ) if

  () =   ( ′) , where  ′ =  + 
(15)
The above expression can easily be extended over a test
set (  ). Note that</p>
        <p>when added to the base image
( ) produces a version of that base image that represents
the combined efects of a geometric transformation and
changes in lighting. These diferent versions of the base
image can then be analyzed by the verification engine.</p>
      </sec>
      <sec id="sec-5-8">
        <title>4.6. Extended Stability Metrics</title>
        <p>
          Stability and Robustness are standard metrics commonly
used in the literature on adversarial machine learning.
They are generally used to quantify the security of
classifiers at test time (for e.g., evasion attacks). Beyond
stability, Ranzato and Zanella [
          <xref ref-type="bibr" rid="ref17">9</xref>
          ] define four additional
metrics - robustness (R), fragility (F), vulnerability (V) and
breakage (B) that combine classifier stability with
prediction accuracy in the presence of input perturbations. We
|{  ∈  
|    (
 , ) }|
 , ) ∧  (
|  |
recall
        </p>
        <p>R =
We incorporate these definitions into VoTE’s Property
Checker which formally verifies classifier stability and
automatically computes these metrics at the same time.</p>
      </sec>
      <sec id="sec-5-9">
        <title>4.7. VoTE Property Checker</title>
        <p>Recall that from the preliminaries, the VoTE property
( ∈ 
checker (VoTE-Pc) returns a pass if the property is
satisifed. For a tree-ensemble model (   ), and a test sample 
 ), the following equations are used to verify
stability and correctness, which can be extended to compute
the metrics from the equations in section 4.6.
is_correct ← VoTE-Pc ( ,   (
is_stable ← VoTE-Pc ( ,  (
 , ))
 , ))
(16)
(17)</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>5. Case Studies</title>
      <p>
        We apply our three afine transformations combined with
changes in lighting to two case studies. We verify
stability and compute the extended metrics associated with
classifier stability and correctness. Due to space
restrictions, we will only show the outcomes of composite
rotations in one study and composite scaling in the other. We
use scikit-learn [
        <xref ref-type="bibr" rid="ref14">6</xref>
        ] to train random forests and CatBoost
[
        <xref ref-type="bibr" rid="ref15">7</xref>
        ] to train gradient boosting machines. Experiments are
conducted on a Windows 11 Machine running Ubuntu
20.04 in WSL Mode (Windows Subsystem for Linux). The
machine comes equipped with an Intel Core i7-10875H
CPU and 16 GB RAM.
      </p>
      <sec id="sec-6-1">
        <title>5.1. Digit Recognition</title>
        <p>The MNIST dataset [22] is a collection of hand-written
digits commonly used to evaluate machine learning
algorithms. The dataset contains 70,000 gray scale images
with a resolution of 28x28 pixels at 8bpp, encoded as a
tuple of 784 pixels. The input regions surrounding each
sample in the test set were defined using feature space
modelling (section 4.3). The parameters for rotation were
chosen as follows: Rotation Range = 0 − 10°, Rotation
Steps = 1001. For the changes in lighting, the parameters
were chosen as:   = 2 (darkening) and   = 3
(brightening). To make this case study even more realistic, the
training data was augmented (doubled to 120,000 training
samples) by adding samples that were randomly rotated
in the range of ±10° using Keras [24]. The verification
process was carried out on 10,000 test samples.
and out). For the changes in lighting, the parameters
were chosen as:   = 2 (darkening) and   = 3
(brightening). To make this case study even more realistic, the
training data was augmented (doubled to 249,600 training
samples) by adding samples that were randomly scaled
in the range of ±10% using Keras [24]. The verification
process was carried out on 10,000 test samples.</p>
        <p>In regard to this case study, if we consider the
EMNIST [23] letter a and scaling in the range of ±10 %,
with   = 2 and   = 3,   and   represent the first and
last images in the verification process. We then ask the
VoTE Property Checker to verify all the images in the
transformed series. Figure 5 shows a fragment of the
verification flow.</p>
      </sec>
      <sec id="sec-6-2">
        <title>5.2. Handwritten Character Recognition</title>
        <p>In our next experiments, we use the EMNIST-Letters
dataset, which is a variant of the EMNIST Dataset [23]
that includes upper and lower case handwritten
characters. This dataset contains 145,600 samples across 26
balanced classes. Each sample has a resolution of 28x28 Figure 5: Verification Workflow Fragment (EMNIST)
pixels at 8bpp, encoded as a tuple of 784 pixels. The
input regions surrounding each sample in the test set were Discussions
defined using feature space modelling. The parameters
for scaling were chosen as follows: Scaling Steps = 201, From the experimental results in Table 1 and Table 2,
Scale Factor = ±10% which represents a 10% Zoom (in we note that the robustness of the learned models with
respect to composite geometric perturbations is much verify these models within a fraction of a second for
lower compared to the standard model accuracy (which most of the experiments. For larger models, while the
is somewhat expected). In the first case study, gradi- verification time does increase, this issue is resolvable
ent boosting machines performed slightly better than since due to the memory-eficient structure of the VoTE
random forests in terms of robustness. Increasing the core algorithm, computations can always be parallelised
number of trees improves accuracy but causes a drop in (not used in our experiments).
robustness. This indicates that the models were
overiftted and that adding compositely perturbed images to VoTE captures multiple inputs (precisely) in the
the training set may improve robustness. The high num- interval domain, but whenever a prediction model
bers for fragility indicated that while these samples were violates the stability property, the property checker
identified correctly, the classifier was unable to make returns a counterexample which lies in the violating
stable predictions on them in presence of these types of region. Figure 7 represents a few of the many
counperturbations. We would also like to mention that the ver- terexamples discovered by VoTE during the verification
ification process would require a domain expert, due to process (MNIST-Digits). Since the perturbations are
the complex nature of these perturbations. For instance, almost indistinguishable by the naked eye, they are
if you pick a large rotation range (or a large scaling or highlighted in pink. These counterexamples can be used
translation range), there will be a significant number of for debugging or further analysis to repair or retrofit the
images in the verification series that won’t resemble the models iteratively until provable stability is eventually
ground truth label. In this case, the verification process achieved. For instance, counterexamples used for guided
needs to be done in small ranges. model training can be embedded into a loss function to
achieve better stability.
We repeated the experiments in the second case study by
varying the scale factors as: ±2%, ±4%, ±6%, ±8%, ±10%
for the largest Gradient Boosting Machine. In figure
6, we can see that as the scale factor is increased, the
robustness of the model decreases and the verification
runtime increases. This is expected as the number of
perturbed pixels increases as the scale factor is increased.</p>
        <p>Also, while the tree ensemble models chosen for
the experiments could be considered trivial, we would
like to point out that the verification problem was
certainly non-trivial considering the composite nature
of these perturbations, and the fact that the models were
trained on high-dimensional data. What is positive in
this context is the fact that performing such analyses
in tractable time is at all possible given the enormous
search space for these perturbations.</p>
        <p>Considering the complexity of the verification
problem, it is impressive that VoTE was able to eficiently</p>
      </sec>
    </sec>
    <sec id="sec-7">
      <title>6. Conclusions and Future Works</title>
      <p>Applying formal verification to show that a system
exhibits its intended behaviour can help increase the
trustworthiness of the decisions made by intelligent
systems. In this work, we show that tree-ensemble based
models with comparatively good prediction accuracies
perform spectacularly poorly when presented with
samples containing composite geometric perturbations
based on real-world phenomena.</p>
      <p>Apart from the profound impacts on safety, these
types of perturbations can also violate the security
properties of classifiers which could be detrimental
in safety-critical scenarios. To address this issue,
the parameters in our method can be tweaked to
simultaneously check for adversarial perturbations in
addition to composite geometric perturbations. This
represents a step towards safe and secure AI. Finally, our
method can also be used with diferent types of models
and verification engines depending on the context.
For future works, we plan on exploring diferent
abstract domains which can be used to tightly capture
these perturbations to increase the scalability of VoTE to</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <article-title>diferent tree selection strategies, i.e., a systematic and interpretability</article-title>
          .
          <source>Comput. Sci. Rev</source>
          .,
          <volume>37</volume>
          ,
          <fpage>100270</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <article-title>analysis of the order in which the trees in the ensemble</article-title>
          [12]
          <string-name>
            <surname>Engstrom</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tran</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tsipras</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schmidt</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          , &amp;
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <article-title>are analyzed in VoTE's abstraction-refinement pipeline</article-title>
          <string-name>
            <surname>Madry</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          (
          <year>2019</year>
          ).
          <article-title>Exploring the Landscape of Spatial</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <article-title>process against the large nature of these perturbations</article-title>
          . [13]
          <string-name>
            <surname>Pei</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cao</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yang</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Jana</surname>
            ,
            <given-names>S.S.</given-names>
          </string-name>
          (
          <year>2019</year>
          ).
          <source>Towards Practical Verification of Machine Learning: The Case of Computer Vision Systems</source>
          . ICSE Workshop on Test-
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <article-title>Acknowledgements ing for Deep Learning and Deep Learning for Testing [</article-title>
          14]
          <string-name>
            <surname>Mohapatra</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Weng</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Liu</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          , &amp; Daniel,
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <article-title>This work was partially supported by the Wallenberg L</article-title>
          . (
          <year>2020</year>
          ).
          <source>Towards Verifying Robustness of Neural Net-</source>
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <article-title>funded by the Knut and Alice Wallenberg Foundation</article-title>
          .
          <source>2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)</source>
          ,
          <fpage>241</fpage>
          -
          <lpage>249</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          <string-name>
            <surname>References</surname>
          </string-name>
          [15]
          <string-name>
            <surname>Balunovic</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Baader</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Singh</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gehr</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Vechev</surname>
            ,
            <given-names>M.T.</given-names>
          </string-name>
          (
          <year>2019</year>
          ). Certifying Geometric Robustness
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [1]
          <string-name>
            <surname>Breiman</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          (
          <year>2004</year>
          ).
          <source>Random Forests. Machine Learn- of Neural Networks. NeurIPS. ing</source>
          ,
          <volume>45</volume>
          ,
          <fpage>5</fpage>
          -
          <lpage>32</lpage>
          . [16]
          <string-name>
            <surname>Wu</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tagomori</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Robey</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yang</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Matni</surname>
          </string-name>
          ,
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [2]
          <string-name>
            <surname>Friedman</surname>
            ,
            <given-names>J.H.</given-names>
          </string-name>
          (
          <year>2001</year>
          ).
          <article-title>Greedy function approxima</article-title>
          - N.,
          <string-name>
            <surname>Pappas</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hassani</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pasareanu</surname>
            ,
            <given-names>C.S.</given-names>
          </string-name>
          , &amp;
          <article-title>Barrett, tion: A gradient boosting machine</article-title>
          .
          <source>Annals of</source>
          Statis- C. (
          <year>2022</year>
          ).
          <source>Toward Certified Robustness Against Realtics</source>
          ,
          <volume>29</volume>
          ,
          <fpage>1189</fpage>
          -
          <lpage>1232</lpage>
          . World Distribution Shifts. ArXiv, abs/2206.03669.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [3]
          <string-name>
            <surname>Törnblom</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Nadjm-Tehrani</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          (
          <year>2019</year>
          ). An [
          <volume>17</volume>
          ]
          <string-name>
            <surname>Yang</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Laurel</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Misailovic</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Singh</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          (
          <year>2022</year>
          ).
          <article-title>Abstraction-Refinement Approach to Formal Verifica- Provable Defense Against Geometric Transformation of Tree Ensembles</article-title>
          .
          <source>SAFECOMP Workshops. tions. arXiv preprint arXiv:2207</source>
          .
          <fpage>11177</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [4]
          <string-name>
            <surname>Irsoy</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yildiz</surname>
            ,
            <given-names>O.T.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Alpaydin</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          (
          <year>2012</year>
          ). Soft [18]
          <string-name>
            <surname>Katz</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Barrett</surname>
            ,
            <given-names>C.W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dill</surname>
            ,
            <given-names>D.L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Julian</surname>
            ,
            <given-names>K.D.</given-names>
          </string-name>
          , &amp;
          <article-title>decision trees</article-title>
          .
          <source>Proceedings of the 21st International Kochenderfer</source>
          ,
          <string-name>
            <surname>M.J.</surname>
          </string-name>
          (
          <year>2017</year>
          ).
          <source>Reluplex: An Eficient SMT Conference on Pattern Recognition (ICPR2012)</source>
          ,
          <fpage>1819</fpage>
          -
          <article-title>Solver for Verifying Deep Neural Networks</article-title>
          .
          <source>Interna1822. tional Conference on Computer Aided Verification.</source>
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [5]
          <string-name>
            <surname>Wing</surname>
            ,
            <given-names>J. M.</given-names>
          </string-name>
          (
          <year>2021</year>
          ).
          <article-title>Trustworthy AI</article-title>
          . In Communica- [19]
          <string-name>
            <surname>Gowal</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Qin</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Huang</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          , cemgil, T.,
          <article-title>Dvijotham, tions of the ACM</article-title>
          (Vol.
          <volume>64</volume>
          , Issue 10, pp.
          <fpage>64</fpage>
          -
          <lpage>71</lpage>
          ). As- K.,
          <string-name>
            <surname>Mann</surname>
            ,
            <given-names>T.A.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Kohli</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          (
          <year>2020</year>
          ).
          <article-title>Achieving Robustsociation for Computing Machinery (ACM). https://- ness in the Wild via Adversarial Mixing With Disendoi</article-title>
          .
          <source>org/10</source>
          .1145/3448248 tangled Representations.
          <source>2020 IEEE/CVF Conference</source>
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [6]
          <string-name>
            <surname>Pedregosa</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Varoquaux</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gramfort</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <article-title>Michel, on Computer Vision and Pattern Recognition (CVPR</article-title>
          ), V.,
          <string-name>
            <surname>Thirion</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Grisel</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Blondel</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Louppe</surname>
          </string-name>
          , G.,
          <volume>1208</volume>
          -
          <fpage>1217</fpage>
          . Prettenhofer,
          <string-name>
            <given-names>P.</given-names>
            ,
            <surname>Weiss</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            ,
            <surname>Weiss</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.J.</given-names>
            ,
            <surname>Vanderplas</surname>
          </string-name>
          ,
          <string-name>
            <surname>J.</surname>
          </string-name>
          , [
          <volume>20</volume>
          ]
          <string-name>
            <surname>Hong</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hu</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yin</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          (
          <year>2022</year>
          ).
          <article-title>A MultiPassos, A</article-title>
          .,
          <string-name>
            <surname>Cournapeau</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Brucher</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Perrot</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Scale Convolutional Neural Network for RotationDuchesnay</surname>
          </string-name>
          , E. (
          <year>2011</year>
          ).
          <article-title>Scikit-learn: Machine Learning Invariant Recognition</article-title>
          . Electronics. in Python.
          <source>J. Mach. Learn. Res.</source>
          ,
          <volume>12</volume>
          ,
          <fpage>2825</fpage>
          -
          <lpage>2830</lpage>
          . [21]
          <string-name>
            <surname>Peri</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mehta</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mishra</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Milford</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Garg</surname>
          </string-name>
          ,
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [7]
          <string-name>
            <surname>Prokhorenkova</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gusev</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vorobev</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dorogush</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Krishna</surname>
            ,
            <given-names>K.M.</given-names>
          </string-name>
          (
          <year>2022</year>
          ). ReF
          <string-name>
            <surname>- Rotation EquivA</surname>
          </string-name>
          . V., &amp;
          <string-name>
            <surname>Gulin</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          (
          <year>2018</year>
          ).
          <article-title>CatBoost: unbiased boosting ariant Features for Local Feature Matching. ArXiv, with categorical features</article-title>
          .
          <source>Advances in neural infor- abs/2203.05206. mation processing systems</source>
          ,
          <volume>31</volume>
          . [22]
          <string-name>
            <surname>LeCun</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bottou</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bengio</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Hafner</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          (
          <year>1998</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [8]
          <string-name>
            <surname>Gokhale</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Anirudh</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kailkhura</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <article-title>Thiagara- Gradient-based learning applied to document recogjan</article-title>
          , J.J.,
          <string-name>
            <surname>Baral</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Yang</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          (
          <year>2021</year>
          ).
          <article-title>Attribute-Guided nition</article-title>
          .
          <source>Proc. IEEE</source>
          ,
          <volume>86</volume>
          ,
          <fpage>2278</fpage>
          -
          <lpage>2324</lpage>
          . Adversarial Training for Robustness to Natural Per- [23]
          <string-name>
            <surname>Cohen</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Afshar</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tapson</surname>
            ,
            <given-names>J.C.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Schaik</surname>
            ,
            <given-names>A.V. turbations. AAAI.</given-names>
          </string-name>
          (
          <year>2017</year>
          ).
          <article-title>EMNIST: an extension of MNIST to handwrit-</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [9]
          <string-name>
            <surname>Ranzato</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Zanella</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          (
          <year>2020</year>
          ).
          <article-title>Abstract Interpre- ten letters</article-title>
          .
          <source>ArXiv, abs/1702</source>
          .05373.
          <article-title>tation of Decision Tree Ensemble Classifiers</article-title>
          . AAAI. [24]
          <string-name>
            <surname>Chollet</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          (
          <year>2018</year>
          ).
          <article-title>Keras: The Python Deep Learning</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [10]
          <string-name>
            <surname>Scher</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Trügler</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          (
          <year>2022</year>
          ). Robustness of Ma- library. https://keras.io chine Learning Models Beyond Adversarial Attacks. [25]
          <string-name>
            <surname>Cleland</surname>
            ,
            <given-names>G. M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sujan</surname>
            ,
            <given-names>M. A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Habli</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Medhurst</surname>
          </string-name>
          , ArXiv, abs/2204.10046. J. (
          <year>2012</year>
          ).
          <article-title>Evidence: using safety cases in industry and</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [11]
          <string-name>
            <surname>Huang</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kroening</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ruan</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sharp</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          , Sun, healthcare. The Health Foundation. Y.,
          <string-name>
            <surname>Thamo</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wu</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Yi</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          (
          <year>2020</year>
          ). A survey of [26]
          <string-name>
            <surname>Törnblom</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Nadjm-Tehrani</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          (
          <year>2021</year>
          ).
          <article-title>Scaling safety and trustworthiness of deep neural networks: up Memory-Eficient Formal Verification Tools for Tree Ensembles</article-title>
          . ArXiv, abs/2105.02595.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>