<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Formal Verification of Tree Ensembles against Real-World Composite Geometric Perturbations</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author role="corresp">
							<persName><forename type="first">Valency</forename><forename type="middle">Oscar</forename><surname>Colaco</surname></persName>
							<email>valency.colaco@liu.se</email>
							<affiliation key="aff0">
								<orgName type="department">Department of Computer and Information Science</orgName>
								<orgName type="institution">Linköping University</orgName>
								<address>
									<country key="SE">Sweden</country>
								</address>
							</affiliation>
							<affiliation key="aff2">
								<orgName type="institution">Corresponding author</orgName>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Simin</forename><surname>Nadjm-Tehrani</surname></persName>
							<affiliation key="aff0">
								<orgName type="department">Department of Computer and Information Science</orgName>
								<orgName type="institution">Linköping University</orgName>
								<address>
									<country key="SE">Sweden</country>
								</address>
							</affiliation>
						</author>
						<author>
							<affiliation key="aff1">
								<address>
									<addrLine>Feb 13-14</addrLine>
									<postCode>2023</postCode>
									<settlement>Washington</settlement>
									<region>D.C</region>
									<country>US</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Formal Verification of Tree Ensembles against Real-World Composite Geometric Perturbations</title>
					</analytic>
					<monogr>
						<idno type="ISSN">1613-0073</idno>
					</monogr>
					<idno type="MD5">BF730CBA9A8BA3AA620A897C71F01A3C</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-04-29T06:37+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>Machine Learning</term>
					<term>Formal Verification</term>
					<term>Tree Ensembles</term>
					<term>Composite Perturbations</term>
					<term>Geometric Perturbations</term>
					<term>Random Forests</term>
					<term>Gradient Boosting Machines</term>
					<term>Semantic Perturbations</term>
					<term>Stability</term>
					<term>Robustness</term>
					<term>Trustworthy AI</term>
					<term>Trustworthy Computing</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>Since machine learning components are now being considered for integration in safety-critical systems, safety stakeholders should be able to provide convincing arguments that the systems are safe for use in realistic deployment settings. In the case of vision-based systems, the use of tree ensembles calls for formal stability verification against a host of composite geometric perturbations that the system may encounter. Such perturbations are a combination of an affine transformation like rotation, scaling, or translation and a pixel-wise transformation like changes in lighting. However, existing verification approaches mostly target small norm-based perturbations, and do not account for composite geometric perturbations. In this work, we present a novel method to precisely define the desired stability regions for these types of perturbations. We propose a feature space modelling process that generates abstract intervals which can be passed to VoTE, an efficient formal verification engine that is specialised for tree ensembles. Our method is implemented as an extension to VoTE by defining a new property checker. The applicability of the method is demonstrated by verifying classifier stability and computing metrics associated with stability and correctness, i.e., robustness, fragility, vulnerability, and breakage, in two case studies. In both case studies, targeted data augmentation pre-processing steps were applied for robust model training. Our results show that even models trained with augmented data are unable to handle these types of perturbations, thereby emphasising the need for certified robust training for tree ensembles.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.">Introduction</head><p>Recent advancements in machine learning are now being considered for integration in safety-critical systems like medical equipment, critical infrastructure, and transport networks (autonomous vehicles, avionics, spaceflights) where software flaws could be detrimental to humans and the environment. While these systems are subject to strict regulations, a lack of evidence that the machine learning (ML) models deployed in these systems are safe for operation, can directly affect their trustworthiness.</p><p>Considering the inherently probabilistic nature of these models, standard industry practices like software testing are often unsuitable to provide any guarantees about operational safety. In this context, to help ensure trustworthiness, Artificial Intelligence (AI) systems can benefit from scrutiny through formal methods. Formal verification can be a tool to analyse the safety properties of software operating over a multi-dimensional space. Evidence based on formal verification can be used in a safety assurance case, which is a structured record of rea-soning with compelling and comprehensible arguments for a system being acceptably safe in a given context <ref type="bibr" target="#b24">[25]</ref>.</p><p>The formal methods community has recently been exploring the robustness properties of vision-based AI systems used in autonomous vehicles <ref type="bibr" target="#b4">[5]</ref>. An important role of these vision-based systems is sending correct information about road signs to the vehicle's control system. However, the road signs encountered in the real world may appear different to that in the test set. Real-world phenomena like the banking of roads cause changes to the road surface topography in which the outer edges are raised at an angle above the inner edge to provide the necessary centripetal force to the vehicles for safe turns. This phenomenon, combined with rapid changes in lighting (time of day, light reflections from vehicles), and the fact that road signs have reflective elements, can produce different versions of the road signs. Along with being rotated, parts of these signs may appear lighter and darker depending on the light interactions (absorption, reflection, and so on). The same principle can be applied to other geometric transformations like scaling, in which the road sign appears zoomed out when the vehicle is at a distance, and translation, in which the road sign gradually moves out of the camera frame as it is approached by the vehicle. While the different versions of these road signs are correctly interpretable by humans, an ML model could misread these signs and the effects could be disastrous. This calls for formal verification of these models to ensure classifier stability in the presence of composite geometric perturbations that the system may encounter in real-world deployment settings.</p><p>In this paper, we present a method to precisely capture these perturbations as abstract intervals to verify tree-ensemble based classifiers against specified safety properties. Since our goal is to support safety arguments in assurance cases, the verifiability of the models is of paramount importance. The simple structure of the tree-based models makes them easier to systematically analyze in the formal verification process. However, larger models may still prove hard to verify due to the combinatorial explosion. We present a method where composite geometric perturbations are applied to input data and shown to prove a lack of stability even in the presence of data augmentation during training. This is demonstrated by applying our method to digit/character recognition problems. The contributions of this paper are as follows, -A method to capture composite geometric perturbations based on real-world phenomena (like the combined effects of axial rotations, scaling, or translation together with changes in lighting) in the interval domain through feature space modelling for visionbased systems.</p><p>-Realisation of the method, implemented as an extension to the Verifier of Tree Ensembles <ref type="bibr" target="#b2">[3]</ref> (VoTE), a formal verification tool that uses abstract interpretation, along with the application of the method to two case studies commonly found in the literature.</p><p>The rest of the paper is structured as follows. Section 2 discusses related works on the verification of tree ensembles against different classes of perturbations. Section 3 presents the preliminaries on tree-ensembles, the VoTE toolsuite, and composite geometric perturbations. Section 4 presents our abstraction function and feature space modelling process along with the implementation in VoTE. Section 5 presents the application of the method to two case studies commonly found in the literature. Section 6 concludes the paper and summarizes the learnings.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">Related Works</head><p>Recent machine learning advances are gradually finding their way into safety-critical systems. However, there is a lack of verification techniques to build formal arguments for operational safety in such systems. There has been extensive research on formal verification of 𝑙 𝑝 -norm bounded perturbations (mathematical distances that define the perturbations) in neural networks. A recent survey <ref type="bibr" target="#b10">[11]</ref> gives a broad overview of the safety and trustworthiness of deep neural networks with a specific focus on topics like verification, testing, adversarial defence, and interpretability.</p><p>Engstrom et al. <ref type="bibr" target="#b11">[12]</ref>, show that state of the art neural network classifiers are vulnerable to natural classes of perturbations, especially simple translations and rotations for a significant fraction of their inputs. Another interesting observation they made is that data augmentation does not necessarily provide a significant boost to robustness, which we also confirm for tree ensembles in this paper.</p><p>Pei et al. <ref type="bibr" target="#b12">[13]</ref> and Mohapatra et al. <ref type="bibr" target="#b13">[14]</ref> propose verification frameworks called VERIVIS and Semantify-NN for evaluating the robustness of machine learning systems. The VERIVIS framework checks for violations of the parameter space (like the range of angles) when the inputs are subjected to semantic perturbations while the Semantify-NN framework converts semantic perturbations to 𝑙 𝑝 -norm based threat models thereby enabling the use of any 𝑙 𝑝 -norm based verification method for robustness verification of deep neural networks. Balunović et al. <ref type="bibr" target="#b14">[15]</ref> study the certification of neural networks against natural geometric transformations like rotation and scaling. They use sampling, interpolation and Lipschitz optimisation to find the bounds for their certification process. The authors implement their certification framework in a system called DEEPG that uses a restricted Polyhedra with custom approximations for the operations used in several geometric transformations. Our approach, however, uses a feature space modelling process in conjunction with an abstraction function to get precise upper and lower bounds for the inputs as they are perturbed by real-world elements.</p><p>Wu et al. <ref type="bibr" target="#b15">[16]</ref>, study the certification of deep neural networks against real-world distribution shifts. They train a generative model to learn perturbations from the data to improve robustness precision. Yang et al. <ref type="bibr" target="#b16">[17]</ref> mention that geometric image transformations that arise in the real world, such as scaling and rotation, have been shown to easily deceive deep neural networks. They propose a training formulation known as Certified Geometric Training (CGT) that improves the deterministic certified robustness of neural networks with respect to geometric transformations. Gowal et al. <ref type="bibr" target="#b18">[19]</ref>, use generative models to systematically transfer image attributes that are likely to be misclassified across image instances to achieve better robustness. They also mention that 𝑙 𝑝 -norm bounded perturbations do not necessarily cover plausible real-world variations that preserve the semantics of the input. We agree that composite geometric perturbations based on real-world phenomena (that preserve the semantics of the input) are outside the 𝑙 𝑝 -norm reach. Hence, we deploy a feature space modelling process for defining the multidimensional perturbation regions. Also, while using generative models can help with robust training, this in itself cannot be used for building up a safety case in systems that use ML components. Our approach aims to provide formal proof-based evidence to support safety assurances in these systems.</p><p>Outside formal verification and robust training, significant research has been done to tackle the issue of real-world perturbations like axial rotations in images in the form of rotation-invariant networks <ref type="bibr" target="#b19">[20]</ref> and rotation-equivariant networks <ref type="bibr" target="#b20">[21]</ref>. In Scher et al. <ref type="bibr" target="#b9">[10]</ref>, the authors formally define real-world perturbations and differentiate them between adversarial perturbations and distribution shifts. They model the perturbations around a point as a probability distribution and succeed in the estimation of robustness using Monte Carlo Sampling for low to medium dimensional data. While their method provides probabilistic guarantees, our approach aims to provide formal deterministic guarantees which can form the basis for reasoning about safety in safety-critical systems.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">Preliminaries</head><p>In this section, we present the background knowledge on tree-based ensembles, semantic perturbations, interpolated/geometric and pixel-wise transformations, composite geometric perturbations based on real-world phenomena, and the toolsuite VoTE <ref type="bibr" target="#b2">[3]</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1.">Decision Trees</head><p>Decision trees are employed as prediction models in machine learning and can be used in vision-based systems to solve image classification problems (among other applications). A decision tree model contains rules to predict the output class and implements a prediction function 𝑡 ∶ 𝑋 𝑛 → ℝ 𝑚 that maps disjoint sets of points 𝑋 𝑖 ⊂ 𝑋 𝑛 to a single output point 𝑦 𝑖 ∈ ℝ 𝑚 , i.e.,</p><formula xml:id="formula_0">𝑡(𝑥) = ⎧ ⎨ ⎩ (𝑦 1,1 , ..., 𝑦 1,𝑚 ) 𝑥 ∈ 𝑋 1 ⋮ (𝑦 𝑘,1 , ..., 𝑦 𝑘,𝑚 ) 𝑥 ∈ 𝑋 𝑘 (1)</formula><p>Where 𝑘 is the number of disjoint sets and</p><formula xml:id="formula_1">𝑋 𝑛 = 𝑘 ⋃ 𝑖=1 𝑋 𝑖</formula><p>The n-dimensional input domain 𝑋 𝑛 includes elements 𝑥 as tuples in which each input variable 𝑥 𝑖 captures some feature of interest of the system <ref type="bibr" target="#b2">[3]</ref>. The tree structure is evaluated in a top-down manner, where decision functions determine which path to take towards the leaves. Each internal node in the decision tree is associated with a decision function that recursively splits the input space, thereby separating regions from each other. When a leaf is hit, the output 𝑦 ∈ ℝ 𝑚 associated with the leaf becomes the output associated with the input from which the top-down evaluation started.</p><p>In this paper, we only consider binary trees with linear decision functions with one input variable, which Irsoy et al. <ref type="bibr" target="#b3">[4]</ref> call univariate hard decision trees. State-of-the-art implementations of tree-based ensembles typically use univariate hard decision trees, for example, scikit-learn <ref type="bibr" target="#b5">[6]</ref> and CatBoost <ref type="bibr" target="#b6">[7]</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2.">Random Forests</head><p>Decision trees are known to suffer from a phenomenon called overfitting in which the models are fitted so tightly to their training data that they memorize the data itself instead of learning the patterns associated with the data in order to make generalized predictions. To counteract these issues with decision trees concerning bias and variance, Breiman <ref type="bibr" target="#b0">[1]</ref> proposes Random Forests.</p><p>A random forest 𝑓 ∶ 𝑋 𝑛 → ℝ 𝑚 is an ensemble of 𝐵 decision trees that produces outputs by averaging the values emitted by each individual tree. i.e.,</p><formula xml:id="formula_2">𝑓 (𝑥) = 1 𝐵 𝐵 ∑ 𝑏=1 𝑡 𝑏 (𝑥)<label>(2)</label></formula><p>Where 𝑡 𝑏 is the 𝑏 𝑡ℎ tree in the ensemble To reduce the correlation between trees, each tree is trained on overlapping random subsets of the training data (with replacement) using techniques like bagging or boosting.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.3.">Gradient Boosting Machines</head><p>Similar to Random Forests, Freidman <ref type="bibr" target="#b1">[2]</ref> proposes gradient boosting machines that employ several decision trees to create a prediction function. Unlike Random Forests, these trees are trained in a sequential manner with each succeeding tree correcting the errors made by the predecessor tree. The errors are corrected using gradient descent (hence the name). Although conceptually different from random forests in a learning context, these two models have several features in common when performing predictions.</p><p>A gradient boosting machine 𝑓 ∶ 𝑋 𝑛 → ℝ 𝑚 is an ensemble of 𝐵 additive decision trees, i.e.,</p><formula xml:id="formula_3">𝑓 (𝑥) = 𝐵 ∑ 𝑏=1 𝑡 𝑏 (𝑥)<label>(3)</label></formula><p>Where 𝑡 𝑏 is the 𝑏 𝑡ℎ tree in the ensemble</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.4.">Classifier</head><p>Decision trees and tree-ensemble models can be used as classifiers to categorize samples from an input domain into one or more classes and assign each sample a label unique to its class. In this paper, we only consider functions that map each point from an input domain to exactly one class. Let 𝑓 (𝑥) = (𝑦 1 , ..., 𝑦 𝑚 ) represent a model trained to predict the probability 𝑦 𝑖 associated with a class 𝑖 within disjoint regions in the input domain, where 𝑚 is the number of classes. A classifier 𝑓 𝑐 (𝑥) may then be defined as,</p><formula xml:id="formula_4">𝑓 𝑐 (𝑥) = 𝑎𝑟𝑔𝑚𝑎𝑥 𝑖 𝑦 𝑖 (4)</formula></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.5.">Verifier of Tree Ensembles</head><p>VoTE <ref type="bibr" target="#b2">[3]</ref> (Verifier of Tree Ensembles) is a toolsuite for formally verifying that tree ensembles comply with specific properties that the user can define through implementing a property checker. The tool is based on abstract interpretation which yields equivalence classes in a tree ensemble, i.e. sets of points in the input domain that yield the same output tuple through an iterative refinement in the interval domain, such that the lowest level of abstract (the concrete) domain is equivalent to an exhaustive search down to all leaves in the model. This technique can be used to verify any property of interest defined within VoTE's property checker. For instance, in <ref type="bibr" target="#b25">[26]</ref>, VoTE was used to verify versatile application-specific properties of the aircraft collision avoidance system called ACAS Xu <ref type="bibr" target="#b17">[18]</ref>. However, in this work, the property checker defined earlier (see Figure <ref type="figure" target="#fig_0">1</ref>) checks for stability. The VoTE Core is instantiated from the ensemble subject to verification, 𝑓 ∶ 𝑋 𝑛 → 𝑅 𝑚 . It takes as input a hyperrectangle defining 𝑋 𝑛 , and the outcome of the formal verification is pass (when the tree ensemble satisfies the concerned property) and fail when it does not.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.6.">Semantic Perturbations</head><p>As opposed to perturbations that change the value of each feature by a constant value, semantic perturbations take an input image and 𝑝 parameters to perform a parameterized operation that perturbs the image while preserving its semantic information. Semantics-preserving perturbations can include object-level shifts (like changing the shape or size of an object in a color classification problem), geometric transformations (scaling, rotations) and common image corruptions (fog, blurs) <ref type="bibr" target="#b7">[8]</ref>. Most of these perturbations cannot be naturally represented using the 𝑙 𝑝 -norm (‖ 𝑥 − 𝑥 ′ ‖ 𝑝 ≤ 𝜖) that most robustness verification methods are designed for. Using a large value of 𝜖 to try and capture these types of perturbations is usually not recommended in practice as the image quality degrades significantly <ref type="bibr" target="#b7">[8]</ref>. Hence, we propose a feature space modelling process to precisely capture these multidimensional perturbations.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.7.">Geometric Perturbations</head><p>Geometric Perturbations are a class of semantic perturbations <ref type="bibr" target="#b7">[8]</ref> that involve an affine transformation on each pixel's row and column indices, followed by an interpolation operation <ref type="bibr" target="#b16">[17]</ref>. In this paper, we consider three types of affine transformations -rotations, scaling and translation.</p><p>Let 𝑇 𝜃 ∶ ℝ 2 → ℝ 2 (in 2D) be an invertible affine transformation (like rotation) parameterized by 𝜃 (like the angle of rotation). Let 𝜙 𝑥 (𝑗) and 𝜙 𝑦 (𝑖) be functions that transform 𝑖, 𝑗 pixel indices to 𝑥, 𝑦 co-ordinates with respect to the center of the image. However, as these transformed co-ordinates may not align exactly with the integer-valued pixel indices, bilinear interpolation ( 𝐼 ) is necessary. Yang et al. <ref type="bibr" target="#b16">[17]</ref> define the general form of a geometric perturbation for an image 𝑋 as,</p><formula xml:id="formula_5">𝑋 ′ 𝑖,𝑗 = 𝐼 (𝑇 −1 𝜃 (𝜙 𝑥 (𝑗) , 𝜙 𝑦 (𝑖)))<label>(5)</label></formula><p>Where 𝑋 ′ is the perturbed image. For each geometric perturbation, they present the inverse transform function 𝑇 −1 𝜃 which is used to instantiate equation ( <ref type="formula" target="#formula_5">5</ref>),</p><p>Rotation, parameterized by angle, 𝜃 ∈ [0, 2𝜋]</p><formula xml:id="formula_6">𝑇 −1 𝜃 (𝑥, 𝑦) = [ 𝑐𝑜𝑠𝜃 𝑠𝑖𝑛𝜃 −𝑠𝑖𝑛𝜃 𝑐𝑜𝑠𝜃 ] [ 𝑥 𝑦 ] = [ 𝑥𝑐𝑜𝑠𝜃 + 𝑦𝑠𝑖𝑛𝜃 −𝑥𝑠𝑖𝑛𝜃 + 𝑦𝑐𝑜𝑠𝜃 ]<label>(6)</label></formula><p>Translation, parameterized by horizontal and vertical shifts ℎ ∈ ℝ, 𝑣 ∈ ℝ</p><formula xml:id="formula_7">𝑇 −1 ℎ,𝑣 (𝑥, 𝑦) = [ 𝑥 − ℎ 𝑦 − 𝑣 ]<label>(7)</label></formula><p>Scaling, parameterized by a scaling factor, 𝜙 ∈ ℝ, 𝜙 &gt; −1,</p><formula xml:id="formula_8">𝑇 −1 𝜙 (𝑥, 𝑦) = [ 1 1+𝜙 0 0 1 1+𝜙 ] [ 𝑥 𝑦 ] = [ 𝑥/(1 + 𝜙) 𝑦/(1 + 𝜙) ]<label>(8)</label></formula></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.8.">Pixel-wise Transformations</head><p>Pixel-wise transformations are defined by the cumulative effects of brightness and contrast (i.e., simply changes in lighting) acting on a pixel 𝑋 𝑖,𝑗 of an image 𝑋 where the respective brightness and contrast parameters are 𝑐, 𝑏 ∈ ℝ. These transformations are defined by Balunovic et al. <ref type="bibr" target="#b14">[15]</ref> and Mohapatra et al. <ref type="bibr" target="#b13">[14]</ref> as,</p><formula xml:id="formula_9">𝑋 ′ 𝑖,𝑗 = min(1, max(0, (1 + 𝑐) ⋅ 𝑋 𝑖,𝑗 + 𝑏))<label>(9)</label></formula><p>In our work, the pixel-wise transformations are parameterized by darkening and brightening offsets, 𝜁 𝐿 , 𝜁 𝐻 ∈ ℝ, and the upper and lower bounds for the perturbations are specified as,</p><formula xml:id="formula_10">𝑋 ′ 𝑖,𝑗 = (𝑋 𝑖,𝑗 − 𝜁 𝐿 , 𝑋 𝑖,𝑗 + 𝜁 𝐻 )<label>(10)</label></formula></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.9.">Composite Geometric Perturbations</head><p>Composite geometric perturbations based on real-world phenomena are a combination of one affine transformation (like rotation, translation or scaling) and a pixel-wise transformation (like changes in lighting). These perturbations are visually represented in figure <ref type="figure" target="#fig_1">2</ref> as, </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.">Proposed Method</head><p>In this section, we present a method to capture composite geometric perturbations based on real-world phenomena (like the combined effects of axial rotations and changes in lighting) in the form of abstract intervals. The basic idea is to pass these abstract intervals to the VoTE property checker <ref type="bibr" target="#b2">[3]</ref> to verify stability and compute the extended stability metrics, which will be explained in this section.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.1.">Abstracting the Perturbations</head><p>For a particular input pixel 𝑥, if 𝑠 is the total number of geometric transformation (eg. rotation) steps for a given transformation, there exists a function, 𝜆(𝑥) that projects the effects of the transformation on this pixel at each step</p><formula xml:id="formula_11">𝜆(𝑥) = {𝑥 1 , ..., 𝑥 𝑠 }<label>(11)</label></formula><p>such that 𝑥 𝑖 is the outcome of the transformation at each step 𝑖, 1 ≤ 𝑖 ≤ 𝑠. In order to capture these pixel values in the interval domain, we use the abstraction function 𝛼, which is designed to compute an over-approximation of the problem, These pixel-level perturbation margins are then applied to the initial pixel value before the transformations to get the pixel-level perturbation bounds that are representative of composite geometric perturbations (like rotations and changes in lighting).</p><formula xml:id="formula_12">𝛼(𝜆(𝑥)) = (𝑚𝑖𝑛 𝜆(𝑥), 𝑚𝑎𝑥 𝜆(𝑥))<label>(12)</label></formula><p>(𝑝 + 𝜖 𝐿 , 𝑝 + 𝜖 𝐻 ) = (5 + (−5), 5 + 2) = (0, 7)</p><p>From this example, we see that the pixel 𝑥 can take on any value between (0, 7) during a composite geometric transformation.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.2.">Feature Space Modelling</head><p>The abstraction process helps determine the upper and lower pixel-level perturbation margins in order to achieve efficient computations during the verification process. Since VoTE operates based on abstracting each point with an interval in one dimension, when applying to a multidimensional space we need to extend the abstraction function through feature space modelling to simulate these perturbations for vision-based systems. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.3.">Verification Workflow</head><p>In this section, we present the verification workflow that shows the feature space modelling process incorporated into the VoTE <ref type="bibr" target="#b2">[3]</ref> framework. We will use one of the case studies in section 5.1 to illustrate the flow in Figure <ref type="figure" target="#fig_4">3</ref>. If we consider rotations between 0°and 10°, 𝐼 𝐿 (which is the image generated by applying 𝜖 𝐿 associated with all pixels in a particular image) and 𝐼 𝐻 (the corresponding image generated by applying 𝜖 𝐻 ) represent the first and last images in the verification process. We then ask the VoTE Property Checker to verify all the images in the transformed series. The property checker returns a pass when the classifier makes stable predictions on all the images in the transformed series. If the result of the verification process is fail, the property does not hold (the classifier is not stable). In this case, the VoTE property checker can be configured to return a counterexample that can help with further analysis. The illustration of the approach in more detail is described with use cases in section 5.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.4.">Classifier Correctness</head><p>The correctness of a classifier (𝑓 𝑐 ), with respect to an input 𝑥, denoted by 𝐼 𝑠𝐶𝑜𝑟𝑟𝑒𝑐𝑡(𝑓 𝑐 , 𝑥) is proven when 𝑓 𝑐 predicts the correct label (𝑙) for an input sample (𝑥) according to a ground truth. This notion of correctness computed over a test set (𝑋 𝑡𝑒𝑠𝑡 ) can be used to quantify accuracy as</p><formula xml:id="formula_13">accuracy = |{ 𝑥 ∈ 𝑋 𝑡𝑒𝑠𝑡 | 𝐼 𝑠𝐶𝑜𝑟𝑟𝑒𝑐𝑡(𝑓 𝑐 , 𝑥)}| |𝑋 𝑡𝑒𝑠𝑡 | (14)</formula></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.5.">Safety Property: Classifier Stability</head><p>Accuracy on a test set is the most commonly used metric to evaluate a classification model. However, accuracy alone is not enough for convincing safety arguments in models that use machine learning. In this paper, we consider Stability, a property commonly used in related works. Robustness (and other metrics consistent with the literature) are generalised to a notion of classifier stability and correctness <ref type="bibr" target="#b8">[9]</ref>. Note that compliance with stability and its associated metrics alone isn't generally enough to ensure system safety. Safety Engineers typically define requirements on software functions that are much richer than this property alone.</p><p>Let </p><p>The above expression can easily be extended over a test set (𝑋 𝑡𝑒𝑠𝑡 ). Note that 𝜔 when added to the base image (𝑥) produces a version of that base image that represents the combined effects of a geometric transformation and changes in lighting. These different versions of the base image can then be analyzed by the verification engine.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.6.">Extended Stability Metrics</head><p>Stability and Robustness are standard metrics commonly used in the literature on adversarial machine learning. They are generally used to quantify the security of classifiers at test time (for e.g., evasion attacks). Beyond stability, Ranzato and Zanella <ref type="bibr" target="#b8">[9]</ref> define four additional metrics -robustness (R), fragility (F), vulnerability (V) and breakage (B) that combine classifier stability with prediction accuracy in the presence of input perturbations. We recall</p><formula xml:id="formula_15">R = |{ 𝑥 ∈ 𝑋 𝑡𝑒𝑠𝑡 | 𝐼 𝑠𝐶𝑜𝑟𝑟𝑒𝑐𝑡(𝑓 𝑐 , 𝑥) ∧ 𝐼 𝑠𝑆𝑡𝑎𝑏𝑙𝑒(𝑓 𝑐 , 𝑥) }| |𝑋 𝑡𝑒𝑠𝑡 | F = |{ 𝑥 ∈ 𝑋 𝑡𝑒𝑠𝑡 | 𝐼 𝑠𝐶𝑜𝑟𝑟𝑒𝑐𝑡(𝑓 𝑐 , 𝑥) ∧ ¬𝐼 𝑠𝑆𝑡𝑎𝑏𝑙𝑒(𝑓 𝑐 , 𝑥) }| |𝑋 𝑡𝑒𝑠𝑡 | V = |{ 𝑥 ∈ 𝑋 𝑡𝑒𝑠𝑡 | ¬𝐼 𝑠𝐶𝑜𝑟𝑟𝑒𝑐𝑡(𝑓 𝑐 , 𝑥) ∧ 𝐼 𝑠𝑆𝑡𝑎𝑏𝑙𝑒(𝑓 𝑐 , 𝑥) }| |𝑋 𝑡𝑒𝑠𝑡 | B = |{ 𝑥 ∈ 𝑋 𝑡𝑒𝑠𝑡 | ¬𝐼 𝑠𝐶𝑜𝑟𝑟𝑒𝑐𝑡(𝑓 𝑐 , 𝑥) ∧ ¬𝐼 𝑠𝑆𝑡𝑎𝑏𝑙𝑒(𝑓 𝑐 , 𝑥) }| |𝑋 𝑡𝑒𝑠𝑡 |</formula><p>We incorporate these definitions into VoTE's Property Checker which formally verifies classifier stability and automatically computes these metrics at the same time.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.7.">VoTE Property Checker</head><p>Recall that from the preliminaries, the VoTE property checker (VoTE-Pc) returns a pass if the property is satisfied. For a tree-ensemble model (𝑓 𝑐 ), and a test sample 𝑥 (𝑥 ∈ 𝑋 𝑡𝑒𝑠𝑡 ), the following equations are used to verify stability and correctness, which can be extended to compute the metrics from the equations in section 4.6.</p><formula xml:id="formula_16">is_correct ← VoTE-Pc (𝑥 , 𝐼 𝑠𝐶𝑜𝑟𝑟𝑒𝑐𝑡(𝑓 𝑐 , 𝑥))<label>(16</label></formula><p>)</p><formula xml:id="formula_17">is_stable ← VoTE-Pc (𝑥 , 𝐼 𝑠𝑆𝑡𝑎𝑏𝑙𝑒(𝑓 𝑐 , 𝑥))<label>(17)</label></formula></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.">Case Studies</head><p>We apply our three affine transformations combined with changes in lighting to two case studies. We verify stability and compute the extended metrics associated with classifier stability and correctness. Due to space restrictions, we will only show the outcomes of composite rotations in one study and composite scaling in the other. We use scikit-learn <ref type="bibr" target="#b5">[6]</ref> to train random forests and CatBoost <ref type="bibr" target="#b6">[7]</ref> to train gradient boosting machines. Experiments are conducted on a Windows 11 Machine running Ubuntu 20.04 in WSL Mode (Windows Subsystem for Linux). The machine comes equipped with an Intel Core i7-10875H CPU and 16 GB RAM.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.1.">Digit Recognition</head><p>The MNIST dataset <ref type="bibr" target="#b21">[22]</ref> is a collection of hand-written digits commonly used to evaluate machine learning algorithms. The dataset contains 70,000 gray scale images with a resolution of 28x28 pixels at 8bpp, encoded as a tuple of 784 pixels. The input regions surrounding each sample in the test set were defined using feature space modelling (section 4.3). The parameters for rotation were chosen as follows: Rotation Range = 0 − 10°, Rotation Steps = 1001. For the changes in lighting, the parameters were chosen as: 𝜁 𝐿 = 2 (darkening) and 𝜁 𝐻 = 3 (brightening). To make this case study even more realistic, the training data was augmented (doubled to 120,000 training samples) by adding samples that were randomly rotated in the range of ±10°using Keras <ref type="bibr" target="#b23">[24]</ref>. The verification process was carried out on 10,000 test samples.   Figure <ref type="figure" target="#fig_6">4</ref> shows the construction of the first and last images in the verification series for a particular sample.</p><p>From this figure, it is visible that the two images, i.e., the first and the last resemble the ground truth, and the rest of the images in the series are similar in that sense.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.2.">Handwritten Character Recognition</head><p>In our next experiments, we use the EMNIST-Letters dataset, which is a variant of the EMNIST Dataset <ref type="bibr" target="#b22">[23]</ref> that includes upper and lower case handwritten characters. This dataset contains 145,600 samples across 26 balanced classes. Each sample has a resolution of 28x28 pixels at 8bpp, encoded as a tuple of 784 pixels. The input regions surrounding each sample in the test set were defined using feature space modelling. The parameters for scaling were chosen as follows: Scaling Steps = 201, Scale Factor = ±10% which represents a 10% Zoom (in and out). For the changes in lighting, the parameters were chosen as: 𝜁 𝐿 = 2 (darkening) and 𝜁 𝐻 = 3 (brightening). To make this case study even more realistic, the training data was augmented (doubled to 249,600 training samples) by adding samples that were randomly scaled in the range of ±10% using Keras <ref type="bibr" target="#b23">[24]</ref>. The verification process was carried out on 10,000 test samples.  Table <ref type="table" target="#tab_3">2</ref> lists random forests (RF) and gradient boosting machines (GB) included in the experiments with their maximum tree depth (d), number of trees (B), accuracy (A), the extended stability metrics -robustness (R), fragility (F), and breakage (B) along with the elapsed time (T) taken for verification (in seconds). Again, the column for vulnerability was omitted as it was zero in all experiments.</p><p>In regard to this case study, if we consider the EMNIST <ref type="bibr" target="#b22">[23]</ref> letter a and scaling in the range of ±10 %, with 𝜁 𝐿 = 2 and 𝜁 𝐻 = 3, 𝐼 𝐿 and 𝐼 𝐻 represent the first and last images in the verification process. We then ask the VoTE Property Checker to verify all the images in the transformed series. Figure <ref type="figure" target="#fig_7">5</ref> shows a fragment of the verification flow. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Discussions</head><p>From the experimental results in Table <ref type="table" target="#tab_1">1</ref> and Table <ref type="table" target="#tab_3">2</ref>, we note that the robustness of the learned models with respect to composite geometric perturbations is much lower compared to the standard model accuracy (which is somewhat expected). In the first case study, gradient boosting machines performed slightly better than random forests in terms of robustness. Increasing the number of trees improves accuracy but causes a drop in robustness. This indicates that the models were overfitted and that adding compositely perturbed images to the training set may improve robustness. The high numbers for fragility indicated that while these samples were identified correctly, the classifier was unable to make stable predictions on them in presence of these types of perturbations. We would also like to mention that the verification process would require a domain expert, due to the complex nature of these perturbations. For instance, if you pick a large rotation range (or a large scaling or translation range), there will be a significant number of images in the verification series that won't resemble the ground truth label. In this case, the verification process needs to be done in small ranges.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Figure 6: Comparing Verification Metrics</head><p>We repeated the experiments in the second case study by varying the scale factors as: ±2%, ±4%, ±6%, ±8%, ±10% for the largest Gradient Boosting Machine. In figure <ref type="figure">6</ref>, we can see that as the scale factor is increased, the robustness of the model decreases and the verification runtime increases. This is expected as the number of perturbed pixels increases as the scale factor is increased.</p><p>Also, while the tree ensemble models chosen for the experiments could be considered trivial, we would like to point out that the verification problem was certainly non-trivial considering the composite nature of these perturbations, and the fact that the models were trained on high-dimensional data. What is positive in this context is the fact that performing such analyses in tractable time is at all possible given the enormous search space for these perturbations.</p><p>Considering the complexity of the verification problem, it is impressive that VoTE was able to efficiently verify these models within a fraction of a second for most of the experiments. For larger models, while the verification time does increase, this issue is resolvable since due to the memory-efficient structure of the VoTE core algorithm, computations can always be parallelised (not used in our experiments).</p><p>VoTE captures multiple inputs (precisely) in the interval domain, but whenever a prediction model violates the stability property, the property checker returns a counterexample which lies in the violating region. Figure <ref type="figure" target="#fig_8">7</ref> represents a few of the many counterexamples discovered by VoTE during the verification process (MNIST-Digits). Since the perturbations are almost indistinguishable by the naked eye, they are highlighted in pink. These counterexamples can be used for debugging or further analysis to repair or retrofit the models iteratively until provable stability is eventually achieved. For instance, counterexamples used for guided model training can be embedded into a loss function to achieve better stability. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6.">Conclusions and Future Works</head><p>Applying formal verification to show that a system exhibits its intended behaviour can help increase the trustworthiness of the decisions made by intelligent systems. In this work, we show that tree-ensemble based models with comparatively good prediction accuracies perform spectacularly poorly when presented with samples containing composite geometric perturbations based on real-world phenomena.</p><p>Apart from the profound impacts on safety, these types of perturbations can also violate the security properties of classifiers which could be detrimental in safety-critical scenarios. To address this issue, the parameters in our method can be tweaked to simultaneously check for adversarial perturbations in addition to composite geometric perturbations. This represents a step towards safe and secure AI. Finally, our method can also be used with different types of models and verification engines depending on the context.</p><p>For future works, we plan on exploring different abstract domains which can be used to tightly capture these perturbations to increase the scalability of VoTE to even higher-dimensional inputs. We also plan to explore different tree selection strategies, i.e., a systematic analysis of the order in which the trees in the ensemble are analyzed in VoTE's abstraction-refinement pipeline to further enhance the efficiency of the verification process against the large nature of these perturbations.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Figure 1 :</head><label>1</label><figDesc>Figure 1: Verifier of Tree Ensembles [3] (VoTE)</figDesc><graphic coords="4,110.10,394.24,161.74,131.04" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_1"><head>Figure 2 :</head><label>2</label><figDesc>Figure 2: Composite geometric perturbations based on real-world phenomena</figDesc><graphic coords="5,132.51,476.87,116.93,136.22" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_2"><head>For a composite geometric</head><label></label><figDesc>transformation involving a combination of one affine transformation (rotation, scaling or translation) along with changes in lighting conditions where 𝜁 𝐿 and 𝜁 𝐻 are real-world darkening and brightening margins, if 𝑝 represents the initial pixel 𝑣𝑎𝑙𝑢𝑒 before the transformations, the above (single transformation) abstraction function can be redefined as, 𝛼(𝜆(𝑥)) = ((𝑚𝑖𝑛 𝜆(𝑥)−𝜁 𝐿 )−𝑝, (𝑚𝑎𝑥 𝜆(𝑥)+𝜁 𝐻 )−𝑝) (13) The redefined abstraction function returns a tuple (𝜖 𝐿 , 𝜖 𝐻 ) that represents the perturbation margins of the pixel where, 𝜖 𝐿 = (𝑚𝑖𝑛 𝜆(𝑥) − 𝜁 𝐿 ) − 𝑝 𝜖 𝐻 = (𝑚𝑎𝑥 𝜆(𝑥) + 𝜁 𝐻 ) − 𝑝 The lower and upper perturbation bounds are computed by applying the perturbation margins (𝜖 𝐿 , 𝜖 𝐻 ) to the pixel's intensity as (𝑝 + 𝜖 𝐿 , 𝑝 + 𝜖 𝐻 ) Running Example If a pixel 𝑥 in an image with an initial intensity, 𝑝 = 5 is rotated between 0°and 5°in 5 steps, from equation (11), 𝜆(𝑥) = {6, 2, 3, 1, 4} Applying equation (12), 𝛼(𝜆(𝑥)) = (1, 6) If 𝜁 𝐿 = 𝜁 𝐻 = 1, applying equation (13), 𝛼(𝜆(𝑥)) = ((1 − 1) − 5, (6 + 1) − 5) (𝜖 𝐿 , 𝜖 𝐻 ) = (−5, 2)</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_3"><head>Algorithm 1 : 4 Δ 5 for 8 𝑥 10 (</head><label>145810</label><figDesc>Feature Space Modelling: Rotation Input: Angle 1 (𝑛 1 ), Angle 2 (𝑛 2 ), Offset-High (𝜁 𝐻 ), Rot. Steps (r), Offset-Low (𝜁 𝐿 ), Images (𝐼 ) Output: Perturbation Definitions (𝜏 ) 1 𝑁 , 𝜏 ← ∅, ( ) ▷ 𝑁 = S e t o f a n g l e s t o a p p l y 2 𝑁 ← generate-set-of-angles (𝑛 1 , 𝑛 2 , 𝑟) 3 for 𝑒𝑎𝑐ℎ 𝑖𝑚𝑎𝑔𝑒, 𝑖 ∈ 𝐼 do 𝑖 ← ( ) ▷ I m a g e l e v e l p e r t u r b a t i o n s each pixel, 𝑥 ∈ 𝑖 do 6 𝜆 ← ∅ ▷ T r a n s f o r m e d p i x e l v a l u e s 7 for 𝑒𝑎𝑐ℎ 𝑎𝑛𝑔𝑙𝑒, 𝜈 ∈ 𝑁 do 𝜈 ← rotate-image (𝑖, 𝜈) 9 𝜆 ← 𝜆 ∪ {𝑥 𝜈 } 𝜖 𝐿 , 𝜖 𝐻 ) ← 𝛼 (𝜆) ▷ 𝛼 f r o m e q u a t i o n ( 1 3 ) 11 append (𝜖 𝐿 , 𝜖 𝐻 ) to Δ 𝑖 12 append Δ 𝑖 to 𝜏 Note that while Algorithm 1 contains axial rotations as the affine transformation, this can be easily adapted to the corresponding other affine transformations, i.e., scaling or translation computed by similar respective algorithms analogously.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_4"><head>Figure 3 :</head><label>3</label><figDesc>Figure 3: Verification Workflow (MNIST)</figDesc><graphic coords="6,303.03,229.24,202.54,246.66" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_5"><head></head><label></label><figDesc>(RF) and gradient boosting machines (GB) included in the experiments with their maximum tree depth (d), number of trees (B), accuracy (A), the extended stability metrics -robustness (R), fragility (F), and breakage (B) along with the elapsed time (T) taken for verification (in seconds). The column for vulnerability was omitted as it was zero in all experiments.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_6"><head>Figure 4 :</head><label>4</label><figDesc>Figure 4: Crafting Perturbations (Composite Rotations)</figDesc><graphic coords="8,93.78,295.89,194.40,151.01" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_7"><head>Figure 5 :</head><label>5</label><figDesc>Figure 5: Verification Workflow Fragment (EMNIST)</figDesc><graphic coords="8,334.71,432.34,139.19,167.70" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_8"><head>Figure 7 :</head><label>7</label><figDesc>Figure 7: Counterexamples</figDesc><graphic coords="9,95.96,312.44,190.03,112.30" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head></head><label></label><figDesc>𝑓 𝑐 ∶ 𝑥 → 𝑦 be the classifier subject to verification where 𝑥 (the image) is an n-array vector consisting of individual pixels, and 𝑦 is the output class or label. 𝜖 𝐿 , 𝜖 𝐻 ∈ ℝ are lower and upper perturbation margins (𝜖 𝐿 &lt; 𝜖 𝐻 ) associated with an individual pixel in the image. If 𝑛 is the number of pixels in the image,</figDesc><table /><note>Δ = ((𝜖 𝐿 1 , 𝜖 𝐻 1 ), ..., (𝜖 𝐿 𝑛 , 𝜖 𝐻 𝑛 )) is a sequence containing tuples of pixel-level perturbation margins associated with the image. 𝜔 is an n-tuple of perturbations, with 𝜔 = (𝜔 1 , ..., 𝜔 𝑛 ) and 𝜖 𝐿 𝑖 ⩽ 𝜔 𝑖 ⩽ 𝜖 𝐻 𝑖 . If we consider a test sample 𝑥 and all possible variations of 𝜔, the classifier 𝑓 𝑐 is stable on that sample (denoted by 𝐼 𝑠𝑆𝑡𝑎𝑏𝑙𝑒(𝑓 𝑐 , 𝑥)) if and only if 𝑓 𝑐 (𝑥) = 𝑓 𝑐 (𝑥 ′ ) , where 𝑥 ′ = 𝑥 + 𝜔</note></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_1"><head>Table 1 :</head><label>1</label><figDesc>Results (Combined Rotations and Lighting) Table 1 lists random forests</figDesc><table><row><cell cols="2">Param.</cell><cell cols="2">A (%)</cell><cell cols="2">R (%)</cell><cell cols="2">F (%)</cell><cell cols="2">B (%)</cell><cell></cell><cell>T (s)</cell></row><row><cell>d</cell><cell>B</cell><cell>RF</cell><cell>GB</cell><cell>RF</cell><cell>GB</cell><cell>RF</cell><cell>GB</cell><cell>RF</cell><cell>GB</cell><cell>RF</cell><cell>GB</cell></row><row><cell>3</cell><cell>3</cell><cell>51.26</cell><cell>61.75</cell><cell>14.36</cell><cell>13.61</cell><cell>36.90</cell><cell>48.14</cell><cell>48.74</cell><cell>38.25</cell><cell>0.21</cell><cell>0.40</cell></row><row><cell>3</cell><cell>5</cell><cell>57.26</cell><cell>71.60</cell><cell>7.03</cell><cell>15.81</cell><cell>50.23</cell><cell>55.82</cell><cell>42.74</cell><cell>28.37</cell><cell>0.22</cell><cell>0.26</cell></row><row><cell>5</cell><cell>5</cell><cell>76.59</cell><cell>79.56</cell><cell>2.75</cell><cell>40.89</cell><cell>73.84</cell><cell>38.67</cell><cell>23.41</cell><cell>20.44</cell><cell>0.22</cell><cell>0.26</cell></row><row><cell>5</cell><cell>10</cell><cell>80.23</cell><cell>87.16</cell><cell>0.69</cell><cell>31.00</cell><cell>79.54</cell><cell>56.16</cell><cell>19.77</cell><cell>12.84</cell><cell>0.23</cell><cell>0.56</cell></row><row><cell>5</cell><cell>15</cell><cell>81.95</cell><cell>89.94</cell><cell>0.88</cell><cell>23.74</cell><cell>81.07</cell><cell>66.20</cell><cell>18.05</cell><cell>10.06</cell><cell>0.36</cell><cell>13.33</cell></row><row><cell>5</cell><cell>20</cell><cell>83.20</cell><cell>91.61</cell><cell>1.37</cell><cell>20.21</cell><cell>81.83</cell><cell>71.40</cell><cell>16.80</cell><cell>8.39</cell><cell>1.81</cell><cell>168.12</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_3"><head>Table 2 :</head><label>2</label><figDesc>Results (Combined Scaling and Lighting)</figDesc><table /></figure>
		</body>
		<back>

			<div type="acknowledgement">
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Acknowledgements</head><p>This work was partially supported by the Wallenberg AI, Autonomous Systems and Software Program (WASP) funded by the Knut and Alice Wallenberg Foundation.</p></div>
			</div>

			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<monogr>
		<title level="m" type="main">Random Forests. Machine Learning</title>
		<author>
			<persName><forename type="first">L</forename><surname>Breiman</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2004">2004</date>
			<biblScope unit="volume">45</biblScope>
			<biblScope unit="page" from="5" to="32" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">Greedy function approximation: A gradient boosting machine</title>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">H</forename><surname>Friedman</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Annals of Statistics</title>
		<imprint>
			<biblScope unit="volume">29</biblScope>
			<biblScope unit="page" from="1189" to="1232" />
			<date type="published" when="2001">2001</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">An Abstraction-Refinement Approach to Formal Verification of Tree Ensembles</title>
		<author>
			<persName><forename type="first">J</forename><surname>Törnblom</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Nadjm-Tehrani</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">SAFECOMP Workshops</title>
				<imprint>
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">Soft decision trees</title>
		<author>
			<persName><forename type="first">O</forename><surname>Irsoy</surname></persName>
		</author>
		<author>
			<persName><forename type="first">O</forename><forename type="middle">T</forename><surname>Yildiz</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Alpaydin</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 21st International Conference on Pattern Recognition</title>
				<meeting>the 21st International Conference on Pattern Recognition</meeting>
		<imprint>
			<date type="published" when="2012">2012. ICPR2012</date>
			<biblScope unit="page" from="1819" to="1822" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">Trustworthy AI</title>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">M</forename><surname>Wing</surname></persName>
		</author>
		<idno type="DOI">10.1145/3448248</idno>
		<ptr target="https://-doi.org/10.1145/3448248" />
	</analytic>
	<monogr>
		<title level="m">Communications of the ACM</title>
				<imprint>
			<publisher>ACM</publisher>
			<date type="published" when="2021">2021</date>
			<biblScope unit="volume">64</biblScope>
			<biblScope unit="page" from="64" to="71" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">Scikit-learn: Machine Learning in Python</title>
		<author>
			<persName><forename type="first">F</forename><surname>Pedregosa</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Varoquaux</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Gramfort</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Michel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Thirion</surname></persName>
		</author>
		<author>
			<persName><forename type="first">O</forename><surname>Grisel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Blondel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Louppe</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Prettenhofer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Weiss</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><forename type="middle">J</forename><surname>Weiss</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Vanderplas</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Passos</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Cournapeau</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Brucher</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Perrot</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Duchesnay</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">J. Mach. Learn. Res</title>
		<imprint>
			<biblScope unit="volume">12</biblScope>
			<biblScope unit="page" from="2825" to="2830" />
			<date type="published" when="2011">2011</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">CatBoost: unbiased boosting with categorical features</title>
		<author>
			<persName><forename type="first">L</forename><surname>Prokhorenkova</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Gusev</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Vorobev</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">V</forename><surname>Dorogush</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Gulin</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Advances in neural information processing systems</title>
				<imprint>
			<date type="published" when="2018">2018</date>
			<biblScope unit="volume">31</biblScope>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">Attribute-Guided Adversarial Training for Robustness to Natural Perturbations</title>
		<author>
			<persName><forename type="first">T</forename><surname>Gokhale</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Anirudh</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Kailkhura</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">J</forename><surname>Thiagarajan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Baral</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Yang</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">AAAI</title>
		<imprint>
			<date type="published" when="2021">2021</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">Abstract Interpretation of Decision Tree Ensemble Classifiers</title>
		<author>
			<persName><forename type="first">F</forename><surname>Ranzato</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Zanella</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">AAAI</title>
		<imprint>
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<monogr>
		<title level="m" type="main">Robustness of Machine Learning Models Beyond Adversarial Attacks</title>
		<author>
			<persName><forename type="first">S</forename><surname>Scher</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Trügler</surname></persName>
		</author>
		<idno>ArXiv, abs/2204.10046</idno>
		<imprint>
			<date type="published" when="2022">2022</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<analytic>
		<title level="a" type="main">A survey of safety and trustworthiness of deep neural networks: Verification, testing, adversarial attack and defence, and interpretability</title>
		<author>
			<persName><forename type="first">X</forename><surname>Huang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Kroening</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Ruan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Sharp</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Sun</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Thamo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Wu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Yi</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Comput. Sci. Rev</title>
		<imprint>
			<biblScope unit="volume">37</biblScope>
			<biblScope unit="page">100270</biblScope>
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<monogr>
		<title level="m" type="main">Exploring the Landscape of Spatial Robustness</title>
		<author>
			<persName><forename type="first">L</forename><surname>Engstrom</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Tran</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Tsipras</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Schmidt</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Madry</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2019">2019</date>
			<publisher>ICML</publisher>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b12">
	<analytic>
		<title level="a" type="main">Towards Practical Verification of Machine Learning: The Case of Computer Vision Systems</title>
		<author>
			<persName><forename type="first">K</forename><surname>Pei</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Cao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Yang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">S</forename><surname>Jana</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">ICSE Workshop on Testing for Deep Learning and Deep Learning for Testing</title>
				<imprint>
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b13">
	<analytic>
		<title level="a" type="main">Towards Verifying Robustness of Neural Networks Against A Family of Semantic Perturbations</title>
		<author>
			<persName><forename type="first">J</forename><surname>Mohapatra</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Weng</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Liu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Daniel</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)</title>
				<imprint>
			<date type="published" when="2020">2020. 2020</date>
			<biblScope unit="page" from="241" to="249" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b14">
	<monogr>
		<title level="m" type="main">Certifying Geometric Robustness of Neural Networks</title>
		<author>
			<persName><forename type="first">M</forename><surname>Balunovic</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Baader</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Singh</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Gehr</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">T</forename><surname>Vechev</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2019">2019</date>
			<publisher>NeurIPS</publisher>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b15">
	<monogr>
		<title level="m" type="main">Toward Certified Robustness Against Real-World Distribution Shifts</title>
		<author>
			<persName><forename type="first">H</forename><surname>Wu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Tagomori</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Robey</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Yang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Matni</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Pappas</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Hassani</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><forename type="middle">S</forename><surname>Pasareanu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Barrett</surname></persName>
		</author>
		<idno>ArXiv, abs/2206.03669</idno>
		<imprint>
			<date type="published" when="2022">2022</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b16">
	<monogr>
		<title level="m" type="main">Provable Defense Against Geometric Transformations</title>
		<author>
			<persName><forename type="first">R</forename><surname>Yang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Laurel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Misailovic</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Singh</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2207.11177</idno>
		<imprint>
			<date type="published" when="2022">2022</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b17">
	<analytic>
		<title level="a" type="main">Reluplex: An Efficient SMT Solver for Verifying Deep Neural Networks</title>
		<author>
			<persName><forename type="first">G</forename><surname>Katz</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><forename type="middle">W</forename><surname>Barrett</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><forename type="middle">L</forename><surname>Dill</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><forename type="middle">D</forename><surname>Julian</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">J</forename><surname>Kochenderfer</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">International Conference on Computer Aided Verification</title>
				<imprint>
			<date type="published" when="2017">2017</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b18">
	<analytic>
		<title level="a" type="main">Achieving Robustness in the Wild via Adversarial Mixing With Disentangled Representations</title>
		<author>
			<persName><forename type="first">S</forename><surname>Gowal</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Qin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Huang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Cemgil</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Dvijotham</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><forename type="middle">A</forename><surname>Mann</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Kohli</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)</title>
				<imprint>
			<date type="published" when="2020">2020. 2020</date>
			<biblScope unit="page" from="1208" to="1217" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b19">
	<analytic>
		<title level="a" type="main">A Multi-Scale Convolutional Neural Network for Rotation-Invariant Recognition</title>
		<author>
			<persName><forename type="first">T</forename><surname>Hong</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Hu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Yin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Wang</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Electronics</title>
		<imprint>
			<date type="published" when="2022">2022</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b20">
	<monogr>
		<title level="m" type="main">ReF -Rotation Equivariant Features for Local Feature Matching</title>
		<author>
			<persName><forename type="first">A</forename><surname>Peri</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Mehta</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Mishra</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Milford</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Garg</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><forename type="middle">M</forename><surname>Krishna</surname></persName>
		</author>
		<idno>ArXiv, abs/2203.05206</idno>
		<imprint>
			<date type="published" when="2022">2022</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b21">
	<analytic>
		<title level="a" type="main">Gradient-based learning applied to document recognition</title>
		<author>
			<persName><forename type="first">Y</forename><surname>Lecun</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Bottou</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Bengio</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Haffner</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. IEEE</title>
				<meeting>IEEE</meeting>
		<imprint>
			<date type="published" when="1998">1998</date>
			<biblScope unit="volume">86</biblScope>
			<biblScope unit="page" from="2278" to="2324" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b22">
	<monogr>
		<title level="m" type="main">EMNIST: an extension of MNIST to handwritten letters</title>
		<author>
			<persName><forename type="first">G</forename><surname>Cohen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Afshar</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">C</forename><surname>Tapson</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">V</forename><surname>Schaik</surname></persName>
		</author>
		<idno>ArXiv, abs/1702.05373</idno>
		<imprint>
			<date type="published" when="2017">2017</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b23">
	<monogr>
		<title level="m" type="main">Keras: The Python Deep Learning library</title>
		<author>
			<persName><forename type="first">F</forename><surname>Chollet</surname></persName>
		</author>
		<ptr target="https://keras.io" />
		<imprint>
			<date type="published" when="2018">2018</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b24">
	<monogr>
		<title level="m" type="main">Evidence: using safety cases in industry and healthcare</title>
		<author>
			<persName><forename type="first">G</forename><forename type="middle">M</forename><surname>Cleland</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">A</forename><surname>Sujan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Habli</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Medhurst</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2012">2012</date>
			<publisher>The Health Foundation</publisher>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b25">
	<monogr>
		<title level="m" type="main">Scaling up Memory-Efficient Formal Verification Tools for Tree Ensembles</title>
		<author>
			<persName><forename type="first">J</forename><surname>Törnblom</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Nadjm-Tehrani</surname></persName>
		</author>
		<idno>ArXiv, abs/2105.02595</idno>
		<imprint>
			<date type="published" when="2021">2021</date>
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
