CAISAR: A platform for Characterizing Artificial
Intelligence Safety and Robustness
Julien Girard-Satabin1,† , Michele Alberti1,† , François Bobot1 , Zakaria Chihani1 and
Augustin Lemesle1
1
    Université Paris-Saclay, CEA, List, F-91120, Palaiseau, France


                                             Abstract
                                             We present CAISAR, an open-source platform under active development for the characterization of AI systems’ robustness and
                                             safety. CAISAR provides a unified entry point for defining verification problems by using WhyML, the mature and expressive
                                             language of the Why3 verification platform. Moreover, CAISAR orchestrates and composes state-of-the-art machine learning
                                             verification tools which, individually, are not able to efficiently handle all problems but, collectively, can cover a growing number
                                             of properties. Our aim is to assist, on the one hand, the V&V process by reducing the burden of choosing the methodology
                                             tailored to a given verification problem, and on the other hand the tools developers by factorizing useful features – visualization,
                                             report generation, property description – in one platform. CAISAR will soon be available at https://git.frama-c.com/pub/caisar.

                                             Keywords
                                             formal verification, platform, formal methods, testing, metamorphic testing, reachability analysis, satisfaction modulo theory,
                                             artificial intelligence,


1. Introduction                                                                                                                       graded or abandoned. These tools use different techniques,
                                                                                                                                      different input formats, handle different ML artefacts and,
The integration of machine learning programs as compo-                                                                                most importantly, have varying performances depending
nents of critical systems is said to be bound to happen;                                                                              on the problems. Our goal of orchestrating multiple tools
initiatives from various private and govermental actors                                                                               aims at maximizing the property coverage in the context
(e.g., US’ NSF funding for Trustworthy AI1 , France’s                                                                                 of a validation and verification process.
Grand Défi IA de confiance2 ) are a consequence of that                                                                                  To this end, we present a platform dedicated to Char-
fact. Trusting such programs is thus becoming a crucial                                                                               acterizing Artificial Intelligence Safety and Robustness
issue, both on technical and ethical sides.                                                                                           (CAISAR) that aims to unify several formal methods and
   A possible approach to trust is formal test and verifica-                                                                          tools, at the input, through the use of a mature and expres-
tion, a broad set of techniques and tools that have been                                                                              sive property description and proof orchestration language,
applied to software safety for several decades. These for-                                                                            at the output, through the factorization of features such
mal methods build on sound mathematical foundations                                                                                   as visualization and report generation, and at the usage,
to assess the behaviour of programs in a principled way,                                                                              through shared heuristics and interconnections between
be it for generating tests or providing proven guarantees.                                                                            tools.
For the last few years, several independent works have                                                                                   The answer to the question “To what extent can I trust
started to investigate the possible applications of formal                                                                            my machine learning program?” has many components,
verification to machine learning program verification and                                                                             ranging from data analysis to decision explainability. One
its limitations. This led to what could be characterized as                                                                           such important components is dealing with verification
a Cambrian explosion of tools aiming to solve a particular                                                                            and validation, and we wish to make CAISAR an im-
subset of the machine learning verification field. In less                                                                            portant element in the safety toolbelt by covering these
than five years, more than 20 tools were produced, up-                                                                                applications.
                                                                                                                                         In the following, we will first present the design princi-
The IJCAI-ECAI-22 Workshop on Artificial Intelligence Safety
(AISafety 2022), July 24-25, 2022, Vienna, Austria
                                                                                                                                      ples of CAISAR and state its main goals. We will follow
†
  These authors contributed equally.
                                                                                                                                      by a description of its most prominent features, as well
   julien.girard2@cea.fr (J. Girard-Satabin); michele.alberti@cea.fr                                                                  as its limitations. We will then explain the position of
(M. Alberti); francois.bobot@cea.fr (F. Bobot);                                                                                       CAISAR regarding other tools for formal verification of
zakaria.chihani@cea.fr (Z. Chihani); augustin.lemesle@cea.fr                                                                          machine learning programs, and conclude by presenting
(A. Lemesle)                                                                                                                          some future work and possible research problems.
 0000-0001-6374-3694 (J. Girard-Satabin)
                                       © 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License
                                       Attribution 4.0 International (CC BY 4.0).
    CEUR
    Workshop
            CEUR Workshop Proceedings (CEUR-WS.org)
    Proceedings
                  http://ceur-ws.org
                  ISSN 1613-0073


1
  https://www.nsf.gov/pubs/2022/nsf22502/nsf22502.htm
2
  https://www.gouvernement.fr/grand-defi-securiser-certifier-et-
  fiabiliser-les-systemes-fondes-sur-l-intelligence-artificielle
2. Core principles of CAISAR                                  confidence on the studied system. In order for users to
                                                              trust CAISAR as well, it needs to rely on well-known and
The aim for CAISAR is to provide a verification envi- approved principles and technologies. It is developed in
ronment for Artificial Intelligence (AI) based systems OCaml, a strongly typed programming language, used
tailored to different needs. The profusion of tools for AI- to develop tools for program verification and validation.
programs certification offers numerous possibilities, from Such tools include Coq [1], a proof assistant that for in-
the choice of technology (formal methods, test generation) stance used to develop CompCert[2], a C compiler that
to the scope of properties to check (coverage, suitability to is guaranteed to output C-ANSI compliant source code,
a given distribution, robustness). However, with increased Frama-C [3], a platform for the static analysis of C code,
possibilites comes the burden of choice. Which method and Why3 [4], a platform for deductive verification of
better suits a given use case? Are the results provided programs.
by this particular method trustworthy enough? How to
bring trust in the process of selecting, tailoring and com-
puting results of a given tool? How to evaluate a given
                                                              2.3. Tools composition
tool against others? Those are the questions that we aim Some works are starting to combine multiple tech-
to answer with CAISAR.                                        niques [5] for their analysis, using an exact MILP solver
                                                              to refine bounds obtained by abstract interpretation. Our
2.1. Compatibility with existing and                          goal with CAISAR is to bring tool composition to an-
                                                              other level. For instance, metamorphic transformations
        future methods
                                                              could generate different input space partitions for formal
The first principle of CAISAR is to ease this burden of verifiers. A reachability analysis tool could be called nu-
choice by automating parts of it. CAISAR aims to provide merous times with tighter bounds until reaching a precise
a unique interface for vastly different tools, with a single enough answer. Coverage testing objectives could be ex-
entry point to specify verification goals. Choosing which tracted from reachability analysis tools and fed to test
tool to use is an informed decision that may not be relevant generators. CAISAR will be more than the sum of its part,
for the user; the goal is to provide an actionable answer on allowing communication between vastly different tools to
the safety of the system, by using whatever tool is suitable provide faster and more accurate answers.
for the problem. Ideally, the user should not be bothered
with deciding which tool is suitable for their use case: 2.4. Automatic proposal of verification
CAISAR will automatically figure out how to express the
given property to suit verifiers. As AI systems pipelines
                                                                     strategies
are becoming more and more complex, it is crucial for A long-term goal for CAISAR is to provide a reason-
CAISAR to handle this complexity. Currently, CAISAR ing engine where past verification problems processed
supports neural networks and support vector machines, by CAISAR can inform next ones, gradually building a
and an industrial benchmark of an ensemble model (NN- knowledge base that is suitable for the specific needs of
SVM), which we are unable to further discuss, is being the user. CAISAR will also implement its own built-in
used as a concrete real-word use-case.                        heuristics to supplement specialized programs that do not
                                                              implement them.
2.2. Common modelling and answers
Existing verifiers rely on different decision procedures,    3. Architecture and features
e.g., Mixed Integer Linear Programming (MILP), abstract
interpretation, or Satisfaction Modulo Theory (SMT) cal-     CAISAR’s architecture can be divided into the following
culus. Modelling a verification problem using these frame-   functional blocks:
works require different skills and is time-consuming; for        1. A Common specification Language (CL)
instance, some modelling choices made for MILP may not           2. A Proof Obligation Generator (POG), associated
be applied under SMT. Moreover, even if one succeeds in             with a Dispatcher (DISP)
phrasing a verification problem under multiple decision
                                                                 3. An Intermediate Representation (IR)
procedures, the different results may not be immediately
                                                                 4. A visualization module (VIZ)
comparable.
   CAISAR aims to provide a common ground for inputs         See fig. 1 for a visual depiction of dependencies between
and outputs, which will lead to an easier comparison,        blocks.
lower time consumption and an informed decision. Fur-
thermore, collecting and presenting the user with multiple
answers from different techniques can provide additional
                                                               Ada programs [6]. This global expressiveness and safety
                                                               allows to write 𝒱 once and for all, independently of the
                                                               verifier. For instance, Figure 2 shows the definition of
                                                               robustness against a perturbation of amplitude 𝜀 using the
                                                               𝑙∞ distance within CAISAR’s standard library, and fig. 3
                                                               the WhyML file the user need to write in order to verify
                                                               the robustness of a given TestSVM against a perturbation
                                                               of amplitude 0.5. Note that all the necessary element to
                                                               define 𝒱, namely 𝑓 , 𝒳 , 𝒫 and 𝒬, are defined in those
                                                               files: 𝑓 is the function TestSVM, and 𝒬 is the predicate
                                                               itself. Note that WhyML is not limited to the robustness
                                                               against a given perturbation property, often met in the
                                                               literature. For instance, asserting that a neural network
                                                               respect the properties of being differentialy private [7]
                                                               or respecting causal fairness [8] is something that could
                                                               be phrased as WhyML programs, since those properties
                                                               have a mathematical characterization. Finally, WhyML
                                                               does not constrain the form of 𝒫 nor 𝒬. In particular, it is
                                                               possible to define multiple verification goals in the same
                                                               𝒱, opening the way to subdivide it into subproblems, and
                                                               providing answers at each step.
                                                                  CAISAR then automatically translates 𝒱 into a format
                                                               supported by the selected verifiers, through a succession
                                                               of encoding transformations and simplifications. For in-
                                                               stance, some verifiers are best used when trying to falsify
                                                               the property: instead of checking
Figure 1: CAISAR overall architecture
                                                                            ∀𝑥 ∈ 𝒳 , 𝒫(𝑥) ⇒ 𝒬(𝑓 (𝑥), 𝑥)

                                                               they instead try to satisfy the negation
3.1. Specification language and
                                                                            ∃𝑥 ∈ 𝒳 , 𝒫(𝑥) ∧ ¬𝒬(𝑓 (𝑥), 𝑥)
     verification predicates
A typical task for program verification involves to solve      This transformation is embedded in CAISAR, when call-
a verification problem. A verification problem consists        ing Marabou: a verification problem 𝒱 can be trans-
                                                                                              ′
on checking that a program with a given set of inputs is       formed into an equivalent one 𝒱 that can be dispatched
meeting certain expectations on its outputs. More for-         to Marabou.
mally, let 𝒳 be an input space, 𝒫(·) be a property on the
input space, 𝑓 be a program, and 𝒬(·, ·) be a property         3.2. Proof Obligations Generations for
on the output space. By property, we mean a statement
                                                                    various tools
that describes a desirable behaviour for the program. Let
𝒱 = (𝒳 , 𝑓, 𝒫, 𝒬) be a verification problem. The goal is       CAISAR currently supports a variety of tools and tech-
to verify the following property:                              niques: metamorphic testing, reachability analysis based
                                                               on abstract interpretation and constraint-based propaga-
             ∀𝑥 ∈ 𝒳 , 𝒫(𝑥) ⇒ 𝒬(𝑓 (𝑥), 𝑥)                       tions. CAISAR can analyze neural networks and support
                                                               vector machines. This versatility allows for CAISAR to
To write 𝒱, one needs to be able to express concisely and      verify system components using different machine learn-
without ambiguity each component: the program to verify        ing architectures.
𝑓 , the properties 𝒫 and 𝒬, and the dataset 𝒳 . To this end,
CAISAR provides full support for the WhyML specifi-
                                                               Marabou
cation and programming language [4]. WhyML is a lan-
guage with static strong typing, pattern matching, types in-   Marabou [9] is a deep neural network verification com-
variants and inductive predicates. This gives WhyML pro-       plete verifier. Its core routine relies on a modified sim-
grams a sound semantic as logical propositions. WhyML          plex algorithm that lazily relaxes constraints on piecewise
is at the core of the Why3 verification platform, and has      linear activation functions. Marabou also makes use of
been used as an intermediate language for verification of      several heuristics that help speeding up the verification
procedure, like relying on tight convex overapproxima-             use TestSVM . SVMasArray
tions [10] or sound overapproximations [11]. It can                use i e e e _ f l o a t . F l o a t 6 4
answer reachability and conjunction of linear constraint           use c a i s a r .SVM
queries. Marabou ranked fifth at the VNN-COMP 2021.
It is currently in active development.                             g o a l G: f o r a l l a : i n p u t _ t y p e .
                                                                   r o b u s t _ t o svm_apply a ( 0 . 5 : t )
SAVer
The Support Vector Machine reachability analysis tool           Figure 3: Example verification problem specified to
SAVer [12] is specialized in the verification of support        CAISAR. The program to verify is TestSVM, the input
vector machines (SVM), a popular machine learning algo-         space is defined by the elements in a, the output space is
                                                                the result of the application of the function svm_apply.
rithm used alongside neural networks for classification or
regression tasks. SAVer can answer reachability queries,
and supports a variety of SVM configurations. This tool
was selected for support as, to the best of our knowledge,      the VNNLIB standard, used in the VNN-COMP, uses a
it is the first one to deal with verification of SVM.           subset of SMTLIB2, which paves the way for the support
                                                                of future tools in CAISAR.
Alt-Ergo and SMTLIB compliant solvers
                                                                Python Reachability Assessment Tool
Existing general purpose SMT solvers for program verifi-
cation like Alt-Ergo [13] or Z3 [14] all support a standard     The Python Reachability Assessment Tool (PyRAT) is a
input language, SMTLIB [15]. CAISAR leverages Why3              static analyzer targeting specifically neural networks. It
existing support for SMT solvers and can translate neural       builds upon the framework of abstract interpretation [16]
network control flows directly into SMTLIB compliant            using abstract domains adapted for the approximation
strings using its intermediate representation, which allows     of the reachable space in a neural network. Three main
the support of a variety of off-the-shelf solvers. Note that    domains are used: intervals with symbolic relations as
                                                                described in [17, 18], zonotopes [19] and Deep poly do-
                                                                main [5].
type i n p u t _ t y p e = i n t −> t                              For low dimensional inputs, PyRAT use input parti-
type o u t p u t _ t y p e = i n t                              tioning as described in [18], with heuristics tailored to
type model = {                                                  relational domains: the zonotope domain and the deep-
               app :                                            poly domain with backsubstitution. Those heuristics allow
               i n p u t _ t y p e −> o u t p u t _ t y p e ;   the computation of a non-trivial (e.g., not just widest in-
               num_input : i n t ;
                                                                terval first) score ranking the inputs by their estimated
               num_classes : i n t
                                                                influence on the outputs. PyRAT has comparable results
}
                                                                to state-of-the-art analyzers on the widely used ACAS-
predicate d i s t _ l i n f                                     Xu [20] benchmark, and outperforms the similar domains
  (a : input_type )                                             of ERAN on S-shape activations functions such as the
  (b : input_type )                                             sigmoid or hyperbolic tangent functions with specific ap-
  ( eps : t )                                                   proximations.
  (n: int ) =
  f o r a l l i . 0 <= i < n −>
                                                                AIMOS: a Metamorphic testing utility
     . − eps . < a i . − b i . < eps
                                                          AI Metamorphism Observing Software (AIMOS) is a
predicate robust_to                                       software developped at the same time as CAISAR, aiming
  ( model : model )                                       to provide metamorphic properties testing or perturbations
  (a : input_type )
                                                          on a dataset for a given AI model. Metamorphic testing
  ( eps : t ) =
                                                          is a testing technique relying on properties symmetries
  f o r a l l b . d i s t _ l i n f a b eps
  model . num_input −>                                    and invariance on the operating domain. See [21] for a
  model . app a = model . app b                           comprehensive survey on this approach. AIMOS offers
                                                          tools to derive properties from a set of transformations on
                                                          the inputs: given 𝒫, 𝒬, 𝒳 and a transformation function
                                                                                                                    ′
Figure 2: An example of a predicate in CAISAR’s stan- 𝑡𝜃 : 𝑥 ∈ 𝒳 ↦→ 𝒳 , it generates a set of new properies 𝒬
dard library: being “robust to” against a perturbation of that are coherent with the transformation. As an example,
amplitude 𝜀. Here, the predicate defines 𝒬(𝒴).            a symmetry on the inputs of a classification model could
result on a symmetry on the outputs; AIMOS would then           3.4. Answer composition
automatically modify the property to check against the
                                                                CAISAR currently offer two ways to compose verifiers.
symmetrical labels.
                                                                First, CAISAR can launch several solvers on the same task
   AIMOS can generate test cases scenarios from the
                                                                and compose their answer: it can then provide a summary
most common input transformations (geometrical rota-
                                                                stating which solver succeeded and which one failed. Sec-
tions, noise addition); others can be added if necessary.
                                                                ond, CAISAR has the ability to verify pipelines that are
AIMOS was evaluated on a metamorphic property on the
                                                                composed of several machine learning programs: for in-
ACAS-Xu benchmark. The aim of the property was to
                                                                stance, a pipeline composed of several neural networks, or
evaluate the ability of neural networks trained on ACAS
                                                                a neural network which outputs are processed by a SVM.
to generalize with symmetric inputs. Given a symmetry
                                                                CAISAR can be used to state an overall verification goal,
on inputs, AIMOS generates the expected symmetrical
                                                                and to model that the outputs of a block of the pipeline
output, and tests models against the base and symmetrical
                                                                are the inputs of another block. More advanced methods
outputs. See table 1 for results. AIMOS was able to show
                                                                of composition, such as automatic subgoals generation
that neural networks trained on ACAS have a low, but
                                                                or refinement by multiple analysis constitute a promising
noticeable sensitivity to symmetry on one input.
                                                                research venue.

Table 1
Average number of same answer for all 45 models of the          4. Background & related works
ACAS-Xu benchmark, computed by AIMOS. First column
denotes values presented in the benchmark. Second               In less than five years, a profusion of tools and techniques
column denotes the percentage of identical answer given.        leveraging formal verification to provide trust on neural
              Previous answer   Identical answer                network sprouted [22, 9, 18, 23, 19, 5, 24, 25, 26, 27,
                    COC              89.7%                      28, 8, 29]. See [30, 31] for more comprehensive surveys
                     WL              95.9%                      on the verification and validation of machine learning
                    WR               99.6%                      programs.
                      L              95.3%                         As for general purpose verification platforms, exam-
                     R               99.8%                      ples include the Why3 deductive verification platform and
                                                                the Frama-C [3] C static analysis platform. We leverage
                                                                multiple existing features of Why3, such as the WhyML
3.3. Supported formats                                          language support, transformation and rewriting engine.
                                                                Why3 and Frama-C both lack the interfaces and tooling to
CAISAR supports all input formats used by its inte-             handle neural network. Experiments we conducted involv-
grated verifiers. Most verifiers require either a framework-    ing the EVA Frama-C plugin applied on simple reachabil-
specific binary (Pytorch’s pth, Tensorflow tf), a custom        ity analysis properties showed a lack of scalability that a
description language (NNet), or an Open Neural Net-             naive python reachability analysis tool, specialized in neu-
work eXchange (ONNX) 3 file. CAISAR is able to parse            ral networks, was able to overcome quickly. Conversion of
any of these input formats and extract useful metadata          neural networks in C programs that were scalable for EVA
for the building of the verification strategy. It can also      presented difficult challenges. The differing structure be-
output a verification problem into the SMTLIB [15] for-         tween C programs and neural networks implies differing
mat, supported by all general purpose solvers, as well          verification problems. Thus, it seems more fruitful to
as in the ONNX format. The VNN-Lib initiative 4 pro-            investigate a specialized platform for machine learning
vides a standard format for verification problems that          programs.
relies on SMTLIB; thus CAISAR also supports VNN-Lib.               The ProDeep platform [32] aims to regroup several ver-
CAISAR aims for maximum interoperability, and can be            ifiers under a single user interface. It provides a single
used as a hub to write and convert verification queries         entry point, supports input formats and offers numerous
adapted to different verifiers. Additionally, verifiers some-   visualization tools. It does not aim to provide other proper-
times require datasets to verify properties against, espe-      ties than those that are natively supported by its embedded
cially reachability analysis tools. As such, CAISAR cur-        verifiers. It also supports a fixed set of datasets. They
rently supports datasets as flattened features under a csv      make use of DeepG [33] to generate constraints for ver-
file, and RGB images.                                           ifiers, effectively combining tools. Their scope seems
                                                                limited to neural networks, whereas CAISAR currently
                                                                supports neural networks and support vector machines,
                                                                and aims to support a wider set of machine learning mod-
3
    https://onnx.ai/                                            els.
4
    http://www.vnnlib.org/
   The most similar work to CAISAR is the DNNV plat-             instance out-of-distribution detection, is another future
form [34]. As CAISAR, DNNV provides support to var-              work.
ious state-of-the-art verifiers. It similarly aims to be a          Finally, to further help the user to select the optimal set
hub for neural network verification by supporting a wide         of tools for its verification problem, a long-term goal of
range of input and output formats, and by providing a mod-       CAISAR is to provide a verification helper to design opti-
elling language for properties specification and discharge       mal queries for verification problems based on previous
to capable provers. Their Domain Specific Language,              runs.
DNNP, is built on Python; while CAISAR’s specifica-
tion language, WhyML, is already used in several formal
verification platforms and provide additional theoretical        References
guarantees, which is a key component to provide trust. As
stated before, WhyML allows specifying multiple verifi-           [1] G. Huet, G. Kahn, C. Paulin-Mohring, The Coq
cation goals in the same verification problem, which helps            Proof Assistant : A Tutorial : Version 7.2, Research
modelling more complex use cases.                                     Report RT-0256, INRIA, 2002. URL: https://hal.
   The main difference between CAISAR and DNNV is                     inria.fr/inria-00069918, projet COQ.
that the latter does not combine verifiers answers, that          [2] X. Leroy, S. Blazy, D. Kästner, B. Schommer,
is to say there is (at the time of writing) no feature that           M. Pister, C. Ferdinand, Compcert-a formally veri-
aims to interoperate verifiers: from the DNNV documen-                fied optimizing compiler, in: ERTS 2016: Embed-
tation5 : "DNNV standardizes the network and property                 ded Real Time Software and Systems, 8th European
input formats to enable multiple verification tools to run            Congress, 2016.
on a single network and property. This facilitates both           [3] P. Baudin, F. Bobot, D. Bühler, L. Correnson,
verifier comparison, and artifact re-use." As verifiers are           F. Kirchner, N. Kosmatov, A. Maroneze, V. Perrelle,
becoming more and more sophisticated and specialized,                 V. Prevosto, J. Signoles, N. Williams, The dogged
combination of methods will become even more fruitful,                pursuit of bug-free C programs: The Frama-C soft-
and we expect this to be a key difference with DNNV.                  ware analysis platform, Communications of the
                                                                      ACM 64 (2021) 56–68. doi:10.1145/3470569.
                                                                  [4] J.-C. Filliâtre, A. Paskevich, Why3 - Where Pro-
5. Conclusion & future works                                          grams Meet Provers, in: M. Felleisen, P. Gardner
                                                                      (Eds.), Programming Languages and Systems, Lec-
As the field of machine learning verification is blooming,            ture Notes in Computer Science, Springer, Berlin,
choosing the right tool for the right verification problem            Heidelberg, 2013, pp. 125–128. doi:10.1007/
becomes more and more tedious. We presented CAISAR,                   978-3-642-37036-6_8.
a platform aimed to alleviate this difficulty by presenting       [5] G. Singh, T. Gehr, M. Püschel, M. Vechev, An
a single, extensible entry point to machine learning veri-            abstract domain for certifying neural networks, Pro-
fication problem modelling and solving. Plenty of work                ceedings of the ACM on Programming Languages
still needs to be done, however.                                      3 (2019) 1–30.
   Altough CAISAR already integrates some state-of-the-           [6] J. Guitton, J. Kanig, Y. Moy, why hi-lite ada, Rustan,
art tools, other verifiers that ranked high in the VNN-               et al.[32] (2011) 27–39.
COMP are on the way of integration. Such verifiers in-            [7] M. Abadi, A. Chu, I. Goodfellow, B. McMahan,
clude 𝛼, 𝛽-CROWN [23, 35], who scored first on said                   I. Mironov, K. Talwar, L. Zhang, Deep Learning
competition.                                                          with Differential Privacy, in: 23rd ACM Conference
   Another research venue would be the integration of                 on Computer and Communications Security (ACM
neural network reparation techniques such as [36]. Cor-               CCS), 2016, pp. 308–318.
rective techniques would contribute to provide a feedback         [8] C. Urban, M. Christakis, V. Wüstholz, F. Zhang,
loop composed of problem specification, verification, fault           Perfectly Parallel Fairness Certification of Neu-
identification and correction proposal.                               ral Networks,        arXiv:1912.02499 [cs] (2019).
   Various problem splitting heuristics based, for instance,          arXiv:1912.02499.
on [37, 38] could be integrated into CAISAR to leverage           [9] G. Katz, D. A. Huang, D. Ibeling, K. Julian,
parallelism for verifiers that do not support them                    C. Lazarus, R. Lim, P. Shah, S. Thakoor, H. Wu,
   Data is the cornerstone of modern machine learning                 A. Zeljić, D. L. Dill, M. J. Kochenderfer, C. Barrett,
systems, and it is necessary to give tools to handle its com-         The Marabou Framework for Verification and Analy-
plexity. Support for more various data kinds, such as time            sis of Deep Neural Networks, in: I. Dillig, S. Tasiran
series, is a first step towards this direction. Integration of        (Eds.), Computer Aided Verification, volume 11561,
tools for analyzing data in relation with a program, for              Springer International Publishing, Cham, 2019, pp.
5
                                                                      443–452.
    https://docs.dnnv.org/en/stable/
[10] H. Wu, A. Zeljić, G. Katz, C. Barrett, Efficient Neu-      K. Grauman, N. Cesa-Bianchi, R. Garnett (Eds.),
     ral Network Analysis with Sum-of-Infeasibilities,           Advances in Neural Information Processing
     in: D. Fisman, G. Rosu (Eds.), Tools and Algo-              Systems 31, Curran Associates, Inc., 2018, pp.
     rithms for the Construction and Analysis of Sys-            10802–10813. URL: http://papers.nips.cc/paper/
     tems, Lecture Notes in Computer Science, Springer           8278-fast-and-effective-robustness-certification.
     International Publishing, Cham, 2022, pp. 143–163.          pdf.
     doi:10.1007/978-3-030-99524-9_8.                       [20] G. Manfredi, Y. Jestin, An introduction to ACAS
[11] M. Ostrovsky, C. W. Barrett, G. Katz,              An       Xu and the challenges ahead, in: IEEE/AIAA Dig-
     abstraction-refinement approach to verifying convo-         ital Avionics Systems Conference (DASC), Sacra-
     lutional neural networks, CoRR abs/2201.01978               mento, CA, USA, 2016. doi:10.1109/DASC.
     (2022). URL: https://arxiv.org/abs/2201.01978.              2016.7778055.
     arXiv:2201.01978.                                      [21] T. Y. Chen, F.-C. Kuo, H. Liu, P.-L. Poon, D. Towey,
[12] F. Ranzato, M. Zanella, Robustness verification of          T. Tse, Z. Q. Zhou, Metamorphic testing: A review
     support vector machines, in: International Static           of challenges and opportunities, ACM Computing
     Analysis Symposium, Springer, 2019, pp. 271–295.            Surveys (CSUR) 51 (2018) 1–27.
[13] S. Conchon, A. Coquereau, M. Iguernlala, A. Meb- [22] G. Katz, C. Barrett, D. Dill, K. Julian, M. Kochen-
     sout, Alt-Ergo 2.2, in: SMT Workshop: Inter-                derfer, Reluplex: An efficient smt solver for
     national Workshop on Satisfiability Modulo Theo-            verifying deep neural networks, arXiv preprint
     ries, Oxford, United Kingdom, 2018. URL: https:             arXiv:1702.01135 (2017).
     //hal.inria.fr/hal-01960203.                           [23] S. Wang, H. Zhang, K. Xu, X. Lin, S. Jana, C.-J.
[14] L. de Moura, N. Bjørner, Z3: An Efficient SMT               Hsieh, J. Z. Kolter, Beta-CROWN: Efficient Bound
     Solver, in: C. R. Ramakrishnan, J. Rehof (Eds.),            Propagation with Per-neuron Split Constraints for
     Tools and Algorithms for the Construction and Anal-         Complete and Incomplete Neural Network Robust-
     ysis of Systems, Lecture Notes in Computer Science,         ness Verification, ???? URL: http://arxiv.org/abs/
     Springer, Berlin, Heidelberg, 2008, pp. 337–340.            2103.06624. arXiv:2103.06624.
     doi:10.1007/978-3-540-78800-3_24.                      [24] Z. Shi, H. Zhang, K.-W. Chang, M. Huang, C.-J.
[15] C. Barrett, P. Fontaine, C. Tinelli, The Sat-               Hsieh, Robustness Verification for Transformers, in:
     isfiability Modulo Theories Library (SMT-LIB),              International Conference on Learning Representa-
     www.SMT-LIB.org, 2016.                                      tions, 2020. URL: https://openreview.net/forum?id=
[16] P. Cousot, R. Cousot, Abstract interpretation: A            BJxwPJHFwS.
     unified lattice model for static analysis of programs [25] S. Bak, Nnenum: Verification of ReLU Neural
     by construction or approximation of fixpoints, in:          Networks with Optimized Abstraction Refinement,
     POPL, 1977, pp. 238–252.                                    in: A. Dutle, M. M. Moscato, L. Titolo, C. A.
[17] J. Li, J. Liu, P. Yang, L. Chen, X. Huang, L. Zhang,        Muñoz, I. Perez (Eds.), NASA Formal Methods,
     Analyzing deep neural networks with symbolic prop-          Lecture Notes in Computer Science, Springer In-
     agation: Towards higher precision and faster ver-           ternational Publishing, Cham, 2021, pp. 19–36.
     ification, in: B. E. Chang (Ed.), Static Analy-             doi:10.1007/978-3-030-76384-8_2.
     sis - 26th International Symposium, SAS 2019, [26] P. Henriksen, A. Lomuscio, Efficient Neural Net-
     Porto, Portugal, October 8-11, 2019, Proceedings,           work Verification via Adaptive Refinement and Ad-
     volume 11822 of Lecture Notes in Computer Sci-              versarial Search, in: 24th European Conference
     ence, Springer, 2019, pp. 296–319. URL: https:              on Artificial Intelligence - ECAI 2020, Santiago de
     //doi.org/10.1007/978-3-030-32304-2_15. doi:10.             Compostela, Spain, 2020, p. 8.
     1007/978-3-030-32304-2\_15.                            [27] S. Dutta, S. Jha, S. Sanakaranarayanan, A. Ti-
[18] S. Wang, K. Pei, J. Whitehouse, J. Yang, S. Jana,           wari, Output Range Analysis for Deep Neural
     Efficient Formal Safety Analysis of Neural Net-             Networks, arXiv:1709.09130 [cs, stat] (2017).
     works, in: S. Bengio, H. Wallach, H. Larochelle,            arXiv:1709.09130.
     K. Grauman, N. Cesa-Bianchi, R. Garnett (Eds.), [28] A. D. Palma, R. Bunel, A. Desmaison, K. Dvi-
     Advances in Neural Information Processing                   jotham, P. Kohli, P. H. S. Torr, M. P. Kumar,
     Systems 31, Curran Associates, Inc., 2018,                  Improved branch and bound for neural network
     pp. 6367–6377. URL: http://papers.nips.cc/paper/            verification via lagrangian decomposition (????).
     7873-efficient-formal-safety-analysis-of-neural-networks. arXiv:2104.06718v1.
     pdf.                                                   [29] R. Ehlers, Formal Verification of Piece-Wise Linear
[19] G. Singh, T. Gehr, M. Mirman, M. Püschel,                   Feed-Forward Neural Networks, arXiv:1705.01320
     M. Vechev, Fast and Effective Robustness Certifi-           [cs] (2017). arXiv:1705.01320.
     cation, in: S. Bengio, H. Wallach, H. Larochelle, [30] C. Urban, A. Miné, A Review of Formal Methods
     applied to Machine Learning, arXiv:2104.02466
     [cs] (2021). arXiv:2104.02466.
[31] C. Liu, T. Arnon, C. Lazarus, C. Barrett, M. J.
     Kochenderfer, Algorithms for Verifying Deep Neu-
     ral Networks, arXiv:1903.06758 [cs, stat] (2019).
     arXiv:1903.06758.
[32] R. Li, J. Li, C.-C. Huang, P. Yang, X. Huang,
     L. Zhang, B. Xue, H. Hermanns, PRODeep: a plat-
     form for robustness verification of deep neural net-
     works, in: Proceedings of the 28th ACM Joint Meet-
     ing on European Software Engineering Conference
     and Symposium on the Foundations of Software En-
     gineering, ESEC/FSE 2020, Association for Com-
     puting Machinery, New York, NY, USA, 2020, pp.
     1630–1634. URL: https://doi.org/10.1145/3368089.
     3417918. doi:10.1145/3368089.3417918.
[33] M. Balunovic, M. Baader, G. Singh, T. Gehr,
     M. Vechev,          Certifying Geometric Robust-
     ness of Neural Networks,             in: Advances
     in Neural Information Processing Systems,
     volume 32, Curran Associates, Inc., 2019.
     URL:        https://papers.nips.cc/paper/2019/hash/
     f7fa6aca028e7ff4ef62d75ed025fe76-Abstract.
     html.
[34] D. Shriver, S. Elbaum, M. B. Dwyer, DNNV:
     A Framework for Deep Neural Network Verifi-
     cation, in: A. Silva, K. R. M. Leino (Eds.),
     Computer Aided Verification, Lecture Notes in
     Computer Science, Springer International Publish-
     ing, Cham, 2021, pp. 137–150. doi:10.1007/
     978-3-030-81685-8_6.
[35] K. Xu, H. Zhang, S. Wang, Y. Wang, S. Jana,
     X. Lin, C.-J. Hsieh, Fast and Complete: En-
     abling Complete Neural Network Verification with
     Rapid and Massively Parallel Incomplete Verifiers,
     arXiv:2011.13824 [cs] (2021). URL: http://arxiv.
     org/abs/2011.13824, arXiv: 2011.13824.
[36] B. Goldberger, G. Katz, Y. Adi, J. Keshet, Minimal
     modifications of deep neural networks using veri-
     fication, in: E. Albert, L. Kovacs (Eds.), LPAR23.
     LPAR-23: 23rd International Conference on Logic
     for Programming, Artificial Intelligence and Reason-
     ing, volume 73 of EPiC Series in Computing, Easy-
     Chair, 2020, pp. 260–278. doi:10.29007/699q.
[37] J. Girard-Satabin, A. Varasse, M. Schoenauer,
     G. Charpiat, Z. Chihani, DISCO: Division of in-
     put space into convex polytopes for neural network
     verification, JFLA (2021).
[38] R. Bunel, J. Lu, I. Turkaslan, P. H. Torr, P. Kohli,
     M. P. Kumar, Branch and bound for piecewise linear
     neural network verification, Journal of Machine
     Learning Research 21 (2020) 1–39.