CAISAR: A platform for Characterizing Artificial Intelligence Safety and Robustness Julien Girard-Satabin1,† , Michele Alberti1,† , François Bobot1 , Zakaria Chihani1 and Augustin Lemesle1 1 Université Paris-Saclay, CEA, List, F-91120, Palaiseau, France Abstract We present CAISAR, an open-source platform under active development for the characterization of AI systems’ robustness and safety. CAISAR provides a unified entry point for defining verification problems by using WhyML, the mature and expressive language of the Why3 verification platform. Moreover, CAISAR orchestrates and composes state-of-the-art machine learning verification tools which, individually, are not able to efficiently handle all problems but, collectively, can cover a growing number of properties. Our aim is to assist, on the one hand, the V&V process by reducing the burden of choosing the methodology tailored to a given verification problem, and on the other hand the tools developers by factorizing useful features – visualization, report generation, property description – in one platform. CAISAR will soon be available at https://git.frama-c.com/pub/caisar. Keywords formal verification, platform, formal methods, testing, metamorphic testing, reachability analysis, satisfaction modulo theory, artificial intelligence, 1. Introduction graded or abandoned. These tools use different techniques, different input formats, handle different ML artefacts and, The integration of machine learning programs as compo- most importantly, have varying performances depending nents of critical systems is said to be bound to happen; on the problems. Our goal of orchestrating multiple tools initiatives from various private and govermental actors aims at maximizing the property coverage in the context (e.g., US’ NSF funding for Trustworthy AI1 , France’s of a validation and verification process. Grand Défi IA de confiance2 ) are a consequence of that To this end, we present a platform dedicated to Char- fact. Trusting such programs is thus becoming a crucial acterizing Artificial Intelligence Safety and Robustness issue, both on technical and ethical sides. (CAISAR) that aims to unify several formal methods and A possible approach to trust is formal test and verifica- tools, at the input, through the use of a mature and expres- tion, a broad set of techniques and tools that have been sive property description and proof orchestration language, applied to software safety for several decades. These for- at the output, through the factorization of features such mal methods build on sound mathematical foundations as visualization and report generation, and at the usage, to assess the behaviour of programs in a principled way, through shared heuristics and interconnections between be it for generating tests or providing proven guarantees. tools. For the last few years, several independent works have The answer to the question “To what extent can I trust started to investigate the possible applications of formal my machine learning program?” has many components, verification to machine learning program verification and ranging from data analysis to decision explainability. One its limitations. This led to what could be characterized as such important components is dealing with verification a Cambrian explosion of tools aiming to solve a particular and validation, and we wish to make CAISAR an im- subset of the machine learning verification field. In less portant element in the safety toolbelt by covering these than five years, more than 20 tools were produced, up- applications. In the following, we will first present the design princi- The IJCAI-ECAI-22 Workshop on Artificial Intelligence Safety (AISafety 2022), July 24-25, 2022, Vienna, Austria ples of CAISAR and state its main goals. We will follow † These authors contributed equally. by a description of its most prominent features, as well julien.girard2@cea.fr (J. Girard-Satabin); michele.alberti@cea.fr as its limitations. We will then explain the position of (M. Alberti); francois.bobot@cea.fr (F. Bobot); CAISAR regarding other tools for formal verification of zakaria.chihani@cea.fr (Z. Chihani); augustin.lemesle@cea.fr machine learning programs, and conclude by presenting (A. Lemesle) some future work and possible research problems.  0000-0001-6374-3694 (J. Girard-Satabin) © 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop CEUR Workshop Proceedings (CEUR-WS.org) Proceedings http://ceur-ws.org ISSN 1613-0073 1 https://www.nsf.gov/pubs/2022/nsf22502/nsf22502.htm 2 https://www.gouvernement.fr/grand-defi-securiser-certifier-et- fiabiliser-les-systemes-fondes-sur-l-intelligence-artificielle 2. Core principles of CAISAR confidence on the studied system. In order for users to trust CAISAR as well, it needs to rely on well-known and The aim for CAISAR is to provide a verification envi- approved principles and technologies. It is developed in ronment for Artificial Intelligence (AI) based systems OCaml, a strongly typed programming language, used tailored to different needs. The profusion of tools for AI- to develop tools for program verification and validation. programs certification offers numerous possibilities, from Such tools include Coq [1], a proof assistant that for in- the choice of technology (formal methods, test generation) stance used to develop CompCert[2], a C compiler that to the scope of properties to check (coverage, suitability to is guaranteed to output C-ANSI compliant source code, a given distribution, robustness). However, with increased Frama-C [3], a platform for the static analysis of C code, possibilites comes the burden of choice. Which method and Why3 [4], a platform for deductive verification of better suits a given use case? Are the results provided programs. by this particular method trustworthy enough? How to bring trust in the process of selecting, tailoring and com- puting results of a given tool? How to evaluate a given 2.3. Tools composition tool against others? Those are the questions that we aim Some works are starting to combine multiple tech- to answer with CAISAR. niques [5] for their analysis, using an exact MILP solver to refine bounds obtained by abstract interpretation. Our 2.1. Compatibility with existing and goal with CAISAR is to bring tool composition to an- other level. For instance, metamorphic transformations future methods could generate different input space partitions for formal The first principle of CAISAR is to ease this burden of verifiers. A reachability analysis tool could be called nu- choice by automating parts of it. CAISAR aims to provide merous times with tighter bounds until reaching a precise a unique interface for vastly different tools, with a single enough answer. Coverage testing objectives could be ex- entry point to specify verification goals. Choosing which tracted from reachability analysis tools and fed to test tool to use is an informed decision that may not be relevant generators. CAISAR will be more than the sum of its part, for the user; the goal is to provide an actionable answer on allowing communication between vastly different tools to the safety of the system, by using whatever tool is suitable provide faster and more accurate answers. for the problem. Ideally, the user should not be bothered with deciding which tool is suitable for their use case: 2.4. Automatic proposal of verification CAISAR will automatically figure out how to express the given property to suit verifiers. As AI systems pipelines strategies are becoming more and more complex, it is crucial for A long-term goal for CAISAR is to provide a reason- CAISAR to handle this complexity. Currently, CAISAR ing engine where past verification problems processed supports neural networks and support vector machines, by CAISAR can inform next ones, gradually building a and an industrial benchmark of an ensemble model (NN- knowledge base that is suitable for the specific needs of SVM), which we are unable to further discuss, is being the user. CAISAR will also implement its own built-in used as a concrete real-word use-case. heuristics to supplement specialized programs that do not implement them. 2.2. Common modelling and answers Existing verifiers rely on different decision procedures, 3. Architecture and features e.g., Mixed Integer Linear Programming (MILP), abstract interpretation, or Satisfaction Modulo Theory (SMT) cal- CAISAR’s architecture can be divided into the following culus. Modelling a verification problem using these frame- functional blocks: works require different skills and is time-consuming; for 1. A Common specification Language (CL) instance, some modelling choices made for MILP may not 2. A Proof Obligation Generator (POG), associated be applied under SMT. Moreover, even if one succeeds in with a Dispatcher (DISP) phrasing a verification problem under multiple decision 3. An Intermediate Representation (IR) procedures, the different results may not be immediately 4. A visualization module (VIZ) comparable. CAISAR aims to provide a common ground for inputs See fig. 1 for a visual depiction of dependencies between and outputs, which will lead to an easier comparison, blocks. lower time consumption and an informed decision. Fur- thermore, collecting and presenting the user with multiple answers from different techniques can provide additional Ada programs [6]. This global expressiveness and safety allows to write 𝒱 once and for all, independently of the verifier. For instance, Figure 2 shows the definition of robustness against a perturbation of amplitude 𝜀 using the 𝑙∞ distance within CAISAR’s standard library, and fig. 3 the WhyML file the user need to write in order to verify the robustness of a given TestSVM against a perturbation of amplitude 0.5. Note that all the necessary element to define 𝒱, namely 𝑓 , 𝒳 , 𝒫 and 𝒬, are defined in those files: 𝑓 is the function TestSVM, and 𝒬 is the predicate itself. Note that WhyML is not limited to the robustness against a given perturbation property, often met in the literature. For instance, asserting that a neural network respect the properties of being differentialy private [7] or respecting causal fairness [8] is something that could be phrased as WhyML programs, since those properties have a mathematical characterization. Finally, WhyML does not constrain the form of 𝒫 nor 𝒬. In particular, it is possible to define multiple verification goals in the same 𝒱, opening the way to subdivide it into subproblems, and providing answers at each step. CAISAR then automatically translates 𝒱 into a format supported by the selected verifiers, through a succession of encoding transformations and simplifications. For in- stance, some verifiers are best used when trying to falsify the property: instead of checking Figure 1: CAISAR overall architecture ∀𝑥 ∈ 𝒳 , 𝒫(𝑥) ⇒ 𝒬(𝑓 (𝑥), 𝑥) they instead try to satisfy the negation 3.1. Specification language and ∃𝑥 ∈ 𝒳 , 𝒫(𝑥) ∧ ¬𝒬(𝑓 (𝑥), 𝑥) verification predicates A typical task for program verification involves to solve This transformation is embedded in CAISAR, when call- a verification problem. A verification problem consists ing Marabou: a verification problem 𝒱 can be trans- ′ on checking that a program with a given set of inputs is formed into an equivalent one 𝒱 that can be dispatched meeting certain expectations on its outputs. More for- to Marabou. mally, let 𝒳 be an input space, 𝒫(·) be a property on the input space, 𝑓 be a program, and 𝒬(·, ·) be a property 3.2. Proof Obligations Generations for on the output space. By property, we mean a statement various tools that describes a desirable behaviour for the program. Let 𝒱 = (𝒳 , 𝑓, 𝒫, 𝒬) be a verification problem. The goal is CAISAR currently supports a variety of tools and tech- to verify the following property: niques: metamorphic testing, reachability analysis based on abstract interpretation and constraint-based propaga- ∀𝑥 ∈ 𝒳 , 𝒫(𝑥) ⇒ 𝒬(𝑓 (𝑥), 𝑥) tions. CAISAR can analyze neural networks and support vector machines. This versatility allows for CAISAR to To write 𝒱, one needs to be able to express concisely and verify system components using different machine learn- without ambiguity each component: the program to verify ing architectures. 𝑓 , the properties 𝒫 and 𝒬, and the dataset 𝒳 . To this end, CAISAR provides full support for the WhyML specifi- Marabou cation and programming language [4]. WhyML is a lan- guage with static strong typing, pattern matching, types in- Marabou [9] is a deep neural network verification com- variants and inductive predicates. This gives WhyML pro- plete verifier. Its core routine relies on a modified sim- grams a sound semantic as logical propositions. WhyML plex algorithm that lazily relaxes constraints on piecewise is at the core of the Why3 verification platform, and has linear activation functions. Marabou also makes use of been used as an intermediate language for verification of several heuristics that help speeding up the verification procedure, like relying on tight convex overapproxima- use TestSVM . SVMasArray tions [10] or sound overapproximations [11]. It can use i e e e _ f l o a t . F l o a t 6 4 answer reachability and conjunction of linear constraint use c a i s a r .SVM queries. Marabou ranked fifth at the VNN-COMP 2021. It is currently in active development. g o a l G: f o r a l l a : i n p u t _ t y p e . r o b u s t _ t o svm_apply a ( 0 . 5 : t ) SAVer The Support Vector Machine reachability analysis tool Figure 3: Example verification problem specified to SAVer [12] is specialized in the verification of support CAISAR. The program to verify is TestSVM, the input vector machines (SVM), a popular machine learning algo- space is defined by the elements in a, the output space is the result of the application of the function svm_apply. rithm used alongside neural networks for classification or regression tasks. SAVer can answer reachability queries, and supports a variety of SVM configurations. This tool was selected for support as, to the best of our knowledge, the VNNLIB standard, used in the VNN-COMP, uses a it is the first one to deal with verification of SVM. subset of SMTLIB2, which paves the way for the support of future tools in CAISAR. Alt-Ergo and SMTLIB compliant solvers Python Reachability Assessment Tool Existing general purpose SMT solvers for program verifi- cation like Alt-Ergo [13] or Z3 [14] all support a standard The Python Reachability Assessment Tool (PyRAT) is a input language, SMTLIB [15]. CAISAR leverages Why3 static analyzer targeting specifically neural networks. It existing support for SMT solvers and can translate neural builds upon the framework of abstract interpretation [16] network control flows directly into SMTLIB compliant using abstract domains adapted for the approximation strings using its intermediate representation, which allows of the reachable space in a neural network. Three main the support of a variety of off-the-shelf solvers. Note that domains are used: intervals with symbolic relations as described in [17, 18], zonotopes [19] and Deep poly do- main [5]. type i n p u t _ t y p e = i n t −> t For low dimensional inputs, PyRAT use input parti- type o u t p u t _ t y p e = i n t tioning as described in [18], with heuristics tailored to type model = { relational domains: the zonotope domain and the deep- app : poly domain with backsubstitution. Those heuristics allow i n p u t _ t y p e −> o u t p u t _ t y p e ; the computation of a non-trivial (e.g., not just widest in- num_input : i n t ; terval first) score ranking the inputs by their estimated num_classes : i n t influence on the outputs. PyRAT has comparable results } to state-of-the-art analyzers on the widely used ACAS- predicate d i s t _ l i n f Xu [20] benchmark, and outperforms the similar domains (a : input_type ) of ERAN on S-shape activations functions such as the (b : input_type ) sigmoid or hyperbolic tangent functions with specific ap- ( eps : t ) proximations. (n: int ) = f o r a l l i . 0 <= i < n −> AIMOS: a Metamorphic testing utility . − eps . < a i . − b i . < eps AI Metamorphism Observing Software (AIMOS) is a predicate robust_to software developped at the same time as CAISAR, aiming ( model : model ) to provide metamorphic properties testing or perturbations (a : input_type ) on a dataset for a given AI model. Metamorphic testing ( eps : t ) = is a testing technique relying on properties symmetries f o r a l l b . d i s t _ l i n f a b eps model . num_input −> and invariance on the operating domain. See [21] for a model . app a = model . app b comprehensive survey on this approach. AIMOS offers tools to derive properties from a set of transformations on the inputs: given 𝒫, 𝒬, 𝒳 and a transformation function ′ Figure 2: An example of a predicate in CAISAR’s stan- 𝑡𝜃 : 𝑥 ∈ 𝒳 ↦→ 𝒳 , it generates a set of new properies 𝒬 dard library: being “robust to” against a perturbation of that are coherent with the transformation. As an example, amplitude 𝜀. Here, the predicate defines 𝒬(𝒴). a symmetry on the inputs of a classification model could result on a symmetry on the outputs; AIMOS would then 3.4. Answer composition automatically modify the property to check against the CAISAR currently offer two ways to compose verifiers. symmetrical labels. First, CAISAR can launch several solvers on the same task AIMOS can generate test cases scenarios from the and compose their answer: it can then provide a summary most common input transformations (geometrical rota- stating which solver succeeded and which one failed. Sec- tions, noise addition); others can be added if necessary. ond, CAISAR has the ability to verify pipelines that are AIMOS was evaluated on a metamorphic property on the composed of several machine learning programs: for in- ACAS-Xu benchmark. The aim of the property was to stance, a pipeline composed of several neural networks, or evaluate the ability of neural networks trained on ACAS a neural network which outputs are processed by a SVM. to generalize with symmetric inputs. Given a symmetry CAISAR can be used to state an overall verification goal, on inputs, AIMOS generates the expected symmetrical and to model that the outputs of a block of the pipeline output, and tests models against the base and symmetrical are the inputs of another block. More advanced methods outputs. See table 1 for results. AIMOS was able to show of composition, such as automatic subgoals generation that neural networks trained on ACAS have a low, but or refinement by multiple analysis constitute a promising noticeable sensitivity to symmetry on one input. research venue. Table 1 Average number of same answer for all 45 models of the 4. Background & related works ACAS-Xu benchmark, computed by AIMOS. First column denotes values presented in the benchmark. Second In less than five years, a profusion of tools and techniques column denotes the percentage of identical answer given. leveraging formal verification to provide trust on neural Previous answer Identical answer network sprouted [22, 9, 18, 23, 19, 5, 24, 25, 26, 27, COC 89.7% 28, 8, 29]. See [30, 31] for more comprehensive surveys WL 95.9% on the verification and validation of machine learning WR 99.6% programs. L 95.3% As for general purpose verification platforms, exam- R 99.8% ples include the Why3 deductive verification platform and the Frama-C [3] C static analysis platform. We leverage multiple existing features of Why3, such as the WhyML 3.3. Supported formats language support, transformation and rewriting engine. Why3 and Frama-C both lack the interfaces and tooling to CAISAR supports all input formats used by its inte- handle neural network. Experiments we conducted involv- grated verifiers. Most verifiers require either a framework- ing the EVA Frama-C plugin applied on simple reachabil- specific binary (Pytorch’s pth, Tensorflow tf), a custom ity analysis properties showed a lack of scalability that a description language (NNet), or an Open Neural Net- naive python reachability analysis tool, specialized in neu- work eXchange (ONNX) 3 file. CAISAR is able to parse ral networks, was able to overcome quickly. Conversion of any of these input formats and extract useful metadata neural networks in C programs that were scalable for EVA for the building of the verification strategy. It can also presented difficult challenges. The differing structure be- output a verification problem into the SMTLIB [15] for- tween C programs and neural networks implies differing mat, supported by all general purpose solvers, as well verification problems. Thus, it seems more fruitful to as in the ONNX format. The VNN-Lib initiative 4 pro- investigate a specialized platform for machine learning vides a standard format for verification problems that programs. relies on SMTLIB; thus CAISAR also supports VNN-Lib. The ProDeep platform [32] aims to regroup several ver- CAISAR aims for maximum interoperability, and can be ifiers under a single user interface. It provides a single used as a hub to write and convert verification queries entry point, supports input formats and offers numerous adapted to different verifiers. Additionally, verifiers some- visualization tools. It does not aim to provide other proper- times require datasets to verify properties against, espe- ties than those that are natively supported by its embedded cially reachability analysis tools. As such, CAISAR cur- verifiers. It also supports a fixed set of datasets. They rently supports datasets as flattened features under a csv make use of DeepG [33] to generate constraints for ver- file, and RGB images. ifiers, effectively combining tools. Their scope seems limited to neural networks, whereas CAISAR currently supports neural networks and support vector machines, and aims to support a wider set of machine learning mod- 3 https://onnx.ai/ els. 4 http://www.vnnlib.org/ The most similar work to CAISAR is the DNNV plat- instance out-of-distribution detection, is another future form [34]. As CAISAR, DNNV provides support to var- work. ious state-of-the-art verifiers. It similarly aims to be a Finally, to further help the user to select the optimal set hub for neural network verification by supporting a wide of tools for its verification problem, a long-term goal of range of input and output formats, and by providing a mod- CAISAR is to provide a verification helper to design opti- elling language for properties specification and discharge mal queries for verification problems based on previous to capable provers. Their Domain Specific Language, runs. DNNP, is built on Python; while CAISAR’s specifica- tion language, WhyML, is already used in several formal verification platforms and provide additional theoretical References guarantees, which is a key component to provide trust. As stated before, WhyML allows specifying multiple verifi- [1] G. Huet, G. Kahn, C. Paulin-Mohring, The Coq cation goals in the same verification problem, which helps Proof Assistant : A Tutorial : Version 7.2, Research modelling more complex use cases. Report RT-0256, INRIA, 2002. URL: https://hal. The main difference between CAISAR and DNNV is inria.fr/inria-00069918, projet COQ. that the latter does not combine verifiers answers, that [2] X. Leroy, S. Blazy, D. Kästner, B. Schommer, is to say there is (at the time of writing) no feature that M. Pister, C. Ferdinand, Compcert-a formally veri- aims to interoperate verifiers: from the DNNV documen- fied optimizing compiler, in: ERTS 2016: Embed- tation5 : "DNNV standardizes the network and property ded Real Time Software and Systems, 8th European input formats to enable multiple verification tools to run Congress, 2016. on a single network and property. This facilitates both [3] P. Baudin, F. Bobot, D. Bühler, L. Correnson, verifier comparison, and artifact re-use." As verifiers are F. Kirchner, N. Kosmatov, A. Maroneze, V. Perrelle, becoming more and more sophisticated and specialized, V. Prevosto, J. Signoles, N. Williams, The dogged combination of methods will become even more fruitful, pursuit of bug-free C programs: The Frama-C soft- and we expect this to be a key difference with DNNV. ware analysis platform, Communications of the ACM 64 (2021) 56–68. doi:10.1145/3470569. [4] J.-C. Filliâtre, A. Paskevich, Why3 - Where Pro- 5. Conclusion & future works grams Meet Provers, in: M. Felleisen, P. Gardner (Eds.), Programming Languages and Systems, Lec- As the field of machine learning verification is blooming, ture Notes in Computer Science, Springer, Berlin, choosing the right tool for the right verification problem Heidelberg, 2013, pp. 125–128. doi:10.1007/ becomes more and more tedious. We presented CAISAR, 978-3-642-37036-6_8. a platform aimed to alleviate this difficulty by presenting [5] G. Singh, T. Gehr, M. Püschel, M. Vechev, An a single, extensible entry point to machine learning veri- abstract domain for certifying neural networks, Pro- fication problem modelling and solving. Plenty of work ceedings of the ACM on Programming Languages still needs to be done, however. 3 (2019) 1–30. Altough CAISAR already integrates some state-of-the- [6] J. Guitton, J. Kanig, Y. Moy, why hi-lite ada, Rustan, art tools, other verifiers that ranked high in the VNN- et al.[32] (2011) 27–39. COMP are on the way of integration. Such verifiers in- [7] M. Abadi, A. Chu, I. Goodfellow, B. McMahan, clude 𝛼, 𝛽-CROWN [23, 35], who scored first on said I. Mironov, K. Talwar, L. Zhang, Deep Learning competition. with Differential Privacy, in: 23rd ACM Conference Another research venue would be the integration of on Computer and Communications Security (ACM neural network reparation techniques such as [36]. Cor- CCS), 2016, pp. 308–318. rective techniques would contribute to provide a feedback [8] C. Urban, M. Christakis, V. Wüstholz, F. Zhang, loop composed of problem specification, verification, fault Perfectly Parallel Fairness Certification of Neu- identification and correction proposal. ral Networks, arXiv:1912.02499 [cs] (2019). Various problem splitting heuristics based, for instance, arXiv:1912.02499. on [37, 38] could be integrated into CAISAR to leverage [9] G. Katz, D. A. Huang, D. Ibeling, K. Julian, parallelism for verifiers that do not support them C. Lazarus, R. Lim, P. Shah, S. Thakoor, H. Wu, Data is the cornerstone of modern machine learning A. Zeljić, D. L. Dill, M. J. Kochenderfer, C. Barrett, systems, and it is necessary to give tools to handle its com- The Marabou Framework for Verification and Analy- plexity. Support for more various data kinds, such as time sis of Deep Neural Networks, in: I. Dillig, S. Tasiran series, is a first step towards this direction. Integration of (Eds.), Computer Aided Verification, volume 11561, tools for analyzing data in relation with a program, for Springer International Publishing, Cham, 2019, pp. 5 443–452. https://docs.dnnv.org/en/stable/ [10] H. Wu, A. Zeljić, G. Katz, C. Barrett, Efficient Neu- K. Grauman, N. Cesa-Bianchi, R. Garnett (Eds.), ral Network Analysis with Sum-of-Infeasibilities, Advances in Neural Information Processing in: D. Fisman, G. Rosu (Eds.), Tools and Algo- Systems 31, Curran Associates, Inc., 2018, pp. rithms for the Construction and Analysis of Sys- 10802–10813. URL: http://papers.nips.cc/paper/ tems, Lecture Notes in Computer Science, Springer 8278-fast-and-effective-robustness-certification. International Publishing, Cham, 2022, pp. 143–163. pdf. doi:10.1007/978-3-030-99524-9_8. [20] G. Manfredi, Y. Jestin, An introduction to ACAS [11] M. Ostrovsky, C. W. Barrett, G. Katz, An Xu and the challenges ahead, in: IEEE/AIAA Dig- abstraction-refinement approach to verifying convo- ital Avionics Systems Conference (DASC), Sacra- lutional neural networks, CoRR abs/2201.01978 mento, CA, USA, 2016. doi:10.1109/DASC. (2022). URL: https://arxiv.org/abs/2201.01978. 2016.7778055. arXiv:2201.01978. [21] T. Y. Chen, F.-C. Kuo, H. Liu, P.-L. Poon, D. Towey, [12] F. Ranzato, M. Zanella, Robustness verification of T. Tse, Z. Q. Zhou, Metamorphic testing: A review support vector machines, in: International Static of challenges and opportunities, ACM Computing Analysis Symposium, Springer, 2019, pp. 271–295. Surveys (CSUR) 51 (2018) 1–27. [13] S. Conchon, A. Coquereau, M. Iguernlala, A. Meb- [22] G. Katz, C. Barrett, D. Dill, K. Julian, M. Kochen- sout, Alt-Ergo 2.2, in: SMT Workshop: Inter- derfer, Reluplex: An efficient smt solver for national Workshop on Satisfiability Modulo Theo- verifying deep neural networks, arXiv preprint ries, Oxford, United Kingdom, 2018. URL: https: arXiv:1702.01135 (2017). //hal.inria.fr/hal-01960203. [23] S. Wang, H. Zhang, K. Xu, X. Lin, S. Jana, C.-J. [14] L. de Moura, N. Bjørner, Z3: An Efficient SMT Hsieh, J. Z. Kolter, Beta-CROWN: Efficient Bound Solver, in: C. R. Ramakrishnan, J. Rehof (Eds.), Propagation with Per-neuron Split Constraints for Tools and Algorithms for the Construction and Anal- Complete and Incomplete Neural Network Robust- ysis of Systems, Lecture Notes in Computer Science, ness Verification, ???? URL: http://arxiv.org/abs/ Springer, Berlin, Heidelberg, 2008, pp. 337–340. 2103.06624. arXiv:2103.06624. doi:10.1007/978-3-540-78800-3_24. [24] Z. Shi, H. Zhang, K.-W. Chang, M. Huang, C.-J. [15] C. Barrett, P. Fontaine, C. Tinelli, The Sat- Hsieh, Robustness Verification for Transformers, in: isfiability Modulo Theories Library (SMT-LIB), International Conference on Learning Representa- www.SMT-LIB.org, 2016. tions, 2020. URL: https://openreview.net/forum?id= [16] P. Cousot, R. Cousot, Abstract interpretation: A BJxwPJHFwS. unified lattice model for static analysis of programs [25] S. Bak, Nnenum: Verification of ReLU Neural by construction or approximation of fixpoints, in: Networks with Optimized Abstraction Refinement, POPL, 1977, pp. 238–252. in: A. Dutle, M. M. Moscato, L. Titolo, C. A. [17] J. Li, J. Liu, P. Yang, L. Chen, X. Huang, L. Zhang, Muñoz, I. Perez (Eds.), NASA Formal Methods, Analyzing deep neural networks with symbolic prop- Lecture Notes in Computer Science, Springer In- agation: Towards higher precision and faster ver- ternational Publishing, Cham, 2021, pp. 19–36. ification, in: B. E. Chang (Ed.), Static Analy- doi:10.1007/978-3-030-76384-8_2. sis - 26th International Symposium, SAS 2019, [26] P. Henriksen, A. Lomuscio, Efficient Neural Net- Porto, Portugal, October 8-11, 2019, Proceedings, work Verification via Adaptive Refinement and Ad- volume 11822 of Lecture Notes in Computer Sci- versarial Search, in: 24th European Conference ence, Springer, 2019, pp. 296–319. URL: https: on Artificial Intelligence - ECAI 2020, Santiago de //doi.org/10.1007/978-3-030-32304-2_15. doi:10. Compostela, Spain, 2020, p. 8. 1007/978-3-030-32304-2\_15. [27] S. Dutta, S. Jha, S. Sanakaranarayanan, A. Ti- [18] S. Wang, K. Pei, J. Whitehouse, J. Yang, S. Jana, wari, Output Range Analysis for Deep Neural Efficient Formal Safety Analysis of Neural Net- Networks, arXiv:1709.09130 [cs, stat] (2017). works, in: S. Bengio, H. Wallach, H. Larochelle, arXiv:1709.09130. K. Grauman, N. Cesa-Bianchi, R. Garnett (Eds.), [28] A. D. Palma, R. Bunel, A. Desmaison, K. Dvi- Advances in Neural Information Processing jotham, P. Kohli, P. H. S. Torr, M. P. Kumar, Systems 31, Curran Associates, Inc., 2018, Improved branch and bound for neural network pp. 6367–6377. URL: http://papers.nips.cc/paper/ verification via lagrangian decomposition (????). 7873-efficient-formal-safety-analysis-of-neural-networks. arXiv:2104.06718v1. pdf. [29] R. Ehlers, Formal Verification of Piece-Wise Linear [19] G. Singh, T. Gehr, M. Mirman, M. Püschel, Feed-Forward Neural Networks, arXiv:1705.01320 M. Vechev, Fast and Effective Robustness Certifi- [cs] (2017). arXiv:1705.01320. cation, in: S. Bengio, H. Wallach, H. Larochelle, [30] C. Urban, A. Miné, A Review of Formal Methods applied to Machine Learning, arXiv:2104.02466 [cs] (2021). arXiv:2104.02466. [31] C. Liu, T. Arnon, C. Lazarus, C. Barrett, M. J. Kochenderfer, Algorithms for Verifying Deep Neu- ral Networks, arXiv:1903.06758 [cs, stat] (2019). arXiv:1903.06758. [32] R. Li, J. Li, C.-C. Huang, P. Yang, X. Huang, L. Zhang, B. Xue, H. Hermanns, PRODeep: a plat- form for robustness verification of deep neural net- works, in: Proceedings of the 28th ACM Joint Meet- ing on European Software Engineering Conference and Symposium on the Foundations of Software En- gineering, ESEC/FSE 2020, Association for Com- puting Machinery, New York, NY, USA, 2020, pp. 1630–1634. URL: https://doi.org/10.1145/3368089. 3417918. doi:10.1145/3368089.3417918. [33] M. Balunovic, M. Baader, G. Singh, T. Gehr, M. Vechev, Certifying Geometric Robust- ness of Neural Networks, in: Advances in Neural Information Processing Systems, volume 32, Curran Associates, Inc., 2019. URL: https://papers.nips.cc/paper/2019/hash/ f7fa6aca028e7ff4ef62d75ed025fe76-Abstract. html. [34] D. Shriver, S. Elbaum, M. B. Dwyer, DNNV: A Framework for Deep Neural Network Verifi- cation, in: A. Silva, K. R. M. Leino (Eds.), Computer Aided Verification, Lecture Notes in Computer Science, Springer International Publish- ing, Cham, 2021, pp. 137–150. doi:10.1007/ 978-3-030-81685-8_6. [35] K. Xu, H. Zhang, S. Wang, Y. Wang, S. Jana, X. Lin, C.-J. Hsieh, Fast and Complete: En- abling Complete Neural Network Verification with Rapid and Massively Parallel Incomplete Verifiers, arXiv:2011.13824 [cs] (2021). URL: http://arxiv. org/abs/2011.13824, arXiv: 2011.13824. [36] B. Goldberger, G. Katz, Y. Adi, J. Keshet, Minimal modifications of deep neural networks using veri- fication, in: E. Albert, L. Kovacs (Eds.), LPAR23. LPAR-23: 23rd International Conference on Logic for Programming, Artificial Intelligence and Reason- ing, volume 73 of EPiC Series in Computing, Easy- Chair, 2020, pp. 260–278. doi:10.29007/699q. [37] J. Girard-Satabin, A. Varasse, M. Schoenauer, G. Charpiat, Z. Chihani, DISCO: Division of in- put space into convex polytopes for neural network verification, JFLA (2021). [38] R. Bunel, J. Lu, I. Turkaslan, P. H. Torr, P. Kohli, M. P. Kumar, Branch and bound for piecewise linear neural network verification, Journal of Machine Learning Research 21 (2020) 1–39.