<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <article-id pub-id-type="doi">10.1109/ICAA58325.2023</article-id>
      <title-group>
        <article-title>On the Verification of ML Systems and Models</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Greta Dolcetti</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Vincenzo Arceri</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Agostino Cortesi</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Enea Zafanella</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Ca' Foscari University of Venice</institution>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>University of Parma</institution>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2025</year>
      </pub-date>
      <volume>97</volume>
      <fpage>0000</fpage>
      <lpage>0002</lpage>
      <abstract>
        <p>The role and impact of machine learning systems and models are growing in every economic and social sector. The problem of guaranteeing the reliability and correctness of the underlying software therefore becomes increasingly relevant. In this article we identify the elements that characterize these systems and that have a challenging impact on the application of state-of-the-art verification techniques and we highlight the advantages and limitations of a set of formal techniques that can be combined to achieve this goal. In principle, we advocate not only for a deeper adoption of formal methods in the machine learning development and deployment, but also for a more systematic and holistic approach.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Machine Learning</kwd>
        <kwd>Verification</kwd>
        <kwd>Formal Methods</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>The field of computer science has undergone a significant paradigm shift due to progress in Artificial
Intelligence (AI) and Machine Learning (ML). This evolution has led to the adoption of ML systems in
complex tasks such as natural language processing and image recognition. Their capacity to perform
well in tasks previously deemed impractical to solve has led to their adoption in various contexts,
including autonomous driving and healthcare, where failures could result in serious damage.</p>
      <p>The adoption of these methods carries a risk associated with their safety, particularly in safety-critical
domains. To address this, various verification frameworks have been proposed to provide a systematic
approach to verifying the correctness and reliability of ML models, ensuring they are safe and efective
in practical applications. The verification of these systems would not only increase the confidence
and trustworthiness regarding their adoption but would also lower the risk of failures. However, the
current state-of-the-art for ML systems verification poses many challenges, which will be discussed in
the following sections of this paper.</p>
      <p>This paper aims to highlight the challenges of verifying ML systems and models, covering a wide
range of challenges and providing research directions for both the ML and formal methods communities.
We advocate for a broader and more holistic application of verification techniques to ofer formal
guarantees about properties related to various aspects of the models and systems, especially those
directly regarding human activities, decisions, and sensible information. We believe that a greater
efort is required from both academia and industry to achieve higher trustworthiness and reliability of
adopted models.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Background</title>
      <p>
        In traditional software design, testing is a big component of the development pipeline and, although
necessary, it is not suficient to guarantee the correctness of the product. Similarly, in ML, metrics
like accuracy, precision, and recall provide empirical measures of performance but do not ofer formal
guarantees or prove safety. Therefore, especially in safety-critical contexts, it is necessary to formally
verify that ML models comply with properties of interest, such as safety [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] (i.e., ensuring that predictions
do not violate specifications), robustness [ 2] (i.e., maintaining performance and correctness despite
variations, perturbations, or adversarial inputs) and fairness [3] (i.e., ensuring that predictions do not
depend on features that are considered sensible, with obvious ethical implications).
2.1. Verification
In machine learning verification systems, the property to verify is usually composed of a set of
preconditions and a set of postconditions, that namely represent a relation that should hold between the input
(preconditions) and the output (postconditions) of the model. Collectively, this set of preconditions and
postconditions is referred to as specifications . These properties usually transcend the architecture of
the model, which is later taken into consideration during the verification process, and simply describe
some characteristics that the model should (not) have.
      </p>
      <p>
        Some properties that are subject to verification are
• Safety. Given a trained model, the resulting predictions are safe if they do not violate any
previously given specifications, often regarding safety-critical decisions. A case for the verification
of this property is reported in [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], on a Deep Neural Network (DNN) trained for the Airborne
Collision Avoidance Systems (ACAS) [4] task, for which a verification algorithm is adopted to
prove some safety specifications regarding the model predictions.
• Robustness. Robustness refers to the network’s ability to maintain its performance and
correctness in the face of variations, perturbations, or adversarial inputs. A robust neural network should
produce reliable and consistent predictions even when exposed to unexpected or challenging
conditions. This property is crucial for ensuring the trustworthiness and resilience of neural
networks in real-world applications and, in some scenarios, ofers also interpretability
perspectives [5]. Informally, robustness guarantees that in classification tasks, for small perturbations
in the neighborhood of a given input, the output of the network should not change, resulting
in a diferent class. Many diferent approaches, like [ 6] and [7], have been proposed, making
robustness one of the most pervasive property in the verification area.
• Fairness. The notions of fairness can be diferent. In group fairness, a model is fair if diferent
values of a given feature, considered sensible, do not result in diferent predictions, meaning
that some sort of statistical parity exists for members of protected groups (i.e., the demographics
of the set of individuals receiving any classification are the same as the demographics of the
underlying population) [8]. Other types of fairness like individual fairness [8] can guarantee that
if two inputs (individuals) are similar, then the two outputs of the model must be similar too.
Both of these concepts are worth investigating, either because they may seem in conflict and
present trade-ofs between the two of them, but also because they can be analyzed from diferent
perspectives, such as politics, justice, and philosophy, as shown in [9], making fairness not only
an interesting property but also an ethical hot-spot.
2.1.1. Methodologies
The strategies for machine learning verification can be categorized into two main types: complete and
incomplete.
      </p>
      <p>
        Complete. Complete methods leverage techniques such as Satisfiability Modulo Theory (SMT)
[
        <xref ref-type="bibr" rid="ref1">1, 10, 11</xref>
        ] and Mixed Integer Linear Programming (MILP) [12, 13]. In the former, the verification is
defined as a Constraint Satisfaction Problem (CSP) in which both the network and the specification are
expressed as a set of constraints. The verification step is then applied to the network and the negation
of the specification, so that if the CSP is satisfiable, then the property does not hold and the solution is
a counter-example that violates the property itself; otherwise, if the CSP is unsatisfiable, then there
is no possible solution that violates the property and therefore, the property holds. In the latter, the
verification is expressed as a MILP problem usually solved via Branch-and-Bound strategies (B-a-B) [ 7],
to prove the validity/absence of a given property, providing conclusive evidence when the property
holds and, in case of non-compliance, often providing a counter-example. While these methods excel in
delivering precise results, their drawback lies in the lack of scalability, particularly when dealing with
large systems. Nonetheless, they ofer the advantage of avoiding false negatives.
      </p>
      <p>Incomplete. Incomplete methods adopt techniques such as Linear Programming (LP) [14],
SemiDefinite Programming (SDP) [ 15], or Abstract Interpretation (AbsInt) [16, 2, 17] to assess the validity of
desired properties within a given model. These approaches usually adopt abstractions that scale well in
large models and are sound (meaning that if the property is said to hold, it actually holds), but they can
raise false alarms, although they never result in missed violations.</p>
      <p>Concisely, as shown in Figure 1 a verification algorithm aims to provide a guarantee that a given
property holds or not, also allowing for the case where, with the adopted technique, the result is
unknown, for example because a timeout is reached. Usually, given a property to verify, the underlying
method for the verification task is to check if the violation of the property is possible considering the
specifications and the model, thus resulting in a negative approach w.r.t. the positive property expressed
in the specifications. Practically speaking, the property is verified if its violation via a counterexample
is infeasible; on the contrary, if a counterexample, eventually further analyzed using more refined
models, allowing the violation of the original property is achievable, then the property does not hold.
For example, if we concretize this abstract concept into the verification of the robustness property,
the model is robust if there is no concrete input in the considered neighborhood that could result in a
diferent output; on the other hand, the model is not robust if there exists at least an adversarial example
for which the model output difers from the original one considered.</p>
      <p>To complicate an already complex topic, the introduction of Foundation Models has further
revolutionized the AI field thanks to their adaptability, enabling the use of pre-trained models that perform
well across a wide range of tasks, which can then be fine-tuned for specific issues without requiring
developing and training a model from scratch for each diferent task. Continuous improvements have led
to larger, more capable, and general models. However, their size and complexity make them increasingly
dificult to comprehend and explain without external techniques and tools.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Challenges</title>
      <p>The field of ML verification is filled with dificult challenges to overcome, both internal and external to
the verification task itself. These challenges will be discussed in the next paragraphs, highlighting how
they afect the ML community and where the research on this topic, in our opinion, should focus.
Bias in the Data. ML models learn from data, making dataset creation and validation crucial.
Unfortunately, few verification frameworks can prove interesting properties about the data, which can
exhibit unwanted properties too. Data bias is a simple yet powerful example: biased datasets most
likely train biased models. Furthermore, proving fairness and re-training the model requires more
computational efort than training on an unbiased dataset. The problem lies in formally expressing
properties of interest to prove on the datasets; moreover, datasets can be obtained from biased sources,
such as sensors or manual annotation, which can be subject to errors and inaccuracies too.
Verified Models Are Not Real Models. The major issue with verifying ML models is that the
majority of research is conducted on simple architectures and tasks, which inadequately reflect the
complexity of real-world models. For example, benchmarks used in the Verification of Neural Networks
Competition (VNN-COMP) [18] target models with a few hundred to a few million parameters, whereas
newly developed Large Language Models (LLMs) can have hundreds of billions of parameters. Research
often focuses on feed-forward networks with piecewise-linear activation functions, disregarding RNNs,
LSTMs, and transformer-based models, for which few publications exist. There is, indeed, a necessity to
extend the verification to more realistic systems and models, although this will probably result in an
exponential increment to the computational cost of verification.</p>
      <p>When analyzing the time necessary to complete verification tasks on ML models it is evident that
these approaches struggle to scale because simple analyses can range from a few seconds to days
to be completed. However, this should not discourage the adoption of verification steps. Although
computationally expensive, the verification process is a one-time step that does not need to be repeated
unless the model changes. To mitigate this cost, approaches using parallelization and GPU computing
have been adopted [19]. Additionally, incremental [20] or continuous verification could be beneficial
for models and systems that are regularly updated. This would allow the MLOps Lifecycle to be
continuously integrated, deployed, and verified.</p>
      <p>Supervised vs. Unsupervised Models. Most formal verification is applied to supervised or
semisupervised models. For unsupervised models, such as k-means [21], few formal verification approaches
exist [22] due to the dificulty in expressing specifications and properties of interest. The main issue
is that unsupervised models often lack a ground truth, making performance evaluation challenging.
For example, clustering task validation is either internal (linked to cluster shape/distance) or external
(linked to existing labels or domain knowledge), relying on techniques like clustering stability [23] or
density-based validation [24]. Despite this, unsupervised models are widely used for clustering and
association tasks, such as market and customer segmentation. Therefore, as stated in [25], "it is crucial
to develop approaches towards fair unsupervised learning". This is closely linked to the issue of biased
data discussed in Section 3, especially due to the absence of labels that can lead to undetected unfairness.
In this scenario, formal methods could be used for cluster distance guarantees or proving robustness
under a given perturbation.</p>
      <p>Privacy. One of the major concerns surrounding ML and neural networks is the issue of privacy.
Since these systems collect and process vast amounts of data, there is a risk that they may inadvertently
compromise sensitive information or incentivize biases and discrimination. For example, a neural
network trained on a dataset containing personal information may infer sensitive details about individuals,
even if the data is anonymized [26]. The use of ML algorithms to make decisions about individuals
raises questions about accountability and transparency. To address these concerns, it is essential to
develop methods for protecting privacy and ensuring ML systems are transparent and accountable [27].
This may involve implementing robust data protection policies, developing techniques for anonymous
data collection and processing, and establishing standards for the ethical use of ML and neural networks.
On the other hand, techniques like diferential privacy [ 28] and secure multi-party computation[29]
can provide privacy guarantees, but they often introduce a privacy-utility tradeof. In this context,
the European Union’s Artificial Intelligence Act 1 aims to protect citizen privacy by limiting some AI
applications and enforcing obligations for high-risk systems. Similar approaches coming from regulators
should be encouraged and incentivized.
The Issue with Generative Models and LLMs. Generative models have been widely adopted and
studied in recent times thanks to their ability to generate content: these models are extremely useful and
powerful, allowing to perform tasks once considered infeasible, such as text-to-image or text-to-video.
Since the performed activities difer greatly from ML classical tasks, like classification and regression, it
is more dificult to identify, define and formalize the properties to verify. Robustness properties can
be reinterpreted so that the aim is to obtain a similar output (intended as a generated content and
not a label) for a given input neighborhood and some approaches have already been implemented
[30]. However, new properties could emerge that focus on new issues: for example, verification of
copyright infringement could be implemented in order to ensure that the model has not been trained on
copyright-protected data. Furthermore, a plagiarism verification could be adopted to certify the novelty
or, at least, the non-plagiarism nature of the generated content. It is also important to note the emerging
trend of jailbreaking LLMs, where adversarial prompts are used to bypass safety mechanisms and force
the model to generate restricted or harmful content. Introducing safeguards or defence mechanisms
against jailbreaking, would allow for safer interactions.</p>
      <p>LLMs have gained attention for their ability to mimic language and reasoning, ofering glimpses of
artificial general intelligence. However, their often closed-source nature and vast number of parameters
make them challenging to verify. While some tests have been conducted to uncover vulnerabilities,
these approaches typically involve empirical experiments on large amounts of data and queries, rather
than relying on provably safe approaches based on formal methods. The Open Worldwide Application
Security Project (OWASP) has identified top 10 vulnerabilities for LLMs, 2 highlighting the need for
robust verification techniques. The inherent complexity of LLMs, coupled with the lack of transparency,
emphasizes the need for further research. One major weakness afecting LLMs is hallucinations, where
outputs may be incorrect, fake, or inappropriate. Researchers have attempted to mitigate this issue
through fine-tuning, knowledge graphs, memory augmentations, and formal methods [ 31]. However,
these approaches are not automated or widely applied in real-world applications. Breaking down
verification, as in the Chain of Thought procedure [ 32], could bring benefits to the formal verification
step; similarly, if a model can be decomposed into independent sub-models, the whole verification could
be parallelized and sped up [3].</p>
      <p>Verifying Models Not Systems. Few verification approach focus on the entire ML pipeline,
particularly on the data processing step, which, indeed, is a crucial part of ML development and it should be
treated as such. Verification or validation systems capable of extracting properties of the training data
should be developed and applied as a first step of the ML pipeline. By gaining insights into the data, the
model itself could be more reliable, and developers could address issues like data scarcity, imbalance,
or bias, potentially using data augmentation or cleaning techniques. Cyber-physical systems, often
used in safety-critical applications, consist of various components, including ML models, which can
pose security issues [33]. Therefore, the system should be considered as a whole, verifying not only
individual components but also their interactions. However, this poses a challenge, as each component
may require a diferent kind of verification with a diferent framework and specifications. Integrated
verification could result in an expensive and complex task. For example, the introduction of the Model
Context Protocol (MCP)3, a protocol that allows AI applications to interact with external resources,
dynamic environments, and tools, demands careful consideration: not only because it consists of very
diferent components, but also because it introduces new potential attack points within the interactions
among these components.</p>
      <p>Nevertheless, a system can only be declared safe if all its components and interactions are safe. It
is essential to highlight that a system is as strong as its weakest component, and no aspect should be
neglected.
Regulations - Call for a Standard. Establishing an analogy with classic safety-critical software can
be inspiring when analyzing how this field has evolved to provide further safety guarantees to both
developers and users. Over the years, many standards and guidelines have been published, determining
requirements that the products have to meet to be adopted4. Similarly, regulatory authorities and
consumer organizations should require guarantees from entities that develop and produce ML systems
to release them for commercial use. If the guarantees become mandatory, ML systems would result
in safer, more secure, and reliable applications. Additionally, guidelines for research projects in ML,
such as open-source model and dataset adoption, would allow for higher experimental reproducibility,
which is now not always possible.</p>
      <p>Verifying During Training vs. After Training. Most research on verifying neural networks
(NNs) focuses on already trained models. The sequential execution and repetition of the training and
verification steps can be computationally very expensive, requiring a lot of time. To address this, many
approaches combine the two into guided training, such as adversarial training, which aims to formally
minimize the worst-case loss for every possible input. Robust training relies on input perturbations,
improving model robustness through data augmentation. These approaches primarily target formal
robustness and often require white-box access to the models. Fairness, a data-related property, is also
targeted during training, although often empirically and not formally. However, as highlighted in [34],
training fair neural networks poses technical challenges, including the risk of overfitting and false
senses of fairness during training. Nevertheless, [35] identified a trade-of between robustness and
accuracy, while [36] showed how "existing certified training approaches improve worst-case certified
accuracy at the cost of drastically reduced standard accuracy on clean inputs", limiting their adoption
in practical applications.</p>
      <p>ML for ML Verification. In the previous sections, we’ve seen formal methods and empirical
experiments. However, it’s profitable to adapt ML models to verify other ML models. Some approaches
leveraging this idea exist: for example, [37] refines and verifies global robustness using generative
models, and [38] uses generative models to discover adversarial examples. Similarly, [39] provides
guarantees over the observation space approximated by a generative model by training a Generative
Adversarial Network (GAN) to map states to plausible input images. While ML models are not yet
widely used in verification tasks due to their black-box behavior, integrating formal methods with
ML could be a promising perspective. ML models could generate candidate invariants, properties to
verify, and counterexamples, which could then be fed into formal methods tools to formally check
that they satisfy required properties. These candidate elements generated by ML components could be
used to formally verify that they satisfy the required properties, making the verification process more
automated and eficient.</p>
      <p>Teaching. Due to AI’s wide range of applicability, which led to a broader range of research perspectives
and many cross-field advancements, nowadays it is common to find ML-related courses not only in
computer science degrees. We are confident that formal methods can play a crucial role in teaching,
infusing a verified-by-design culture at an early stage of learning, and leading to a higher security and
safety awareness of these technologies. We think that this is even more important for training ML
specialists in critical fields, such as healthcare. Furthermore, we believe that enriching these courses
with formal methods, and the guarantees provided, would present a significant advantage for both
educators and students, for example making the behavior of models and systems more explainable
and easy to understand [40, 41]. In this sense, it is worth highlighting that, given the heterogeneous
audience to whom these courses are ofered, a higher degree of explainability and guarantees could
provide a better comprehension of errors and bugs, which are usually hard to spot and debug in ML
systems, especially if used as of-the-shelf tools.
4For example ISO 25000 and ISO 62304, which are issued by the International Organization for Standardization.
Emerging Issues - A Fast-Evolving World. An additional challenge in the verification of ML
systems arises from the rapid evolution of this field, marked by the constant introduction of new
applications and improved techniques. To match this pace, a joint community efort is essential. A
notable example is the initiative by the LVE Project (https://lve-project.org/), where an open-source
repository is maintained to track vulnerabilities concerning privacy, reliability, security and trust. This
collaborative endeavor aims to enhance safety and awareness within the community by providing a
comprehensive resource for identifying and addressing potential bugs in LLMs.</p>
    </sec>
    <sec id="sec-4">
      <title>4. Constraints, Limitations and Assumptions</title>
      <p>The quest for guaranteed ML models and systems comes with inevitable constraints, limitations, and
assumptions. As stated in [42], good ML software should be robust and provide testing routines to
verify code correctness. Our claim goes beyond testing routines, aiming for formal verification to obtain
provable guarantees. However, accessing source code for verification is a major concern, as many
recent models are closed-source. The verification step is also limited by a trade-of between accuracy
and computational complexity. A careful integration into software production processes is required,
considering both energy consumption and computational complexity.</p>
    </sec>
    <sec id="sec-5">
      <title>5. Conclusion</title>
      <p>The verification of ML models and systems should be a primary goal to reach in modern society.
Succeeding in this task would give us guarantees that only formal verification can provide, resulting
in the adoption of these models with higher reliability and trustworthiness, even in safety-critical
applications, and lowering the risks of vulnerabilities, attacks, and malfunctions. However, it is not
suficient to tackle each one of the challenges presented in the previous sections singularly, thus the
extent of the issue calls for an holistic approach. Indeed, looking at an isolated aspect could lead to
systems that are safe and reliable under that aspect, but that can cause major damages in all the others,
for example, a model could be robust but heavily biased. As part of this holistic approach, the safety of
a model should be considered not only by technical metrics and issues but also in the light of ethical,
jurisdictional, and legal features. In conclusion, providing users with clear and provable guarantees
about all the aspects of the product ofered should become the standard.</p>
    </sec>
    <sec id="sec-6">
      <title>Acknowledgments</title>
      <p>This work has been supported by SERICS (PE00000014 - CUP H73C2200089001) and iNEST (ECS00000043
– CUP H43C22000540006) projects funded by PNRR NextGeneration EU.</p>
    </sec>
    <sec id="sec-7">
      <title>Declaration on Generative AI</title>
      <p>During the preparation of this work, the authors used ChatGPT and Grammarly in order to: Grammar
and spelling check, Paraphrase and reword. After using this tool/service, the authors reviewed and edited
the content as needed and take full responsibility for the publication’s content.
[2] T. Gehr, M. Mirman, D. Drachsler-Cohen, P. Tsankov, S. Chaudhuri, M. T. Vechev, AI2: safety and
robustness certification of neural networks with abstract interpretation, in: 2018 IEEE Symposium
on Security and Privacy, SP 2018, Proceedings, 21-23 May 2018, San Francisco, California, USA,
IEEE Computer Society, 2018, pp. 3–18. doi:10.1109/SP.2018.00058.
[3] C. Urban, M. Christakis, V. Wüstholz, F. Zhang, Perfectly parallel fairness certification of neural
networks, Proc. ACM Program. Lang. 4 (2020) 185:1–185:30. doi:10.1145/3428253.
[4] K. D. Julian, J. Lopez, J. S. Brush, M. P. Owen, M. J. Kochenderfer, Policy compression for aircraft
collision avoidance systems, in: 2016 IEEE/AIAA 35th Digital Avionics Systems Conference
(DASC), IEEE, 2016, pp. 1–10. doi:10.1109/DASC.2016.7778091.
[5] D. Banerjee, A. Singh, G. Singh, Interpreting robustness proofs of deep neural networks, CoRR
abs/2301.13845 (2023). doi:10.48550/ARXIV.2301.13845.
[6] H. Salman, G. Yang, H. Zhang, C. Hsieh, P. Zhang, A convex relaxation barrier to tight robustness
verification of neural networks, in: H. M. Wallach, H. Larochelle, A. Beygelzimer, F. d’Alché-Buc,
E. B. Fox, R. Garnett (Eds.), Advances in Neural Information Processing Systems 32: Annual
Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019,
Vancouver, BC, Canada, 2019, pp. 9832–9842. URL: https://proceedings.neurips.cc/paper/2019/
hash/246a3c5544feb054f3ea718f61adfa16-Abstract.html.
[7] S. Wang, H. Zhang, K. Xu, X. Lin, S. Jana, C. Hsieh, J. Z. Kolter, Beta-crown: Eficient bound
propagation with per-neuron split constraints for neural network robustness verification, in:
M. Ranzato, A. Beygelzimer, Y. N. Dauphin, P. Liang, J. W. Vaughan (Eds.), Advances in Neural
Information Processing Systems 34: Annual Conference on Neural Information Processing Systems
2021, NeurIPS 2021, December 6-14, 2021, virtual, 2021, pp. 29909–29921. URL: https://proceedings.
neurips.cc/paper/2021/hash/fac7fead96dafceaf80c1dafeae82a4-Abstract.html.
[8] C. Dwork, M. Hardt, T. Pitassi, O. Reingold, R. S. Zemel, Fairness through awareness, in: S.
Goldwasser (Ed.), Innovations in Theoretical Computer Science 2012, Cambridge, MA, USA, January
8-10, 2012, ACM, 2012, pp. 214–226. doi:10.1145/2090236.2090255.
[9] R. Binns, On the apparent conflict between individual and group fairness, in: M. Hildebrandt,
C. Castillo, L. E. Celis, S. Ruggieri, L. Taylor, G. Zanfir-Fortuna (Eds.), FAT* ’20: Conference on
Fairness, Accountability, and Transparency, Barcelona, Spain, January 27-30, 2020, ACM, 2020, pp.
514–524. doi:10.1145/3351095.3372864.
[10] G. Amir, H. Wu, C. W. Barrett, G. Katz, An smt-based approach for verifying binarized neural
networks, in: J. F. Groote, K. G. Larsen (Eds.), Tools and Algorithms for the Construction and
Analysis of Systems - 27th International Conference, TACAS 2021, Held as Part of the European
Joint Conferences on Theory and Practice of Software, ETAPS 2021, Luxembourg City, Luxembourg,
March 27 - April 1, 2021, Proceedings, Part II, volume 12652 of Lecture Notes in Computer Science,
Springer, 2021, pp. 203–222. doi:10.1007/978-3-030-72013-1\_11.
[11] L. Pulina, A. Tacchella, An abstraction-refinement approach to verification of artificial neural
networks, in: T. Touili, B. Cook, P. B. Jackson (Eds.), Computer Aided Verification, 22nd Int.
Conf., CAV 2010, Edinburgh, UK, July 15-19, 2010. Proc., volume 6174 of LNCS, Springer, 2010, pp.
243–257. doi:10.1007/978-3-642-14295-6\_24.
[12] M. Fischetti, J. Jo, Deep neural networks and mixed integer linear optimization, Constraints An</p>
      <p>Int. J. 23 (2018) 296–309. doi:10.1007/S10601-018-9285-6.
[13] V. Tjeng, K. Y. Xiao, R. Tedrake, Evaluating robustness of neural networks with mixed integer
programming, in: 7th International Conference on Learning Representations, ICLR 2019, New
Orleans, LA, USA, May 6-9, 2019, OpenReview.net, 2019. URL: https://openreview.net/forum?id=
HyGIdiRqtm.
[14] W. Lin, Z. Yang, X. Chen, Q. Zhao, X. Li, Z. Liu, J. He, Robustness verification of classification deep
neural networks via linear programming, in: IEEE Conference on Computer Vision and Pattern
Recognition, CVPR 2019, Long Beach, CA, USA, June 16-20, 2019, Computer Vision Foundation /
IEEE, 2019, pp. 11418–11427. doi:10.1109/CVPR.2019.01168.
[15] M. Newton, A. Papachristodoulou, Exploiting sparsity for neural network verification, in: A.
Jadbabaie, J. Lygeros, G. J. Pappas, P. A. Parrilo, B. Recht, C. J. Tomlin, M. N. Zeilinger (Eds.),
Proceedings of the 3rd Annual Conference on Learning for Dynamics and Control, L4DC 2021, 7-8 June
2021, Virtual Event, Switzerland, volume 144 of Proceedings of Machine Learning Research, PMLR,
2021, pp. 715–727. URL: http://proceedings.mlr.press/v144/newton21a.html.
[16] P. Cousot, R. Cousot, Abstract interpretation: A unified lattice model for static analysis of programs
by construction or approximation of fixpoints, in: R. M. Graham, M. A. Harrison, R. Sethi (Eds.),
Conference Record of the Fourth ACM Symposium on Principles of Programming Languages, Los
Angeles, California, USA, January 1977, ACM, 1977, pp. 238–252. doi:10.1145/512950.512973.
[17] G. Singh, T. Gehr, M. Püschel, M. T. Vechev, An abstract domain for certifying neural networks,</p>
      <p>Proc. ACM Program. Lang. 3 (2019) 41:1–41:30. doi:10.1145/3290354.
[18] C. Brix, M. N. Müller, S. Bak, T. T. Johnson, C. Liu, First three years of the international verification
of neural networks competition (VNN-COMP), Int. J. Softw. Tools Technol. Transf. 25 (2023)
329–339. doi:10.1007/S10009-023-00703-4.
[19] K. Xu, H. Zhang, S. Wang, Y. Wang, S. Jana, X. Lin, C. Hsieh, Fast and complete: Enabling
complete neural network verification with rapid and massively parallel incomplete verifiers, in:
9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May
3-7, 2021, OpenReview.net, 2021. URL: https://openreview.net/forum?id=nVZtXBI6LNn.
[20] S. Ugare, D. Banerjee, S. Misailovic, G. Singh, Incremental verification of neural networks, Proc.</p>
      <p>ACM Program. Lang. 7 (2023) 1920–1945. doi:10.1145/3591299.
[21] S. P. Lloyd, Least squares quantization in PCM, IEEE Trans. Inf. Theory 28 (1982) 129–136.</p>
      <p>doi:10.1109/TIT.1982.1056489.
[22] A. Maurer, D. A. Parletta, A. Paudice, M. Pontil, Robust unsupervised learning via l-statistic
minimization, in: M. Meila, T. Zhang (Eds.), Proceedings of the 38th International Conference
on Machine Learning, ICML 2021, 18-24 July 2021, Virtual Event, volume 139 of Proceedings of
Machine Learning Research, PMLR, 2021, pp. 7524–7533. URL: http://proceedings.mlr.press/v139/
maurer21a.html.
[23] U. von Luxburg, Clustering stability: An overview, Found. Trends Mach. Learn. 2 (2009) 235–274.</p>
      <p>URL: https://doi.org/10.1561/2200000008. doi:10.1561/2200000008.
[24] D. Moulavi, P. A. Jaskowiak, R. J. G. B. Campello, A. Zimek, J. Sander, Density-based clustering
validation, in: M. J. Zaki, Z. Obradovic, P. Tan, A. Banerjee, C. Kamath, S. Parthasarathy (Eds.),
Proceedings of the 2014 SIAM International Conference on Data Mining, Philadelphia, Pennsylvania,
USA, April 24-26, 2014, SIAM, 2014, pp. 839–847. URL: https://doi.org/10.1137/1.9781611973440.96.
doi:10.1137/1.9781611973440.96.
[25] F. Buet-Golfouse, I. Utyagulov, Towards fair unsupervised learning, in: FAccT ’22: 2022 ACM
Conference on Fairness, Accountability, and Transparency, Seoul, Republic of Korea, June 21 - 24,
2022, ACM, 2022, pp. 1399–1409. doi:10.1145/3531146.3533197.
[26] M. Fredrikson, E. Lantz, S. Jha, S. M. Lin, D. Page, T. Ristenpart, Privacy in pharmacogenetics: An
end-to-end case study of personalized warfarin dosing, in: K. Fu, J. Jung (Eds.), Proc. of the 23rd
USENIX Security Symp., San Diego, CA, USA, August 20-22, 2014, USENIX Assoc., 2014, pp. 17–
32. URL: https://www.usenix.org/conference/usenixsecurity14/technical-sessions/presentation/
fredrikson_matthew.
[27] N. Balasubramaniam, M. Kauppinen, A. Rannisto, K. Hiekkanen, S. Kujala, Transparency and
explainability of AI systems: From ethical guidelines to requirements, Inf. Softw. Technol. 159
(2023) 107197. doi:10.1016/J.INFSOF.2023.107197.
[28] C. Dwork, Diferential privacy, in: M. Bugliesi, B. Preneel, V. Sassone, I. Wegener (Eds.), Automata,
Languages and Programming, 33rd International Colloquium, ICALP 2006, Venice, Italy, July 10-14,
2006, Proceedings, Part II, volume 4052 of Lecture Notes in Computer Science, Springer, 2006, pp.
1–12. URL: https://doi.org/10.1007/11787006_1. doi:10.1007/11787006\_1.
[29] O. Goldreich, Secure multi-party computation, Manuscript. Preliminary version 78 (1998) 1–108.
[30] M. Mirman, A. Hägele, P. Bielik, T. Gehr, M. T. Vechev, Robustness certification with generative
models, in: S. N. Freund, E. Yahav (Eds.), PLDI ’21: 42nd ACM SIGPLAN International Conference
on Programming Language Design and Implementation, Virtual Event, Canada, June 20-25, 2021,
ACM, 2021, pp. 1141–1154. doi:10.1145/3453483.3454100.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>G.</given-names>
            <surname>Katz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. W.</given-names>
            <surname>Barrett</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. L.</given-names>
            <surname>Dill</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Julian</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. J.</given-names>
            <surname>Kochenderfer</surname>
          </string-name>
          ,
          <string-name>
            <surname>Reluplex:</surname>
          </string-name>
          <article-title>An eficient SMT solver for verifying deep neural networks</article-title>
          , in: R.
          <string-name>
            <surname>Majumdar</surname>
          </string-name>
          , V. Kuncak (Eds.),
          <source>Computer Aided Verification - 29th International Conference, CAV 2017</source>
          , Heidelberg, Germany,
          <source>July 24-28</source>
          ,
          <year>2017</year>
          , Proceedings,
          <string-name>
            <surname>Part</surname>
            <given-names>I</given-names>
          </string-name>
          , volume
          <volume>10426</volume>
          <source>of LNCS</source>
          , Springer,
          <year>2017</year>
          , pp.
          <fpage>97</fpage>
          -
          <lpage>117</lpage>
          . doi:
          <volume>10</volume>
          .1007/978-3-
          <fpage>319</fpage>
          -63387-9\_5.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>