<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Provable Fairness for Neural Network Models Using Formal Verification</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Giorgian Borca-Tasciuc</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Xingzhi Guo</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Stanley Bak</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Steven Skiena</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Computer Science, Stony Brook University</institution>
          ,
          <addr-line>NY</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Machine learning models, extensively deployed in critical decision-making roles, necessitate verification for potential gender or racial biases from training data. Traditional fairness strategies focus on curating training data and subsequent statistical evaluation of model fairness. We, however, propose techniques that ofer formal proof of fairness, leveraging recent advancements in neural network model verification. Our methods ofer robust guarantees and uniquely, they eliminate the need for explicit training or evaluation data, often proprietary, to analyze a trained model. Experimental results from the well-known ADULTS dataset demonstrate that appropriate training can decrease unfairness by an average of 65.4% with less than 1% cost in AUC score.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Fairness Metrics</kwd>
        <kwd>Provable Fairness</kwd>
        <kwd>Neural Network Verification</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Machine learning models are increasingly utilized in critical decision-making tasks, and it is
vital to ensure they are devoid of gender or racial biases [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. Traditional fairness strategies focus
on curating training data and post-hoc evaluation on testing data. However, these methods
often require proprietary data and do not provide concrete proof of fairness.
      </p>
      <p>We propose techniques to formally prove fairness in neural network models, without needing
explicit training or testing data [2, 3, 4, 5]. We explore fairness specifications and apply formal
methods for classification tasks with explicit input labels that encode sensitive properties such
as race, gender, or age. Fairness in this context implies that model  should behave similarly
across all values of these sensitive fields. Importantly, eliminating these inputs during training
does not ensure fairness, as sensitive properties can be inferred from other variables, such as
zip codes.</p>
      <p>Our approach is primarily applied to fully connected networks with ReLU activation
functions [6]. In these networks, binary decision models divide possible inputs into geometric
polytopes, and all points within a polytope share the same label. Fairness properties involve
demonstrating that these regions are similar for distinct labels or groups (Figure 1). We define
fairness in  using three provable properties: Volumetric symmetric diference ,
Probabilityweighted symmetric diference , and Net Preference, each depending on specific fairness criteria.
Our key contribution is:
• Provable Fairness Guarantees without Training or Evaluation Data – We define a class of
fairness guarantees for neural classification models that require only black box access to
the model. Our metrics either require no training data or knowledge of the class features’
distribution, or only require knowledge of the features’ probability distribution, which
can be estimated using demographic studies.</p>
      <p>A full overview of the contributions and a more in-depth analysis of the results is available
in the full paper.[7]</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Work</title>
      <p>Formal verification methods for neural networks : Formal verification methods conduct
set-based analysis of networks, eschewing the execution of individual input samples [8, 2, 9].
Given a neural network  : R → R, an input set  ⊆ R, and an unsafe set of outputs
 ⊆ R, the open-loop NN verification problem aims to ascertain that for all inputs  ∈ ,
the output  () does not belong to the unsafe set  . The problem is typically resolved by
computing the range of the neural network function for an input domain  and checking if the
range intersects the unsafe states  [10, 11]. In this paper, we repurpose this range computation
approach to examine how probability distributions propagate through the networks.</p>
      <p>Two core operations in our work are: (1) computing the range of the neural network as
a union of polytopes, and (2) computing the volume of the output polytopes. Unlike the
conventional usage of formal verification methods for safety and motion planning, we explore
their application for neural network fairness measurement.</p>
      <p>Limited research exists on provable fairness. Some frameworks focus on proving dependency
fairness [12, 13], aiming to ensure outputs are not influenced by certain input features through
forward and backward static analysis and input feature partitioning. Another approach [14]
emphasizes individual fairness, where similar individuals receive similar treatments. Our
provable fairness metrics, however, are defined over geometric properties of the network outputs,
such as the symmetric diference between the ranges with diferent values of sensitive inputs.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Fairness Verification via Formal Methods</title>
      <sec id="sec-3-1">
        <title>3.1. Acceptance Region</title>
        <p>To evaluate model fairness, we scrutinize the acceptance regions, denoted as  (). Given a
neural network assigning the label  (⃗) for input value ⃗,  () represents the input space
where ⃗ ∈  ∧  (⃗) = ℓ. The defining label ℓ is user-determined based on specific neural
network properties of interest.</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Fairness Metrics</title>
        <p>Our fairness evaluation relies on metrics embodying legal notions of disparate treatment and
impact. We develop and use three metrics: WSD (weighted symmetric diference), VSD
(volumetric symmetric diference), and NP (net preference). WSD precisely captures the proportion
of individuals afected by class-specific rules in an unfair model. The VSD is an approximation
of the WSD that can be computed without access to the distribution of class features. The NP
solely captures disparities arising from class-agnostic rules. WSD and VSD measure disparate
treatment, and NP measures disparate impact. Details of the metrics are explained in the full
paper.</p>
      </sec>
      <sec id="sec-3-3">
        <title>3.3. Verification Approach</title>
        <p>Our verification technique extends set-based neural network execution to encompass probability
distributions. We propagate a set of potential inputs through the network to determine the
possible output range. We use the linear star set representation [15, 16] for state set propagation.
Post-analysis, the input set is divided into polytopes, each mapping to outputs with an identical
label.</p>
        <p>This representation enables a straightforward fairness verification process for neural
networks, assuming the direct encoding of 1 and 2 into the input features by setting one or
more of the features to the appropriate values. The procedure includes:
1. Using neural-network reachability analysis to determine acceptance regions for each class:  (1)
and  (2), then calculating  (1) ∩  (2).</p>
        <p>2. Performing the appropiate integral over the acceptance region depending on the desired metric.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Experimental Results</title>
      <p>We evaluate various models’ fairness, trained on the ADULTS dataset [17], using our proposed
metrics (Section 3.2). Our method, distinct from those in Section 2, does not necessitate extensive
data sampling or testing data. The ADULTS dataset, comprising both categorical and continuous
variables, includes at least one protected feature (e.g., Race or Sex). We consider this a potential
source of unfair bias during model training, and use three training strategies (race-permute,
augment data, remove data) designed to mitigate this bias. Training strategy details are described
in the full paper.</p>
      <p>We note the following results. Baseline models exhibit unfairness without fairness training. A
significant WSD for all the baseline models indicate that many individuals are labeled diferently
due to class-specific rules in the model. The VSD serves as a proxy for WSD when input feature
probability distributions are unavailable. In 11/16 trials, the model with the lowest WSD also
had the lowest VSD. Finally, it is possible to significantly improve fairness while maintaining
or even improving - accuracy.</p>
      <p>0.14
0.12
0.10
SD0.08</p>
      <p>WSD vs AUC
Remove Data
Augment Data
Base (Race-Permute)</p>
      <p>Base (Original)
W0.06
0.04
0.02
0.000.810 0.815 0.820 0.825 0.830 0.835 0.840 0.845</p>
      <p>AUC</p>
      <p>VSD vs AUC</p>
    </sec>
    <sec id="sec-5">
      <title>5. Conclusion</title>
      <p>This paper presents a novel technique for formal fairness analysis of neural network models,
circumventing the need for access to the model’s training data. This broadens the scope
for model fairness evaluation. Our evaluation of various models highlights the substantial
unfairness in models trained without intervention. We establish that a probability-free approach
can efectively substitute when input probability distribution is inaccessible. Due to the formal
methods applied, our method provides strong guarantees about the fairness of the model and
its potential impact.
[2] C. Liu, T. Arnon, C. Lazarus, C. Strong, C. Barrett, M. J. Kochenderfer, Algorithms for
verifying deep neural networks, arXiv preprint arXiv:1903.06758 (2019).
[3] H.-D. Tran, W. Xiang, T. T. Johnson, Verification approaches for learning-enabled
autonomous cyber-physical systems, IEEE Design &amp; Test (2020).
[4] G. Katz, C. Barrett, D. L. Dill, K. Julian, M. J. Kochenderfer, Reluplex: An eficient smt
solver for verifying deep neural networks, in: International Conference on Computer
Aided Verification, Springer, 2017.
[5] T. Gehr, M. Mirman, D. Drachsler-Cohen, P. Tsankov, S. Chaudhuri, M. Vechev, Ai2: Safety
and robustness certification of neural networks with abstract interpretation, in: 2018 IEEE
Symposium on Security and Privacy (SP), IEEE, 2018, pp. 3–18.
[6] S. Bak, C. Liu, T. T. Johnson, The second international verification of neural networks
competition (VNN-COMP 2021): Summary and results, CoRR abs/2109.00498 (2021). URL:
https://arxiv.org/abs/2109.00498. arXiv:2109.00498.
[7] G. Borca-Tasciuc, X. Guo, S. Bak, S. Skiena, Provable fairness for neural network models
using formal verification, 2022. arXiv:2212.08578.
[8] W. Xiang, P. Musau, A. A. Wild, D. M. Lopez, N. Hamilton, X. Yang, J. Rosenfeld, T. T.</p>
      <p>Johnson, Verification for machine learning, autonomy, and neural networks survey, arXiv
preprint arXiv:1810.01989 (2018).
[9] A. Albarghouthi, Introduction to Neural Network Verification, verifieddeeplearning.com,
2021. arXiv:2109.10317, http://verifieddeeplearning.com.
[10] S. Bak, H.-D. Tran, K. Hobbs, T. T. Johnson, Improved geometric path enumeration for
verifying ReLU neural networks, in: 32nd International Conference on Computer Aided
Verification, Springer, 2020.
[11] J. A. Vincent, M. Schwager, Reachable polyhedral marching (rpm): A safety verification
algorithm for robotic systems with deep neural network components, in: 2021 IEEE
International Conference on Robotics and Automation (ICRA), IEEE, 2021, pp. 9029–9035.
[12] C. Urban, M. Christakis, V. Wüstholz, F. Zhang, Perfectly parallel fairness certification of
neural networks, Proceedings of the ACM on Programming Languages 4 (2020) 1–30.
[13] S. Galhotra, Y. Brun, A. Meliou, Fairness testing: testing software for discrimination, in:
Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering, 2017,
pp. 498–510.
[14] A. Ruoss, M. Balunović, M. Fischer, M. Vechev, Learning certified individually fair
representations, arXiv preprint arXiv:2002.10312 (2020).
[15] P. S. Duggirala, M. Viswanathan, Parsimonious, simulation based verification of linear
systems, in: International Conference on Computer Aided Verification, Springer, 2016, pp.
477–494.
[16] H.-D. Tran, S. Bak, W. Xiang, T. T. Johnson, Verification of deep convolutional neural
networks using imagestars, in: Proceedings of the 32nd International Conference on
Computer Aided Verification, Springer, 2020.
[17] D. Dua, C. Graf, UCI machine learning repository, 2017. URL: http://archive.ics.uci.edu/ml.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>J.</given-names>
            <surname>Angwin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Larson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Mattu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Kirchner</surname>
          </string-name>
          ,
          <article-title>Machine bias</article-title>
          ,
          <source>in: Ethics of Data and Analytics</source>
          , Auerbach Publications,
          <year>2016</year>
          , pp.
          <fpage>254</fpage>
          -
          <lpage>264</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>