<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>European Workshop on Algorithmic Fairness, June</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Formally Verified Algorithmic Fairness using Information-Flow Tools (Extended Abstract)</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Samuel Teuber</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Bernhard Beckert</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Karlsruhe Institute of Technology (KIT)</institution>
          ,
          <addr-line>Karlsruhe</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2023</year>
      </pub-date>
      <volume>0</volume>
      <fpage>7</fpage>
      <lpage>09</lpage>
      <abstract>
        <p>This work presents results on the use of Information-Flow tools for the formal verification of algorithmic fairness properties. The problem of enforcing secure information-flow was originally studied in the context of information security: If secret information may “flow” through an algorithm in such a way that it can influence the program's output, we consider that to be insecure information-flow as attackers could potentially observe (parts of) the secret. Due to its wide-spread use, there exist numerous tools for analyzing secure information-flow properties. Recent work showed that there exists a strong correspondence between secure information-flow and algorithmic fairness: If protected group attributes are treated as secret program inputs, then secure information-flow means that these “secret” attributes cannot influence the result of a program. We demonstrate that of-the-shelf tools for information-flow can be used to formally analyze algorithmic fairness properties including established notions such as (conditional) demographic parity as well as a new quantitative notion named fairness spread.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Program Analysis</kwd>
        <kwd>Formal Methods</kwd>
        <kwd>Demographic Parity</kwd>
        <kwd>Quantitative Analysis</kwd>
        <kwd>Qualitative Analysis</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Information-Flow is a well-studied concept originally introduced for software security analysis
(see, e.g., [
        <xref ref-type="bibr" rid="ref1 ref2 ref3 ref4">1, 2, 3, 4, 5, 6, 7, 8</xref>
        ]). Consider a program  that checks whether a user entered the
right password.  ’s inputs are the correct password (high security status) as well as the user
entered password (low security status). A “bad” program that, for example, returns the full secret
password to the user would be said to contain insecure information-flow. In general, we say that
 contains insecure information-flow if there exist two high security status values ℎ, ℎ′ and a
low security value  such that  returns diferent outputs for the inputs (ℎ, ) and (ℎ′, ). We
build upon previous work which observed similarities between the ideas in algorithmic fairness
and information-flow analysis [ 9] and show that there is a strong correspondence between
definitions in qualitative as well as quantitative information-flow and algorithmic fairness: By
treating protected attributes such as race, gender, or age as “secret” inputs, we can analyze their
influence on the program’s outcome. We demonstrate that many of-the-shelf qualitative and
quantitative information-flow tools can equally be used to verify algorithmic fairness properties
such as demographic parity or a newly introduced metric which we call Fairness Spread.
      </p>
    </sec>
    <sec id="sec-2">
      <title>2. Information-Flow and Demographic Parity</title>
      <sec id="sec-2-1">
        <title>Qualitative Information-Flow.</title>
        <p>We demonstrate that information-flow provides an
interesting framework for the analysis of disparate treatment. To this end, we consider decision-making
programs  which process a protected group attribute  ∈  and an independently distributed
unprotected variable  ∈  which can be comprised of a single value or a tuple of multiple
values. The program has an outcome  ∈ . By assigning  to have high security status and
 to have low security status, we can use existing tools to check whether a given program
satisfies unconditional noninterference:
Definition 1 (Unconditional Noninterference (see e.g. [7])). A program  satisfies unconditional
noninterference if for all 1, 2 ∈  and for all  ∈  it holds that  (1, ) =  (2, ).</p>
        <p>It can be shown that if  satisfies unconditional noninterference with respect to  and  it
also satisfies the fairness notion of demographic parity:
Pr [ (,  ) =  |  = 1] = Pr [ (,  ) =  |  = 2]
Lemma 2 (Unconditional Noninterference implies Demographic Parity). Let  be a program and
let ,  be distributed arbitrarily, but independently. If  satisfies unconditional noninterference,
then the outcome of  satisfies demographic parity, i.e., for all  ∈ , 1, 2 ∈  it holds that</p>
        <p>Note that the inverse is not the case which makes unconditional noninterference a strictly
stronger property. More refined flavors of information-flow which allow some level of
information leakage lead to other kinds of, more restricted, fairness guarantees. Information-flow
properties can be verified using of-the-shelf tools such as the deductive program verification
tool KeY [7] or the static analysis tool Joana [10, 11].</p>
        <p>∈ ∈</p>
        <p>∈</p>
      </sec>
      <sec id="sec-2-2">
        <title>Quantitative Information-Flow.</title>
        <p>
          Quantitative Information-Flow (see e.g. [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ]) transforms
the qualitative security analysis of software into a metric for a software’s security. In this
context, one important notion is the idea of Conditional Vulnerability:
Definition 3 (Conditional Vulnerability). For a program  and random independent variables
,  , we define the Conditional Vulnerability  of  w.r.t.  (,  ) and  as follows:
 ( |  (, ) , ) = ∑︁ ∑︁ max (Pr [ =  |  (, ) = ,  = ] · Pr [ (, ) = ,  = ]) .
        </p>
        <p>For binary classifiers (i.e., for || = 2), our work shows that Conditional Vulnerability can
equally be used in the fairness context to compute a metric which we call Fairness Spread. For
programs  with a binary outcome and ,  as before, the metric of Fairness Spread quantifies
the weighted disparity in treatment between the most advantaged and most disadvantaged
groups for each possible assignment of unprotected attributes, i.e.:
∑︁ Pr [ = ]
∈
︂(
1,2∈
max (Pr [ (, ) = 1 |  = 1,  = ] − Pr [ (, ) = 1 |  = 2,  = ]) .
︂)
This metric satisfies three desirable requirements for a metric: 1. The metric returns its minimum
(optimal) value if the program satisfies (un)conditional demographic parity. 2. A less fair program
has a higher value of the metric. 3. It can be measured using of-the-shelf tools (e.g. [12]).</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Exemplary Applications</title>
      <p>
        As an exemplary application for information-flow analysis, consider the credit allocation
algorithm in Figure 1b which contains a threshold parameter T: Assuming a uniform, independent
distribution of gender and amount this program satisfies demographic parity for the
parameter T set to T = 5, as then both groups are equally likely to obtain a credit. However, the
program could still be considered unfair as one group only gets small credits while the other
group only gets large credits. Indeed, for T ∈ [
        <xref ref-type="bibr" rid="ref1">1, 9</xref>
        ] this program does not satisfy unconditional
noninterference, and we can furthermore compute its Fairness Spread for difering values of T
using counterSharp [12] (Figure 1c). This allows us to compare the fairness of diferent variants
of the algorithm. For T = 0 and T = 10, the program does satisfy unconditional noninterference
and, thus, demographic parity as well. This can be verified using KeY [7].
      </p>
      <p>One might argue that the program in Figure 1b is an overly simplified version of realistic
decision making procedures raising the question to which degree this analysis technique is
scalable to larger decision making programs. To this end, it is worth to mention that the
qualitative information-flow tool Joana [ 11, 10] is scalable to systems with up to 100k lines
of Java code and the quantitative counterSharp tool has been tested on systems with up to
1000 lines of C code [12]. This is possible due to the use of the approximate model counter
ApproxMC [13, 14] which allows the eficient counting of inputs satisfying given constraints
even for very large input domains.</p>
      <p>As a concrete example, consider the program in Figure 1a: While the previous example
showed an obvious case of disparate treatment, this example is much more subtle: Initially,
the program does make use of the variables gender and race. However, the computation of
the final score is mostly unintelligible, which leaves the influence of the use of gender and
race unclear. Figure 1a admittedly is a fictional example, however, unintelligible and possibly
undocumented code is hardly an uncommon phenomenon in practice. For example, it may be
the case that an engineer constructed the algorithm who has since left the company without
documenting the reasoning behind the computation or the choice of the magic constant 360.
We can learn multiple lessons from this example: First, analyzing whether or not a certain
variable has influence on the outcome of a computation is not a trivial problem. In fact, assuming
a value range of 1 to 5 for the inputs college_rank and experience, the function hire
satisfies unconditional noninterference and could thus be considered a fair algorithm under
the definition of demographic parity. While a human will have a hard time seeing this directly
from the program code, interactive theorem proving tools can once again be used to formally
prove this property in KeY. Secondly, while tools for the information-flow-based analysis of
machine-learning systems are currently not available, we can allow for the use of values learned
by machine learning systems, e.g, the score threshold T which is a constant within the program
hire (line 20). This allows, for example, to prove a program fair for all possible values of
a machine learned threshold T and leaves room for a machine learning algorithm to learn
the optimal threshold value. Finally, our approach allows the analysis of both single group
attributes as well as mixtures of group attributes, e.g., the disparate treatment of individuals
with gender==0 &amp;&amp; race == 0.</p>
    </sec>
    <sec id="sec-4">
      <title>4. Future Work</title>
      <p>As a next step we plan to analyze larger scale systems to prove or disprove fairness properties of
such systems. Our approach currently has two limitations: First, we assume the independence of
the protected attribute  and  . To this end, causal models or Bayesian networks may provide
a methodology to model important dependencies between  and  within the analyzed code.
Moreover, we can only analyze “classical” software currently. Therefore, we plan to develop
techniques for analyzing information-flow properties on machine learning models.
Software, ETAPS 2009, York, UK, March 22-29, 2009. Proceedings, volume 5504 of LNCS,
Springer, Cham, 2009, pp. 288–302. doi:10.1007/978-3-642-00596-1\_21.
[5] B. Beckert, D. Bruns, V. Klebanov, C. Scheben, P. H. Schmitt, M. Ulbrich, Information flow in
object-oriented software, in: G. Gupta, R. Peña (Eds.), Logic-Based Program Synthesis and
Transformation, 23rd International Symposium, LOPSTR 2013, Madrid, Spain, September
18-19, 2013, Revised Selected Papers, volume 8901 of LNCS, Springer, Cham, 2013, pp.
19–37. doi:10.1007/978-3-319-14125-1\_2.
[6] V. Klebanov, Precise quantitative information flow analysis - a symbolic approach, Theor.</p>
      <p>Comput. Sci. 538 (2014) 124–139. doi:10.1016/j.tcs.2014.04.022.
[7] W. Ahrendt, B. Beckert, R. Bubel, R. Hähnle, P. H. Schmitt, M. Ulbrich (Eds.), Deductive
Software Verification - The KeY Book - From Theory to Practice, volume 10001 of LNCS,
Springer, Cham, 2016. doi:10.1007/978-3-319-49812-6.
[8] F. Biondi, M. A. Enescu, A. Heuser, A. Legay, K. S. Meel, J. Quilbeuf, Scalable approximation
of quantitative information flow in programs, in: I. Dillig, J. Palsberg (Eds.), Verification,
Model Checking, and Abstract Interpretation - 19th International Conference, VMCAI 2018,
Los Angeles, CA, USA, January 7-9, 2018, Proceedings, volume 10747 of LNCS, Springer,
Cham, 2018, pp. 71–93. doi:10.1007/978-3-319-73721-8\_4.
[9] B. Beckert, M. Kirsten, M. Schefczyk, Algorithmic Fairness and Secure Information
Flow (Extended Abstract), in: C. Heitz, C. Hertweck, E. Viganò, M. Loi (Eds.), European
Workshop on Algorithmic Fairness (EWAF ’22), Lightning round track, 2022. URL: https:
//sites.google.com/view/ewaf22/accepted-papers.
[10] G. Snelting, D. Gifhorn, J. Graf, C. Hammer, M. Hecker, M. Mohr, D. Wasserrab, Checking
probabilistic noninterference using joana, it - Information Technology 56 (2014) 280–287.
doi:10.1515/itit-2014-1051.
[11] J. Graf, M. Hecker, M. Mohr, Using joana for information flow control in java programs
- a practical guide, in: Proceedings of the 6th Working Conference on Programming
Languages (ATPS’13), Lecture Notes in Informatics (LNI) 215, Springer Berlin / Heidelberg,
2013, pp. 123–138.
[12] S. Teuber, A. Weigl, Quantifying software reliability via model-counting, in: A. Abate,
A. Marin (Eds.), Quantitative Evaluation of Systems - 18th International Conference, QEST
2021, Paris, France, August 23-27, 2021, Proceedings, volume 12846 of LNCS, Springer,
Cham, 2021, pp. 59–79. doi:10.1007/978-3-030-85172-9\_4.
[13] S. Chakraborty, K. S. Meel, M. Y. Vardi, Algorithmic improvements in approximate counting
for probabilistic inference: From linear to logarithmic SAT calls, in: Proceedings of the
Twenty-Fifth International Joint Conference on Artificial Intelligence, IJCAI 2016, New
York, NY, USA, 9-15 July 2016, 2016, pp. 3569–3576.
[14] M. Soos, K. S. Meel, BIRD: engineering an eficient CNF-XOR SAT solver and its applications
to approximate model counting, in: The Thirty-Third AAAI Conference on Artificial
Intelligence, AAAI 2019, The Thirty-First Innovative Applications of Artificial Intelligence
Conference, IAAI 2019, The Ninth AAAI Symposium on Educational Advances in Artificial
Intelligence, EAAI 2019, Honolulu, Hawaii, USA, January 27 - February 1, 2019, 2019, pp.
1592–1599. doi:10.1609/aaai.v33i01.33011592.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>B. W.</given-names>
            <surname>Lampson</surname>
          </string-name>
          ,
          <article-title>A note on the confinement problem</article-title>
          ,
          <source>Commun. ACM</source>
          <volume>16</volume>
          (
          <year>1973</year>
          )
          <fpage>613</fpage>
          -
          <lpage>615</lpage>
          . doi:
          <volume>10</volume>
          .1145/362375.362389.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>D. E. R.</given-names>
            <surname>Denning</surname>
          </string-name>
          ,
          <article-title>Secure information flow in computer systems</article-title>
          .,
          <source>Ph.D. thesis</source>
          , Purdue University,
          <year>1975</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>Á.</given-names>
            <surname>Darvas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Hähnle</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Sands</surname>
          </string-name>
          ,
          <article-title>A theorem proving approach to analysis of secure information flow</article-title>
          , in: D.
          <string-name>
            <surname>Hutter</surname>
          </string-name>
          , M. Ullmann (Eds.), Security in Pervasive Computing, Second International Conference, SPC 2005, Boppard, Germany, April 6-
          <issue>8</issue>
          ,
          <year>2005</year>
          , Proceedings, volume
          <volume>3450</volume>
          <source>of LNCS</source>
          , Springer, Cham,
          <year>2005</year>
          , pp.
          <fpage>193</fpage>
          -
          <lpage>209</lpage>
          . doi:
          <volume>10</volume>
          .1007/978-3-
          <fpage>540</fpage>
          -32004-3\ _
          <fpage>20</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>G.</given-names>
            <surname>Smith,</surname>
          </string-name>
          <article-title>On the foundations of quantitative information flow</article-title>
          , in: L. de Alfaro (Ed.),
          <source>Foundations of Software Science and Computational Structures</source>
          , 12th International Conference, FOSSACS 2009,
          <article-title>Held as Part of the Joint European Conferences on Theory and Practice of</article-title>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>