<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Proof-checking bias in labeling methods</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Giuseppe Primiero</string-name>
          <email>giuseppe.primiero@unimi.it</email>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Fabio D'Asaro</string-name>
          <email>fabioaurelio.dasaro@univr.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="editor">
          <string-name>Bias, Classifiers, Formal Verification</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Ethos Group, Department of Human Sciences, University of Verona</institution>
          ,
          <addr-line>Via S. Francesco, 22, Verona, 37129</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Logic, Uncertanty, Computation and Information Group, Department of Philosophy, University of Milan</institution>
          ,
          <addr-line>Via Festa del</addr-line>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Perdono</institution>
          ,
          <addr-line>7, Milano, 20122</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>We introduce a typed natural deduction system designed to formally verify the presence of bias in automatic labeling methods. The system relies on a ”data-as-terms” and ”labels-as-types” interpretation of formulae, with derivability contexts encoding probability distributions on training data. Bias is understood as the divergence that expected probabilistic labeling by a classifier trained on opaque data displays from the fairness constraints set by a transparent dataset.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        The formal verification of AI systems is a growing field of research, and its applications are
essential to the deployment of safe systems in several domains with high societal impact.
This theme is attracting more and more attention from the community at large, not just the
formal verification specialists, see e.g. the recent [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. Essential to this task is to reason about
probabilistic processes, in order to accommodate the uncertainty and non-deterministic behavior
characteristic of AI systems, and to model desirable forms of accuracy and trustworthiness.
      </p>
      <p>
        This objective can be formulated in terms of formal methods to prove trustworthy properties
of interest, see e.g. [
        <xref ref-type="bibr" rid="ref2 ref3 ref4">2, 3, 4</xref>
        ]; or with linear temporal logic properties defined over Markov
decision processes, e.g. with reinforcement learning methods [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] or with imprecise probabilities
[
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. A diferent approach consists in exploiting a proof-theoretic semantics and the tradition
of lambda-calculi and the Curry-Howard isomorphism to interpret computational processes
as proof terms and output values as their types. Under the general heading of proof-checking
techniques, we can identify untyped  -calculi [
        <xref ref-type="bibr" rid="ref7 ref8">7, 8, 9</xref>
        ], probabilistic event  -calculi [10], calculi
for Bayesian learning [11], and calculi with types or natural deduction systems [12, 13, 14]. In
this latter tradition, the calculus TPTND [15, 16] ofers a novel interpretation of a probabilistic
typed natural deduction system specifically designed to formally check the post-hoc validity of
a computational process executed under a possibly opaque distribution of outputs against a
https://sites.unimi.it/gprimiero/ (G. Primiero); https://www.fabiodasaro.com (F. D’Asaro)
      </p>
      <p>© 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
transparent and desirable model. Here trustworthiness is interpreted as an admissible distance
in the probability assigned to an observed output when compared to the probability that the
intended behavior should display.</p>
      <p>The threats that AI systems pose to reliable and safe knowledge are varied and, among these,
the potential bias of is a rising phenomenon. Dating back to the analysis of bias in deterministic
systems see e.g. [17], several recent studies have shown that training and evaluation data for
classification algorithms are often biased. [ 18] illustrates algorithmic discrimination based
on gender and race; [19] lists diferent types of bias for data and algorithms. [ 20] shows that
the most well-known AI datasets are full of labeling errors. From a formal viewpoint, the
trustworthiness of an AI system may be considered also from the point of view of the absence
of bias. Given an automated labeling method, one might ask whether it displays any bias, and
if so, whether such level is below an admissible maximum potential threshold. Under this
perspective, a new task is the design of systems that will make it possible to formally verify the
absence of bias, or its presence within limits considered admissible in the application domain.
Consider as an example the potential of automatic AI-based classification in recruiting, where
the aim is to automatically select the best applicants out of huge pools of resumè. Notoriously
this is not without risks, as shown by the infamous Amazon case, where the selection process
was non gender-neutral due to the training of models based on patterns in resumes submitted
over a 10-year period in which most applicants were men, thereby reflecting and consolidating
male dominance in the tech industry, [21]. An important task is therefore to devise methods
were gender-balance (or similar sensitive attributes) are treated as safety properties to be
automatically checked.</p>
      <p>In this paper we present TPTND-KL, a system re-interpreting TPTND to reason about labeling
of random variables and datapoints, with the task of inferring the presence of bias in classification
methods. The underlying interpretation of formulas can be dubbed ”datapoints-as-terms” and
”labels-as-types”. In this context, we understand bias as a measure of the entropy between
a given opaque probabilistic assignment of labels to a datapoint and the results given by a
distribution of labels assumed to be correct, or at least desirable. This interpretation is inspired
by [22]. The rest of the paper is structured as follows. In Section 2 we introduce the language,
stressing the novel interpretation with respect to the previous version of the system and the
introduction of the new  operator. In Section 3 we reformulate the proof system under
the novel isomorphism between syntactic terms and semantic interpretation. In Section 4 we
extend the proof system with rules for checking the presence of bias, with particular attention
to the novel distance measure chosen for this application. In Section 5 we provide a detailed
example to show the checking method at work. We conclude with limitations of this approach
and further topics of research.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Formal Preliminaries</title>
      <p>The logic TPTND-KL is a typed natural deduction system built from the following syntax:</p>
      <sec id="sec-2-1">
        <title>Definition 1 (Syntax).</title>
        <p>V a r i a b l e S e t ∶= x ∣ x T ∣ x̃ T ∣ x T∗ ∣ ⟨X , X ⟩ ∣ fst(X ) ∣ snd(X ) ∣ [X ]X ∣ X .X ∣ X .T
D a t a s e t ∶= t ∣ ⟨T , T ⟩ ∣ fst(T ) ∣ snd(T ) ∣ [X ]T ∣ T .T ′ ∣ ( T )
L a b e l S e t ∶=  ∣   ∣ ⊥ ∣ ( × )  ∣ ( + )  ∣ ( → ) [ ′]</p>
        <p>A s s i g n m e n t s ∶= {} ∣ C ,  ∶  ∣ C ,  ∶</p>
        <p>It includes: (random) variables of the form   with an associated datapoint  and annotated
with indicators for being the correct labeled variable  ̃ or the wrongly labeled variable  ∗, and
composed as pairs, first and second projection, abstraction and application (also with terms);
terms  standing for datapoints, composed as pairs, first and second projection, abstraction (with
respect to a variable of interest) and application, and taken as argument of a  operator; a
labelset with labels  decorating variables and terms and indexed by a real value  , composed as
conjunction, disjunction and implication; finally, an assignment function which lists a probability
distribution  of labels to variables.</p>
        <p>Example formulas of TPTND-KL have the following form:
•  ∶  ⊢  ∶  0.1 : the random variable  has probability 0.1 of having label  , provided 
has label  , i.e.  ( =  ∣  = ) = 0.1 .
• Γ ⊢  100 ∶  0̃.35: under distribution Γ (e.g. of a given population), for 100 people one
expects label   to be assigned 35% of the times.
• Γ ⊢  100 ∶  30: for 100 people in our dataset the label   was assigned 30 times.
•  ̃ ∶  0.5,   ∗ ∶  0.35 ⊢  ∶  0.3: a datapoint  is labeled  with probability 0.3 provided
that  is wrongly labeled  with probability 0.5, and correctly  with probability 0.35, i.e.
 ( =  ) = 0.3 ,  (  =  ∣  ̃  ≠  ) = 0.5 and  (  =  ∣  ̃  =  ) = 0.35 .</p>
        <p>The main novelty of TPTND-KL (and what diferentiates it from its ancestor system TPTND)
is the interpretation of its formulae in the style of ”datapoints-as-terms” and ”labels-as-types”,
and the ability to express the probability of label assignment under a given distribution of labels
for random variables, including true and false positive rates of labeling.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Proof system</title>
      <p>Judgements are defined through rules for the associated logical operator.</p>
      <p>The construction rules for probability distributions of the label set mimic those of TPTND
[15], see 1. Contexts are generated by judgements of the form ⊢  ∶   and ⊢   ∶   , expressing
assignments of probabilities to labeled variables for given terms. An empty set {} represents
the base case (rule ”base”) to define distributions. Random variables with given theoretical
probability values assigned to labels can be added to a distribution as long as they respect the
standard additivity requirement on probabilities (”extend”). Labels are intended as exclusive
(a single label for a datapoint) and categorical (at least a label for each datapoint). To denote
an unknown distribution (”unknown”) we use the (transfinite) construction assigning to all
values of interest the interval of all possible probability values, provided the additivity condition
of ”extend” is preserved. Such a construction expresses formally the probability distribution
underlying an opaque classifier.</p>
      <p>{} ∶∶ distribution</p>
      <p>base
Γ ∶∶ distribution
 ∶   ∉ Γ</p>
      <p>∑  ∶   ∈ Γ ≤ 1
Γ,  ∶   ∶∶ distribution</p>
      <p>extend
 ∶∶ 
…</p>
      <p>
        ∶∶ 
{ ∶  [
        <xref ref-type="bibr" rid="ref1">0,1</xref>
        ], … ,  ∶  [
        <xref ref-type="bibr" rid="ref1">0,1</xref>
        ]} ∶∶ distribution
unknown

 , under a distribution of labels which assigns true positive rate  for  to be  and false positive
rate  for  to be  . This rule states the observed behavior of a classifier under a transparent
confusion matrix for the available label set. In this version of our calculus, we do not include
the joint probability of distinct labels under independent distributions.  + allows to express
that a single datapoint is assigned exclusively one of two (or more) labels included in the same
distribution; correspondingly +
      </p>
      <p>extracts the unique label from such a disjoint set, when the
other options have non-positive probability (⊥). Finally, → expresses the dependency of label
assignment from feature satisfaction: if the label  for term  depends from the label  being
correctly assigned to  with a certain probability  , then  has label ( → )
with probability  ;
from this, provided  is labeled  , then term  is labeled  . Consider, for example, the probability
of being assigned a label</p>
      <p>from a classifier depending on the probability that the feature
be greater than some positive value (and in a bad classifier that probability might be
The rules can now be generalised for the expected assignment of labels for a given population
of interest, see Figure 3. Again, these rules are adapted from their counterparts in TPTND.
The ”expected labeling” rule axiomatically declares that under a transparent distribution of
labels which assigns true positive rate  for  to be  and false positive rate  for  to be  , in
a population of  datapoints  the label  is expected to be assigned with probability  (the
expected probability being denoted by the tilde). Note that this rule sets an expected assignment
given a known probability distribution, which can be taken to be the one on which a classifier
is trained: such expected probabilistic classification can be designed to embed specific fairness
constraints, like demographic parity or equal opportunity. In the following, we will use this
expected behavior as a constraint to check the fairness of a given unknown classifier. The
”sampling” rule says that the frequency value  of label  appears for  datapoints  , where  is
the number of cases in which  is actually assigned in those  cases by means of  occurrences
of the ”label assignment” rule. As a general case, the distribution under which  is labeled
can be taken to be unknown, hence we consider the simple observation of a classifier at work
without knowing its inner structure. This will turn out to be crucial in our bias checking
rule. The ”update” rule serves the purpose of considering multiple classifications to render the
change in frequency of any given label with diferent sets of members of a given population of
interest. Rule I+ introduces disjunction of possible labels: intuitively, if under a distribution Γ a
population of  elements  has assigned label  with an expected probability  ̃, and label  is
assigned with expected probability  ̃ , then the expected probability of label  or label  is  +̃  ̃ .
By E+ (respectively E+ ): if under a distribution Γ a population of  datapoints  is assigned
label  or label  with expected probability  ,̃ and the former (respectively, the latter) label has
probability  ̃ (respectively,  ̃ ), then in that population the latter (respectively, the former) label
will be assigned with probability  −̃  ̃ (respectively  −̃  ̃ ). This formally expresses categoricity
of labeling. The rule I→ now says that: if label  is assigned with expected probability  ̃ in a
population of size  provided in that population label  is correctly assigned with probability
 , then we express with such dependency with a term []  which is assigned label ( → )
with probability  ̃ provided  . The corresponding elimination E→ allows to verify the expected
probability of label  : it considers a term []  labeled ( → ) with expected probability  ̃
depending on the probability  of label  to be assigned to a given element  (e.g. a feature),
and when such labeling occurs with probability  ̃, it computes the expected probability  ⋅̃  ̃ for
 datapoints in the population to be labeled  . Notice that this rule might be made invalid if
constrained in a manner that the expected frequency of  must be higher than a given threshold,
in order for the probability of  to be assigned be positive.</p>
    </sec>
    <sec id="sec-4">
      <title>4. Checking for Bias</title>
      <p>We now provide a set of introduction and elimination rules for the  operator ranging over a
label assignment. Let us consider a binary label set  = {, } , a variable set  ∶ { 1, … ,   } and
a data set  ∶= {, ,  , … , } . Consider now the expression</p>
      <p>⊢  ̃ ∶  
saying that  is the wrong label for data point  with probability  . This expresses therefore
 ∗ ∶   ,  ̃ ∶   ⊢   ∶  ̃
Γ ⊢  1 ∶  1</p>
      <p>…
Γ ⊢   ∶</p>
      <p>expected labeling
Γ ⊢   ∶   sampling
Γ ⊢   ∶  
Γ ⊢  + ∶   ∗(/(+))+
Γ ⊢   ∶   ′
′∗(/(+))</p>
      <p>update
Γ ⊢   ∶  ̃ Γ ⊢   ∶   ̃ I+</p>
      <p>Γ ⊢   ∶ ( + ) ̃+̃
Γ ⊢   ∶ ( + ) ̃ Γ ⊢   ∶  ̃ E+</p>
      <p>Γ ⊢   ∶  ̃− ̃
Γ ⊢   ∶ ( + )</p>
      <p>̃ Γ ⊢   ∶   ̃ E+
Γ ⊢   ∶  ̃−̃
Γ,  ∗ ∶   ⊢   ∶   ̃
Γ ⊢ [ ∗]  ∶ ( → ) []  ̃</p>
      <p>I→
Γ ⊢ [ ∗]  ∶ ( → ) []  ̃</p>
      <p>∶   ⊢   ∶  ̃
Γ ⊢   .[  ∶ ] ∶  ⋅̃</p>
      <p>E→
where  =∣ { 1, … ,   ∣   = } ∣
the False Positive Rate for  having label  . And the expression</p>
      <p>⊢  ∗ ∶  
saying that  is the correct label for data point  with probability  . This expresses therefore
the True Positive Rate for any datapoint  with label  . Here  = 1 −  . Now let us assume we
have a probability distribution for  in which such information is available over a population,
i.e.</p>
      <p>Γ,  ∗ ∶   ,  ∶̃   ∶∶ distribution</p>
      <p>Assume the function ℎ implemented by a classifier using this distribution is a desirable one,
meaning that the expected classification by an application of the rule ”expected labeling” over
the classes denoted by the available labels will behave following this distribution, which might
reflect a fairness constraint we require on a protected group. For completeness, we denote the
desirable Γ satisfying our fairness constraints as
Γ ∶∶  
_
Δ ⊢  ∶ 
  (Δ ∣∣ Γ) =  &gt; 
Δ ⊢ (
 ∶   )</p>
      <p>
        )
Γ,  ∗ ∶  [
        <xref ref-type="bibr" rid="ref1">0,1</xref>
        ]−[−(),+()]
⊢   ∶
      </p>
      <p>EB
Consider now the observation for a given data point  assigning label  with frequency  :
which we might consider biased if the divergence of  under Δ is significant from what is
dictated by the constraint on the labels encoded in Γ:</p>
      <sec id="sec-4-1">
        <title>Definition 2 (Bias labeling).</title>
        <p>The assignment of label  shown with frequency  for a population
of  datapoints  under the distribution generated by the possibly opaque training set Δ is biased if
the divergence of  from the probability  generated by the correct label assigned by a transparent
distribution Γ is greater than a given threshold considered appropriate for the labeling task at hand.</p>
        <p>We now use the Kullback-Leibler divergence, which notoriously is a measure of how the
probability distribution Δ is diferent from</p>
        <p>Γ, or the expected excess surprise from using Γ as a
model when the actual distribution is Δ. Fixing a constraint that defines the fairness of Γ, and
for each term  and label  common to Δ, Γ, we obtain:
 KL(Δ ∥ Γ) =∑(Δ ⊢  ∶   )log (
∈Δ</p>
        <p>Δ ⊢  ∶  
Γ,  ∗ ∶   ∶∶  
_ 
)</p>
        <p>Using this measure, we want to check whether its value surpasses some reference threshold 
which is considered appropriate for the classification task. This means considering suficiently
low values of  for sensitive classifications, e.g. gender or other socially and culturally relevant
properties. Definition</p>
        <p>
          2 is then expressed by Rule IB in Figure 4. In this rule  is the sum of the
divergence between Δ and Γ computed for each label  common to both. On the other hand, in
the presence of a population of  datapoints where the frequency  of label  under distribution
Δ is considered biased, we infer a family of probability values in the interval [
          <xref ref-type="bibr" rid="ref1">0, 1</xref>
          ] excluding the
interval [ − (),  + ()]
for output  ∈ Γ
        </p>
        <p>within which   ∶   can be correctly sampled from
Δ. Note that this rule will not allow to derive a fair value for label  , but rather determine the
conditions of the distribution under which the current frequency can be considered admissible.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Example</title>
      <p>a population of interest in which 1/3 of the population is male and 2/3 is female.
Consider context Γ = {  ∶  1/3,   ∶  2/3}, representing the reference distribution of gender in</p>
      <p>Assume now a classifier trained on some data that distributes according to Δ, which is
inaccessible, but which we know is – or can assume to be – similar to the one described by Γ.
The classifier on Δ when used displays the following behavior:
that is, out of 140 datapoints, 60 are labelled as  and 80 are labelled as  . For readability we
write Γ( ) = 1/3 , Γ( ) = 2/3 , Δ( ) = 60/140 = 3/7 and Δ( ) = 80/140 = 4/7 . We want to
check whether our classifier using Δ is fair with respect to the constraints set by Γ. We calculate
the Kullback–Leibler divergence as follows:
  (Δ ∥ Γ) = ∑
∈{ ,}</p>
      <p>Δ()
Δ() ⋅ log2 ( Γ() )
= Δ( ) ⋅ log2 ( ΔΓ(()) ) + Δ( ) ⋅ log2 ( ΔΓ(()) )
Recall that Γ( ) = 1/3 and Γ( ) = 2/3 , and consider that since we are dealing with a binary
classification task, Δ( ) = 1 − Δ( ) . Then we may rewrite Equation (2) as
  (Δ ∥ Γ) = (1 − Δ( )) ⋅log2 (</p>
      <p>
        ) + Δ( ) ⋅ log2 (3 ⋅ Δ( ) )
3(1 − Δ( ))
2
and if we let Δ( ) vary in the interval [
        <xref ref-type="bibr" rid="ref1">0, 1</xref>
        ], we can plot   (Δ ∥ Γ)as in Figure 5, which
provides a more concrete illustration of which values are considered to be biased in our domain.
      </p>
    </sec>
    <sec id="sec-6">
      <title>6. Conclusion</title>
      <p>In this paper we have introduced TPTND-KL, a natural deduction system where formulas
express the expected labeling of data, based on probability distributions of labels over training
sets. We have shown how to model proof-theoretically a checking procedure to measure the
)
Γ
∥
Δ
(

females, and for a classifier that classifies a fraction Δ()
of the population as male. If we set a
threshold  = 0.2 (depicted as the red line in the figure) we get that Δ() should belong to the interval
[0.11, 0.59] in order to be considered unbiased (see the thick black line on the  axis). Note that, for the
particular case where Δ() = 3/7 as in our previous example, the bias is in fact acceptable (see the teal
dot on the  axis).
distance that a given opaque classifier displays from the constraints set on the expected labeling
by a transparent probabilistic distribution.</p>
      <p>Our research leaves room for further developments. To start with, the example used
throughout this paper is somewhat simplistic. Instead, a more precise characterization of the types of
biases and of fairness metrics that we can model within our language is required: for instance,
one can take advantage of the use of the implication to build dependent variables to model more
complex examples where multi-attribute biases and intersectionality occur. It is worth stressing
here that currently our framework allows us to deal with fairness-of-outcome, as we can only
compare the output distribution (i.e., the outcome of the classification process) with a reference
distribution that we assume to be fair. It remains to be explored whether other definition of
fairness can be modelled without substantially modify TPTND-KL. Other problems that we
would like to tackle in a future version of this work are: adapting the framework to regression
tasks, as at the moment it only works for classification, since the set of types is tacitly assumed
to be finite; and investigate how to set the reference thresholds can be defined or extracted
in less arbitrary ways. This latter point will require studying how the framework behaves
with real data, and to this end an implementation of this work is very much needed. Another
future step is the formulation of meta-threotical results, like progress and termination on the
syntactic transformations determined by our proof systems, to prove under which conditions
a given measure of bias will never exceed a certain threshold; finally, the integration of the
bias checking procedure developed for our system with other non-automatic parameters and
information quality criteria would be desirable.</p>
    </sec>
    <sec id="sec-7">
      <title>Acknowledgments</title>
      <p>This research has been partially funded by the Project PRIN2020 BRIO - Bias, Risk and Opacity in
AI (2020SSKZ7R) awarded by the Italian Ministry of University and Research (MUR). The author
Fabio Aurelio D’Asaro acknowledges the funding and support of PON “Ricerca e Innovazione”
2014-2020 (PON R&amp;I FSE-REACT EU), Azione IV.6 “Contratti di ricerca su tematiche Green” in
the context of the project titled “Il miglioramento dell’algorithmic fairness in ambito assicurativo:
Un’analisi concettuale a partire da uno studio di caso”.
2016, Nara, Japan, September 18-22, 2016, ACM, 2016, pp. 33–46. URL: https://doi.org/10.
1145/2951913.2951942. doi:1 0 . 1 1 4 5 / 2 9 5 1 9 1 3 . 2 9 5 1 9 4 2 .
[9] P. H. A. de Amorim, D. Kozen, R. Mardare, P. Panangaden, M. Roberts, Universal semantics
for the stochastic  -calculus, in: 36th Annual ACM/IEEE Symposium on Logic in Computer
Science, LICS 2021, Rome, Italy, June 29 - July 2, 2021, IEEE, 2021, pp. 1–12. URL: https:
//doi.org/10.1109/LICS52264.2021.9470747. doi:1 0 . 1 1 0 9 / L I C S 5 2 2 6 4 . 2 0 2 1 . 9 4 7 0 7 4 7 .
[10] M. Antonelli, U. D. Lago, P. Pistone, Curry and howard meet borel, in: C. Baier, D. Fisman
(Eds.), LICS ’22: 37th Annual ACM/IEEE Symposium on Logic in Computer Science, Haifa,
Israel, August 2 - 5, 2022, ACM, 2022, pp. 45:1–45:13. URL: https://doi.org/10.1145/3531130.
3533361. doi:1 0 . 1 1 4 5 / 3 5 3 1 1 3 0 . 3 5 3 3 3 6 1 .
[11] F. Dahlqvist, D. Kozen, Semantics of higher-order probabilistic programs with conditioning,
Proc. ACM Program. Lang. 4 (2020) 57:1–57:29. URL: https://doi.org/10.1145/3371125.
doi:1 0 . 1 1 4 5 / 3 3 7 1 1 2 5 .
[12] A. D. Pierro, A type theory for probabilistic  -calculus, in: From Lambda Calculus to</p>
      <p>Cybersecurity Through Program Analysis, 2020.
[13] M. Boričić, Sequent calculus for classical logic probabilized, Archive for Mathematical</p>
      <p>Logic 58 (2019) 119–136.
[14] S. Ghilezan, J. Ivetić, S. Kašterović, Z. Ognjanović, N. Savić, Probabilistic reasoning about
simply typed lambda terms, in: International Symposium on Logical Foundations of
Computer Science, Springer, 2018, pp. 170–189.
[15] F. A. D’Asaro, G. Primiero, Probabilistic typed natural deduction for trustworthy
computations, in: D. Wang, R. Falcone, J. Zhang (Eds.), Proceedings of the 22nd International
Workshop on Trust in Agent Societies (TRUST 2021) Co-located with the 20th International
Conferences on Autonomous Agents and Multiagent Systems (AAMAS 2021), London,
UK, May 3-7, 2021, volume 3022 of CEUR Workshop Proceedings, CEUR-WS.org, 2021. URL:
http://ceur-ws.org/Vol-3022/paper3.pdf.
[16] F. A. D’Asaro, G. Primiero, Checking trustworthiness of probabilistic computations in
a typed natural deduction system, CoRR abs/2206.12934 (2022). URL: https://doi.org/10.
48550/arXiv.2206.12934. doi:1 0 . 4 8 5 5 0 / a r X i v . 2 2 0 6 . 1 2 9 3 4 . a r X i v : 2 2 0 6 . 1 2 9 3 4 .
[17] B. Friedman, H. Nissenbaum, Bias in computer systems, ACM Trans. Inf. Syst. 14 (1996)
330–347.
[18] J. Buolamwini, T. Gebru, Gender shades: Intersectional accuracy disparities in commercial
gender classification, in: Conference on Fairness, Accountability and Transparency, FAT
2018, 23-24 February 2018, New York, NY, USA, volume 81, PMLR, 2018, pp. 77–91.
[19] N. Mehrabi, F. Morstatter, N. Saxena, K. Lerman, A. Galstyan, A survey on bias and
fairness in machine learning, ACM Comput. Surv. 54 (2021) 115:1–115:35. URL: https:
//doi.org/10.1145/3457607. doi:1 0 . 1 1 4 5 / 3 4 5 7 6 0 7 .
[20] C. G. Northcutt, A. Athalye, J. Mueller, Pervasive label errors in test sets destabilize machine
learning benchmarks, 2021. Preprint at https://arxiv.org/pdf/2103.14749.pdf.
[21] J. Dastin, Amazon scraps secret ai recruiting tool that showed bias against women, ????
[22] H. Jiang, O. Nachum, Identifying and correcting label bias in machine learning, CoRR
abs/1901.04966 (2019). URL: http://arxiv.org/abs/1901.04966. a r X i v : 1 9 0 1 . 0 4 9 6 6 .</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>S. A.</given-names>
            <surname>Seshia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Sadigh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. S.</given-names>
            <surname>Sastry</surname>
          </string-name>
          ,
          <source>Toward verified artificial intelligence</source>
          ,
          <source>Commun. ACM</source>
          <volume>65</volume>
          (
          <year>2022</year>
          )
          <fpage>46</fpage>
          -
          <lpage>55</lpage>
          . URL: https://doi.org/10.1145/3503914.
          <source>doi:1 0 . 1 1</source>
          <volume>4 5 / 3 5 0 3 9 1 4 .</volume>
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>R.</given-names>
            <surname>Alur</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T. A.</given-names>
            <surname>Henzinger</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.-H.</given-names>
            <surname>Ho</surname>
          </string-name>
          ,
          <article-title>Automatic symbolic verification of embedded systems</article-title>
          ,
          <source>IEEE Trans. Softw. Eng</source>
          .
          <volume>22</volume>
          (
          <year>1996</year>
          )
          <fpage>181</fpage>
          -
          <lpage>201</lpage>
          . URL: https://doi.org/10.1109/32.489079.
          <source>doi:1 0 . 1 1 0 9 / 3 2 . 4 8</source>
          <volume>9 0 7 9 .</volume>
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>M. Z.</given-names>
            <surname>Kwiatkowska</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Norman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Parker</surname>
          </string-name>
          , Prism:
          <article-title>Probabilistic symbolic model checker</article-title>
          ,
          <source>in: Proceedings of the 12th International Conference on Computer Performance Evaluation, Modelling Techniques and Tools</source>
          ,
          <source>TOOLS '02</source>
          , Springer-Verlag, Berlin, Heidelberg,
          <year>2002</year>
          , pp.
          <fpage>200</fpage>
          -
          <lpage>204</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>A.</given-names>
            <surname>Termine</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Primiero</surname>
          </string-name>
          ,
          <string-name>
            <surname>F. A.</surname>
          </string-name>
          <article-title>D'Asaro, Modelling accuracy and trustworthiness of explaining agents</article-title>
          , in: S. Ghosh, T. Icard (Eds.), Logic, Rationality, and Interaction - 8th
          <source>International Workshop</source>
          , LORI 2021,
          <article-title>Xi'ian</article-title>
          , China,
          <source>October 16-18</source>
          ,
          <year>2021</year>
          , Proceedings, volume
          <volume>13039</volume>
          of Lecture Notes in Computer Science, Springer,
          <year>2021</year>
          , pp.
          <fpage>232</fpage>
          -
          <lpage>245</lpage>
          . URL: https://doi.org/10.1007/978-3-
          <fpage>030</fpage>
          -88708-7_
          <fpage>19</fpage>
          .
          <source>doi:1 0 . 1 0</source>
          <volume>0 7 / 9 7 8 - 3 - 0 3 0 - 8 8 7 0 8 - 7</volume>
          \ _ 1
          <fpage>9</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>Q.</given-names>
            <surname>Gao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Hajinezhad</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Kantaros</surname>
          </string-name>
          ,
          <string-name>
            <surname>M. M. Zavlanos</surname>
          </string-name>
          ,
          <article-title>Reduced variance deep reinforcement learning with temporal logic specifications</article-title>
          ,
          <source>in: Proceedings of the 10th ACM/IEEE International Conference on Cyber-Physical Systems, ICCPS '19</source>
          ,
          <string-name>
            <surname>Association</surname>
          </string-name>
          for Computing Machinery, New York, NY, USA,
          <year>2019</year>
          , pp.
          <fpage>237</fpage>
          -
          <lpage>248</lpage>
          . URL: https://doi.org/10. 1145/3302509.3311053.
          <source>doi:1 0 . 1 1</source>
          <volume>4 5 / 3 3 0 2 5 0 9 . 3 3 1 1 0 5 3 .</volume>
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>A.</given-names>
            <surname>Termine</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Antonucci</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Primiero</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Facchini</surname>
          </string-name>
          ,
          <article-title>Logic and model checking by imprecise probabilistic interpreted systems</article-title>
          , in: A.
          <string-name>
            <surname>Rosenfeld</surname>
          </string-name>
          , N. Talmon (Eds.),
          <source>Multi-Agent Systems - 18th European Conference, EUMAS</source>
          <year>2021</year>
          ,
          <string-name>
            <given-names>Virtual</given-names>
            <surname>Event</surname>
          </string-name>
          , June 28-29,
          <year>2021</year>
          , Revised Selected Papers, volume
          <volume>12802</volume>
          of Lecture Notes in Computer Science, Springer,
          <year>2021</year>
          , pp.
          <fpage>211</fpage>
          -
          <lpage>227</lpage>
          . URL: https://doi.org/10.1007/978-3-
          <fpage>030</fpage>
          -82254-5_
          <fpage>13</fpage>
          .
          <source>doi:1 0 . 1 0</source>
          <volume>0 7 / 9 7 8 - 3 - 0 3 0 - 8 2 2 5 4 - 5</volume>
          \ _ 1
          <fpage>3</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>G.</given-names>
            <surname>Bacci</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Furber</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Kozen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Mardare</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Panangaden</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. S.</given-names>
            <surname>Scott</surname>
          </string-name>
          ,
          <article-title>Boolean-valued semantics for the stochastic  -calculus</article-title>
          , in: A.
          <string-name>
            <surname>Dawar</surname>
          </string-name>
          , E. Grädel (Eds.),
          <source>Proceedings of the 33rd Annual ACM/IEEE Symposium on Logic in Computer Science, LICS 2018</source>
          , Oxford, UK,
          <source>July 09-12</source>
          ,
          <year>2018</year>
          , ACM,
          <year>2018</year>
          , pp.
          <fpage>669</fpage>
          -
          <lpage>678</lpage>
          . URL: https://doi.org/10.1145/3209108.3209175.
          <source>doi:1 0 . 1 1</source>
          <volume>4 5 / 3 2 0 9 1 0 8 . 3 2 0 9 1 7 5 .</volume>
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>J.</given-names>
            <surname>Borgström</surname>
          </string-name>
          , U. D.
          <string-name>
            <surname>Lago</surname>
            ,
            <given-names>A. D.</given-names>
          </string-name>
          <string-name>
            <surname>Gordon</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Szymczak</surname>
          </string-name>
          ,
          <article-title>A lambda-calculus foundation for universal probabilistic programming</article-title>
          , in: J.
          <string-name>
            <surname>Garrigue</surname>
          </string-name>
          , G. Keller, E. Sumii (Eds.),
          <source>Proceedings of the 21st ACM SIGPLAN International Conference on Functional Programming</source>
          , ICFP
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>