<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>geometric framework for fairness</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Alessandro Maggio</string-name>
          <email>alessandro.maggio6@unibo.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Luca Giuliani</string-name>
          <email>luca.giuliani13@unibo.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Roberta Calegari</string-name>
          <email>roberta.calegari@unibo.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Michele Lombardi</string-name>
          <email>michele.lombardi2@unibo.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Michela Milano</string-name>
          <email>michela.milano@unibo.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Alma Mater Studiorum-Università di Bologna</institution>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2023</year>
      </pub-date>
      <abstract>
        <p>Fairness has emerged as a critical concern in the field of machine learning impacting its application in various domains. While there have been successful attempts to tackle fairness, many existing analyses rely on sophisticated mathematical methods that may lack intuitive understanding. Drawing inspiration from successful applications in other areas of machine learning, in this study, we propose a GEOmetric Framework for Fairness - GEOFFair - that represents distributions, ML models, fairness constraints, and hypothesis spaces as vectors and sets. The geometric framework aims to provide a more intuitive and rigorous understanding of fairness in Artificial Intelligence (AI). It enables visualizing mitigation techniques as movements in the vector space and aids in constructing proofs-by-witness by quickly identifying examples or counter-examples. Furthermore, the geometric framework ofers a platform for studying various fairness properties, including geometrical distances between fairness vectors, relative fairness comparisons, and the exploration of symmetries, invariances, and trade-ofs between fairness metrics.</p>
      </abstract>
      <kwd-group>
        <kwd>AI fairness</kwd>
        <kwd>geometric framework</kwd>
        <kwd>GEOFFair</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>benefit of geometric frameworks is that they enable visualization, allowing us to gain insights
into the data or the model operation.</p>
      <p>In terms of motivation, our approach draws from successful attempts in other areas of
ML [4, 5], where mapping models to points in vector spaces (e.g. by concatenating their
parameters) have led to simplifications in their representation and analysis. Through this
lens, distance metrics, projections, similarities, and algorithms can be applied to gain insights
into the models. Papers like [6] demonstrate the efectiveness of vector representations in
natural language processing tasks, where embracing the vector space model allows for a deeper
understanding and comparison of machine learning models. Similarly, we believe our framework
can enable a better understanding and visualization of fairness issues and facilitate the study of
their properties. For example, using this approach it is possible to map mitigation techniques
for addressing fairness concerns as movements in the vector space or assist the construction of
proofs-by-witness by quickly finding examples or counter-examples.</p>
      <p>Furthermore, the geometric framework provides a platform for studying various properties
related to fairness. We can investigate the geometrical distances between fairness vectors, which
may provide insights into the relative fairness of diferent models or interventions. Additionally,
the geometric space allows for exploring symmetries, invariances, or trade-ofs between diferent
fairness metrics, contributing to a deeper understanding of the fairness landscape.</p>
      <p>Accordingly, the paper is organized as follows. Section 2 introduces the formal framework
of GEOFFair, discussing its mathematical foundation and exploring the potential relationships
among its elements. It establishes the groundwork for understanding the subsequent sections.
Then in Section 3 the paper interprets fairness mitigation techniques within the context of
the proposed GEOFFair framework. It examines how these techniques can be applied and
understood through the lens of GEOFFair, providing insights and analysis. Conclusion and
future works are discussed in Section 4.</p>
    </sec>
    <sec id="sec-2">
      <title>2. GEOFFair: a GEOmetric Framework for Fairness</title>
      <p>The following section aims to introduce a formal framework. To achieve this goal, we will focus
on two key points. Firstly, we will explore the vector representation of the main mathematical
objects of the framework defining the core elements and the main existing properties
(Subsection 2.1, Subsection 2.2). This vector representation allows us to formalize fairness concepts and
measures in a clear and precise mathematical language. Secondly, we will discuss how these
vector representations exist within the same space, providing a common basis for comparing
and contrasting diferent fairness measures (Subsection 2.3).</p>
      <sec id="sec-2-1">
        <title>2.1. From Distribution to Vectors in the Space</title>
        <p>Classical formulations for both ML models and fairness metrics typically rely on probability
theory and statistics: for example, the ground truth is viewed as a probability distribution, the
ML model as a parameterized function, the training loss as a likelihood measure, and fairness
metrics as functions over conditional expectations. The first challenge in the definition of our
framework is therefore mapping such concepts into a vector representation, with no significant
loss of generality.
and joint distribution  ( ,  )
  ,   ∼  ( ,  )</p>
        <p>and  → ∞ .</p>
        <sec id="sec-2-1-1">
          <title>Probability Distributions and Functions</title>
          <p>We focus on a supervised learning setting, and
we start by defining a representation for (joint) probability distributions, which we approximate
to arbitrary precision via an infinite sample . Formally:
Notation 1 (Probability Distributions). Let  , 
be random variables with support in  and 
. Then we encode the distribution as a vector (,  ) = {  ,   }=1 , with
a vector  (,  ) = { (
all the samples.</p>
          <p>Intuitively,  represents an observable that may serve as the input for an ML model, while 
represents the quantity (or class) to be estimated. We make no assumption about the support of
the random variables, i.e. the range of their possible values. The same representation can be
applied for the individual distributions of  and  , which are therefore denoted as  and  . Our
approach makes it particularly easy to represent functions over random variables (e.g. Machine
Learning models evaluated over their input). Formally:
Notation 2 (Functions). A deterministic function  over  and  can then be naturally viewed as

 ,   )}=1 with  → ∞ , i.e. just the vector with the function evaluation over</p>
          <p>Functions that depend only on  or only on  are sub-cases of the above definitions and are
respectively denoted as  ()</p>
          <p>and  ( ) .</p>
          <p>There are a few observations worth making. First, while we use the term “vector” for
simplicity, our definitions are closer to functions that map an index  to an object such as  
or   . In other words,  ,  ,  () , etc. can be thought of as points in a Hilbert space. Second,
our representations are not exact, but they will be suficient to approximate key statistical
properties with arbitrarily high probability. Exact representations for distributions exist and
are well known, e.g. the Probability Mass Function or Probability Density Function; however,
they do not enable constructing a simple 1-1 mapping between components in the vector (e.g.
  ) and function evaluations (e.g.  (  )), which is instead trivial with our approach.</p>
        </sec>
        <sec id="sec-2-1-2">
          <title>Equivalence of Expectation Predicates</title>
          <p>Many of the existing fairness metrics are expressed
in terms of (conditional) expectations, i.e. averages, or can be reduced in such a form. For
example, assuming  is a binary protected attribute, the DIDI metric from [7] is defined in
terms of the discrepancy between the global average outcome and the average outcome for
each protected group, i.e. |[ ∣  = 0] − [ ]| + |[ ∣  = 1] − [ ]|
. Statistical parity in
classification, which advocates for similar probabilities of a positive outcome across all groups,
can be defined as |[ ∣  = 0] − [ ∣  = 1]|
, and so on. Intuitively, this means that many
fairness constraints can be viewed as predicates over (conditional) expectations.</p>
          <p>The sample expectation function, represented by (⋅) , tends to converge towards the true
expectation [⋅]</p>
          <p>as the sample size grows: we use this result to establish a form of equivalence
between predicates expressed over a distribution and those expressed over a sample.</p>
          <p>({  }=1 , {  }=1 ) be its sample counterpart. Then we have that:
Theorem 1. Let Π( ,  )
be a predicate over (conditional) expectations for  and  and let
 (Π( ,  ) ⇔
→∞</p>
          <p>
            lim  ({  }=1 , {  }=1 )) = 1
(
            <xref ref-type="bibr" rid="ref1">1</xref>
            )
i.e. the two predicates are equivalent almost surely as the sample size grows if the involved
expectations are finite.
          </p>
          <p>Proof. The two predicates are identical except for the use of the true and sample expectations.
For the sake of simplicity and without loss of generality, let us assume the involved expectations
distribution, due to the strong law of large numbers we have that:
are respectively [ ]</p>
          <p>and  ({  }=1 ). Since the samples are drawn independently from the same
→∞</p>
          <p>
            ( lim  ({  }=1 ) = [ ] ) = 1
(
            <xref ref-type="bibr" rid="ref2">2</xref>
            )
Equivalence of the sample and true expectations then implies equivalence of Π and  .
          </p>
          <p>Notation 1 and Notation 2 give us the ability to transition from the conventional distribution
paradigm of ML to the realm of vector spaces. Theorem 1 enables reasoning over the vector
representation and translates almost certainly any result to the original distribution, at least as
far as fairness metrics are concerned. Together, these tools allow us to leverage the power and
interpretability of vector space representations in the context of fairness metrics, expanding the
scope of analysis and decision-making.</p>
        </sec>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. The Formal Model</title>
        <p>As mentioned in Section 2.1, we focus on a supervised learning setting where the goal is to
learn a model that maps inputs (always observable) to outputs (observable at training time and
to be estimated at inference time). In this context, we introduce four key mathematical objects
that play a major role in the analysis of fairness issues in AI.</p>
        <p>We represent the input distribution by means of an input vector  ∈   , with  → ∞ , according
to Notation 1. Concerning the output, we make a distinction between the distribution that can
actually be observed and the one that we ideally wish to estimate. We start by introducing the
following concept:
Definition 1 (Ground Vector). The ground vector  + ∈   represents data that can be observed
and used as ground truth to learn machine learning models. It is paired with the input vector  .</p>
        <p>As inspired by [3], we model the fact that the ground truth might be subject to systemic
social biases, but with a key diference. That is, we directly define an “unbiased” output vector
rather than an unbiased input matrix, as our framework allows us to reason in terms of vector
components within the output space.</p>
        <p>Definition 2 (Gold Vector). The gold vector  ∗ ∈   represents the “unbiased” data that
corresponds to the output distribution before it is corrupted by social biases; accordingly, we can derive
the ground vector  + by considering the application of a biased mapping over the gold vector, i.e.:
 + = ( ∗), where  ∶   →   is called the biased mapping</p>
        <p>Note that in practical applications, the gold vector is typically unobservable and therefore
not accessible at training time. Still, explicitly modeling the unbiased distributions allows us to
study in deeper detail the interplay between bias and fairness constraints.</p>
        <p>
          In our framework, an ML model can be viewed as a function that maps input to output data.
In supervised learning, the training process is typically viewed as that of selecting one model
out of a pool of candidates, so as to minimize a loss metric. Formally, training amounts to
solving in an exact or approximate fashion:
arg min ℒ ( (),  +)
 ∈ℱ
(
          <xref ref-type="bibr" rid="ref3">3</xref>
          )
where  is the ML model, ℒ is the chosen loss metric and ℱ represents the set of possible models,
usually defined by specifying an architecture (e.g. a number and size of layers in a feed-forward
neural network, number of estimators and maximum depth in a random forest).
        </p>
        <p>In our framework, however, the input vector  is by construction fixed, thus making the
model output the only relevant factor. In other words, two models are equivalent as long as
they have the same output. This observation allows us to introduce a simplified representation
of the classical notion of hypothesis space.</p>
        <p>Definition 3 (Hypothesis Space). The hypothesis space  ̂ is the set of possible outputs for the
chosen class of ML models, i.e.</p>
        <p>=̂ { ∈</p>
        <p>∣ ∃ ∈ ℱ ∶  () =  }</p>
        <p>Intuitively, the hypothesis space can be viewed as the set of possible model outputs for the
considered sample. A linear regression model will have a limited hypothesis space due to its
ability to represent linear relationships only, while more complex models such as random forests
and neural networks will have a much larger hypothesis space.</p>
        <p>Finally, as we are considering a fairness scenario, we need to model a final mathematical
object in order to guarantee a proper analysis of the phenomenon, namely the region in the
output space that is considered fair.</p>
        <p>be the set containing all the output vectors that are aligned
Definition 4 (Fair Space). Let  ⊆ 
with the fairness requirements.</p>
        <p>We make no assumption on the mathematical definition of the fair space. Nonetheless, it
is worth noting that in many practical cases, this set is defined by means of a threshold  on a
fairness metric  , i.e.  = { ∈   | ( ) ≤ } .</p>
        <p>Once all the elements are defined, we can examine how they interact with each other. In
the most general setup, we can not make any assumption about the relationships between  ∗,
 +,  ̂ , and  . Without specific contextual information on data, models, and constraints, the
relationships between these entities can vary significantly.</p>
        <p>It is worth mentioning that, in the defined framework, all vectors and sets we introduced
exist in the same space, which facilitates easy visualization (see Figures in Section 3). This
visual representation can assist with proof-by-witness, allowing us to analyze and demonstrate
relationships between these vectors more efectively.</p>
      </sec>
      <sec id="sec-2-3">
        <title>2.3. Relationships Between Elements</title>
        <p>Let us now explore the four elements we discussed earlier. A summary of their potential
relationships is presented in Table 1. Firstly, thanks to the problem’s symmetry, we can acquire
and  ∩̂  ≠ ∅, , ̂  .</p>
        <p>Possible one-to-one relationships. When comparing the two sets, we use ∩ ̸and ∩ as aliases for  ∩̂  = ∅

 ̂
 +
 ∗
 ̂</p>
        <p>∩,̸ ⊆, ⊇, ∩
 +
∋, ∌
∋, ∌
 ∗
∋, ∌
∋, ∌
≡, ≢
(i.e. ∀ and ∄).
 ∩̂  = ∅
 ⊆̂ 
 ⊇̂ 
 ∩̂  ≠ ∅
all potential relationships by examining the Cartesian product of the six entries found in the
upper triangular section of the table. This yields a total of 128 configurations, calculated as
4 × 25. Furthermore, we can decrease the configuration space by making a few straightforward
observations.</p>
        <p>•  ∗ ≡  + can hold uniquely when the two vectors belong to the same sets;
• If  ∗ ∈  ̂ and  ∗ ∈  hold simultaneously then  ∩̂  ≠ ∅ , and the same applies to  +;
•  ∗ ∈  ̂ and  ∗ ∉  are incompatible if  ⊂̂  , and the same applies to  +;
•  ∗ ∉  ̂ and  ∗ ∈  are incompatible if  ⊃̂  , and the same applies to  +.</p>
        <p>By taking these logical constraints into account, we identified 56 distinct legal combinations,
whose listing is provided in Appendix A. While the number of possible combinations is not
small, it is nevertheless finite, which can help with proofs of universally quantified statements</p>
        <p>Now, let us examine each one-to-one relationship between these elements. With respect to
the Hypothesis and Fair Space, the possible outcomes are as follows:</p>
        <p>This scenario implies that it is not possible to learn a model that satisfies the fairness
criteria. Although this is a rare occurrence, as most fairness metrics evaluate to zero on
constant vectors (which can generally be represented by any machine learning model), it
might still happen in certain situations. For example, this could be due to an excessively
strict threshold imposed on the fairness constraint.</p>
        <p>In this case, the machine learning model is said to be fair-by-design [8]. While achieving
this is challenging in many practical cases, it can be attained by incorporating explicit
rules into the model, ensuring that certain deontological fair principles are always upheld.</p>
        <p>Here, the machine learning models can cover all existing fair outputs. This can be the
case when employing powerful models like large neural networks.</p>
        <p>This is the most common scenario encountered in practice. In this case, the goal
of learning a fair model is to find an appropriate parameter configuration such that the
output vector  belongs to the non-trivial intersection between the two sets,  ̂ and  .</p>
        <p>Similarly, when considering the relationships among vectors and sets, the following
considerations can be made:
1. if  + and  ∗ coincide, it implies that the mapping  :   →   introduces no bias.</p>
        <p>However, this is an extremely rare scenario that leads to trivial solutions for any fairness
task. In most real-world cases, the two vectors are not aligned, indicating a discrepancy
between the information conveyed by  + and  ∗. This misalignment suggests that the
ground vector has been pushed away from the unbiased distribution to some extent;
2. if  + ∈  ̂ , it can be perfectly represented by the machine learning model, although this
representation is not guaranteed to be fair unless  + is already in the Fair Space (as
mentioned in Point 4 below). Conversely, when the model lacks the capacity to represent
 + adequately, it will be trained to minimize the loss ℒ between the labels and the model
outputs;
3. the same considerations as in Point 2 apply to the relationship between  ∗ and  ̂ . The
only diference is that, in this case, the analysis is purely theoretical since no model can
be trained on  ∗, which is not observable in real-world scenarios;
4. if  + ∈  , it means that the ground vector aligns with the fairness criteria. This alignment
can be due to various reasons, such as a weakly biased mapping that does not significantly
deviate the ground vector from the unbiased distribution, or the fairness criteria being
too permissive and allowing for a higher degree of fairness violation. Conversely, when
 + is outside the Fair Space, learning a fair model becomes more challenging as it must
explicitly account for the fairness constraints. This is the most common scenario in
real-world settings;
5. Similar to Point 4, the gold vector can belong to the Fair Space or not. In most practical
use cases, it does belong, indicating that the chosen fairness metric  aligns with the
unbiased distribution and its threshold is well-tuned. However, if a misaligned metric is
chosen or a too restrictive threshold is set, it is possible for  ∗ ∉  .</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Fairness Mitigation Through the Lens of GEOFFair</title>
      <p>In this section, we will utilize the GEOFFair framework to analyze fairness mitigation techniques.
In a previous work by Dutta et al. [3], it was demonstrated that maximizing accuracy solely based
on the observed labels vector may not always be the optimal choice. They employed statistical
distributions and mathematical tools from probability theory to establish this result. Rather
than extending their findings, our objective is to employ our proposed geometric framework to
support and validate them. By leveraging the GEOFFair framework, we aim to present similar
conclusions in a more accessible and interpretable way and can bridge the gap between complex
mathematical concepts and practical implications. This allows for a clearer comprehension of
the challenges associated with fairness and the potential solutions that can be pursued.</p>
      <sec id="sec-3-1">
        <title>3.1. Mitigation as Projection</title>
        <p>Mitigation, in the AI fairness context, refers to the process of reducing unfairness by either
transforming the biased distribution or by ensuring that the ML model behaviour is compatible
with the fairness constraints. From a geometric point of view, such techniques can be viewed
as projecting either the ground vector or the ML output onto the Fair Space. Analogously,
training an ML model can be viewed as the problem of finding a vector in the Hypothesis
Space that is closest to the ground vector in terms of the loss function, i.e. as projecting the
ground vector onto the Hypothesis Space. Therefore, in the context of GEOFFair, projections
provide a convenient lens through which we can study mitigation at pre-processing, training,
and post-processing time in a uniform fashion.</p>
        <p>We will focus our analysis on the more widespread case where learning a fair ML model is
possible (i.e.  ∩̂  ≠ ∅ ). We start by introducing two additional vectors, i.e. the projections of
the ground truths and the gold standard vector, respectively. These projections will be onto the
intersection space between the Hypothesis and the Fair Space.</p>
        <p>Definition 5 (Ground and Gold Fair Projections). The optimal fair predictions  and  obtained
from the ground ( +) and gold ( ∗) vectors, i.e.:


 = arg min{ℒ ( ,  +) ∣  ∈  ∩̂ }
 = arg min{ℒ ( ,  ∗) ∣  ∈  ∩̂ }</p>
        <p>Intuitively,  represents the outcome of training an ML model under fairness constraints, or
equivalently of training an ML model over a ground distribution transformed so as to enforce
the fairness restrictions. The  vector represents the best fair model that we could learn for the
(typically unobservable) “unbiased” distribution.</p>
        <p>It is worth noting that  and  might not be unique, as equally accurate outputs that are both
fair and representable by the model can exist. Furthermore, for the purpose of our theoretical
analysis, we will assume that  and  are obtained from exact and globally optimal algorithms.
However, it is important to acknowledge that many machine learning models, especially larger
ones, do not guarantee this optimality property in practice. Additionally, to avoid trivial cases,
we assume that the biased mapping function  :   →   applies a modification to the input
vector, i.e. that  ∗ ≢  +. This assumption narrows down our analysis to even fewer cases than
those defined in Subsection 2.3, and let us draw the following conclusion:</p>
        <p>
          ℒ ( +,  ∗) &gt; 0
where ℒ is any non-negative loss function such that ℒ ( +,  ∗) = 0 if  + ≡  ∗.
Basic Properties of Fair Projections Let us consider the optimization problems defined
in Equations (
          <xref ref-type="bibr" rid="ref4">4</xref>
          )-(
          <xref ref-type="bibr" rid="ref5">5</xref>
          ) and examine the behaviour of  and  in terms of fairness based on the
position of  + and  ∗, respectively. We will rely on the formulation of the Fair Space based on
a fairness metric  (⋅) that we introduced in Section 2, i.e.:
 = { ∈ 
 ∣  ( ) ≤ }
(
          <xref ref-type="bibr" rid="ref4">4</xref>
          )
(
          <xref ref-type="bibr" rid="ref5">5</xref>
          )
(
          <xref ref-type="bibr" rid="ref6">6</xref>
          )
(
          <xref ref-type="bibr" rid="ref7">7</xref>
          )
        </p>
        <p>Before establishing a fundamental property of fair projections, let us introduce some notation
to describe the concept of Fair Space Frontier. It can be described as:
Notation 3 (Fair Frontier).</p>
        <p>= { ∈</p>
        <p>∣  ( ) = }</p>
        <p>The Fair Space Frontier represents the boundary of the Fair Space, namely the region of the
space containing all the vectors exhibiting a threshold-level fairness. Likewise, we can introduce
the concept of Internal Fair Set, which encompasses the vectors within the Fair Space but not on
the Fair Frontier:
Notation 4 (Internal Fair Set).</p>
        <p>
          Property 1 (Fair Projections). Given a vector  and its projection  ′ onto the Fair Space as defined
in Equation (
          <xref ref-type="bibr" rid="ref7">7</xref>
          ), we know that:
Δ =  ⧵   = { ∈
        </p>
        <p>∣  ( ) &lt; }
 ∈  ⟹  ′ ≡</p>
        <p>
          ⟹  (
 ∉  ⟹  ′ ∈   ⟹  (
′) =  ( )
′) = 
(
          <xref ref-type="bibr" rid="ref8">8</xref>
          )
(
          <xref ref-type="bibr" rid="ref9">9</xref>
          )
(10)
(11)
(12)
(13)
        </p>
        <p>It is important to note that when the Fair Space and the Hypothesis Space have a non-trivial
intersection – i.e. neither space is a subset of the other –, we cannot draw conclusions about
 ( ′) since points in the boundary of the intersection can exhibit diferent fairness levels.</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Possible Cases Configuration</title>
        <p>Based on the properties and assumptions discussed in the previous subsection, we can now
outline five distinct cases that summarize the diferent combinations arising from the positions
of the input vectors ( + and  ∗) and their projections ( and  ). Each case description is
accompanied by illustrative figures, where blue stripes represent the region where the ground
projection  can fall, and green stripes indicate the region where the gold projection  can
fall; additionally, the figures depict two possible ground and gold vectors, along with their
projections. On a final note, we underline that these cases are mutually exclusive, meaning that
the conditions of each subsequent case implicitly exclude the conditions of the previous ones.
meaning that any vector lying within the Fair Space will be projected onto itself (thus exhibiting
the same fairness level); conversely, if the vector is outside the Fair Space, its projection will be on
the boundary of the Fair Space, resulting in threshold-level fairness.</p>
        <p>This is a well-known property in both convex and non-convex optimization, whose proof can
be found in [9]. Now, if we take into account the capabilities of the ML model, we can extend
Property 1 as follows:
Property 2 (Representable Fair Projections). Given a vector  and its projection  ′ onto the
intersection between the Fair and Hypothesis Space, we know that:
 ∈  ∨  ⊆̂  ⟹  (
 ∉  ∧  ⊇̂  ⟹  (
′) ≤ 
′) =</p>
        <p>
          Case 1:  +,  ∗ ∈ ∩̂  . This is a trivial scenario in which both vectors are already representable
and satisfy the fair condition. As shown in Figure 1, in this case, the projections coincide with
the original vectors, therefore we know that:
ℒ (, ) = ℒ (
+,  ∗) &gt; 0
(14)
where the strict inequality follows from having assumed a non-trivial biased mapping, i.e.
from Equation (
          <xref ref-type="bibr" rid="ref6">6</xref>
          ). The existence of a gap even in this simple scenario shows that maximizing
accuracy based on the biased data (under fairness constraints) may not always yield the best
solution. Furthermore, we have that:
 () ⋚  ()
(15)
In other words, there is also no guarantee that the gold projection is fairer than the ground
projection. Such a variety of possible outcomes is due to the fact that this case captures situations
where fairness constraints are not particularly restrictive, so applying mitigation techniques is
not really meaningful.
        </p>
        <p>Case 2:  +,  ∗ ∈ Δ ∨  ⊆̂  . We can identify two sub-cases within this scenario. In the first
sub-case, both the gold vector and the ground vector satisfy the fairness criteria, even if at
least one of them does not belong to the Hypothesis Space – this distinction is necessary to
avoid falling back into Case 1. In the second one, the ML model is fair-by-design, meaning that
any possible output is guaranteed to be within the Fair Space. Intuitively, the former sub-case
reflects another situation where fairness constraints are not particularly restrictive; in the latter,
fairness issues have already been addressed by acting on the model architecture. Although these
two sub-cases might look very diferent, the resulting projections exhibit the same behaviour.
In fact, similar to Case 1,  and  may be arbitrarily close or far, depending on the positions of
the two original vectors, and no mutual information on  () and  () can be obtained. This
observation stresses the potential impact of the chosen class of models (the Hypothesis Space)
on the outcome of mitigation approaches.</p>
        <p>Case 3:  + ∉ Δ ∧  ∗ ∈ Δ . In this case, the ground target  + is either outside or at the
frontier of  , meaning that there is a non-null gap in terms of the fairness metric between itself
(a)  +,  ∗ ∈ Δ .
(b)  ⊆̂  .
and the gold vector (which is in the Fair Space).</p>
        <p>Again, we can identify two main sub-cases, depending on the position of the Hypotheses
Space with respect to the Fair Space. Since we are assuming that the intersection between the
two sets is not empty, and the scenario in which  ̂ is a subset of  has been already covered in
Case 2, the possible outcomes are: (a)  ⊆  ̂ , or (b)  ⊈  ̂ . In the former (Figure 3a), mitigation
has a beneficial efect in terms of fairness, but a guaranteed gap will remain wrt the best possible
fair project, since Property 2 guarantees that  () =  &gt;  () . In this situation, as long as the
fairness metric is suficiently aligned with the unbiased distribution (i.e.  ∗ ∈  ), better results
can be obtained by simply making the fairness constraints more restrictive.</p>
        <p>On the contrary, in the latter sub-case (Figure 3b), nothing can be said on the fairness level
of  as Property 2 does not cover the case of non-trivial intersection. This suggests that using
suficiently expressive model classes (e.g. larger neural networks) in mitigation approaches may
lead to more consistent outcomes, at least as long as overfitting is successfully prevented.</p>
        <p>As a final consideration, we underline that Case 3 is the most common real-world scenario,
as it assumes that a non-trivial, strong enough biased mapping is applied to  ∗, and that the
fairness metric is both aligned and correctly calibrated on it.</p>
        <p>Case 4:  + ∈ Δ ∧  ∗ ∉ Δ . Contrarily to the previous case, here the positions of the ground
and gold vectors are swapped. This is a very unlikely scenario, which is implied by the adoption
of a wrong fairness metric that is aligned with the biased data but not with the unbiased one.
As for Case 3, we can distinguish among two diferent sub-cases. When  ⊆  ̂ (Figure 4a), we
are guaranteed that there could be a better solution with respect to the gold vectors, although
in this case this solution would exhibit a higher degree of unfairness due to Property 2. On the
contrary (Figure 4b), nothing can be proven, but the same considerations of Case 3 remain.</p>
        <p>(a)  ⊄  ̂ .
(b)  ⊂  ̂ .</p>
        <p>Case 5:  +,  ∗ ∉ Δ . Finally, we consider the case in which the fairness metric is either
misaligned or wrongly calibrated for both vectors. The usual sub-cases (Figure 5a:  ⊆  ̂ ,
and Figure 5b:  ⊈  ̂ ) can be identified, with similar conclusions as for Case 3 and Case 4. In
fact, in the former one, we are guaranteed that the level of fairness of  and  will coincide
independently from the position of the original vectors since they will both fall on the Fair
Space Frontier. On the contrary, no property can be proven for the latter sub-case.</p>
      </sec>
      <sec id="sec-3-3">
        <title>3.3. Fairness Threshold Analysis</title>
        <p>The threshold  plays a significant role in distinguishing diferent cases among those mentioned
earlier. For example, by analyzing the efect of decreasing  in Case 1 we observe that when  is
more aligned to  ∗ than  +, it results in Case 3; conversely, if it is more aligned with  + than
 ∗, it may result in Case 4. Continuously lowering  eventually leads to Case 5, and while we
can intuitively observe that decreasing the threshold results in closer projections  and  , we
yet have no formal proof of the relationship between  and the loss ℒ (,  ∗).</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Conclusion</title>
      <p>This study introduces GEOFFair – a GEOmetric Framework for Fairness – which leverages
geometrical concepts to provide a rigorous and intuitive understanding of fairness in AI. By
representing distributions, ML models, fairness constraints, and hypothesis spaces as vectors
and sets, GEOFFair allows for visualizing mitigation techniques and constructing
proofs-bywitness. The framework facilitates the exploration of various fairness properties, including
geometrical distances between fairness vectors, relative fairness comparisons, and the analysis
of symmetries, invariances, and trade-ofs between fairness metrics.</p>
      <p>Through the lens of GEOFFair, we conducted a theoretical analysis of mitigation techniques,
leading to the identification of five distinct cases that are essential for analyzing diferent fairness
scenarios. These cases provide valuable insights into the relationship between input vectors,
their projections, and the fairness level achieved.</p>
      <p>Future work will focus on applying GEOFFair to analyze well-known fairness problems.
Geometrical reasoning and projection might also prove very efective for understanding how
properties of the loss function and fairness metrics (e.g. convexity, triangle inequality) afect
the efectiveness of mitigation techniques.</p>
      <p>Finally, exploring the generation of biased data to assess the fairness of AI applications
through the lens of GEOFFair will be an important avenue for future research. Overall, the
adoption of the GEOFFair framework holds promise for advancing the understanding and
development of fair AI systems.</p>
    </sec>
    <sec id="sec-5">
      <title>Acknowledgments</title>
      <p>The work has been partially supported by the AEQUITAS project funded by the European
Union’s Horizon Europe Programme (Grant Agreement No. 101070363), by the EU ICT-48 2020
project TAILOR (No. 952215) and by PNRR - M4C2 - Investimento 1.3, Partenariato Esteso
PE00000013 - ”FAIR - Future Artificial Intelligence Research” - Spoke 8 ”Pervasive AI”, funded
by the European Commission under the NextGeneration EU programme.</p>
    </sec>
    <sec id="sec-6">
      <title>A. Possible Scenarios</title>
      <p>Let ℤ be  ⧵ (  ∪̂ ) .</p>
      <p>Scenario  ⊂  ̂
Cases in which ∗ ∈  :
13)  ∗,  + ∈  and  ∗ ≡  +.
14)  ∗,  + ∈  and  ∗ ≢  +.
15)  ∗ ∈  and  + ∈  ⧵̂  .
16)  ∗ ∈  and  + ∈ ℤ.</p>
      <p>Cases in which ∗ ∉  :
17)  ∗,  + ∈  ⧵̂  and  ∗ ≡  +.
18)  ∗,  + ∈  ⧵̂  and  ∗ ≢  +.</p>
      <p>Scenario  ⊂̂ 
Cases in which ∗ ∈  :
25)  ∗,  + ∈  ̂ and  ∗ ≡  +.
26)  ∗,  + ∈  ̂ and  ∗ ≢  +.
27)  ∗ ∈  ̂ and  + ∈  ⧵  ̂ .
28)  ∗ ∈  ̂ and  + ∈ ℤ.
29)  ∗,  + ∈  ⧵  ̂ and  ∗ ≡  +.
30)  ∗,  + ∈  ⧵  ̂ and  ∗ ≢  +.
31)  ∗ ∈  ⧵  ̂ and  + ∈  ̂ .
32)  ∗ ∈  ⧵  ̂ and  + ∈ ℤ.
Cases in which ∗ ∉  :
33)  ∗,  + ∈ ℤ and  ∗ ≡  +.
34)  ∗,  + ∈ ℤ and  ∗ ≢  +.
35)  ∗ ∈ ℤ and  + ∈  ⧵  ̂ .
36)  ∗ ∈ ℤ and  + ∈  ̂ .</p>
    </sec>
    <sec id="sec-7">
      <title>Scenario  ∩̂  ≠ ∅ and not a previous case</title>
      <p>Cases in which ∗ ∈  :
37)  ∗,  + ∈  ⧵  ̂ and  ∗ ≡  +.
38)  ∗,  + ∈  ⧵  ̂ and  ∗ ≢  +.
39)  ∗ ∈  ⧵  ̂ and  + ∈  ∩  ̂ .</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>N.</given-names>
            <surname>Mehrabi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Morstatter</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Saxena</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Lerman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Galstyan</surname>
          </string-name>
          ,
          <article-title>A survey on bias and fairness in machine learning</article-title>
          ,
          <source>ACM Computing Surveys (CSUR) 54</source>
          (
          <year>2021</year>
          )
          <fpage>1</fpage>
          -
          <lpage>35</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>M.</given-names>
            <surname>Srivastava</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Heidari</surname>
          </string-name>
          ,
          <string-name>
            <surname>A</surname>
          </string-name>
          . Krause,
          <article-title>Mathematical notions vs. human perception of fairness: A descriptive approach to fairness for machine learning</article-title>
          ,
          <source>in: Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery &amp; data mining</source>
          ,
          <year>2019</year>
          , pp.
          <fpage>2459</fpage>
          -
          <lpage>2468</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>S.</given-names>
            <surname>Dutta</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Wei</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Yueksel</surname>
          </string-name>
          , P.-Y. Chen, S. Liu,
          <string-name>
            <given-names>K.</given-names>
            <surname>Varshney</surname>
          </string-name>
          ,
          <article-title>Is there a trade-of between fairness and accuracy? a perspective using mismatched hypothesis testing</article-title>
          ,
          <source>in: International Conference on Machine Learning, PMLR</source>
          ,
          <year>2020</year>
          , pp.
          <fpage>2803</fpage>
          -
          <lpage>2813</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>I.</given-names>
            <surname>Kansizoglou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Bampis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Gasteratos</surname>
          </string-name>
          ,
          <article-title>Deep feature space: A geometrical perspective</article-title>
          ,
          <source>IEEE Transactions on Pattern Analysis and Machine Intelligence</source>
          <volume>44</volume>
          (
          <year>2021</year>
          )
          <fpage>6823</fpage>
          -
          <lpage>6838</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <surname>M. M. Bronstein</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Bruna</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          <string-name>
            <surname>LeCun</surname>
            , A. Szlam,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Vandergheynst</surname>
          </string-name>
          ,
          <article-title>Geometric deep learning: going beyond euclidean data</article-title>
          ,
          <source>IEEE Signal Processing Magazine</source>
          <volume>34</volume>
          (
          <year>2017</year>
          )
          <fpage>18</fpage>
          -
          <lpage>42</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>O.</given-names>
            <surname>Shahmirzadi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Lugowski</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Younge</surname>
          </string-name>
          ,
          <article-title>Text similarity in vector space models: a comparative study</article-title>
          ,
          <source>in: 2019 18th IEEE international conference on machine learning and applications (ICMLA)</source>
          , IEEE,
          <year>2019</year>
          , pp.
          <fpage>659</fpage>
          -
          <lpage>666</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>S.</given-names>
            <surname>Aghaei</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. J.</given-names>
            <surname>Azizi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Vayanos</surname>
          </string-name>
          ,
          <article-title>Learning optimal and fair decision trees for nondiscriminative decision-making</article-title>
          ,
          <source>in: The Thirty-Third AAAI Conference on Artificial Intelligence</source>
          ,
          <source>AAAI</source>
          <year>2019</year>
          ,
          <source>The Thirty-First Innovative Applications of Artificial Intelligence Conference</source>
          ,
          <string-name>
            <surname>IAAI</surname>
          </string-name>
          <year>2019</year>
          ,
          <source>The Ninth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI</source>
          <year>2019</year>
          , Honolulu, Hawaii, USA, January 27 - February 1,
          <year>2019</year>
          , AAAI Press,
          <year>2019</year>
          , pp.
          <fpage>1418</fpage>
          -
          <lpage>1426</lpage>
          . URL: https://doi.org/10.1609/aaai.
          <source>v33i01.33011418. doi:1 0 . 1 6 0</source>
          <article-title>9 / a a a i</article-title>
          .
          <source>v 3 3 i 0 1 . 3 3</source>
          <volume>0 1 1 4 1 8 .</volume>
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>V.</given-names>
            <surname>Nurock</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Chatila</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.-H. Parizeau</surname>
          </string-name>
          ,
          <article-title>What does “ethical by design” mean?</article-title>
          ,
          <source>Reflections on Artificial Intelligence for Humanity</source>
          (
          <year>2021</year>
          )
          <fpage>171</fpage>
          -
          <lpage>190</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>P.</given-names>
            <surname>Jain</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Kar</surname>
          </string-name>
          , Non-convex
          <source>Optimization for Machine Learning</source>
          ,
          <year>2017</year>
          .
          <source>doi:1 0 . 1 5</source>
          <volume>6 1 / 9 7 8 1 6 8 0 8 3 3 6 9 0 .</volume>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>