<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>FAiRDAS: Fairness-Aware Ranking as Dynamic Abstract System</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Eleonora Misino</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Roberta Calegari</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Michele Lombardi</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Michela Milano</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Alma Mater Studiorum-Università di Bologna</institution>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>dynamic system as a solution to design and ensure long-term fairness in ranking systems. This approach provides valuable insights into system behaviour, metric interactions, and overall dynamics. By considering the ranking system as a dynamic system, we can model the evolution and interaction of fairness metrics over time. Our proposed approach enables the analysis of system properties, trade-ofs, and tensions that arise when optimizing multiple fairness metrics. To validate its efectiveness, we apply this approach to real-world use case scenarios, demonstrating its practical applicability.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;fair ranking</kwd>
        <kwd>fair matchmaking</kwd>
        <kwd>fairness in AI</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>In recent years, artificial intelligence (AI) has gained significant prominence in the area of online
matchmaking and ranking systems [1], which play a key role in diverse applications such as
Airbnb or dating platforms like Tinder and Bumble1 [2]. The objective of these systems is to
pair individuals or resources based on their respective characteristics, skills, or preferences,
expressed by a user query. These AI models can efectively learn from historical data, user
preferences, and other relevant features to generate personalized recommendations and
optimize the matching of individuals or items. While these systems strive to provide relevant
and personalized recommendations, long-term fairness has emerged as a crucial aspect to
consider. Long-term fairness refers to the equitable treatment of diferent groups of users over
extended periods, aiming to avoid systemic biases or disadvantages [3, 4]. Long-term fairness
appears to be an important feature to consider and enforce when we have multiple queries to a
recommendation system or a ranking system. Given a sensitive attribute, while the single query
does not exhibit any polarization, multiple repeated queries can be biased toward a specific
value of the sensitive attribute.</p>
      <p>The state of the art in addressing long-term fairness in online ranking systems involves the
development of fairness-aware algorithms that explicitly incorporate fairness constraints into
the ranking process [5, 6]. These algorithms leverage fairness metrics and mathematical models
to ensure equitable treatment across user groups.</p>
      <p>Fairness metrics in ranking models encompass diverse dimensions like demographic parity,
equalized odds, or equal opportunity [7]. Each metric targets a specific fairness aspect, ofering
distinct insights. Considering and integrating multiple fairness metrics is common practice to
attain a comprehensive understanding of fairness in rankings [8]. Thus, designing a theoretical
framework that accommodates multiple metrics, possibly defined through parameters, becomes
crucial.</p>
      <p>Concerning algorithms, by formulating fairness constraints or objectives, the models ensure
that the algorithm not only maximizes relevance or accuracy but also adheres to the desired
fairness principles. These models often involve techniques from optimization, such as
constrained optimization or multi-objective optimization, to balance fairness considerations with
other objectives. One common approach is to introduce fairness constraints that enforce equal
treatment or equal opportunity across diferent groups [ 4]. These constraints aim to ensure
that the ranking algorithm does not favour or discriminate against specific demographic groups
based on protected attributes such as gender, race, or age. By incorporating these constraints,
the algorithm is encouraged to produce rankings that are fair and unbiased. Regularization
techniques, such as fairness regularization, can also be employed to balance fairness
considerations with other objectives [9], such as relevance or accuracy. Fairness regularization involves
adding penalty terms to the optimization objective that explicitly encourage fair behaviour. The
regularization terms act as a form of control, nudging the algorithm towards producing fair
rankings while still optimizing for other desired properties.</p>
      <p>Considering the factors mentioned above and given the complexities associated with studying
the evolution and convergence of ranking systems, particularly when dealing with complex
algorithms in existing state-of-the-art approaches, this paper proposes a novel solution. The
main contribution of this paper is the formalization of long-term fairness as an abstract
dynamic system, to model the evolution and interaction of fairness metrics over time.</p>
      <p>This formalization enables a systematic analysis of system properties like stability,
equilibrium, and convergence. It also facilitates the identification of critical points or attractors.
Furthermore, it allows for the examination of how system changes, such as algorithm updates or
user preference shifts, influence long-term fairness. Additionally, this formalization enables the
exploration of trade-ofs and tensions when optimizing multiple fairness metrics concurrently.
It provides insights into how improvements in one metric may impact others and reveal the
complex relationships between them. Through this approach, a deeper understanding of the
dynamics between contrasting fairness metrics can be achieved, helping to identify potential
synergies or conflicts among them.</p>
      <p>It is important to note that the abstract dynamic system for long-term fairness is intentionally
left abstract to accommodate customization based on specific case characteristics. Once the
dynamic system parameters are selected, the abstract dynamic system can be tailored to the
specific case under investigation. Experiments can then be conducted to observe system
behaviour and evaluate the efectiveness of diferent approaches in achieving fairness goals.
This formalization allows for better control and fine-tuning of each component of the system.</p>
      <p>The paper is organized as follows. Section 2 introduces FAiRDAS – Fairness-Aware Ranking as
Dynamic Abstract System – and presents its mathematical foundation. The dynamic framework
is explained in detail, along with the process of grounding it. Next, an empirical evaluation
is conducted in 3, grounding the framework in the motivating example described above. The
instantiation of the framework is evaluated, demonstrating the efectiveness of the proposed
approach. Finally, in 4, concluding remarks are provided, along with a discussion on future
works.</p>
    </sec>
    <sec id="sec-2">
      <title>2. FAiRDAS</title>
      <p>In this section, we present the problem formulation for the framework (Subsection 2.1). The
main objective is to develop a comprehensive understanding of the problem at hand and establish
a clear foundation for our proposed solution. Then, the FAiRDAS abstract dynamic system is
introduced in Subsection 2.2. The main idea behind the work revolves around conceptualising
long-term fairness as a dynamic system, allowing us to define the ideal behaviour and potentially
conduct system property analysis. By mapping fairness to this abstract level, we establish a
framework for understanding the desired dynamics of the system. After the definition of the
dynamic system, we transition from the abstract level to the practical level by implementing
concrete actions within the real system (Subsection 2.3). These actions are specifically designed
to approximate the ideal behaviour defined at the abstract level. By taking tangible steps within
the real system, we aim to align its behaviour as closely as possible with the envisioned fairness
dynamics.</p>
      <sec id="sec-2-1">
        <title>2.1. Problem formulation: ranking problem definition</title>
        <p>We approach the problem of ranking a set of  resources ℛ in response to a sequence of
incoming queries ∞=1. Our focus lies on the ranking algorithm, which can be manipulated
by adjusting a vector of parameters  , referred to as the action vector (e.g. penalizing an
over-exposed resource). Formally, we can represent the ranking algorithm as a function:
where  and   are the query and the set of actions performed at time , respectively;  (· , · ) is
the ranking function performed on ℛ, and it is based on the given query and set of actions.

 :  × Θ → {}=1
 = ( (,  )),

where  is the set of possible queries, Θ is the set of possible action vectors, and {}=1 is the
resources rank for query .</p>
        <p>
          At each time step  a vector of  metrics  ∈ R is defined as:
(
          <xref ref-type="bibr" rid="ref1">1</xref>
          )
(
          <xref ref-type="bibr" rid="ref2">2</xref>
          )
        </p>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. The FAiRDAS Approach</title>
        <p>
          Evolution of Metrics as a Dynamic System Equation (
          <xref ref-type="bibr" rid="ref2">2</xref>
          ) defines how to evaluate metrics
across all resources, but it has two significant limitations. First, since it considers a single query
at a time, the equation requires an aggregation mechanism in order to be used for measuring
fairness across multiple queries. Second, since our eventual goal is enforcing fairness, we are
interested in assessing how much the current action vector   afects all possible incoming
queries.
        </p>
        <p>We can address both issues by taking the expectation of the metrics over the query distribution
 at time . This allows us to capture the average behaviour across all queries and account
for the variations in query characteristics and preferences, as well as the dependency of such
elements on time. Hence, we formalize the metric value as its expectation over the query
distribution:</p>
        <p>
          ¯ = E∼  [( (,  ))]
Note that  is not directly observable, but it can be approximated, for example by checking
the last  queries (Equation (
          <xref ref-type="bibr" rid="ref8">8</xref>
          )).
        </p>
        <p>
          In Equation (
          <xref ref-type="bibr" rid="ref3">3</xref>
          ), the set of actions   is performed to approach the ideal behaviour, and depends
on the expected behaviour of the queries from the previous step:
        </p>
        <p>+1 = (¯),
where the function (· ) represents the selection mechanism of the set of actions. At each time
step, we have direct and explicit control over the selection of the actions, so that we can actively
influence the ranking algorithm to adapt and improve its fairness performance over time.</p>
        <p>
          By focusing just on the evolution of the metrics, we combine Equation (
          <xref ref-type="bibr" rid="ref3">3</xref>
          ) and Equation (
          <xref ref-type="bibr" rid="ref11 ref4">4</xref>
          )
to obtain:
¯+1 = (¯),
(
          <xref ref-type="bibr" rid="ref5">5</xref>
          )
where (· ) represents the evolution function depending on the selection mechanism .
Defining an Ideal System Behavior The ultimate objective is to evolve the system in such
a way that it ensures the pre-defined metrics remain below a vector of user-defined thresholds
 ∈ R.
        </p>
        <p>The metrics of interest are application dependant and may include fairness indicators as well
as other metrics. For example, the application domain may require fairness metrics as well as
performance metrics to guarantee both system’s fairness and predictive performance within an
acceptable range.</p>
        <p>
          By setting a threshold for each metric, the user establishes boundaries that the system should
not surpass. For fairness metrics, the threshold represents the maximum limit beyond which
the system would be considered unfair. On the other hand, for accuracy metrics, the threshold
represents the maximum acceptable level of error in the prediction.
(
          <xref ref-type="bibr" rid="ref3">3</xref>
          )
(
          <xref ref-type="bibr" rid="ref11 ref4">4</xref>
          )
        </p>
        <p>It is worth mentioning that metrics can be contrasting or even conflicting – like in the
case of fairness and accuracy. In such case cases, it may not be possible to satisfy all of the
thresholds simultaneously. For example, improving fairness metrics might result in a decrease in
accuracy, or vice versa. In such scenarios, it becomes essential to strike a balance and determine
a satisfactory compromise that aligns with the desired objectives and priorities.</p>
        <p>Finding this trade-of involves carefully considering the relative importance of each metric
and making informed decisions based on the context and requirements of the system. The
most immediate outcome of this process is the definition of the thresholds, but in some cases it
might be usefule to rescale diferent metrics to reflect their relative importance. At this stage,
we model the whole process in a abstract fashion, by treating the thresholds as parameters and
by viewing any rescaling operation as part of the definition of the metrics themselves.</p>
        <p>
          Accordingly, to these considerations, a disarable discrete dynamical system that ensures the
user objectives can be described as follows:
¯+1 =  min(0, ¯ −  ) + ¯,
(
          <xref ref-type="bibr" rid="ref7">7</xref>
          )
        </p>
        <p>
          Formally, we want to approximate the evolution of the system in Equation (
          <xref ref-type="bibr" rid="ref6">6</xref>
          ) by finding
the set of actions   to perform at time step  in order to minimize the distance to the ideal
behaviour; thus, we have to solve:
where  is a  ×  diagonal and positive definite matrix, and due to result from the theory of
discrete, linear, dynamic systems  represents the upper bound for the set of fixed points in the
system. By defining  and  , we establish a framework that guides the evolution of the system
towards achieving fairness objectives.
        </p>
        <p>
          Based on the choice of , the expected behaviour of the system defined in Equation (
          <xref ref-type="bibr" rid="ref6">6</xref>
          ) can
be schematized via the following three situations: (a) with eigenvalues close to zero, the system
converges slowly, making softer and less drastic choices; (b) with eigenvalues close to 1, the
behaviour is highly rapid, indicating more aggressive choices; (c) with eigenvalues greater than
1, the theoretical dynamic system oscillates. However, since the minimum limits the oscillations,
a stable behaviour is anticipated. We report the theoretical results in Appendix 4.1.
Approximating the Ideal Behavior The behaviour of the system, i.e., the metrics evolution
over time, described in Equation (
          <xref ref-type="bibr" rid="ref5">5</xref>
          ) may undergo changes over time due to two main factors:
i) the drift in the query distribution (change or shift in the characteristics, preferences, or
distribution of the incoming queries over time), and ii) the set of actions taken to enforce
fairness. While the first factor is beyond our control this is not true for the second factor. We
can leverage this control to shape the system’s evolution and approximate the desired behaviour
outlined in Equation (
          <xref ref-type="bibr" rid="ref6">6</xref>
          ), which represents the ideal dynamics of the system.
        </p>
        <p>Let’s define the distance between the approximate and the ideal system evolution as:
arg min ℒ(¯).</p>
        <p>
          (
          <xref ref-type="bibr" rid="ref6">6</xref>
          )
(
          <xref ref-type="bibr" rid="ref8">8</xref>
          )
        </p>
        <p>
          The cost function of the system, represented by Equation (
          <xref ref-type="bibr" rid="ref8">8</xref>
          ), can be tailored to the specific
scenario being addressed.
        </p>
        <p>We can then expand ¯ according to its definition, and we can rely on a sample approximation
for the expectation. For example, we can assume that the distribution drift over time is limited
and use as sample the queries from the last  time steps; when  = 0, our approximation is
not impacted by drift efects, but it is more afected by sampling noise (i.e. we treat a single
query as representative for the full distribution); increasing  allows to adjust this trade-of.</p>
        <p>
          With this strategy, Equation (
          <xref ref-type="bibr" rid="ref8">8</xref>
          ) can be customised as follows2:
arg min
 
{︃
ℒ() +
        </p>
        <p>∑︁

=1</p>
        <p>
          }︃
ℒ(− ) , with −  = ( (− ,  )).
(
          <xref ref-type="bibr" rid="ref9">9</xref>
          )
        </p>
        <p>
          The rationale behind the semantics of Equation (
          <xref ref-type="bibr" rid="ref9">9</xref>
          ) can be summarized with the following
intuition: to limit the impact of drift-efects, we consider only one query per time step, and we
approximate the expectation in Equation (
          <xref ref-type="bibr" rid="ref3">3</xref>
          ) by considering the  previous queries before 
(current query arrived at time ) and treat them as if they arrived at time . Therefore, during
the optimization phase, the cost function is computed by considering the impact of the actions
not only on  but also on the  previous queries (Equation (
          <xref ref-type="bibr" rid="ref9">9</xref>
          )). However, since the previous
queries are passed, their contribution is weighted with a parameter  ∈ [0, 1], to lower their
importance. It is worth mentioning that by changing  we can also control the computational
eficiency of the system.
        </p>
      </sec>
      <sec id="sec-2-3">
        <title>2.3. FAiRDAS grounding</title>
        <p>
          The proposed approach is applicable to a wide range of scenarios where the objective is to
promote fairness in the outcomes of user queries over time. To tailor the abstract framework
FAiRDAS to a specific application all the system parameters should be selected. Specifically, the
abstract system incorporates the following parameters and actions to be grounded:
• Specific metrics of interest, along with the thresholds and rescaling factors, definition . This
involves the selection of metrics relevant to assessing fairness, as well as other metrics of
interest, including those related to performance. Appropriate rescaling factors are applied
to ensure comparability among these metrics. It is essential to include weights in the
metric definition to indicate the importance of each metric within the specific scenario.
• Ideal dynamical system definition . The matrix of the dynamic system and the threshold
vector are defined to represent the desired behaviour of the system. This involves defining
the matrix  and the threshold vector  as depicted in Equation (
          <xref ref-type="bibr" rid="ref6">6</xref>
          ).
• Actions for achieving fairness definition . Potential actions or interventions are identified
to promote fairness within the system.
• Cost function definition . A cost function is defined to quantify the trade-ofs or
penalties associated with diferent outcomes or actions. This is the function that guides the
2The cost function in Equation (
          <xref ref-type="bibr" rid="ref9">9</xref>
          ) represents one possible approximation of the query expectation; for example,
other distance metrics can be defined depending on the application, and the factor  can be time-dependent instead
of constant to weight each past contribution −  diferently.
        </p>
        <p>optimization process, taking into account the approximation for the expectation of the
fairness metrics as outlined in equation 3.
• Optimization method definition . Select the appropriate method to solve the optimization
problem, considering factors such as computational eficiency, accuracy, and scalability.</p>
        <p>By following these steps, we can customize FAiRDAS to a specific application or domain,
allowing for the efective enforcement of fairness or other metrics over time in the outcomes of
user queries. The flexibility and adaptability of our approach enable its application in various
contexts, providing a framework for addressing fairness concerns in query-based systems. The
parameterization of the system in FAiRDAS empowers us to navigate the complexities and
intricacies of rankings while ensuring fairness. It ofers a versatile framework that accommodates
the inherent non-smoothness of rankings, enabling efective and customizable handling of
fairness considerations.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Empirical Evaluation</title>
      <sec id="sec-3-1">
        <title>3.1. Case Study</title>
        <p>In this section, we present a real-world case study 3 that focuses on a web platform providing a
matchmaking service for AI experts. The platform aims to bridge the gap between customers
(such as companies or other entities) seeking AI expertise to solve specific problems. Users
of the platform describe the issues they need to address, along with any desired expertise.
The platform then generates a ranked list of AI experts that best align with the user’s query,
providing them with a tailored response. The ranking process takes into consideration the
relevance and suitability of the experts’ expertise to the user’s specific scenario. This ensures
an optimal match between the user’s requirements and the available AI experts.</p>
        <p>Within this context, fairness is a crucial aspect that is given due consideration. Specifically, the
sensitive attribute of country is acknowledged, prompting the development of fair algorithms to
address this concern. In generating the ranked list of AI experts, our goal is to ensure fairness
with respect to the country attribute. This entails considering the expertise and qualifications of
the AI experts while giving proper weight to their skills in relevant AI subfields. Additionally,
measures are implemented to mitigate any potential biases associated with the country attribute.
These measures ensure that experts from all countries have an equal opportunity to be included
in the ranked list. Formally, let each resource be characterized by two score vectors:  ∈ R
and  ∈ R. The vector  captures the expertise of AI experts across  subfields of AI. Each
component of  represents the skill level of the expert in a specific subfield, and its value is
confined within the range of [− 1, 1]. On the other hand,  represents the country of the AI
expert, encoded in a one-hot format, and serves as a protected attribute within our use case to
guarantee fair nationality distribution. The queries are described by the same vectors of scores
which reflect the user’s requirements. It is important to note that the query includes both AI
subfields ( ) and preferred country () in R+ due to the specific nature of the case at hand.
In this scenario, when selecting AI experts for a particular use case, it might be necessary to
specify the language in which we can communicate with the expert. However, in typical cases,
3StairwAI Project.
the sensitive attribute cannot be directly provided by the user. It is crucial to understand that
this particularity does not impact the mathematical foundation of the model; on the contrary, it
shows its flexibility.</p>
        <p>Dataset Generation For the experimentation phase, synthetic data was generated to facilitate
better control over the level of an imbalance concerning the country attribute, allowing for the
manipulation of bias levels. To achieve this, the synthetic data was designed to represent various
countries in a controlled manner. By adjusting the parameters of the data generation process,
the desired level of imbalance or bias in terms of country representation could be defined.</p>
        <p>We generate three datasets for the experimental valuation, each consisting of 40 resources
and 100 queries. Each data sample is defined by 9 AI subfields and belongs to one of 5 countries.
The score vectors of the AI subfields are sampled from a uniform distribution, while the country
vectors are sampled from a categorical distribution with 1, . . . , 5 where the event probabilities
were normalized. The synthetic datasets difer in the event probabilities of the resources country
distribution to represent three levels of bias. In Figure 1, we show the scores distribution of the
generated data. Figure 1(a) represents the distribution of the query scores, which is the same
for all three datasets. The resources of the first dataset are uniformly distributed among the 5
classes, i.e.,  = 0.2 ∀  (Figure 1(b)); we refer to this dataset as Balanced. In the second dataset,
hereinafter Mild Unbalanced dataset, we introduce a bias towards the attribute Country3 by
setting 3 = 0.4 (Figure 1(c)), and in the third dataset, named Strong Unbalanced, we increase
the bias with 3 = 0.8 (Figure 1(d)).</p>
        <p>Ranking Algorithm Given an incoming query  ∈ R+, the ranking algorithm computes
the cosine similarity between  and each resource vector; the resources are ranked based on the
resulting scores.</p>
        <p>Metrics of Interest In this case study, we are interested in enforcing fairness over the
protected attribute while preserving the ranking accuracy. As fairness metric we use the

Disparate Impact Discrimination Index [10]. Given a sample {, }=1 including values for
a protected attribute  and a continuous target value , the Disparate Impact Discrimination
Index is defined as:
dcos(,  ) = 1 −</p>
        <p>· 
||  || ||  ||
DIDI(, ) = ∑︁ ⃒⃒⃒ ∑︀=1 ( = ) 1 ∑︁ ⃒⃒⃒</p>
        <p>∈ ⃒⃒ ∑︀=1 ( = ) −  =1 ⃒⃒
where  is the domain of  and ( ) is the indicator function for the logical formula  .</p>
        <p>
          To quantify the ranking accuracy we measure the cosine distance dcos between the true rank
vector  resulting from applying the ranking algorithm with no actions vector, and the rank
vector  computed by the ranking algorithm subject to an actions vector  .
(
          <xref ref-type="bibr" rid="ref10">10</xref>
          )
(11)
        </p>
        <p>Resources</p>
        <sec id="sec-3-1-1">
          <title>AILabelA1ILabelA2ILabelA3ILabelA4ILabelA5ILabelA6ILabelA7ILabelA8ILabeCl9ountryC1ountryC2ountryC3ountryC4ountry5</title>
        </sec>
        <sec id="sec-3-1-2">
          <title>AILabelA1ILabelA2ILabelA3ILabelA4ILabelA5ILabelA6ILabelA7ILabelA8ILabeCl9ountryC1ountryC2ountryC3ountryC4ountry5</title>
          <p>Resources</p>
          <p>
            Resources
Ideal Dynamical System We adopt the dynamical system in Equation (
            <xref ref-type="bibr" rid="ref6">6</xref>
            ) with  a 2 × 2
diagonal matrix with eigenvalues equal to 0.54, and we vary the threshold components   in
the interval [0, 1] to force diferent behaviours.
          </p>
          <p>
            Cost Function As cost function, we adopt the 2-norm defined in Equation (
            <xref ref-type="bibr" rid="ref7">7</xref>
            ), and we
approximate the metrics expectation by considering  = 5 previous queries and weighting
their contribution with  = 0.2.
          </p>
          <p>Set of Actions Our set of actions applies directly to the rank returned by the ranking algorithm:
each resource in ℛ can be either put at the bottom of the rank or kept in the current position
based on the value of a binary variable. In particular, we associate a binary variable  to each
resource indicating whether the  − ℎ resource should ( = 1) or should not ( = 0) be placed
at the bottom of the ranking list. The optimizer seeks to determine which resources to place
at the bottom by exploring the space {0, 1}, where  represents the number of resources.
In other words, the algorithm searches for the optimal assignment of the  binary variables
representing the action “place at the bottom”.
4The eigenvalues of 0.5 were selected as they fall within the stable range [0,2] discussed in the previous section.
Further studies will be dedicated to finding the optimal eigenvalues for the specific scenario at hand.
Optimization Method To solve the optimization problem we use a random walk to search in
the hyperspace Ω ⊆ { 0, 1}. Starting with a random generated point the algorithm searches for
a better solution among a set of candidates recursively. The new candidate points are generated
by flipping each component of the current best solution with a decreasing probability .
0
20
40
60
80
100
0
20
40
60
80
100
Queries</p>
          <p>Queries
0.0
(b)</p>
        </sec>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Experiments</title>
        <p>The presented case study was applied to three datasets, and we varied the threshold components
  to enforce diferent behaviours 5. To establish a reference benchmark, we computed the DIDI
and dcos metrics for each query in the three datasets without taking any further action. Figure
2(a) illustrates the resulting curves, while Table 1 summarizes the mean values of the two metrics
across all queries. As expected, the Strong Unbalanced dataset exhibited a higher average DIDI
compared to the Balanced and Mild Unbalanced datasets. Furthermore, the fairness metrics
displayed a more unstable behaviour in the Strong Unbalanced dataset when compared to the
5The source code to generate the datasets and reproduce the experiments is available at https://github.com/EleMisi/
FAiRDAS under MIT license.
other datasets.</p>
        <p>As an initial step, we aim to determine the minimum average DIDI achievable by employing
FAiRDAS. For this purpose, we set the threshold component for DIDI to 0, while keeping the
component for ranking accuracy at 1. By doing so, we allow FAiRDAS to prioritize the reduction
of DIDI without imposing any constraint on the rank accuracy metrics. Notice that the opposite
setting, where we want to maximize the rank accuracy without any requirements on the fairness
metrics, is equivalent to not taking any actions, i.e., it is equivalent to the reference benchmark
(Figure 2(a)). Figure 2(b) visualizes the resulting curves. As anticipated, applying FAiRDAS with
 = {0, 1} leads to an increase in dcos for all datasets, while the DIDI decreases and exhibits a
more stable behaviour.</p>
        <p>In Figure 3, we present a comparison of the impact of FAiRDAS when applying three diferent
thresholds. Specifically, in Figure 3(a), we depict the scenario where we set the thresholds for
DIDI and dcos as 0.5 and 0.2, respectively 6. Since these thresholds exceed the mean values
of DIDI and dcos for the Balanced and Mild Unbalanced datasets, FAiRDAS does not take any
action on incoming queries, and the rank quality remains unafected. Conversely, in the case of
the Strong Unbalanced dataset, the rank quality declines as FAiRDAS must implement measures
to reduce DIDI.</p>
        <p>In Figure 3(b), we maintain a constant threshold for dcos while reducing the threshold for
DIDI to 0.4. This new threshold is now lower than the average value of DIDI achieved by
applying FAiRDAS in the initial experiment on the Strong Unbalanced dataset (Figure 2(b),
red line). As a result, the system reaches an equilibrium above the threshold for the Strong
Unbalanced dataset. Furthermore, FAiRDAS needs to take actions on the incoming queries of
the Balanced and Mild Unbalanced datasets to ensure that the resulting rankings do not violate
the fairness threshold. Consequently, the value of dcos is afected in all the datasets.</p>
        <p>In our final experiment, we examine the behavior of the system when faced with unattainable
thresholds—both threshold components set to 0.1. As expected, the system is unable to meet
6It is important to note that these thresholds represent the maximum allowed values for identifying unfair behaviour
and the maximum acceptable error in accuracy, respectively.
the requirements but instead settles into an equilibrium state near the desired thresholds. The
average DIDI values exhibited by the system align with the initial bias present in the three
datasets: Strong Unbalanced being the most biased, while Balanced and Mild Unbalanced are
closer in terms of DIDI. However, as FAiRDAS takes actions on the incoming queries to
approach the desired fairness threshold, the quality of rankings deteriorates for all the datasets.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Conclusion</title>
      <p>In this study, we proposed a novel approach for addressing long-term fairness in matchmaking
and ranking systems by formalizing it as an abstract dynamic system. By modelling the evolution
and interaction of fairness metrics over time, valuable insights into system behaviour, metrics
interactions, and overall dynamics can be gained. This formalization enables a systematic
analysis of system properties, such as stability, equilibrium, and convergence, and facilitates the
exploration of trade-ofs and tensions when optimizing multiple fairness metrics simultaneously.</p>
      <p>One of the advantages of the proposed abstract dynamic system is its flexibility and
customizability. It can be tailored to specific cases, allowing for targeted analysis and experimentation.
Diferent metrics – including both fairness and performance metrics – can be considered
simultaneously and easily parameterized within the system. Furthermore, the study includes a
discussion and evaluation of the proposed approach in a real-world use case. By grounding the
framework in a practical scenario, the efectiveness and applicability of the approach can be
demonstrated, providing valuable insights and practical guidance for implementing fair ranking
systems.</p>
      <p>Overall, this study has the potential to advance the field of fairness in matchmaking and
ranking systems.</p>
      <p>Future Research Directions The present work represents a preliminary exploration of the
topic, and there are several avenues for further research. One key area of focus will be the
optimal selection of eigenvalues for the matrix . It is essential to conduct in-depth studies to
determine the most efective approach for selecting these eigenvalues and their impact on the
overall system dynamics.</p>
      <p>Additionally, a more comprehensive investigation of the system is necessary in terms of
balancing conflicting fairness metrics and identifying any unreachable points. It will be interesting
to explore methods for a priori identification of unreachable thresholds and visually analyze the
system’s behavior using phase diagrams. This analysis can provide insights into the dynamics
and trade-ofs between diferent fairness metrics.</p>
      <p>Furthermore, an important aspect to study is how to choose the thresholds for diferent
metrics, select their weights, and perform rescaling to make them comparable. Investigating
methods for threshold determination, weight selection, and normalization techniques will
contribute to refining the fairness-aware ranking system and enhancing its efectiveness.</p>
      <p>Another important direction for future research is the integration of contextual fairness in
the ranking systems. User-centric fairness is also an area that warrants further investigation.
Involving users in shaping the fairness objectives and constraints of ranking systems can lead
to more personalized and user-centric fairness approaches. Future studies should explore
methods for actively engaging users in the fairness design process, allowing them to define their
fairness preferences and customize the ranking system accordingly. Specifically, we will work
on adapting the system in real-time based on user feedback and interactions, going beyond
basic fairness constraints.</p>
      <p>Transparency and explainability are crucial aspects of fairness in ranking systems. Future
research eforts should be dedicated to enhancing the transparency and interpretability of
ranking algorithms. Developing methods to explain the ranking decisions to users and providing
them with insights into the fairness criteria employed can foster trust and understanding.
Transparency and explainability also enable users to hold the ranking system accountable for
its fairness outcomes.</p>
      <p>Long-term impact assessment is another important research direction. Evaluating the
efectiveness of fairness-aware ranking systems over extended periods is critical to ensure equitable
outcomes and minimize unintended consequences. Future studies should assess how these
systems afect various stakeholders, including users, content providers, and platform operators,
and examine any long-term biases or disparities that may arise.</p>
      <p>Furthermore, since we are operating within an experimental framework, a potential area
for future investigation would involve identifying the most equitable ML algorithm based on
the defined constraints (boundaries). Taking it a step further, examining the boundaries using
diverse, polarized data sets would provide insights into the threshold at which a particular
algorithm becomes inefective (algorithm breaking point).</p>
      <p>By addressing these future research directions, we can advance the field of fairness in
ranking systems, mitigate biases and discrimination, and ensure more equitable and transparent
outcomes for users and stakeholders alike.</p>
    </sec>
    <sec id="sec-5">
      <title>Acknowledgments</title>
      <p>The work has been partially supported by the AEQUITAS project funded by the European
Union’s Horizon Europe Programme (Grant Agreement No. 101070363), by the European
Horizon 2020 Project StairwAI (GA 101017142), and by PNRR - M4C2 - Investimento 1.3,
Partenariato Esteso PE00000013 - "FAIR - Future Artificial Intelligence Research" - Spoke 8
"Pervasive AI", funded by the European Commission under the NextGeneration EU programme.</p>
    </sec>
    <sec id="sec-6">
      <title>Appendix</title>
      <p>Let’s consider a discrete-time linear system with state equation:
( + 1) =  () + (),
(12)
with  and  real-valued  ×  matrices. Given   ∈ R the eigenvalues of  , it is known
that:
if | | &gt; 1 for some  =⇒ unstability.</p>
      <p>
        In the dynamic system defined in Equation (
        <xref ref-type="bibr" rid="ref6">6</xref>
        ), we have  = I − ; thus, assuming  to be
diagonal and positive definite, and denoting   ∈ R+ the eigenvalues of , we have that:
if   &gt; 2 for some  =⇒ instability.
      </p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>R.</given-names>
            <surname>Akerkar</surname>
          </string-name>
          ,
          <source>Artificial intelligence for business</source>
          , Springer,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>A. E.</given-names>
            <surname>Roth</surname>
          </string-name>
          ,
          <article-title>Who gets what-and why: the new economics of matchmaking and market design</article-title>
          ,
          <source>Houghton Miflin Harcourt</source>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>E.</given-names>
            <surname>Pitoura</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Stefanidis</surname>
          </string-name>
          , G. Koutrika,
          <article-title>Fairness in rankings and recommenders: Models, methods and research directions</article-title>
          ,
          <source>in: 2021 IEEE 37th International Conference on Data Engineering (ICDE)</source>
          , IEEE,
          <year>2021</year>
          , pp.
          <fpage>2358</fpage>
          -
          <lpage>2361</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>A.</given-names>
            <surname>Singh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Joachims</surname>
          </string-name>
          ,
          <article-title>Fairness of exposure in rankings</article-title>
          ,
          <source>in: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery &amp; Data Mining</source>
          ,
          <year>2018</year>
          , pp.
          <fpage>2219</fpage>
          -
          <lpage>2228</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>M.</given-names>
            <surname>Zehlike</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Stoyanovich</surname>
          </string-name>
          ,
          <article-title>Fairness in ranking: A survey</article-title>
          ,
          <source>arXiv preprint arXiv:2103.14000</source>
          (
          <year>2021</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>M.</given-names>
            <surname>Zehlike</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Stoyanovich</surname>
          </string-name>
          ,
          <article-title>Fairness in ranking, part i: Score-based ranking</article-title>
          ,
          <source>ACM Computing Surveys</source>
          <volume>55</volume>
          (
          <year>2022</year>
          )
          <fpage>1</fpage>
          -
          <lpage>36</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>B.</given-names>
            <surname>Hutchinson</surname>
          </string-name>
          , M. Mitchell,
          <article-title>50 years of test (un) fairness: Lessons for machine learning</article-title>
          ,
          <source>in: Proceedings of the conference on fairness, accountability, and transparency</source>
          ,
          <year>2019</year>
          , pp.
          <fpage>49</fpage>
          -
          <lpage>58</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>A.</given-names>
            <surname>Raj</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. D.</given-names>
            <surname>Ekstrand</surname>
          </string-name>
          ,
          <article-title>Measuring fairness in ranked results: An analytical and empirical comparison</article-title>
          ,
          <source>in: Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval</source>
          ,
          <year>2022</year>
          , pp.
          <fpage>726</fpage>
          -
          <lpage>736</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>H.</given-names>
            <surname>Abdollahpouri</surname>
          </string-name>
          , G. Adomavicius,
          <string-name>
            <given-names>R.</given-names>
            <surname>Burke</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I.</given-names>
            <surname>Guy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Jannach</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Kamishima</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Krasnodebski</surname>
          </string-name>
          , L. Pizzato,
          <article-title>Multistakeholder recommendation: Survey and research directions, User Modeling and User-Adapted Interaction 30 (</article-title>
          <year>2020</year>
          )
          <fpage>127</fpage>
          -
          <lpage>158</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>S.</given-names>
            <surname>Aghaei</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. J.</given-names>
            <surname>Azizi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Vayanos</surname>
          </string-name>
          ,
          <article-title>Learning optimal and fair decision trees for nondiscriminative decision-making</article-title>
          ,
          <source>in: Proceedings of the AAAI conference on artificial intelligence</source>
          , volume
          <volume>33</volume>
          ,
          <year>2019</year>
          , pp.
          <fpage>1418</fpage>
          -
          <lpage>1426</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>4.1. Stability Analysis</mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>