1. Introduction

FAiRDAS: Fairness-Aware Ranking as Dynamic Abstract System

Eleonora Misino

Roberta Calegari

Michele Lombardi

Michela Milano

0 0 Alma Mater Studiorum-Università di Bologna , Italy

dynamic system as a solution to design and ensure long-term fairness in ranking systems. This approach provides valuable insights into system behaviour, metric interactions, and overall dynamics. By considering the ranking system as a dynamic system, we can model the evolution and interaction of fairness metrics over time. Our proposed approach enables the analysis of system properties, trade-ofs, and tensions that arise when optimizing multiple fairness metrics. To validate its efectiveness, we apply this approach to real-world use case scenarios, demonstrating its practical applicability.

eol>fair ranking fair matchmaking fairness in AI

1. Introduction

In recent years, artificial intelligence (AI) has gained significant prominence in the area of online matchmaking and ranking systems [1], which play a key role in diverse applications such as Airbnb or dating platforms like Tinder and Bumble1 [2]. The objective of these systems is to pair individuals or resources based on their respective characteristics, skills, or preferences, expressed by a user query. These AI models can efectively learn from historical data, user preferences, and other relevant features to generate personalized recommendations and optimize the matching of individuals or items. While these systems strive to provide relevant and personalized recommendations, long-term fairness has emerged as a crucial aspect to consider. Long-term fairness refers to the equitable treatment of diferent groups of users over extended periods, aiming to avoid systemic biases or disadvantages [3, 4]. Long-term fairness appears to be an important feature to consider and enforce when we have multiple queries to a recommendation system or a ranking system. Given a sensitive attribute, while the single query does not exhibit any polarization, multiple repeated queries can be biased toward a specific value of the sensitive attribute.

The state of the art in addressing long-term fairness in online ranking systems involves the development of fairness-aware algorithms that explicitly incorporate fairness constraints into the ranking process [5, 6]. These algorithms leverage fairness metrics and mathematical models to ensure equitable treatment across user groups.

Fairness metrics in ranking models encompass diverse dimensions like demographic parity, equalized odds, or equal opportunity [7]. Each metric targets a specific fairness aspect, ofering distinct insights. Considering and integrating multiple fairness metrics is common practice to attain a comprehensive understanding of fairness in rankings [8]. Thus, designing a theoretical framework that accommodates multiple metrics, possibly defined through parameters, becomes crucial.

Concerning algorithms, by formulating fairness constraints or objectives, the models ensure that the algorithm not only maximizes relevance or accuracy but also adheres to the desired fairness principles. These models often involve techniques from optimization, such as constrained optimization or multi-objective optimization, to balance fairness considerations with other objectives. One common approach is to introduce fairness constraints that enforce equal treatment or equal opportunity across diferent groups [ 4]. These constraints aim to ensure that the ranking algorithm does not favour or discriminate against specific demographic groups based on protected attributes such as gender, race, or age. By incorporating these constraints, the algorithm is encouraged to produce rankings that are fair and unbiased. Regularization techniques, such as fairness regularization, can also be employed to balance fairness considerations with other objectives [9], such as relevance or accuracy. Fairness regularization involves adding penalty terms to the optimization objective that explicitly encourage fair behaviour. The regularization terms act as a form of control, nudging the algorithm towards producing fair rankings while still optimizing for other desired properties.

Considering the factors mentioned above and given the complexities associated with studying the evolution and convergence of ranking systems, particularly when dealing with complex algorithms in existing state-of-the-art approaches, this paper proposes a novel solution. The main contribution of this paper is the formalization of long-term fairness as an abstract dynamic system, to model the evolution and interaction of fairness metrics over time.

This formalization enables a systematic analysis of system properties like stability, equilibrium, and convergence. It also facilitates the identification of critical points or attractors. Furthermore, it allows for the examination of how system changes, such as algorithm updates or user preference shifts, influence long-term fairness. Additionally, this formalization enables the exploration of trade-ofs and tensions when optimizing multiple fairness metrics concurrently. It provides insights into how improvements in one metric may impact others and reveal the complex relationships between them. Through this approach, a deeper understanding of the dynamics between contrasting fairness metrics can be achieved, helping to identify potential synergies or conflicts among them.

It is important to note that the abstract dynamic system for long-term fairness is intentionally left abstract to accommodate customization based on specific case characteristics. Once the dynamic system parameters are selected, the abstract dynamic system can be tailored to the specific case under investigation. Experiments can then be conducted to observe system behaviour and evaluate the efectiveness of diferent approaches in achieving fairness goals. This formalization allows for better control and fine-tuning of each component of the system.

The paper is organized as follows. Section 2 introduces FAiRDAS – Fairness-Aware Ranking as Dynamic Abstract System – and presents its mathematical foundation. The dynamic framework is explained in detail, along with the process of grounding it. Next, an empirical evaluation is conducted in 3, grounding the framework in the motivating example described above. The instantiation of the framework is evaluated, demonstrating the efectiveness of the proposed approach. Finally, in 4, concluding remarks are provided, along with a discussion on future works.

2. FAiRDAS

In this section, we present the problem formulation for the framework (Subsection 2.1). The main objective is to develop a comprehensive understanding of the problem at hand and establish a clear foundation for our proposed solution. Then, the FAiRDAS abstract dynamic system is introduced in Subsection 2.2. The main idea behind the work revolves around conceptualising long-term fairness as a dynamic system, allowing us to define the ideal behaviour and potentially conduct system property analysis. By mapping fairness to this abstract level, we establish a framework for understanding the desired dynamics of the system. After the definition of the dynamic system, we transition from the abstract level to the practical level by implementing concrete actions within the real system (Subsection 2.3). These actions are specifically designed to approximate the ideal behaviour defined at the abstract level. By taking tangible steps within the real system, we aim to align its behaviour as closely as possible with the envisioned fairness dynamics.

2.1. Problem formulation: ranking problem definition

We approach the problem of ranking a set of resources ℛ in response to a sequence of incoming queries ∞=1. Our focus lies on the ranking algorithm, which can be manipulated by adjusting a vector of parameters , referred to as the action vector (e.g. penalizing an over-exposed resource). Formally, we can represent the ranking algorithm as a function: where and are the query and the set of actions performed at time , respectively; (· , · ) is the ranking function performed on ℛ, and it is based on the given query and set of actions. : × Θ → {}=1 = ( (, )), where is the set of possible queries, Θ is the set of possible action vectors, and {}=1 is the resources rank for query .

At each time step a vector of metrics ∈ R is defined as: ( 1 ) ( 2 )

2.2. The FAiRDAS Approach

Evolution of Metrics as a Dynamic System Equation ( 2 ) defines how to evaluate metrics across all resources, but it has two significant limitations. First, since it considers a single query at a time, the equation requires an aggregation mechanism in order to be used for measuring fairness across multiple queries. Second, since our eventual goal is enforcing fairness, we are interested in assessing how much the current action vector afects all possible incoming queries.

We can address both issues by taking the expectation of the metrics over the query distribution at time . This allows us to capture the average behaviour across all queries and account for the variations in query characteristics and preferences, as well as the dependency of such elements on time. Hence, we formalize the metric value as its expectation over the query distribution:

¯ = E∼ [( (, ))] Note that is not directly observable, but it can be approximated, for example by checking the last queries (Equation ( 8 )).

In Equation ( 3 ), the set of actions is performed to approach the ideal behaviour, and depends on the expected behaviour of the queries from the previous step:

+1 = (¯), where the function (· ) represents the selection mechanism of the set of actions. At each time step, we have direct and explicit control over the selection of the actions, so that we can actively influence the ranking algorithm to adapt and improve its fairness performance over time.

By focusing just on the evolution of the metrics, we combine Equation ( 3 ) and Equation ( 4 ) to obtain: ¯+1 = (¯), ( 5 ) where (· ) represents the evolution function depending on the selection mechanism . Defining an Ideal System Behavior The ultimate objective is to evolve the system in such a way that it ensures the pre-defined metrics remain below a vector of user-defined thresholds ∈ R.

The metrics of interest are application dependant and may include fairness indicators as well as other metrics. For example, the application domain may require fairness metrics as well as performance metrics to guarantee both system’s fairness and predictive performance within an acceptable range.

By setting a threshold for each metric, the user establishes boundaries that the system should not surpass. For fairness metrics, the threshold represents the maximum limit beyond which the system would be considered unfair. On the other hand, for accuracy metrics, the threshold represents the maximum acceptable level of error in the prediction. ( 3 ) ( 4 )

It is worth mentioning that metrics can be contrasting or even conflicting – like in the case of fairness and accuracy. In such case cases, it may not be possible to satisfy all of the thresholds simultaneously. For example, improving fairness metrics might result in a decrease in accuracy, or vice versa. In such scenarios, it becomes essential to strike a balance and determine a satisfactory compromise that aligns with the desired objectives and priorities.

Finding this trade-of involves carefully considering the relative importance of each metric and making informed decisions based on the context and requirements of the system. The most immediate outcome of this process is the definition of the thresholds, but in some cases it might be usefule to rescale diferent metrics to reflect their relative importance. At this stage, we model the whole process in a abstract fashion, by treating the thresholds as parameters and by viewing any rescaling operation as part of the definition of the metrics themselves.

Accordingly, to these considerations, a disarable discrete dynamical system that ensures the user objectives can be described as follows: ¯+1 = min(0, ¯ − ) + ¯, ( 7 )

Formally, we want to approximate the evolution of the system in Equation ( 6 ) by finding the set of actions to perform at time step in order to minimize the distance to the ideal behaviour; thus, we have to solve: where is a × diagonal and positive definite matrix, and due to result from the theory of discrete, linear, dynamic systems represents the upper bound for the set of fixed points in the system. By defining and , we establish a framework that guides the evolution of the system towards achieving fairness objectives.

Based on the choice of , the expected behaviour of the system defined in Equation ( 6 ) can be schematized via the following three situations: (a) with eigenvalues close to zero, the system converges slowly, making softer and less drastic choices; (b) with eigenvalues close to 1, the behaviour is highly rapid, indicating more aggressive choices; (c) with eigenvalues greater than 1, the theoretical dynamic system oscillates. However, since the minimum limits the oscillations, a stable behaviour is anticipated. We report the theoretical results in Appendix 4.1. Approximating the Ideal Behavior The behaviour of the system, i.e., the metrics evolution over time, described in Equation ( 5 ) may undergo changes over time due to two main factors: i) the drift in the query distribution (change or shift in the characteristics, preferences, or distribution of the incoming queries over time), and ii) the set of actions taken to enforce fairness. While the first factor is beyond our control this is not true for the second factor. We can leverage this control to shape the system’s evolution and approximate the desired behaviour outlined in Equation ( 6 ), which represents the ideal dynamics of the system.

Let’s define the distance between the approximate and the ideal system evolution as: arg min ℒ(¯).

( 6 ) ( 8 )

The cost function of the system, represented by Equation ( 8 ), can be tailored to the specific scenario being addressed.

We can then expand ¯ according to its definition, and we can rely on a sample approximation for the expectation. For example, we can assume that the distribution drift over time is limited and use as sample the queries from the last time steps; when = 0, our approximation is not impacted by drift efects, but it is more afected by sampling noise (i.e. we treat a single query as representative for the full distribution); increasing allows to adjust this trade-of.

With this strategy, Equation ( 8 ) can be customised as follows2: arg min {︃ ℒ() +

∑︁ =1

}︃ ℒ(− ) , with − = ( (− , )). ( 9 )

The rationale behind the semantics of Equation ( 9 ) can be summarized with the following intuition: to limit the impact of drift-efects, we consider only one query per time step, and we approximate the expectation in Equation ( 3 ) by considering the previous queries before (current query arrived at time ) and treat them as if they arrived at time . Therefore, during the optimization phase, the cost function is computed by considering the impact of the actions not only on but also on the previous queries (Equation ( 9 )). However, since the previous queries are passed, their contribution is weighted with a parameter ∈ [0, 1], to lower their importance. It is worth mentioning that by changing we can also control the computational eficiency of the system.

2.3. FAiRDAS grounding

The proposed approach is applicable to a wide range of scenarios where the objective is to promote fairness in the outcomes of user queries over time. To tailor the abstract framework FAiRDAS to a specific application all the system parameters should be selected. Specifically, the abstract system incorporates the following parameters and actions to be grounded: • Specific metrics of interest, along with the thresholds and rescaling factors, definition . This involves the selection of metrics relevant to assessing fairness, as well as other metrics of interest, including those related to performance. Appropriate rescaling factors are applied to ensure comparability among these metrics. It is essential to include weights in the metric definition to indicate the importance of each metric within the specific scenario. • Ideal dynamical system definition . The matrix of the dynamic system and the threshold vector are defined to represent the desired behaviour of the system. This involves defining the matrix and the threshold vector as depicted in Equation ( 6 ). • Actions for achieving fairness definition . Potential actions or interventions are identified to promote fairness within the system. • Cost function definition . A cost function is defined to quantify the trade-ofs or penalties associated with diferent outcomes or actions. This is the function that guides the 2The cost function in Equation ( 9 ) represents one possible approximation of the query expectation; for example, other distance metrics can be defined depending on the application, and the factor can be time-dependent instead of constant to weight each past contribution − diferently.

optimization process, taking into account the approximation for the expectation of the fairness metrics as outlined in equation 3. • Optimization method definition . Select the appropriate method to solve the optimization problem, considering factors such as computational eficiency, accuracy, and scalability.

By following these steps, we can customize FAiRDAS to a specific application or domain, allowing for the efective enforcement of fairness or other metrics over time in the outcomes of user queries. The flexibility and adaptability of our approach enable its application in various contexts, providing a framework for addressing fairness concerns in query-based systems. The parameterization of the system in FAiRDAS empowers us to navigate the complexities and intricacies of rankings while ensuring fairness. It ofers a versatile framework that accommodates the inherent non-smoothness of rankings, enabling efective and customizable handling of fairness considerations.

3. Empirical Evaluation 3.1. Case Study

In this section, we present a real-world case study 3 that focuses on a web platform providing a matchmaking service for AI experts. The platform aims to bridge the gap between customers (such as companies or other entities) seeking AI expertise to solve specific problems. Users of the platform describe the issues they need to address, along with any desired expertise. The platform then generates a ranked list of AI experts that best align with the user’s query, providing them with a tailored response. The ranking process takes into consideration the relevance and suitability of the experts’ expertise to the user’s specific scenario. This ensures an optimal match between the user’s requirements and the available AI experts.

Within this context, fairness is a crucial aspect that is given due consideration. Specifically, the sensitive attribute of country is acknowledged, prompting the development of fair algorithms to address this concern. In generating the ranked list of AI experts, our goal is to ensure fairness with respect to the country attribute. This entails considering the expertise and qualifications of the AI experts while giving proper weight to their skills in relevant AI subfields. Additionally, measures are implemented to mitigate any potential biases associated with the country attribute. These measures ensure that experts from all countries have an equal opportunity to be included in the ranked list. Formally, let each resource be characterized by two score vectors: ∈ R and ∈ R. The vector captures the expertise of AI experts across subfields of AI. Each component of represents the skill level of the expert in a specific subfield, and its value is confined within the range of [− 1, 1]. On the other hand, represents the country of the AI expert, encoded in a one-hot format, and serves as a protected attribute within our use case to guarantee fair nationality distribution. The queries are described by the same vectors of scores which reflect the user’s requirements. It is important to note that the query includes both AI subfields ( ) and preferred country () in R+ due to the specific nature of the case at hand. In this scenario, when selecting AI experts for a particular use case, it might be necessary to specify the language in which we can communicate with the expert. However, in typical cases, 3StairwAI Project. the sensitive attribute cannot be directly provided by the user. It is crucial to understand that this particularity does not impact the mathematical foundation of the model; on the contrary, it shows its flexibility.

Dataset Generation For the experimentation phase, synthetic data was generated to facilitate better control over the level of an imbalance concerning the country attribute, allowing for the manipulation of bias levels. To achieve this, the synthetic data was designed to represent various countries in a controlled manner. By adjusting the parameters of the data generation process, the desired level of imbalance or bias in terms of country representation could be defined.

We generate three datasets for the experimental valuation, each consisting of 40 resources and 100 queries. Each data sample is defined by 9 AI subfields and belongs to one of 5 countries. The score vectors of the AI subfields are sampled from a uniform distribution, while the country vectors are sampled from a categorical distribution with 1, . . . , 5 where the event probabilities were normalized. The synthetic datasets difer in the event probabilities of the resources country distribution to represent three levels of bias. In Figure 1, we show the scores distribution of the generated data. Figure 1(a) represents the distribution of the query scores, which is the same for all three datasets. The resources of the first dataset are uniformly distributed among the 5 classes, i.e., = 0.2 ∀ (Figure 1(b)); we refer to this dataset as Balanced. In the second dataset, hereinafter Mild Unbalanced dataset, we introduce a bias towards the attribute Country3 by setting 3 = 0.4 (Figure 1(c)), and in the third dataset, named Strong Unbalanced, we increase the bias with 3 = 0.8 (Figure 1(d)).

Ranking Algorithm Given an incoming query ∈ R+, the ranking algorithm computes the cosine similarity between and each resource vector; the resources are ranked based on the resulting scores.

Metrics of Interest In this case study, we are interested in enforcing fairness over the protected attribute while preserving the ranking accuracy. As fairness metric we use the Disparate Impact Discrimination Index [10]. Given a sample {, }=1 including values for a protected attribute and a continuous target value , the Disparate Impact Discrimination Index is defined as: dcos(, ) = 1 −

· || || || || DIDI(, ) = ∑︁ ⃒⃒⃒ ∑︀=1 ( = ) 1 ∑︁ ⃒⃒⃒

∈ ⃒⃒ ∑︀=1 ( = ) − =1 ⃒⃒ where is the domain of and ( ) is the indicator function for the logical formula .

To quantify the ranking accuracy we measure the cosine distance dcos between the true rank vector resulting from applying the ranking algorithm with no actions vector, and the rank vector computed by the ranking algorithm subject to an actions vector . ( 10 ) (11)

Resources

AILabelA1ILabelA2ILabelA3ILabelA4ILabelA5ILabelA6ILabelA7ILabelA8ILabeCl9ountryC1ountryC2ountryC3ountryC4ountry5 AILabelA1ILabelA2ILabelA3ILabelA4ILabelA5ILabelA6ILabelA7ILabelA8ILabeCl9ountryC1ountryC2ountryC3ountryC4ountry5

Resources

Resources Ideal Dynamical System We adopt the dynamical system in Equation ( 6 ) with a 2 × 2 diagonal matrix with eigenvalues equal to 0.54, and we vary the threshold components in the interval [0, 1] to force diferent behaviours.

Cost Function As cost function, we adopt the 2-norm defined in Equation ( 7 ), and we approximate the metrics expectation by considering = 5 previous queries and weighting their contribution with = 0.2.

Set of Actions Our set of actions applies directly to the rank returned by the ranking algorithm: each resource in ℛ can be either put at the bottom of the rank or kept in the current position based on the value of a binary variable. In particular, we associate a binary variable to each resource indicating whether the − ℎ resource should ( = 1) or should not ( = 0) be placed at the bottom of the ranking list. The optimizer seeks to determine which resources to place at the bottom by exploring the space {0, 1}, where represents the number of resources. In other words, the algorithm searches for the optimal assignment of the binary variables representing the action “place at the bottom”. 4The eigenvalues of 0.5 were selected as they fall within the stable range [0,2] discussed in the previous section. Further studies will be dedicated to finding the optimal eigenvalues for the specific scenario at hand. Optimization Method To solve the optimization problem we use a random walk to search in the hyperspace Ω ⊆ { 0, 1}. Starting with a random generated point the algorithm searches for a better solution among a set of candidates recursively. The new candidate points are generated by flipping each component of the current best solution with a decreasing probability . 0 20 40 60 80 100 0 20 40 60 80 100 Queries

Queries 0.0 (b)

3.2. Experiments

The presented case study was applied to three datasets, and we varied the threshold components to enforce diferent behaviours 5. To establish a reference benchmark, we computed the DIDI and dcos metrics for each query in the three datasets without taking any further action. Figure 2(a) illustrates the resulting curves, while Table 1 summarizes the mean values of the two metrics across all queries. As expected, the Strong Unbalanced dataset exhibited a higher average DIDI compared to the Balanced and Mild Unbalanced datasets. Furthermore, the fairness metrics displayed a more unstable behaviour in the Strong Unbalanced dataset when compared to the 5The source code to generate the datasets and reproduce the experiments is available at https://github.com/EleMisi/ FAiRDAS under MIT license. other datasets.

As an initial step, we aim to determine the minimum average DIDI achievable by employing FAiRDAS. For this purpose, we set the threshold component for DIDI to 0, while keeping the component for ranking accuracy at 1. By doing so, we allow FAiRDAS to prioritize the reduction of DIDI without imposing any constraint on the rank accuracy metrics. Notice that the opposite setting, where we want to maximize the rank accuracy without any requirements on the fairness metrics, is equivalent to not taking any actions, i.e., it is equivalent to the reference benchmark (Figure 2(a)). Figure 2(b) visualizes the resulting curves. As anticipated, applying FAiRDAS with = {0, 1} leads to an increase in dcos for all datasets, while the DIDI decreases and exhibits a more stable behaviour.

In Figure 3, we present a comparison of the impact of FAiRDAS when applying three diferent thresholds. Specifically, in Figure 3(a), we depict the scenario where we set the thresholds for DIDI and dcos as 0.5 and 0.2, respectively 6. Since these thresholds exceed the mean values of DIDI and dcos for the Balanced and Mild Unbalanced datasets, FAiRDAS does not take any action on incoming queries, and the rank quality remains unafected. Conversely, in the case of the Strong Unbalanced dataset, the rank quality declines as FAiRDAS must implement measures to reduce DIDI.

In Figure 3(b), we maintain a constant threshold for dcos while reducing the threshold for DIDI to 0.4. This new threshold is now lower than the average value of DIDI achieved by applying FAiRDAS in the initial experiment on the Strong Unbalanced dataset (Figure 2(b), red line). As a result, the system reaches an equilibrium above the threshold for the Strong Unbalanced dataset. Furthermore, FAiRDAS needs to take actions on the incoming queries of the Balanced and Mild Unbalanced datasets to ensure that the resulting rankings do not violate the fairness threshold. Consequently, the value of dcos is afected in all the datasets.

In our final experiment, we examine the behavior of the system when faced with unattainable thresholds—both threshold components set to 0.1. As expected, the system is unable to meet 6It is important to note that these thresholds represent the maximum allowed values for identifying unfair behaviour and the maximum acceptable error in accuracy, respectively. the requirements but instead settles into an equilibrium state near the desired thresholds. The average DIDI values exhibited by the system align with the initial bias present in the three datasets: Strong Unbalanced being the most biased, while Balanced and Mild Unbalanced are closer in terms of DIDI. However, as FAiRDAS takes actions on the incoming queries to approach the desired fairness threshold, the quality of rankings deteriorates for all the datasets.

4. Conclusion

In this study, we proposed a novel approach for addressing long-term fairness in matchmaking and ranking systems by formalizing it as an abstract dynamic system. By modelling the evolution and interaction of fairness metrics over time, valuable insights into system behaviour, metrics interactions, and overall dynamics can be gained. This formalization enables a systematic analysis of system properties, such as stability, equilibrium, and convergence, and facilitates the exploration of trade-ofs and tensions when optimizing multiple fairness metrics simultaneously.

One of the advantages of the proposed abstract dynamic system is its flexibility and customizability. It can be tailored to specific cases, allowing for targeted analysis and experimentation. Diferent metrics – including both fairness and performance metrics – can be considered simultaneously and easily parameterized within the system. Furthermore, the study includes a discussion and evaluation of the proposed approach in a real-world use case. By grounding the framework in a practical scenario, the efectiveness and applicability of the approach can be demonstrated, providing valuable insights and practical guidance for implementing fair ranking systems.

Overall, this study has the potential to advance the field of fairness in matchmaking and ranking systems.

Future Research Directions The present work represents a preliminary exploration of the topic, and there are several avenues for further research. One key area of focus will be the optimal selection of eigenvalues for the matrix . It is essential to conduct in-depth studies to determine the most efective approach for selecting these eigenvalues and their impact on the overall system dynamics.

Additionally, a more comprehensive investigation of the system is necessary in terms of balancing conflicting fairness metrics and identifying any unreachable points. It will be interesting to explore methods for a priori identification of unreachable thresholds and visually analyze the system’s behavior using phase diagrams. This analysis can provide insights into the dynamics and trade-ofs between diferent fairness metrics.

Furthermore, an important aspect to study is how to choose the thresholds for diferent metrics, select their weights, and perform rescaling to make them comparable. Investigating methods for threshold determination, weight selection, and normalization techniques will contribute to refining the fairness-aware ranking system and enhancing its efectiveness.

Another important direction for future research is the integration of contextual fairness in the ranking systems. User-centric fairness is also an area that warrants further investigation. Involving users in shaping the fairness objectives and constraints of ranking systems can lead to more personalized and user-centric fairness approaches. Future studies should explore methods for actively engaging users in the fairness design process, allowing them to define their fairness preferences and customize the ranking system accordingly. Specifically, we will work on adapting the system in real-time based on user feedback and interactions, going beyond basic fairness constraints.

Transparency and explainability are crucial aspects of fairness in ranking systems. Future research eforts should be dedicated to enhancing the transparency and interpretability of ranking algorithms. Developing methods to explain the ranking decisions to users and providing them with insights into the fairness criteria employed can foster trust and understanding. Transparency and explainability also enable users to hold the ranking system accountable for its fairness outcomes.

Long-term impact assessment is another important research direction. Evaluating the efectiveness of fairness-aware ranking systems over extended periods is critical to ensure equitable outcomes and minimize unintended consequences. Future studies should assess how these systems afect various stakeholders, including users, content providers, and platform operators, and examine any long-term biases or disparities that may arise.

Furthermore, since we are operating within an experimental framework, a potential area for future investigation would involve identifying the most equitable ML algorithm based on the defined constraints (boundaries). Taking it a step further, examining the boundaries using diverse, polarized data sets would provide insights into the threshold at which a particular algorithm becomes inefective (algorithm breaking point).

By addressing these future research directions, we can advance the field of fairness in ranking systems, mitigate biases and discrimination, and ensure more equitable and transparent outcomes for users and stakeholders alike.

Acknowledgments

The work has been partially supported by the AEQUITAS project funded by the European Union’s Horizon Europe Programme (Grant Agreement No. 101070363), by the European Horizon 2020 Project StairwAI (GA 101017142), and by PNRR - M4C2 - Investimento 1.3, Partenariato Esteso PE00000013 - "FAIR - Future Artificial Intelligence Research" - Spoke 8 "Pervasive AI", funded by the European Commission under the NextGeneration EU programme.

Appendix

Let’s consider a discrete-time linear system with state equation: ( + 1) = () + (), (12) with and real-valued × matrices. Given ∈ R the eigenvalues of , it is known that: if | | > 1 for some =⇒ unstability.

In the dynamic system defined in Equation ( 6 ), we have = I − ; thus, assuming to be diagonal and positive definite, and denoting ∈ R+ the eigenvalues of , we have that: if > 2 for some =⇒ instability.

[1]

Akerkar , Artificial intelligence for business , Springer, 2019 .

[2]

A. E.

Roth , Who gets what-and why: the new economics of matchmaking and market design , Houghton Miflin Harcourt , 2015 .

[3]

Pitoura ,

Stefanidis , G. Koutrika, Fairness in rankings and recommenders: Models, methods and research directions , in: 2021 IEEE 37th International Conference on Data Engineering (ICDE) , IEEE, 2021 , pp. 2358 - 2361 .

[4]

Singh ,

Joachims , Fairness of exposure in rankings , in: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining , 2018 , pp. 2219 - 2228 .

[5]

Zehlike ,

Yang ,

Stoyanovich , Fairness in ranking: A survey , arXiv preprint arXiv:2103.14000 ( 2021 ).

[6]

Zehlike ,

Yang ,

Stoyanovich , Fairness in ranking, part i: Score-based ranking , ACM Computing Surveys 55 ( 2022 ) 1 - 36 .

[7]

Hutchinson , M. Mitchell, 50 years of test (un) fairness: Lessons for machine learning , in: Proceedings of the conference on fairness, accountability, and transparency , 2019 , pp. 49 - 58 .

[8]

Raj ,

M. D.

Ekstrand , Measuring fairness in ranked results: An analytical and empirical comparison , in: Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval , 2022 , pp. 726 - 736 .

[9]

Abdollahpouri , G. Adomavicius,

Burke ,

Guy ,

Jannach ,

Kamishima ,

Krasnodebski , L. Pizzato, Multistakeholder recommendation: Survey and research directions, User Modeling and User-Adapted Interaction 30 ( 2020 ) 127 - 158 .

[10]

Aghaei ,

M. J.

Azizi ,

Vayanos , Learning optimal and fair decision trees for nondiscriminative decision-making , in: Proceedings of the AAAI conference on artificial intelligence , volume 33 , 2019 , pp. 1418 - 1426 .

4.1. Stability Analysis