Optimal Fair Ranking: Real Performances and Critical Parameters Daniela A. Parletta1,βˆ— , Fabio Napoli1 1 Akkodis, Technology Innovation Hub, Artificial Intelligence Unit Abstract Ranking is the problem of retrieving the most relevant π‘š items for a query in a pool of 𝑛 items. Since ranking algorithms are extensively used in socially sensitive applications (e.g., candidate selections and search engines), it becomes critical to take into account an adequate notion of fairness. In this paper, we consider a flexible optimization-based fair ranking framework and we analyze optimal state-of-the-art algorithms on real-world data. We identify the merits as well as bottlenecks of existing methods and point out directions for future research. Keywords Ranking, Fairness, Optimization 1. Introduction In this paper, we are concerned with the following fundamental problem. We are given a pool of 𝑛 items, and upon receiving a query, we have to retrieve the best π‘š ≀ 𝑛 of them. This problem has numerous applications in Search Engines (e.g., Google), Recommendation Systems (YouTube and Netflix), Targeted advertisement (Amazon and eBay), Social Networks (Facebook, Instagram, or Twitter), Matching Systems (e.g., Tinder), and so on. Given a query associated with each item, there is a notion of relevance or utility that depends on that query. For example, if the query is to present the top 10 movies on Netflix based on Bob’s tastes, then each movie’s utility will be an aggregation (based on Bob’s preferences) of the genre components of that movie. In this work, we assume that a query is given and all the utilities are already determined. Ranking is also employed in settings where social/demographic/sexual discrimination may happen. This is the case of HR departments performing recruiting campaigns and ranking the best candidates for the open positions. A second example is ranking the best students to be admitted to the college. In all these settings, there may be bias toward some groups of individuals (e.g., Black-African people or female individuals). This bias may be unwillingly injected into an automated ranking procedure, for example, relaying on sensitive attributes when computing the utility of an item. The approach taken in this work to enforce fairness is to directly account for it in the formalization of ranking problems that lead to a notion of fairness by design. In this paper, we focus on the fair ranking framework proposed in [1], and provide a critical evaluation of state-of-the-art methods. Our main goal is to evaluate the maturity of existing AIMMES ’24: Workshop on AI bias: Measurements, Mitigation, Explanation Strategies, March 20, 2024, Amsterdam, NL βˆ— Corresponding author. Envelope-Open daniela-angela.parletta@akkodis.com (D. A. Parletta); fabio.napoli@akkodis.com (F. Napoli) Β© 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR ceur-ws.org Workshop ISSN 1613-0073 Proceedings methods for concrete applications. As a byproduct of our evaluation, we also point toward future research directions. In particular, we provide the following contributions: β€’ An explicit description of two state-of-the-art methods appeared in past literature only implicitly and in the proofs of runtime bounds. β€’ A discussion on the benefits of these methods as well as their drawbacks. β€’ A numerical evaluation of real-world datasets aimed at assessing the maturity of these solutions for concrete deployment into real-world applications. To the best of our knowl- edge, this is the first numerical evaluation of these methods. Related work. Ranking with fairness constraints has recently received much attention from the research community, and a complete account is given in a recent survey [2, 3]. One can distinguish at least two different approaches to enforce fairness constraints in a ranking: post- hoc adjustments (e.g., [4, 5, 6, 7]) and fairness by design (e.g., [1, 8, 9]). The former, aim takes as input a (possibly unfair) ranking, produced by an algorithm, and applies the minimum amount of modifications to satisfy the fairness constraints. The latter, instead, are methods designed to provide output rankings that already satisfy the fairness constraints. We notice that while post-processing adjustments approaches can be applied to any ranking algorithm, even one that is already developed and deployed, to enforce fairness. On the other hand, fairness by design typically allows one to obtain rankings of superior quality. In [1] authors propose an optimization framework for fair ranking problems and develop several polynomial time algorithms for it. This model has been generalized in [8] to account for noisy sensitive attributes. In [9] authors further extend the framework to aggregate multiple rankings optimally. We notice that the specific problem formulation described in this paper may be in principled approaches with the algorithms proposed in [10, 11]. However, the method proposed in [10] features an exponential dependence on the number of constraints, which in the case of fair ranking may ? be very large. On the other hand, the algorithms in [11] are only guaranteed to satisfy an π‘š-additive perturbation of the constraints, which is unsatisfactory for even moderately large π‘š. Structure of the paper. The remaining of this paper is structured as follows. Section 2 introduces the problem setting. Section 3 details the considered methods and Section 4 present the numerical evaluation. Finally, Section 5 draws some conclusions and sketches future directions for research. 2. Problem Setting Notation. β„•, β„€, ℝ denote the set of natural, integer, and real numbers. We define β„•0 β€œ β„•βˆͺt0u, ℝ` ≔ ℝ ∩ r0, ∞q and β„€` ≔ β„€ ∩ β„•0 . For every 𝑛 ∈ β„•, r𝑛s will denote the set t1, 2, … , 𝑛u. Given a set 𝐴 we denote its cardinality by |𝐴|. Given π‘š, 𝑛 ∈ β„• and a set 𝐴, π΄π‘šΓ—π‘› denotes the set of π‘š by 𝑛 matrices with entries in 𝐴. We will denote vectors with lowercase bold letters and matrices with uppercase bold letters. Given a vector a we denote its 𝑖-th entry with π‘Žπ‘– . The zero norm of a vector a is defined as }a}0 ≔ |t𝑖 ∢ π‘Žπ‘– ‰ 0u|. Given a matrix A we denote its p𝑖, 𝑗q-th entry with 𝐴𝑖𝑗 . Given a logical predicate 𝑝, we denote its indicator function by 𝕀p𝑝q, notice that 𝕀p𝑝q β€œ 1 when 𝑝 is true and 0 otherwise. Setting. Given π‘š, 𝑛 ∈ β„•, the ranking problem consists informally of retrieving (and sorting) the π‘š items in a pool of 𝑛 upon receiving a query. The utility of an item depends on the specific query and maybe the level of the required skill in a job offer, or the h-index in a Google Scholar search. In this work, we assume that the query is given and fixed, so that we collect the utility of placing a item 𝑖 ∈ r𝑛s in position 𝑗 ∈ rπ‘šs in a utility matrix U ∈ 𝑅`π‘šΓ—π‘› . We stress the importance of accounting for the position of a item in a given ranking: in a real-world setting, the perceived utility is inversely proportional to the item’s position in the ranking, a phenomenon known as position bias. A ranking is a vector r ∈ r𝑛sπ‘š s.t. π‘Ÿπ‘— β€œ π‘˜ if item π‘˜ is placed in position 𝑗, and π‘Ÿπ‘– ‰ π‘Ÿπ‘— for each 𝑖 ‰ 𝑗. For π‘˜ ∈ rπ‘šs the π‘˜-prefix of a ranking r is the vector rπ‘˜ β€œ pπ‘Ÿ1 , … , π‘Ÿπ‘˜ q. We introduce the assignment matrix of a ranking A ∈ t0, 1uπ‘šΓ—π‘› where i) 𝐴𝑖𝑗 β€œ 1 if item 𝑖 is π‘š 𝑛 assigned to position 𝑗 in the ranking, ii) βˆ‘π‘—β€œ1 𝐴𝑗𝑖 ≀ 1 for each 𝑖 ∈ r𝑛s and βˆ‘π‘–β€œ1 𝐴𝑗𝑖 β€œ 1 for each 𝑗 ∈ rπ‘šs. We denote the set of all such matrices by π’œ. Notice that there is a bijection between the set of the rankings and the set of the assignment matrices, thus we can speak about a ranking or its assignment matrix interchangeably. The optimal ranking problem is defined as π‘š 𝑛 max βˆ‘ βˆ‘ 𝐴𝑗𝑖 π‘ˆπ‘—π‘– . (1) Aβˆˆπ’œ π‘—β€œ1 π‘–β€œ1 As the focus of this paper is on fair rankings, we consider a more constrained version of Equation (1). In particular, we assume that there is a set of 𝑝 ∈ β„• properties w.r.t. achieve fairness and denote with 𝑃ℓ βŠ† r𝑛s the set of items having the property β„“ ∈ r𝑝s: examples of properties are the gender, the ethnicity or the age. Fairness may be expressed via lower and upper bounds on the number of items featuring a given properties in each prefix of the ranking. For example, we may require that in each even prefix of the ranking, there should be an equal π‘šΓ—π‘ number of men and women. Formally, the matrices Fp𝑙q , Fp𝑒q ∈ β„€` specifies the minimum p𝑙 q p𝑒 q Fπ‘˜β„“ and the maximum number Fπ‘˜β„“ of items having the property β„“ in the π‘˜-prefix of a feasible ranking, that is π‘˜ p𝑙q p𝑒 q πΉπ‘˜β„“ ≀ βˆ‘ βˆ‘ 𝐴𝑗𝑖 ≀ πΉπ‘˜β„“ , βˆ€π‘˜ ∈ rπ‘šs, βˆ€π‘™ ∈ r𝑝s . (2) π‘—β€œ1 π‘–βˆˆπ‘ƒβ„“ These matrices are called lower fairness constraint matrix and upper fairness constraint matrix respectively. The Optimal Fair Ranking (OFR) problem is thus defined as π‘š 𝑛 max βˆ‘ βˆ‘ 𝐴𝑗𝑖 π‘ˆπ‘—π‘– Aβˆˆπ’œ π‘—β€œ1 π‘–β€œ1 π‘˜ (3) p𝑙q p𝑒q s.t. πΉπ‘˜β„“ ≀ βˆ‘ βˆ‘ 𝐴𝑗𝑖 ≀ πΉπ‘˜β„“ , βˆ€π‘˜ ∈ rπ‘šs, βˆ€π‘™ ∈ r𝑝s . π‘—β€œ1 π‘–βˆˆπ‘ƒβ„“ We notice that this problem has binary π‘š β‹… 𝑛 decision variables and π‘š β‹… p𝑝 ` 𝑛q constraints. The optimal value of the above problem will be denoted with OPT. To set up a bottom line, notice that even just checking the feasibility of Equation (3) is NP-Hard in the general case (see Theorem 10.1, 10.3 and 10.4 in [1]). Moreover, in the general case Equation (3) is APX-Hard, and it thus not not admit a polynomial-time approximation scheme (see Theorem 10.2 in [1]). Utility and constraints models. In what follows we list a few popular models for the positions bias. We denote with 𝑒𝑖 the absolute utility of the item 𝑖. We consider the following models for the utility π‘ˆπ‘–π‘— of placing 𝑖 ∈ r𝑛s in position 𝑗 ∈ rπ‘šs Β΄1 𝑗´1 , 𝑒looooooomooooooon 𝑖 log2 p1 ` 𝑗q, 𝑒loooooomoooooon 𝑖 𝑝p1 Β΄ 𝑝q 𝑒loooomoooon 𝑖 𝕀p𝑗 β€œ 1q (4) π‘™π‘œπ‘”π‘Žπ‘Ÿπ‘–π‘‘β„Žπ‘šπ‘–π‘ π‘”π‘’π‘œπ‘šπ‘’π‘‘π‘Ÿπ‘–π‘ π‘ π‘–π‘›π‘”π‘’π‘™π‘Žπ‘Ÿ Notice that while this specific bias is uniform across the items in the pool, the general utility model presented above does not need to be. Further, notice that the geometric model is parametrized by a discount factor 𝑝 ∈ r0, 1s controlling how quickly the utility of an item decays w.r.t. its position in the ranking. We will assume the following conditions on the utility U π‘ˆπ‘–1 𝑗1 β‰₯ π‘ˆπ‘–2 𝑗1 , π‘ˆπ‘–1 𝑗1 β‰₯ π‘ˆπ‘–1 𝑗2 , π‘ˆπ‘–1 𝑗1 ` π‘ˆπ‘–2 𝑗2 β‰₯ π‘ˆπ‘–1 𝑗2 ` π‘ˆπ‘–2 𝑗1 (5) where the two leftmost inequality are called monotonicity and the rightmost is known as Monge condition. We notice that is the users are sorted in descending order of absolute utilities, then any of the above models for the position bias satisfies the Equation (5). As for the constraints, they are usually specified in terms of a maximum number of items belonging to a certain category that can appear in a given prefix. This can be done by setting Fp𝑙q to the zero matrix and only specifying Fp𝑒q . We name this problem Optimal Fair Ranking with Upper constraints (OFRU). 3. Algorithms In this section, we describe two algorithms that solve the OFR problem. Each method features optimality and runtime guarantees that depend on certain parameters, making them better suited under different circumstances. We notice that these methods first appeared in [1], but only implicitly with the proofs of the main theorems. We notice that this is the first explicit description of such algorithms. Dynamic programming. We define the type of an item 𝑖 as a binary vector tp𝑖q ∈ t0, 1u𝑝 such that 𝑑ℓ p𝑖q β€œ 1 if and only if 𝑖 has the property β„“ ∈ r𝑝s. Let 𝑇 ≔ ttp𝑖q ∢ 𝑖 ∈ r𝑛su the set of different types in the data, and let 𝑑 ≔ |𝑇|. We let the elements of 𝑇 be 𝑣 p1q , … , 𝑣 p𝑑q and define 𝑇ℓ ≔ t𝑖 ∈ rπ‘šs ∢ tp𝑖q β€œ 𝑣 pβ„“q u - the set of the first π‘š items of type β„“ - for each β„“ ∈ r𝑑s. We denote p β„“q p β„“q with 𝑑ℓ the cardinality of 𝑇ℓ and its elements with 𝑖1 , … , 𝑖𝑑ℓ . It is convenient to perform a pre-processing step that only retains the first (at most) π‘š items (i.e., the items with the smallest indices) for each type. In light of Equation (3), among these (at Algorithm 1 Dynamic programming for OFR Input: the sets 𝑇1 , … , 𝑇𝑑 , the utility matrix U, the constraints Fp𝑙q , Fp𝑒q , and the set of properties t1, … , 𝑝u. Initialize π‘ˆr0, … , 0s Ð 0, 𝑠1 , … , 𝑠𝑑 Ð 0, π‘Ÿ β€œ empty list while 𝑠1 ` β‹― ` 𝑠𝑑 Δƒ π‘š do u Ð p0, … , 0q ∈ ℕ𝑑0 for β„“ β€œ 1, … , 𝑑 do if Feasiblep𝑠1 , … , 𝑠ℓ ` 1, … , 𝑠𝑑 q then 𝑒ℓ Ð π‘ˆr𝑠1 , … , 𝑠𝑑 s ` π‘ˆ pβ„“q 𝑖𝑠ℓ 𝑗 else 𝑒ℓ Ð ´∞ end if end for Get the index β„“Λš and the value 𝑒 ˚ of maxβ„“βˆˆr𝑑s 𝑒ℓ p β„“q Increment π‘ β„“Λš Ð π‘ β„“Λš ` 1, π‘ˆr𝑠1 , … , 𝑠𝑑 s Ð 𝑒 ˚ , π‘Ÿ.appendpπ‘–π‘ β„“Λš q end while Output: the optimal ranking π‘Ÿ, the optimal utility π‘ˆr𝑠1 , … , 𝑠ℓ s. most) π‘š β‹… 𝑑 items there will be the optimal ranking. Notice step can be done in order of π‘š β‹… 𝑑 time and produces the sets 𝑇1 , … , 𝑇𝑑 . The idea of the dynamic programming approach is the following. We can partition the set of all possible rankings of a certain length π‘˜ ≀ π‘š according to the number of items of each type they have. That is, for 𝑠1 , … , 𝑠𝑑 ∈ β„•0 s.t. π‘˜ β€œ 𝑠1 ` β‹― ` 𝑠𝑑 we denote with p𝑠1 , … , 𝑠𝑑 q the set of all rankings of length π‘˜ made of 𝑠𝑗 items of type vp𝑗q for each 𝑗 ∈ r𝑑s. Furthermore, with π‘ˆr𝑠1 , … , 𝑠𝑑 s we denote the value of the feasible rankings of the highest value in p𝑠1 , … , 𝑠𝑑 q. Thus, finding the optimal solution to an OFR problem corresponds to identifying a feasible ranking of utility π‘ˆr𝑠1 , … , 𝑠𝑑 s β€œ OPT, when 𝑠1 ` β‹― ` 𝑠𝑑 β€œ π‘š. Checking feasibility of the rankings in p𝑠1 , … , 𝑠𝑑 q is simple: it is enough to check that 𝑑 p𝑙q pβ„Žq p𝑒 q πΉπ‘˜β„“ ≀ βˆ‘ π‘ β„Ž 𝑣ℓ ≀ πΉπ‘˜β„“ , βˆ€β„“ ∈ r𝑝s, β„Žβ€œ1 which takes order of 𝑑 β‹… 𝑝 time. We start initializing π‘ˆr0, … , 0s β€œ 0 and notice that by Equation (5), it either holds that π‘ˆr𝑠1 , … , 𝑠𝑑 s β€œ max π‘ˆr𝑠1 , … , 𝑠ℓ Β΄ 1, … , 𝑠𝑑 s ` π‘ˆ pβ„“q , β„“βˆˆr𝑑s 𝑖𝑠ℓ Β΄1 𝑗 or that p𝑠1 , … , 𝑠𝑑 q is infeasible in which case we set π‘ˆr𝑠1 , … , 𝑠𝑑 s β€œ ´∞. As a result, we can build the optimal solution with a bottom-up approach: we start from p0, … , 0q and we build the ranking by adding an item of maximal utility that keeps the constraints satisfied at each step. As we have seen, checking the feasibility takes the order of 𝑑 β‹… 𝑝 and there are at most order of π‘šπ‘‘ rankings that needed to be considered (all the ways to chose 𝑑 natural numbers so that they sum to π‘š). The overall algorithm is described in Algorithm 1. Below we report the theorem establishing its theoretical performances. Theorem 1 (Theorem 3.1 in [1]). Algorithm 1 finds an optimal solution to the OFR problem, if there is one, in order of π‘π‘‘π‘šπ‘‘ time. Otherwise, it return π‘ˆr𝑠1 , … , 𝑠𝑑 s β€œ ´∞. The pre-processing runs in order of π‘š β‹… 𝑑 time. Remark 1 (On the complexity of Algorithm 1). We make the following comments: β€’ We notice that an exhaustive search among all possible rankings of length π‘š in a pool of π‘š β‹… 𝑑 takes the order of 𝑑 π‘š time. On the other hand, the proposed method takes the order of π‘šπ‘‘ time, which is better: in real-world applications, 𝑑 should be thought of as a constant determined by the fairness constraints, while π‘š is a parameter that can be as large as 𝑛. β€’ The practicality of Algorithm 1 depends on the value of 𝑑. We notice that 𝑑 β‰₯ 𝑝, implies that when fairness is defined using more than 2-3 properties, the algorithm becomes unpractical. Greedy algorithm. To overcome the limitations discussed in Remark 1, we consider an alternative greedy approach. The algorithm is simple. Similarly to Algorithm 1, it keeps the items grouped into the set 𝑇1 , … , 𝑇𝑑 . At each step 𝑗, the algorithm greedily takes the item with the highest utility - let vπ‘˜p𝑗q its type - and adds it to the ranking if this addition does not violate the constraints. Otherwise, it seeks the next item among the types tvp1q , … , vp𝑑q u{tvπ‘˜p𝑗q u. If no feasible item can be found, the problem is declared to be infeasible. Since there are π‘š positions to fill and each item can be checked for feasibility in order of 𝑝 time, the runtime of this algorithm is the order of π‘š β‹… 𝑝. The algorithm is also described in Algorithm 2. Despite its simplicity, this algorithm provably finds an optimal solution in the following class of problems. Let define Ξ” ≔ max }𝑣}0 (6) π‘£βˆˆπ‘‡ that is, Ξ” is the maximum number of properties each type of item can exhibit. Then the following holds. Theorem 2 (Theorem 3.2 in [1]). Algorithm 2 finds an optimal solution to any OFRU problem as long as Ξ” β€œ 1. For those problems, if the algorithm returns infeasible, then the problem does not admit a solution. It runs in order of π‘š β‹… 𝑝 time. The pre-processing runs in order of π‘š β‹… 𝑑 time. Remark 2 (On the complexity of Algorithm 2). We make the following comments: β€’ Theorem 2 guarantees optimality of Algorithm 2 only when the problem is an instance of the OFRU problem. This assumption is in contrast to Algorithm 1 that instead can solve the more general OFR problem. Nevertheless, OFRU models many natural settings, since fairness is usually expressed only by limiting the number of items in the unprotected classes (thus only specifying the matrix Fp𝑒q ). β€’ The assumption Ξ” β€œ 1 is satisfied in many practical settings, including the important case of mutually exclusive properties (e.g. male vs. female, White vs Black vs Asian, protected workers categories). For the sake of comparison, notice that in this setting, 𝑑 β€œ 𝑝 and when 𝑝 is even moderately large (e.g. 𝑑 ∈ t4, 5u) Algorithm 1 is unpractical. Algorithm 2 Greedy Algorithm for OFR Input: the sets 𝑇1 , … , 𝑇𝑑 , the utility matrix U, the constraints Fp𝑙q , Fp𝑒q , and the set of properties t1, … , 𝑝u. Initialize π‘ˆ Ð 0, 𝑠1 , … , 𝑠𝑑 Ð 0, π‘Ÿ β€œ empty list for 𝑗 β€œ 1, … , π‘š do p1q p𝑑q Sort by decreasing utility 𝑖𝑠1 , … , 𝑖𝑠𝑑 # We denote with πœ‹ the sorted indices for β„“ β€œ 1, … , 𝑑 do if Feasiblep𝑠1 , … , 𝑠ℓ ` 1, … , 𝑠𝑑 q then p β„“q p β„“q 𝑒 Ð π‘ˆ ` π‘ˆ pβ„“q , π‘Ÿ.appendpπœ‹π‘ β„“ q, 𝑇ℓ Ð 𝑇ℓ {tπœ‹π‘ β„“ u, Flag Ð true πœ‹π‘ β„“ 𝑗 end if end for if flag β€œβ€œ false then return Infeasible end if end for Output: a fair ranking π‘Ÿ and its utility π‘ˆ. β€’ The runtime of this greedy method is exponentially (in 𝑑) smaller than that of Theorem 1. β€’ When Ξ” β‰₯ 3, Theorem 10.2 in [1] establish that it is NP-Hard to find a solution that is within a (multiplicative) factor larger than Ξ”{ logpΞ”q. Practical considerations. Both algorithms employ a pre-processing step that prunes the item pool to retain only the top π‘š β‹… 𝑑 items. This step is justified by the Equation (5). In practice, even when the utilities are obtained from an application of the position-bias model to the absolute utilities, it is often not the case that the items are already sorted in a way that π‘ˆ satisfies Equation (5). To overcome this limitation, it is necessary to perform a sorting of the items based on their absolute utilities, a step that requires order of 𝑛 log 𝑛 time. Overall the pre-processing runs in order of 𝑛 β‹…log 𝑛`π‘š ⋅𝑑 time. In the case of Algorithm 2 this pre-processing time dominates over the ranking time. Furthermore, both methods require storage of the constraint matrices which occupy order of π‘š β‹… 𝑝 space. In addition, Algorithm 1 also requires to store the array with the π‘ˆr𝑠1 , … , 𝑠𝑑 s values, which can require up to order of π‘šπ‘‘ space. This results in a strong limitation - even disposing of time, when 𝑑 is moderately large it is not possible to run this method. Finally, we notice that typically the fairness constraints are phrased in natural language. To work within this framework, it is necessary to translate this constraint into the matrices Fp𝑙q and Fp𝑒q . This is not straightforward in general, but we will show an example in Section 4. Dataset 𝑛 𝑑 Ξ” Chess Ranking 3251 5 2 Chess Ranking S 3251 4 2 Chess Ranking R 3251 3 1 Occupation G 4494 2 1 Occupation W 4494 94 1 Table 1 Dataset parameters. 4. Numerical Experiments In this section, we perform some experiments aiming at assessing the following questions: β€’ Exemplify some typical values that 𝑑 and Ξ” may take in real-world applications. β€’ What is the actual runtime of Algorithm 1? Does it follow the theoretical complexity? β€’ How does Algorithm 2 compare with Algorithm 1 even when the hypothesis of Theorem 2 are violated? For all datasets, we consider the logarithmic position-bias model discussed in Section 2. For the sake of comparison, when setting the fairness constraints we always put Fp𝑙q β€œ 0, thus restricting the problems to be instances of the OFRU problem. For Fp𝑒q we consider the notion of fairness known as proportional representation, where the entry pπ‘˜, β„“q of the upper fairness constraint matrix is π‘˜|𝐼𝑝 |{𝑛, where 𝐼𝑝 is the subset of items that have property 𝑝. Notice that this constraint imposes that each property, in the first π‘˜ positions, cannot be represented with more items than the overall population, fractionally. For each algorithm, we measure the runtime excluding the pre-processing that is common to both methods. For Algorithm 2 we report also the competitive ratio of the computed ranking r,Μ‚ this is defined as the ratio between the utility of rΜ‚ and the optimal value OPT. This ratio takes value in r0, 1s and follows the semantic that the higher the better. Notice that, due to its optimality, for Algorithm 1 this ratio always evaluates 1. In what follows, we refer to Algorithm 1 as DP and to Algorithm 2 as Greedy. All experiments are performed on an Intel i9 2.4 GHz with 8 cores and 16 GB 2.66 MHz DDR4 of RAM. Parameters. We consider the following real-world datasets. Chess Ranking [12] consists of 3251 entries, one per player. For each player, we have the absolute utility as given by the FIDE rating, the gender (male or female), and the race (Asian, Hispanic-Latin, white). Notice that in this dataset there are no women of Hispanic-Latin ethnicity, so that 𝑑 β€œ 5 and Ξ” β€œ 2. We derive two more datasets from Chess Ranking. We consider a simplified version, that we name Chess Ranking S(implified), where each player has a gender and is either white or non-white, leading to 𝑑 β€œ 4 and Ξ” β€œ 2. Instead, we drop the gender attribute in Chess Ranking R(ace) and keep the ethnicity, leading to 𝑑 β€œ 3 and Ξ” β€œ 1. Occupation [8] consists of 4494 images of workers. These data are the results of Google queries each one reporting the top workers in a given professional category, among 94 options. The absolute utility is the position of a worker in the relative ranking. For each image, we also have the gender of the worker. We generate two datasets from Occupation. In Occupation G(ender) we drop the working category and 400 20.0 dp dp 350 m5 17.5 m4 300 15.0 250 12.5 time [sec] time [sec] 200 10.0 150 7.5 100 5.0 50 2.5 0 0.0 10 20 30 40 50 10 20 30 40 50 ranking length ranking length (a) Chess Ranking (b) Chess Ranking S 1.0 dp dp m3 0.08 m2 0.8 0.06 0.6 time [sec] 0.4 time [sec] 0.04 0.2 0.02 0.0 0.00 10 20 30 40 50 10 20 30 40 50 ranking length ranking length (c) Chess Ranking R (d) Occupation G Figure 1: Runtime of Algorithm 1 across the considered datasets. only consider the gender, leading to 𝑑 β€œ 2 types and Ξ” β€œ 1. In Occupation W(ork) we drop the gender and only the working category, leading to 𝑑 β€œ 94 and Ξ” β€œ 1. The relevant parameters of the dataset are reported in Table 1. Runtime. To evaluate the runtime of Algorithm 1 we vary the ranking length π‘š in t10, 20, 30, 40, 50u and measure it for each dataset (excluding Occupation work). For com- parison, we also report the leading term π‘šπ‘‘ in the worst-case bound of Theorem 1. This latter term is multiplied by a suitably chosen small constant, to make the quantities comparable. Results are shown in Figure 1. First, notice the qualitative differences among the plots: the actual runtime scales with a different law depending on the dataset. This is in line with the theory as the leading term in the bound takes on the expression π‘š5 , π‘š4 , π‘š3 , π‘š2 respectively on the considered datasets. Second, we notice that on at least 3 out of 4 datasets, the behavior of DP closely follows the worst-case behavior predicted by the theory. In the case of Occupation G instead, it performs significantly better. Optimization. We now fix π‘š β€œ 100 and compare DP to Greedy. Results are shown in Table 2. First, notice that Greedy is always at least 2 orders of magnitude faster than DP. In the case of Dataset Algorithm Runtime [sec] Ratio [%] DP 6343.4 1 Chess Ranking Greedy 0.0287 0.9998 DP 247.76 1 Chess Ranking S Greedy 0.01593 0.99995 DP 7.24895 1 Chess Ranking R Greedy 0.01078 1 DP 0.14752 1 Occupations G Greedy 0.00627 1 Occupation W Greedy 0.21901 1 Table 2 Experimental comparison between Algorithm 1 (DP) and Algorithm 2 (Greedy). Chess Ranking, it is 6 orders of magnitude faster. Second, even when Ξ” Δ… 1, Greedy features a competitive ratio very close to 1, meaning that it essentially finds an optimal solution. Third, in the case of Occupation W Greedy finds an optimal solution in about 0.2 seconds, while it is not even possible to run DP due to the tremendous space required to store the table for π‘ˆr𝑠1 , … , 𝑠𝑑 s: 10094 elements for this dataset. 5. Conclusion and Future Directions Ranking is a foundational problem in algorithms, with numerous applications where fair- ness constraints must be accounted for. In this paper, we present and discuss an important optimization-based framework for ranking with fairness constraints. We provide, for the first time, an explicit description of state-of-the-art algorithms and evaluate them critically on real- world datasets. By considering the critical parameters controlling the complexity of the fair ranking problem, we shed light on the limitations of the exact dynamic programming method. While it appears to be feasible in some circumstances, its runtime explodes as soon as more than 3 properties are used to define fairness. On the other hand, the greedy approach seems to offer a viable alternative even when the hypothesis that ensures its optimality is violated. Both these methods appear to feature a level of maturity that may already suffice for certain real-world applications such as candidate selection in job matching. Future directions in this area may attempt to: design optimal linear time algorithms for the case Ξ” β€œ 2; re-parameterize the problem and design novel algorithms with better dependencies on the parameters than Algorithm 1; identify additional assumptions to place to escape NP- hardness results. 6. Acknowledgments This paper was supported by the European Union’s Horizon Europe research and innovation program under grant number 101070363 - AEQUITAS. Funded by the European Union. Views and opinions expressed are however those of the authors only and do not necessarily reflect those of the European Union. Neither the European Union nor the granting authority can be held responsible for them. References [1] L. E. Celis, D. Straszak, N. K. Vishnoi, Ranking with fairness constraints, in: I. Chatzi- giannakis, C. Kaklamanis, D. Marx, D. Sannella (Eds.), 45th International Colloquium on Automata, Languages, and Programming, ICALP 2018, July 9-13, 2018, Prague, Czech Republic, volume 107 of LIPIcs, Schloss Dagstuhl - Leibniz-Zentrum fΓΌr Informatik, 2018, pp. 28:1–28:15. URL: https://doi.org/10.4230/LIPIcs.ICALP.2018.28. doi:10.4230/LIPICS. ICALP.2018.28 . [2] M. Zehlike, K. Yang, J. Stoyanovich, Fairness in ranking, part I: score-based ranking, ACM Comput. Surv. 55 (2023) 118:1–118:36. URL: https://doi.org/10.1145/3533379. doi:10.1145/ 3533379 . [3] M. Zehlike, K. Yang, J. Stoyanovich, Fairness in ranking, part II: learning-to-rank and recommender systems, ACM Comput. Surv. 55 (2023) 117:1–117:41. URL: https://doi.org/ 10.1145/3533380. doi:10.1145/3533380 . [4] M. Zehlike, F. Bonchi, C. Castillo, S. Hajian, M. Megahed, R. Baeza-Yates, Fa* ir: A fair top-k ranking algorithm, in: Proceedings of the 2017 ACM on Conference on Information and Knowledge Management, 2017, pp. 1569–1578. [5] A. Singh, T. Joachims, Fairness of exposure in rankings, in: Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery & data mining, 2018, pp. 2219–2228. [6] M. Zehlike, C. Castillo, Reducing disparate exposure in ranking: A learning to rank approach, in: Proceedings of the web conference 2020, 2020, pp. 2849–2855. [7] M. Zehlike, P. Hacker, E. Wiedemann, Matching code and law: achieving algorithmic fairness with optimal transport, Data Mining and Knowledge Discovery 34 (2020) 163–200. [8] L. E. Celis, V. Keswani, Implicit diversity in image summarization, Proceedings of the ACM on Human-Computer Interaction 4 (2020) 1–28. [9] N. Boehmer, L. E. Celis, L. Huang, A. Mehrotra, N. K. Vishnoi, Subset selection based on multiple rankings in the presence of bias: Effectiveness of fairness constraints for multiwinner voting score functions, in: Proceedings of the 40th International Conference on Machine Learning, 2023, pp. 2641–2688. [10] F. Grandoni, R. Ravi, M. Singh, R. Zenklusen, New approaches to multi-objective optimiza- tion, Mathematical Programming 146 (2014) 525–554. [11] A. Srinivasan, Improved approximations of packing and covering problems, in: Proceedings of the twenty-seventh annual ACM symposium on Theory of computing, 1995, pp. 268–276. [12] A. Ghosh, R. Dutt, C. Wilson, When fair ranking meets uncertain inference, in: Proceed- ings of the 44th international ACM SIGIR conference on research and development in information retrieval, 2021, pp. 1033–1043.