Optimal Fair Ranking: Real Performances and Critical
                                Parameters
                                Daniela A. Parletta1,∗ , Fabio Napoli1
                                1
                                    Akkodis, Technology Innovation Hub, Artificial Intelligence Unit


                                               Abstract
                                               Ranking is the problem of retrieving the most relevant 𝑚 items for a query in a pool of 𝑛 items. Since
                                               ranking algorithms are extensively used in socially sensitive applications (e.g., candidate selections and
                                               search engines), it becomes critical to take into account an adequate notion of fairness. In this paper, we
                                               consider a flexible optimization-based fair ranking framework and we analyze optimal state-of-the-art
                                               algorithms on real-world data. We identify the merits as well as bottlenecks of existing methods and
                                               point out directions for future research.

                                               Keywords
                                               Ranking, Fairness, Optimization


                                1. Introduction
                                In this paper, we are concerned with the following fundamental problem. We are given a pool
                                of 𝑛 items, and upon receiving a query, we have to retrieve the best 𝑚 ≤ 𝑛 of them. This problem
                                has numerous applications in Search Engines (e.g., Google), Recommendation Systems (YouTube
                                and Netflix), Targeted advertisement (Amazon and eBay), Social Networks (Facebook, Instagram,
                                or Twitter), Matching Systems (e.g., Tinder), and so on. Given a query associated with each
                                item, there is a notion of relevance or utility that depends on that query. For example, if the
                                query is to present the top 10 movies on Netflix based on Bob’s tastes, then each movie’s utility
                                will be an aggregation (based on Bob’s preferences) of the genre components of that movie. In
                                this work, we assume that a query is given and all the utilities are already determined.
                                   Ranking is also employed in settings where social/demographic/sexual discrimination may
                                happen. This is the case of HR departments performing recruiting campaigns and ranking
                                the best candidates for the open positions. A second example is ranking the best students
                                to be admitted to the college. In all these settings, there may be bias toward some groups of
                                individuals (e.g., Black-African people or female individuals). This bias may be unwillingly
                                injected into an automated ranking procedure, for example, relaying on sensitive attributes
                                when computing the utility of an item. The approach taken in this work to enforce fairness
                                is to directly account for it in the formalization of ranking problems that lead to a notion of
                                fairness by design.
                                   In this paper, we focus on the fair ranking framework proposed in [1], and provide a critical
                                evaluation of state-of-the-art methods. Our main goal is to evaluate the maturity of existing
                                AIMMES ’24: Workshop on AI bias: Measurements, Mitigation, Explanation Strategies, March 20, 2024, Amsterdam, NL
                                ∗
                                    Corresponding author.
                                Envelope-Open daniela-angela.parletta@akkodis.com (D. A. Parletta); fabio.napoli@akkodis.com (F. Napoli)
                                             © 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).


CEUR
                  ceur-ws.org
Workshop      ISSN 1613-0073
Proceedings
methods for concrete applications. As a byproduct of our evaluation, we also point toward
future research directions. In particular, we provide the following contributions:

    • An explicit description of two state-of-the-art methods appeared in past literature only
      implicitly and in the proofs of runtime bounds.
    • A discussion on the benefits of these methods as well as their drawbacks.
    • A numerical evaluation of real-world datasets aimed at assessing the maturity of these
      solutions for concrete deployment into real-world applications. To the best of our knowl-
      edge, this is the first numerical evaluation of these methods.

Related work. Ranking with fairness constraints has recently received much attention from
the research community, and a complete account is given in a recent survey [2, 3]. One can
distinguish at least two different approaches to enforce fairness constraints in a ranking: post-
hoc adjustments (e.g., [4, 5, 6, 7]) and fairness by design (e.g., [1, 8, 9]). The former, aim takes as
input a (possibly unfair) ranking, produced by an algorithm, and applies the minimum amount
of modifications to satisfy the fairness constraints. The latter, instead, are methods designed
to provide output rankings that already satisfy the fairness constraints. We notice that while
post-processing adjustments approaches can be applied to any ranking algorithm, even one that
is already developed and deployed, to enforce fairness. On the other hand, fairness by design
typically allows one to obtain rankings of superior quality.
   In [1] authors propose an optimization framework for fair ranking problems and develop
several polynomial time algorithms for it. This model has been generalized in [8] to account for
noisy sensitive attributes. In [9] authors further extend the framework to aggregate multiple
rankings optimally. We notice that the specific problem formulation described in this paper may
be in principled approaches with the algorithms proposed in [10, 11]. However, the method
proposed in [10] features an exponential dependence on the number of constraints, which in
the case of fair ranking may
                           ? be very large. On the other hand, the algorithms in [11] are only
guaranteed to satisfy an 𝑚-additive perturbation of the constraints, which is unsatisfactory
for even moderately large 𝑚.

Structure of the paper. The remaining of this paper is structured as follows. Section 2
introduces the problem setting. Section 3 details the considered methods and Section 4 present
the numerical evaluation. Finally, Section 5 draws some conclusions and sketches future
directions for research.


2. Problem Setting
Notation. ℕ, ℤ, ℝ denote the set of natural, integer, and real numbers. We define ℕ0 “ ℕ∪t0u,
ℝ` ≔ ℝ ∩ r0, ∞q and ℤ` ≔ ℤ ∩ ℕ0 . For every 𝑛 ∈ ℕ, r𝑛s will denote the set t1, 2, … , 𝑛u. Given
a set 𝐴 we denote its cardinality by |𝐴|. Given 𝑚, 𝑛 ∈ ℕ and a set 𝐴, 𝐴𝑚×𝑛 denotes the set of 𝑚
by 𝑛 matrices with entries in 𝐴. We will denote vectors with lowercase bold letters and matrices
with uppercase bold letters. Given a vector a we denote its 𝑖-th entry with 𝑎𝑖 . The zero norm of
a vector a is defined as }a}0 ≔ |t𝑖 ∶ 𝑎𝑖 ‰ 0u|. Given a matrix A we denote its p𝑖, 𝑗q-th entry with
𝐴𝑖𝑗 . Given a logical predicate 𝑝, we denote its indicator function by 𝕀p𝑝q, notice that 𝕀p𝑝q “ 1
when 𝑝 is true and 0 otherwise.

Setting. Given 𝑚, 𝑛 ∈ ℕ, the ranking problem consists informally of retrieving (and sorting)
the 𝑚 items in a pool of 𝑛 upon receiving a query. The utility of an item depends on the specific
query and maybe the level of the required skill in a job offer, or the h-index in a Google Scholar
search. In this work, we assume that the query is given and fixed, so that we collect the utility of
placing a item 𝑖 ∈ r𝑛s in position 𝑗 ∈ r𝑚s in a utility matrix U ∈ 𝑅`𝑚×𝑛 . We stress the importance

of accounting for the position of a item in a given ranking: in a real-world setting, the perceived
utility is inversely proportional to the item’s position in the ranking, a phenomenon known
as position bias. A ranking is a vector r ∈ r𝑛s𝑚 s.t. 𝑟𝑗 “ 𝑘 if item 𝑘 is placed in position 𝑗,
and 𝑟𝑖 ‰ 𝑟𝑗 for each 𝑖 ‰ 𝑗. For 𝑘 ∈ r𝑚s the 𝑘-prefix of a ranking r is the vector r𝑘 “ p𝑟1 , … , 𝑟𝑘 q.
We introduce the assignment matrix of a ranking A ∈ t0, 1u𝑚×𝑛 where i) 𝐴𝑖𝑗 “ 1 if item 𝑖 is
                                              𝑚                                 𝑛
assigned to position 𝑗 in the ranking, ii) ∑𝑗“1 𝐴𝑗𝑖 ≤ 1 for each 𝑖 ∈ r𝑛s and ∑𝑖“1 𝐴𝑗𝑖 “ 1 for each
𝑗 ∈ r𝑚s. We denote the set of all such matrices by 𝒜. Notice that there is a bijection between the
set of the rankings and the set of the assignment matrices, thus we can speak about a ranking
or its assignment matrix interchangeably. The optimal ranking problem is defined as
                                               𝑚   𝑛
                                          max ∑ ∑ 𝐴𝑗𝑖 𝑈𝑗𝑖 .                                       (1)
                                          A∈𝒜 𝑗“1 𝑖“1

As the focus of this paper is on fair rankings, we consider a more constrained version of
Equation (1). In particular, we assume that there is a set of 𝑝 ∈ ℕ properties w.r.t. achieve
fairness and denote with 𝑃ℓ ⊆ r𝑛s the set of items having the property ℓ ∈ r𝑝s: examples of
properties are the gender, the ethnicity or the age. Fairness may be expressed via lower and
upper bounds on the number of items featuring a given properties in each prefix of the ranking.
For example, we may require that in each even prefix of the ranking, there should be an equal
                                                                  𝑚×𝑝
number of men and women. Formally, the matrices Fp𝑙q , Fp𝑢q ∈ ℤ` specifies the minimum
 p𝑙 q                              p𝑢 q
F𝑘ℓ and the maximum number F𝑘ℓ of items having the property ℓ in the 𝑘-prefix of a feasible
ranking, that is
                                 𝑘
                           p𝑙q             p𝑢 q
                          𝐹𝑘ℓ ≤ ∑ ∑ 𝐴𝑗𝑖 ≤ 𝐹𝑘ℓ ,         ∀𝑘 ∈ r𝑚s, ∀𝑙 ∈ r𝑝s .                      (2)
                                𝑗“1 𝑖∈𝑃ℓ
These matrices are called lower fairness constraint matrix and upper fairness constraint
matrix respectively. The Optimal Fair Ranking (OFR) problem is thus defined as
                               𝑚   𝑛
                       max    ∑ ∑ 𝐴𝑗𝑖 𝑈𝑗𝑖
                       A∈𝒜    𝑗“1 𝑖“1
                                     𝑘                                                            (3)
                               p𝑙q             p𝑢q
                       s.t.   𝐹𝑘ℓ ≤ ∑ ∑ 𝐴𝑗𝑖 ≤ 𝐹𝑘ℓ ,        ∀𝑘 ∈ r𝑚s, ∀𝑙 ∈ r𝑝s .
                                    𝑗“1 𝑖∈𝑃ℓ

We notice that this problem has binary 𝑚 ⋅ 𝑛 decision variables and 𝑚 ⋅ p𝑝 ` 𝑛q constraints. The
optimal value of the above problem will be denoted with OPT.
  To set up a bottom line, notice that even just checking the feasibility of Equation (3) is
NP-Hard in the general case (see Theorem 10.1, 10.3 and 10.4 in [1]). Moreover, in the general
case Equation (3) is APX-Hard, and it thus not not admit a polynomial-time approximation
scheme (see Theorem 10.2 in [1]).

Utility and constraints models. In what follows we list a few popular models for the
positions bias. We denote with 𝑢𝑖 the absolute utility of the item 𝑖. We consider the following
models for the utility 𝑈𝑖𝑗 of placing 𝑖 ∈ r𝑛s in position 𝑗 ∈ r𝑚s
                                    ´1                             𝑗´1 ,
                            𝑢looooooomooooooon
                              𝑖 log2 p1 ` 𝑗q,         𝑢loooooomoooooon
                                                        𝑖 𝑝p1 ´ 𝑝q                𝑢loooomoooon
                                                                                    𝑖 𝕀p𝑗 “ 1q     (4)
                                𝑙𝑜𝑔𝑎𝑟𝑖𝑡ℎ𝑚𝑖𝑐               𝑔𝑒𝑜𝑚𝑒𝑡𝑟𝑖𝑐                 𝑠𝑖𝑛𝑔𝑢𝑙𝑎𝑟

Notice that while this specific bias is uniform across the items in the pool, the general utility
model presented above does not need to be. Further, notice that the geometric model is
parametrized by a discount factor 𝑝 ∈ r0, 1s controlling how quickly the utility of an item decays
w.r.t. its position in the ranking.
  We will assume the following conditions on the utility U

                      𝑈𝑖1 𝑗1 ≥ 𝑈𝑖2 𝑗1 ,   𝑈𝑖1 𝑗1 ≥ 𝑈𝑖1 𝑗2 ,   𝑈𝑖1 𝑗1 ` 𝑈𝑖2 𝑗2 ≥ 𝑈𝑖1 𝑗2 ` 𝑈𝑖2 𝑗1    (5)

where the two leftmost inequality are called monotonicity and the rightmost is known as Monge
condition. We notice that is the users are sorted in descending order of absolute utilities, then
any of the above models for the position bias satisfies the Equation (5).
   As for the constraints, they are usually specified in terms of a maximum number of items
belonging to a certain category that can appear in a given prefix. This can be done by setting
Fp𝑙q to the zero matrix and only specifying Fp𝑢q . We name this problem Optimal Fair Ranking
with Upper constraints (OFRU).


3. Algorithms
In this section, we describe two algorithms that solve the OFR problem. Each method features
optimality and runtime guarantees that depend on certain parameters, making them better
suited under different circumstances. We notice that these methods first appeared in [1], but
only implicitly with the proofs of the main theorems. We notice that this is the first explicit
description of such algorithms.

Dynamic programming. We define the type of an item 𝑖 as a binary vector tp𝑖q ∈ t0, 1u𝑝
such that 𝑡ℓ p𝑖q “ 1 if and only if 𝑖 has the property ℓ ∈ r𝑝s. Let 𝑇 ≔ ttp𝑖q ∶ 𝑖 ∈ r𝑛su the set of
different types in the data, and let 𝑡 ≔ |𝑇|. We let the elements of 𝑇 be 𝑣 p1q , … , 𝑣 p𝑡q and define
𝑇ℓ ≔ t𝑖 ∈ r𝑚s ∶ tp𝑖q “ 𝑣 pℓq u - the set of the first 𝑚 items of type ℓ - for each ℓ ∈ r𝑡s. We denote
                                                               p ℓq        p ℓq
with 𝑡ℓ the cardinality of 𝑇ℓ and its elements with 𝑖1 , … , 𝑖𝑡ℓ .
    It is convenient to perform a pre-processing step that only retains the first (at most) 𝑚 items
(i.e., the items with the smallest indices) for each type. In light of Equation (3), among these (at
Algorithm 1 Dynamic programming for OFR
Input: the sets 𝑇1 , … , 𝑇𝑡 , the utility matrix U, the constraints Fp𝑙q , Fp𝑢q , and the set of properties
t1, … , 𝑝u.
  Initialize 𝑈r0, … , 0s Ð 0, 𝑠1 , … , 𝑠𝑡 Ð 0, 𝑟 “ empty list
  while 𝑠1 ` ⋯ ` 𝑠𝑡 ă 𝑚 do
       u Ð p0, … , 0q ∈ ℕ𝑡0
       for ℓ “ 1, … , 𝑡 do
            if Feasiblep𝑠1 , … , 𝑠ℓ ` 1, … , 𝑠𝑡 q then
                𝑢ℓ Ð 𝑈r𝑠1 , … , 𝑠𝑡 s ` 𝑈 pℓq
                                          𝑖𝑠ℓ 𝑗
          else
              𝑢ℓ Ð ´∞
          end if
       end for
       Get the index ℓ˚ and the value 𝑢 ˚ of maxℓ∈r𝑡s 𝑢ℓ
                                                                             p ℓq
    Increment 𝑠ℓ˚ Ð 𝑠ℓ˚ ` 1, 𝑈r𝑠1 , … , 𝑠𝑡 s Ð 𝑢 ˚ , 𝑟.appendp𝑖𝑠ℓ˚ q
 end while
Output: the optimal ranking 𝑟, the optimal utility 𝑈r𝑠1 , … , 𝑠ℓ s.


most) 𝑚 ⋅ 𝑡 items there will be the optimal ranking. Notice step can be done in order of 𝑚 ⋅ 𝑡 time
and produces the sets 𝑇1 , … , 𝑇𝑡 .
   The idea of the dynamic programming approach is the following. We can partition the set of
all possible rankings of a certain length 𝑘 ≤ 𝑚 according to the number of items of each type
they have. That is, for 𝑠1 , … , 𝑠𝑡 ∈ ℕ0 s.t. 𝑘 “ 𝑠1 ` ⋯ ` 𝑠𝑡 we denote with p𝑠1 , … , 𝑠𝑡 q the set of all
rankings of length 𝑘 made of 𝑠𝑗 items of type vp𝑗q for each 𝑗 ∈ r𝑡s. Furthermore, with 𝑈r𝑠1 , … , 𝑠𝑡 s
we denote the value of the feasible rankings of the highest value in p𝑠1 , … , 𝑠𝑡 q. Thus, finding
the optimal solution to an OFR problem corresponds to identifying a feasible ranking of utility
𝑈r𝑠1 , … , 𝑠𝑡 s “ OPT, when 𝑠1 ` ⋯ ` 𝑠𝑡 “ 𝑚. Checking feasibility of the rankings in p𝑠1 , … , 𝑠𝑡 q is
simple: it is enough to check that
                                                   𝑡
                                    p𝑙q                   pℎq      p𝑢 q
                                  𝐹𝑘ℓ ≤ ∑ 𝑠ℎ 𝑣ℓ                 ≤ 𝐹𝑘ℓ ,   ∀ℓ ∈ r𝑝s,
                                             ℎ“1

which takes order of 𝑡 ⋅ 𝑝 time.
  We start initializing 𝑈r0, … , 0s “ 0 and notice that by Equation (5), it either holds that

                          𝑈r𝑠1 , … , 𝑠𝑡 s “ max 𝑈r𝑠1 , … , 𝑠ℓ ´ 1, … , 𝑠𝑡 s ` 𝑈 pℓq              ,
                                                  ℓ∈r𝑡s                               𝑖𝑠ℓ ´1 𝑗

or that p𝑠1 , … , 𝑠𝑡 q is infeasible in which case we set 𝑈r𝑠1 , … , 𝑠𝑡 s “ ´∞. As a result, we can build
the optimal solution with a bottom-up approach: we start from p0, … , 0q and we build the
ranking by adding an item of maximal utility that keeps the constraints satisfied at each step.
As we have seen, checking the feasibility takes the order of 𝑡 ⋅ 𝑝 and there are at most order of
𝑚𝑡 rankings that needed to be considered (all the ways to chose 𝑡 natural numbers so that they
sum to 𝑚). The overall algorithm is described in Algorithm 1. Below we report the theorem
establishing its theoretical performances.
Theorem 1 (Theorem 3.1 in [1]). Algorithm 1 finds an optimal solution to the OFR problem, if
there is one, in order of 𝑝𝑡𝑚𝑡 time. Otherwise, it return 𝑈r𝑠1 , … , 𝑠𝑡 s “ ´∞. The pre-processing runs
in order of 𝑚 ⋅ 𝑡 time.
Remark 1 (On the complexity of Algorithm 1). We make the following comments:
    • We notice that an exhaustive search among all possible rankings of length 𝑚 in a pool of
      𝑚 ⋅ 𝑡 takes the order of 𝑡 𝑚 time. On the other hand, the proposed method takes the order
      of 𝑚𝑡 time, which is better: in real-world applications, 𝑡 should be thought of as a constant
      determined by the fairness constraints, while 𝑚 is a parameter that can be as large as 𝑛.
    • The practicality of Algorithm 1 depends on the value of 𝑡. We notice that 𝑡 ≥ 𝑝, implies
      that when fairness is defined using more than 2-3 properties, the algorithm becomes
      unpractical.

Greedy algorithm. To overcome the limitations discussed in Remark 1, we consider an
alternative greedy approach.
    The algorithm is simple. Similarly to Algorithm 1, it keeps the items grouped into the set
𝑇1 , … , 𝑇𝑡 . At each step 𝑗, the algorithm greedily takes the item with the highest utility - let v𝑘p𝑗q
its type - and adds it to the ranking if this addition does not violate the constraints. Otherwise,
it seeks the next item among the types tvp1q , … , vp𝑡q u{tv𝑘p𝑗q u. If no feasible item can be found,
the problem is declared to be infeasible. Since there are 𝑚 positions to fill and each item can be
checked for feasibility in order of 𝑝 time, the runtime of this algorithm is the order of 𝑚 ⋅ 𝑝. The
algorithm is also described in Algorithm 2.
    Despite its simplicity, this algorithm provably finds an optimal solution in the following class
of problems. Let define
                                              Δ ≔ max }𝑣}0                                           (6)
                                                  𝑣∈𝑇
that is, Δ is the maximum number of properties each type of item can exhibit. Then the following
holds.
Theorem 2 (Theorem 3.2 in [1]). Algorithm 2 finds an optimal solution to any OFRU problem as
long as Δ “ 1. For those problems, if the algorithm returns infeasible, then the problem does not
admit a solution. It runs in order of 𝑚 ⋅ 𝑝 time. The pre-processing runs in order of 𝑚 ⋅ 𝑡 time.
Remark 2 (On the complexity of Algorithm 2). We make the following comments:
    • Theorem 2 guarantees optimality of Algorithm 2 only when the problem is an instance of
      the OFRU problem. This assumption is in contrast to Algorithm 1 that instead can solve
      the more general OFR problem. Nevertheless, OFRU models many natural settings, since
      fairness is usually expressed only by limiting the number of items in the unprotected
      classes (thus only specifying the matrix Fp𝑢q ).
    • The assumption Δ “ 1 is satisfied in many practical settings, including the important case
      of mutually exclusive properties (e.g. male vs. female, White vs Black vs Asian, protected
      workers categories). For the sake of comparison, notice that in this setting, 𝑡 “ 𝑝 and
      when 𝑝 is even moderately large (e.g. 𝑡 ∈ t4, 5u) Algorithm 1 is unpractical.
Algorithm 2 Greedy Algorithm for OFR
Input: the sets 𝑇1 , … , 𝑇𝑡 , the utility matrix U, the constraints Fp𝑙q , Fp𝑢q , and the set of properties
t1, … , 𝑝u.
  Initialize 𝑈 Ð 0, 𝑠1 , … , 𝑠𝑡 Ð 0, 𝑟 “ empty list
  for 𝑗 “ 1, … , 𝑚 do
                                       p1q       p𝑡q
       Sort by decreasing utility 𝑖𝑠1 , … , 𝑖𝑠𝑡 # We denote with 𝜋 the sorted indices
       for ℓ “ 1, … , 𝑡 do
            if Feasiblep𝑠1 , … , 𝑠ℓ ` 1, … , 𝑠𝑡 q then
                                                p ℓq                 p ℓq
               𝑢 Ð 𝑈 ` 𝑈 pℓq ,     𝑟.appendp𝜋𝑠ℓ q,      𝑇ℓ Ð 𝑇ℓ {t𝜋𝑠ℓ u,     Flag Ð true
                           𝜋𝑠ℓ 𝑗
        end if
    end for
    if flag ““ false then return Infeasible
    end if
 end for
Output: a fair ranking 𝑟 and its utility 𝑈.


    • The runtime of this greedy method is exponentially (in 𝑡) smaller than that of Theorem 1.
    • When Δ ≥ 3, Theorem 10.2 in [1] establish that it is NP-Hard to find a solution that is
      within a (multiplicative) factor larger than Δ{ logpΔq.

Practical considerations. Both algorithms employ a pre-processing step that prunes the
item pool to retain only the top 𝑚 ⋅ 𝑡 items. This step is justified by the Equation (5). In practice,
even when the utilities are obtained from an application of the position-bias model to the
absolute utilities, it is often not the case that the items are already sorted in a way that 𝑈 satisfies
Equation (5). To overcome this limitation, it is necessary to perform a sorting of the items based
on their absolute utilities, a step that requires order of 𝑛 log 𝑛 time. Overall the pre-processing
runs in order of 𝑛 ⋅log 𝑛`𝑚 ⋅𝑡 time. In the case of Algorithm 2 this pre-processing time dominates
over the ranking time.
   Furthermore, both methods require storage of the constraint matrices which occupy order of
𝑚 ⋅ 𝑝 space. In addition, Algorithm 1 also requires to store the array with the 𝑈r𝑠1 , … , 𝑠𝑡 s values,
which can require up to order of 𝑚𝑡 space. This results in a strong limitation - even disposing of
time, when 𝑡 is moderately large it is not possible to run this method.
   Finally, we notice that typically the fairness constraints are phrased in natural language. To
work within this framework, it is necessary to translate this constraint into the matrices Fp𝑙q
and Fp𝑢q . This is not straightforward in general, but we will show an example in Section 4.
                                    Dataset            𝑛     𝑡   Δ
                                  Chess Ranking      3251   5    2
                                 Chess Ranking S     3251   4    2
                                 Chess Ranking R     3251   3    1
                                  Occupation G       4494   2    1
                                  Occupation W       4494   94   1
Table 1
Dataset parameters.


4. Numerical Experiments
In this section, we perform some experiments aiming at assessing the following questions:

    • Exemplify some typical values that 𝑡 and Δ may take in real-world applications.
    • What is the actual runtime of Algorithm 1? Does it follow the theoretical complexity?
    • How does Algorithm 2 compare with Algorithm 1 even when the hypothesis of Theorem 2
      are violated?

   For all datasets, we consider the logarithmic position-bias model discussed in Section 2. For
the sake of comparison, when setting the fairness constraints we always put Fp𝑙q “ 0, thus
restricting the problems to be instances of the OFRU problem. For Fp𝑢q we consider the notion
of fairness known as proportional representation, where the entry p𝑘, ℓq of the upper fairness
constraint matrix is 𝑘|𝐼𝑝 |{𝑛, where 𝐼𝑝 is the subset of items that have property 𝑝. Notice that this
constraint imposes that each property, in the first 𝑘 positions, cannot be represented with more
items than the overall population, fractionally. For each algorithm, we measure the runtime
excluding the pre-processing that is common to both methods. For Algorithm 2 we report also
the competitive ratio of the computed ranking r,̂ this is defined as the ratio between the utility
of r̂ and the optimal value OPT. This ratio takes value in r0, 1s and follows the semantic that the
higher the better. Notice that, due to its optimality, for Algorithm 1 this ratio always evaluates 1.
In what follows, we refer to Algorithm 1 as DP and to Algorithm 2 as Greedy. All experiments
are performed on an Intel i9 2.4 GHz with 8 cores and 16 GB 2.66 MHz DDR4 of RAM.

Parameters. We consider the following real-world datasets. Chess Ranking [12] consists of
3251 entries, one per player. For each player, we have the absolute utility as given by the FIDE
rating, the gender (male or female), and the race (Asian, Hispanic-Latin, white). Notice that
in this dataset there are no women of Hispanic-Latin ethnicity, so that 𝑡 “ 5 and Δ “ 2. We
derive two more datasets from Chess Ranking. We consider a simplified version, that we name
Chess Ranking S(implified), where each player has a gender and is either white or non-white,
leading to 𝑡 “ 4 and Δ “ 2. Instead, we drop the gender attribute in Chess Ranking R(ace)
and keep the ethnicity, leading to 𝑡 “ 3 and Δ “ 1. Occupation [8] consists of 4494 images of
workers. These data are the results of Google queries each one reporting the top workers in a
given professional category, among 94 options. The absolute utility is the position of a worker
in the relative ranking. For each image, we also have the gender of the worker. We generate
two datasets from Occupation. In Occupation G(ender) we drop the working category and
             400                                                            20.0
                         dp                                                             dp
             350         m5                                                 17.5        m4
             300                                                            15.0
             250                                                            12.5
time [sec]


                                                               time [sec]
             200                                                            10.0
             150                                                             7.5
             100                                                             5.0
              50                                                             2.5
               0                                                             0.0
                    10         20         30         40   50                       10         20         30         40   50
                                    ranking length                                                 ranking length
                              (a) Chess Ranking                                              (b) Chess Ranking S

              1.0        dp                                                             dp
                         m3                                                 0.08        m2
              0.8
                                                                            0.06
              0.6
 time [sec]


              0.4                                              time [sec]   0.04

              0.2                                                           0.02

              0.0                                                           0.00
                    10         20         30         40   50                       10         20         30         40   50
                                    ranking length                                                 ranking length
                              (c) Chess Ranking R                                             (d) Occupation G

Figure 1: Runtime of Algorithm 1 across the considered datasets.


only consider the gender, leading to 𝑡 “ 2 types and Δ “ 1. In Occupation W(ork) we drop the
gender and only the working category, leading to 𝑡 “ 94 and Δ “ 1. The relevant parameters of
the dataset are reported in Table 1.

Runtime. To evaluate the runtime of Algorithm 1 we vary the ranking length 𝑚 in
t10, 20, 30, 40, 50u and measure it for each dataset (excluding Occupation work). For com-
parison, we also report the leading term 𝑚𝑡 in the worst-case bound of Theorem 1. This latter
term is multiplied by a suitably chosen small constant, to make the quantities comparable.
Results are shown in Figure 1. First, notice the qualitative differences among the plots: the
actual runtime scales with a different law depending on the dataset. This is in line with the
theory as the leading term in the bound takes on the expression 𝑚5 , 𝑚4 , 𝑚3 , 𝑚2 respectively on
the considered datasets. Second, we notice that on at least 3 out of 4 datasets, the behavior of
DP closely follows the worst-case behavior predicted by the theory. In the case of Occupation
G instead, it performs significantly better.

Optimization. We now fix 𝑚 “ 100 and compare DP to Greedy. Results are shown in Table 2.
First, notice that Greedy is always at least 2 orders of magnitude faster than DP. In the case of
                       Dataset          Algorithm     Runtime [sec]    Ratio [%]
                                           DP            6343.4             1
                  Chess Ranking
                                         Greedy          0.0287          0.9998
                                           DP            247.76             1
                  Chess Ranking S
                                         Greedy          0.01593        0.99995
                                           DP            7.24895            1
                  Chess Ranking R
                                         Greedy          0.01078            1
                                           DP            0.14752            1
                    Occupations G
                                         Greedy          0.00627            1
                    Occupation W         Greedy          0.21901            1
Table 2
Experimental comparison between Algorithm 1 (DP) and Algorithm 2 (Greedy).


Chess Ranking, it is 6 orders of magnitude faster. Second, even when Δ ą 1, Greedy features a
competitive ratio very close to 1, meaning that it essentially finds an optimal solution. Third, in
the case of Occupation W Greedy finds an optimal solution in about 0.2 seconds, while it is not
even possible to run DP due to the tremendous space required to store the table for 𝑈r𝑠1 , … , 𝑠𝑡 s:
10094 elements for this dataset.


5. Conclusion and Future Directions
Ranking is a foundational problem in algorithms, with numerous applications where fair-
ness constraints must be accounted for. In this paper, we present and discuss an important
optimization-based framework for ranking with fairness constraints. We provide, for the first
time, an explicit description of state-of-the-art algorithms and evaluate them critically on real-
world datasets. By considering the critical parameters controlling the complexity of the fair
ranking problem, we shed light on the limitations of the exact dynamic programming method.
While it appears to be feasible in some circumstances, its runtime explodes as soon as more
than 3 properties are used to define fairness. On the other hand, the greedy approach seems
to offer a viable alternative even when the hypothesis that ensures its optimality is violated.
Both these methods appear to feature a level of maturity that may already suffice for certain
real-world applications such as candidate selection in job matching.
   Future directions in this area may attempt to: design optimal linear time algorithms for the
case Δ “ 2; re-parameterize the problem and design novel algorithms with better dependencies
on the parameters than Algorithm 1; identify additional assumptions to place to escape NP-
hardness results.


6. Acknowledgments
This paper was supported by the European Union’s Horizon Europe research and innovation
program under grant number 101070363 - AEQUITAS. Funded by the European Union. Views
and opinions expressed are however those of the authors only and do not necessarily reflect
those of the European Union. Neither the European Union nor the granting authority can be
held responsible for them.


References
 [1] L. E. Celis, D. Straszak, N. K. Vishnoi, Ranking with fairness constraints, in: I. Chatzi-
     giannakis, C. Kaklamanis, D. Marx, D. Sannella (Eds.), 45th International Colloquium on
     Automata, Languages, and Programming, ICALP 2018, July 9-13, 2018, Prague, Czech
     Republic, volume 107 of LIPIcs, Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2018,
     pp. 28:1–28:15. URL: https://doi.org/10.4230/LIPIcs.ICALP.2018.28. doi:10.4230/LIPICS.
     ICALP.2018.28 .
 [2] M. Zehlike, K. Yang, J. Stoyanovich, Fairness in ranking, part I: score-based ranking, ACM
     Comput. Surv. 55 (2023) 118:1–118:36. URL: https://doi.org/10.1145/3533379. doi:10.1145/
     3533379 .
 [3] M. Zehlike, K. Yang, J. Stoyanovich, Fairness in ranking, part II: learning-to-rank and
     recommender systems, ACM Comput. Surv. 55 (2023) 117:1–117:41. URL: https://doi.org/
     10.1145/3533380. doi:10.1145/3533380 .
 [4] M. Zehlike, F. Bonchi, C. Castillo, S. Hajian, M. Megahed, R. Baeza-Yates, Fa* ir: A fair
     top-k ranking algorithm, in: Proceedings of the 2017 ACM on Conference on Information
     and Knowledge Management, 2017, pp. 1569–1578.
 [5] A. Singh, T. Joachims, Fairness of exposure in rankings, in: Proceedings of the 24th
     ACM SIGKDD international conference on knowledge discovery & data mining, 2018, pp.
     2219–2228.
 [6] M. Zehlike, C. Castillo, Reducing disparate exposure in ranking: A learning to rank
     approach, in: Proceedings of the web conference 2020, 2020, pp. 2849–2855.
 [7] M. Zehlike, P. Hacker, E. Wiedemann, Matching code and law: achieving algorithmic
     fairness with optimal transport, Data Mining and Knowledge Discovery 34 (2020) 163–200.
 [8] L. E. Celis, V. Keswani, Implicit diversity in image summarization, Proceedings of the
     ACM on Human-Computer Interaction 4 (2020) 1–28.
 [9] N. Boehmer, L. E. Celis, L. Huang, A. Mehrotra, N. K. Vishnoi, Subset selection based
     on multiple rankings in the presence of bias: Effectiveness of fairness constraints for
     multiwinner voting score functions, in: Proceedings of the 40th International Conference
     on Machine Learning, 2023, pp. 2641–2688.
[10] F. Grandoni, R. Ravi, M. Singh, R. Zenklusen, New approaches to multi-objective optimiza-
     tion, Mathematical Programming 146 (2014) 525–554.
[11] A. Srinivasan, Improved approximations of packing and covering problems, in: Proceedings
     of the twenty-seventh annual ACM symposium on Theory of computing, 1995, pp. 268–276.
[12] A. Ghosh, R. Dutt, C. Wilson, When fair ranking meets uncertain inference, in: Proceed-
     ings of the 44th international ACM SIGIR conference on research and development in
     information retrieval, 2021, pp. 1033–1043.