A Common Approach for Consumer and Provider Fairness in Recommendations Dimitris Sacharidis Kyriakos Mouratidis E-Commerce Research Division School of Information Systems TU Wien, Austria Singapore Management University, Singapore dimitris@ec.tuwien.ac.at kyriakos@smu.edu.sg Dimitrios Kleftogiannis Genome Institute of Singapore A*STAR Research, Singapore Dimitrios_Kleftogiannis@gis.a-star.edu.sg ABSTRACT We present a common approach for handling consumer and provider fairness in recommendations. Our solution requires defining two key components, a classification of items and a target distribution, which together define the case of perfect fairness. This formulation allows distinct fairness concepts to be specified in a common framework. We further propose a novel reranking algorithm that optimizes for a desired trade-off between utility and fairness of a recommendation list. KEYWORDS fairness; recommender systems INTRODUCTION Fairness in algorithmic decisions means that the system should not discriminate against individuals. For recommender systems [2], fairness may concern either the consumers that receive recommen- dations (C-fairness) — as in [5, 8, 12] — or the producers/providers of items being recommended ACM RecSys 2019 Late-breaking Results, 16th-20th September 2019, Copenhagen, Denmark Copyright ©2019 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). (P-fairness) — as in [3, 6, 7, 10]. In C-fairness, the issue with algorithmic bias is that users could receive different recommendations based on their protected/sensitive attributes, e.g., age, gender, race. In P-fairness, the issue is that different classes of ranked items (like male vs. female job applicants) could receive unequal attention, or attention disproportional to their overall size in the pool of alternatives. In this paper, we present a simple unified viewpoint of these fairness concepts, where one only needs to specify two components: (1) an item classification, and (2) a target distribution over item classes. The exact definition of these components depends on the specific fairness goal one aims for: • Consider the C-fairness concept where each user should receive calibrated recommendations that match her prior preferences, as in [10]; e.g., a user that 70% of the time interacts with items from a particular class, should also receive recommendations that come from that class 70% of the time. Here, the item classification is explicitly given — e.g., movie genres in [10]. The target distribution is specific to each user and represents the proportion of each item class in the user’s feedback history. • In another C-fairness concept, the goal is to provide fair recommendations to a group of users, as in [5, 8]. Here, we introduce an item class per group member: an item belongs to the i-th class if it is within the top-N items of the i-th group member. The notion of top-N -envy-freeness [8] is then captured by setting the target distribution to be uniform over these item classes. Intuitively, we seek to construct a group recommendation list that contain items equally desirable among group members, thus ensuring that no member is envious of the others. • As an example of P-fairness, consider the scenario where we want to make recommendations that equally represent all item providers, as in [6]. Here, the item classes are the providers. The target distribution is again the uniform distribution over these item classes. These examples illustrate the flexibility of our approach. Distinct notions of fairness can be described in terms of a simple framework, consisting of an item classification and a target distribution. Once the two components are defined, the fairness of the recommender is measured by the statistical divergence between the target and the realized distribution — the latter being the distribution over item classes induced by a recommendation list. The divergence essentially quantifies how far from being perfectly fair the recommendations are. As the goal of a fairness-aware recommender system is to strike a balance between utility and fairness, similar to prior work [3, 6, 10], we aim to maximize a linear combination of the two measures in a post-processing step. For this purpose, we introduce a novel reranking algorithm that approximately solves this optimization problem. Overall, our approach allows us to frame various fairness objectives under a common framework, making it possible to relate and compare them. Moreover, our unified algorithm enables the direct optimization of the desired utility-fairness trade-off. This is particularly attractive as some fairness proposals only consider some proxy of fairness. APPROACH Item Classification. We assume that items belong to a set of K appropriately defined classes; in case of P-fairness, these are typically the item producers/providers, while in C-fairness, classes are specified according to consumer/user properties. We denote as p(κÍ|t) the membership of item t in class κ ∈ K. As an item may belong to different classes, we require κ ∈K p(κ |t) = 1. Target Distribution. A target distribution Q is a discrete probability distribution over K, representing the desired class representation a recommendation list should exhibit. For example, we may set Q to the uniform distribution directing the recommender to cover all classes; or to the class distribution of the user’s consumed items, asking the recommender to respect past preferences. Realized Distribution. A recommended list τ presents items from possibly different classes to the target user. The realized distribution P(τ ) in a list τ is a discrete probability distribution over the set of classes K, capturing the level at which each class is presented to the user via τ . There are various ways to concretely define P(τ ). The simplest is to sum up the class memberships of each item. Another approach is to account for the position bias in a ranked list, and compute the exposure/attention [1, 9] that each class receives in list τ . Briefly, the intuition is that items ranked at the top of the list are more likely to be selected by the user. Specifically, if d(i) denotes the discount for the bias of position i, then a possible definition of the probability of class κ ∈ K being represented in list τ is: Õ P(τ )[κ] = γτ · d(rankτ (t)) · p(κ |t) (1) t ∈τ where γτ = κ ∈K t ∈τ d(rankτ (t)) · p(κ |t) ensures that P(τ ) is a probability distribution. Í Í Fairness. A recommended list τ is perfectly fair if its realized distribution is equal to the target. As perfect fairness may be unattainable, we would like to quantify deviations from the ideal state. The fairness of a list τ is captured by the similarity of its representation P(τ ) to some given target Q, and is denoted as F (τ ; Q). When the target Q is understood by the context, we simply denote it as F (τ ). Dissimilarity between two distributions can be measured in terms of their statistical divergence. Therefore, we express fairness of τ by a value F (τ ), normalized to [0, 1] as follows: D(P(τ )∥Q) F (τ ) = 1 − , (2) D + (·∥Q) where D(·∥·) is a divergence measure of representation P(τ ) from the target Q, and D + (·∥Q) is an upper bound on the divergence of any possible representation. When the representation P(τ ) equals the target distribution Q, fairness takes its maximum value F (τ ) = 1. In this work,   we consider Q [κ] the Kullback–Leibler (KL) divergence, as in [10, 11]: D(P ∥Q) = κ ∈K Q[κ] · log P̃ [κ] , where P̃ = Í (1 − α) · P + α · Q is the smoothed distribution of the class representation for some small constant α (set to 0.01 to match [10]), and log(1/α) is an upper bound. Utility. A list has a (predicted) utility to the target user, captured by a value U (τ ), normalized to [0, 1]. As a concrete example, assume that the recommender estimates the relevance r (t) of an item t to the target user. Let rankτ (t) denote the position of item t in list τ , and let d(i) denote a discount associated with the i-th position in a list — e.g., d(i) = 1/log(i + 1) as per the discounted cumulative gain (DCG) metric. Then a possible definition of utility is: 1 Õ U (τ ) = + r (t) · d(rankτ (t)), (3) U t ∈τ where U + is the maximum possible utility of any ranked list of N items (similar to the ideal DCG). Table 1: Some Utility-Fairness tradeoffs Optimization Objective. The goal is to return a recommendation list τ that maximizes a linear Uniform Fairness Scenario combination of its utility U (τ ) and its fairness F (τ ), for a given value of λ ∈ [0, 1], i.e., λ U R N F C arg max (1 − λ) · U (τ ) + λ · F (τ ). FAR 0 1.000 0.110 0.110 0.671 0.675 τ 0.5 0.990 0.091 0.087 0.837 0.883 Algorithm. The unified algorithm, termed PREFIX, reranks a recommendation list τ N of length N 1 0.979 0.091 0.083 0.852 0.898 that maximizes utility alone. Let T denote the set of items to be reranked. The goal is to create a CALIB 0 1.000 0.110 0.110 0.671 0.675 fairness aware reranked list that approximately solves the aforementioned optimization problem. For 0.5 0.933 0.089 0.072 0.673 0.680 any list τ , let S(τ ) = (1 − λ) · U (τ ) + λ · F (τ ) denote its score. The algorithm examines each prefix τk of 1 0.926 0.090 0.075 0.674 0.683 the initial list, i.e., a list that contains the first k items from τ N for k ∈ [0, N ]. For each prefix τk , the PREFIX 0 1.000 0.110 0.110 0.671 0.675 algorithm will incrementally append the remaining N − k items in a greedy manner. Let τk (i) denote 0.5 0.980 0.047 0.073 0.891 0.898 1 0.929 0.047 0.061 0.904 0.898 the list at the i-th step — the final reranked list created from the k-prefix is obtained at the (N − k)-th step. At the i-th step, the algorithm appends the item that maximizes the score of the resulting list, User Fairness Scenario i.e., arg maxt ∈T <τi −1 S(τi−1 ⊔ {t }), where ⊔ denotes the append operation. After considering all prefixes λ U R N F D the algorithm has constructed N , possibly distinct, rerankings of the initial list t N , and among them, FAR 0 1.000 0.110 0.110 0.799 1.335 it returns the one with the highest score. 0.5 0.988 0.111 0.109 0.888 0.746 1 0.966 0.113 0.082 0.901 0.657 EVALUATION CALIB 0 1.000 0.110 0.110 0.799 1.335 Data. We use the MovieLens 20M dataset [4]. We create an item class for each movie genre, and for 0.5 0.933 0.089 0.072 0.968 0.214 each movie we assign equal membership probabilities to its classes, as in [10]. As typical, we use a 1 0.926 0.090 0.075 0.968 0.213 basic matrix factorization model with 20 dimensions to create recommendations. PREFIX 0 1.000 0.110 0.110 0.799 1.335 Fairness Objectives and Methods. We consider two scenarios for fairness. Uniform, mandates that 0.5 0.982 0.092 0.100 0.957 0.283 1 0.939 0.093 0.080 0.968 0.210 all the realized distribution is uniform, similar to the P-fairness goal of the FAR method [6]. User, requires that the realized distribution follows each user’s target distribution from her feedback history, as in the C-fairness goal of the CALIB method [10]. Metrics. We ask each method to return a ranked list of 20 items. For a list, we measure: U: the utility according to Equation 3; R: recall at position 20, i.e., recall@20; N: normalized DCG at position 20, i.e., nDCG@20; F: fairness according to Equation 2; C: class coverage ratio for Uniform as in [6]; D: KL divergence for User as in [10]. In all measures except the last, higher means better. 1.00 Results. All three methods use a parameter λ to control the tradeoff between utility and fairness. Table 1 shows all metrics, as we increase λ from 0 to 0.5 and 1. For both scenarios and λ = 0, all 0.98 methods return the same list. For Uniform and λ = 1, CALIB performs poorly in this scenario, while PREFIX matches the effectiveness of FAR in terms of coverage (C), and does better in fairness (F) Utility achieving more class-balanced recommendations. For User and λ = 1, FAR performs poorly, while 0.96 PREFIX exceeds the effectiveness of CALIB in terms of fairness (F) and in KL divergence (D). We next set various values to λ within [0, 1] and plot the utility-fairness curve of each generated 0.94 FAR CALIB ranked list, shown in Figure 1. A recommender with a curve going closer to the upper right corner PREFIX (of perfect utility and fairness) is more desirable, as it better trades off utility against fairness. In the 0.70 0.75 0.80 0.85 0.90 Uniform scenario, it is clear that PREFIX is more effective than FAR and CALIB. In the User scenario, Fairness where the highest possible fairness values for PREFIX and CALIB are comparable (rightmost points), observe that the curve of PREFIX again goes closer to the perfect utility-fairness corner. (a) Uniform REFERENCES 1.00 [1] Asia J. Biega, Krishna P. Gummadi, and Gerhard Weikum. 2018. Equity of Attention: Amortizing Individual Fairness in Rankings. In ACM SIGIR. ACM, 405–414. https://doi.org/10.1145/3209978.3210063 0.98 [2] Robin Burke. 2017. Multisided Fairness for Recommendation. CoRR abs/1707.00093 (2017). arXiv:1707.00093 [3] Robin Burke, Nasim Sonboli, and Aldo Ordonez-Gauger. 2018. Balanced Neighborhoods for Multi-sided Fairness in Utility Recommendation. In FAT (Proceedings of Machine Learning Research), Vol. 81. PMLR, 202–214. 0.96 [4] F. Maxwell Harper and Joseph A. Konstan. 2016. The MovieLens Datasets: History and Context. TiiS 5, 4 (2016), 19:1–19:19. https://doi.org/10.1145/2827872 0.94 FAR [5] Xiao Lin, Min Zhang, Yongfeng Zhang, Zhaoquan Gu, Yiqun Liu, and Shaoping Ma. 2017. Fairness-Aware Group CALIB Recommendation with Pareto-Efficiency. In ACM RecSys. ACM, 107–115. https://doi.org/10.1145/3109859.3109887 PREFIX [6] Weiwen Liu and Robin Burke. 2018. Personalizing Fairness-aware Re-ranking. CoRR abs/1809.02921 (2018). arXiv:1809.02921 [7] Tien T. Nguyen, Pik-Mai Hui, F. Maxwell Harper, Loren G. Terveen, and Joseph A. Konstan. 2014. Exploring the filter bubble: 0.80 0.85 0.90 0.95 the effect of using recommender systems on content diversity. In WWW. ACM. https://doi.org/10.1145/2566486.2568012 Fairness [8] Dimitris Serbos, Shuyao Qi, Nikos Mamoulis, Evaggelia Pitoura, and Panayiotis Tsaparas. 2017. Fairness in Package-to- (b) User Group Recommendations. In WWW. ACM, 371–379. https://doi.org/10.1145/3038912.3052612 [9] Ashudeep Singh and Thorsten Joachims. 2018. Fairness of Exposure in Rankings. In ACM KDD. ACM, 2219–2228. Figure 1: Utility-Fairness curves https://doi.org/10.1145/3219819.3220088 [10] Harald Steck. 2018. Calibrated recommendations. In ACM RecSys. ACM, 154–162. https://doi.org/10.1145/3240323.3240372 [11] Ke Yang and Julia Stoyanovich. 2017. Measuring Fairness in Ranked Outputs. In SSDBM. https://doi.org/10.1145/3085504. 3085526 [12] Sirui Yao and Bert Huang. 2017. Beyond Parity: Fairness Objectives for Collaborative Filtering. In NIPS. 2925–2934.