Examining the Impact of Multi-Objective Recommender Systems on Providers Bias Reza Shafiloo1,* , Kostas Stefanidis1 1 Tampere University, Finland Abstract Recommender systems are designed to help customers in finding their personalized content. However, biases in recommender systems can potentially exacerbate over time. Multi-objective recommender system (MORS) algorithms aim to alleviate bias while maintaining the accuracy of recommendation lists. While these algorithms effectively address item-side fairness, provider-side fairness often remains neglected. This study investigates the impact of MORS algorithms, leveraging evolutionary techniques to mitigate popularity bias on the item-side, on providers’ fairness. Our findings reveal that baseline algorithms can adversely affect providers’ fairness. Moreover, it is demonstrated that evolutionary algorithms, specifically those introducing less popular items to the initial population of their algorithms, exhibit superior performance compared to other MORS algorithms in enhancing providers’ fairness. This research sheds light on the crucial role MORS algorithms, particularly those employing evolutionary approaches, can play in mitigating bias and promoting fairness for both users and providers in recommender systems. Keywords Recommender systems, Items-side fairness, Producer-side fairness 1. Introduction a post-processing approach offers a potential solution for achieving fairness in RS outputs [7]. Some existing MORS These days, with the increasing amount of information on specifically address fairness for the item side. These ap- the web, content providers need systems to personalize con- proaches aim to maintain the accuracy of RS for consumer tent for end-users. As a result, users can efficiently access satisfaction while also creating opportunities for recom- their favorite content, leading to user satisfaction [1]. Rec- mending less popular items, thereby mitigating popularity ommender systems (RS) provide personalized content for bias [8, 9, 10]. While preserving accuracy and enhancing users based on their historical interactions with systems, fairness among items is valuable, it is crucial to investigate such as ratings or clicks on items. Despite being a crucial fairness among providers of items. and valuable tool for users, RS has been identified as ampli- In this study, our objective is to investigate the behavior of fying various biases. These biases can significantly impact MORS algorithms in mitigating item popularity bias and its the outcomes of RS, particularly concerning factors such as impact on providers’ fairness. While existing research has gender, age, race, and other characteristics. One such bias shown the trade-off between mitigating popularity bias and is popularity bias, where certain items typically receive a maintaining recommendation accuracy on the item side, it substantial number of ratings, leading to them being recom- is crucial to delve deeper into how existing work objectives mended more frequently than others. can influence providers’ fairness. Prior research has yet to Fairness-aware recommender systems aim to address al- be conducted in this area, and our study aims to address gorithmic bias in various ways, ensuring the system’s recom- this gap [2]. mendations are unbiased [2]. Fairness-aware recommender Furthermore, we aim to explore which specific objectives systems can take into account various attributes to offer may have a trade-off with providers’ fairness to provide a equitable recommendations. The concept involves evalu- more comprehensive understanding of the issue. We have ating how a recommender system treats or affects individ- chosen MORS algorithms that benefit from evolutionary uals or groups based on the values of specific attributes. algorithms to solve a multi-objective optimization to achieve Methods for ensuring fairness in RS can be categorized into this. While evolutionary algorithms may not be the swiftest, pre-processing, which involves modifying input data [3]; their superiority in addressing multi-objective problems in-processing, which constrains learning algorithms for fair arises from their capability to tackle complex and non-linear recommendations [4]; and post-processing, which modifies optimization problems [8]. the output of the baseline algorithm [5]. Our work shows that MORS algorithms perform better In RS, various stakeholders play crucial roles, with two in ensuring providers’ fairness than baseline algorithms. primary groups being consumers of items and providers of MORS algorithms enable providers to showcase their items items [6]. However, numerous fairness-aware RS focus on more effectively than baseline algorithms. Although it is addressing consumer or producer-sided fairness, often ne- noteworthy that, among all MORS algorithms, there is no glecting comprehensive, all-sided multi-stakeholder fairness. significant difference in covering providers’ fairness, those While numerous studies concentrate on one-sided fairness algorithms that add less popular items to their initial popu- in RS, it is essential to explore how addressing fairness for lation of evolutionary algorithms show better performance one group might impact the fairness of other stakeholders. than other MORS algorithms. Using Multi-objective Recommender Systems (MORS) as The remainder of this paper is structured as follows. Sec- Published in the Proceedings of the Workshops of the EDBT/ICDT 2024 tion 2 reviews the related fairness in recommender systems. Joint Conference (March 25-28, 2024), Paestum, Italy Section 3 describes the algorithms we use in our study and * Corresponding author. the measures we utilize to compare the algorithms’ perfor- $ shafiloo.reza@tuni.fi (R. Shafiloo); konstantinos.stefanidis@tuni.fi mance. Section 4 presents some results of our framework (K. Stefanidis) on the MovieLens and IMDB datasets. Section 5 concludes € https://www.tuni.fi/fi/konstantinos-stefanidis (K. Stefanidis)  0000-0003-1317-8062 (K. Stefanidis) this work. © 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR ceur-ws.org Workshop ISSN 1613-0073 Proceedings 2. Related Work has been applied to the output. These algorithms serve as our baseline models for evaluating bias mitigation strate- The fundamental RS aims to forecast ratings for unknown gies and their impact on the producers’ side in subsequent items among users, employing diverse algorithms for this analyses. task. Approaches like User-based and Item-based collab- For computing the similarity between two users, we have: orative filtering algorithms, as explored by Adomavicius et al. [11] and Yue et al. [12], entail the identification of ∑︀ similar users or items to predict item ratings. CF algorithms 𝑘∈𝐼𝑢 ∩𝐼𝑣 (𝑟𝑢𝑘 − 𝜇𝑢 ).(𝑟𝑣𝑘 − 𝜇𝑣 ) 𝑠𝑖𝑚(𝑢, 𝑣) = √︁∑︀ can be used in many post-processing algorithms as baseline √︁∑︀ 𝑘∈𝐼𝑢 ∩𝐼𝑣 (𝑟𝑢𝑘 − 𝜇𝑢 ). 𝑘∈𝐼𝑢 ∩𝐼𝑣 (𝑟𝑣𝑘 − 𝜇𝑣 ) algorithms, from neural networks [13] to multi-objective evolutionary algorithms [9, 8, 10]. (1) The investigation for an optimal balance between accu- Equation 1 defines the similarity measure between two racy and bias mitigation has garnered significant attention users, 𝑢 and 𝑣, calculated based on the items they have both in RS. Malekzadeh and Kaedi propose a strategy that si- rated. Here, 𝐼𝑢 represents the subset of items rated by user multaneously personalizes recommended items to maintain 𝑢, 𝑟𝑢𝑘 denotes the rating given by user 𝑢 to item 𝑘, and 𝜇𝑢 accuracy as discussed in their work [8]. Similarly, Wang et is the average rating provided by user 𝑢. al. [10] address the long-tail problem by employing multi- objective evolutionary optimization algorithms, focusing on improving recommendation list accuracy and reducing ∑︀ 𝑣∈𝑝𝑢 (𝑗) 𝑠𝑖𝑚(𝑢, 𝑣).(𝑟𝑣𝑗 − 𝜇𝑣 ) the dominance of popular items. Shafiloo et al. [9] present ˆ 𝑟𝑢𝑗 = 𝜇𝑢 + ∑︀ (2) 𝑣∈𝑝𝑢 (𝑗) |𝑠𝑖𝑚(𝑢, 𝑣)| a framework to alleviate popularity bias in recommender systems by incorporating users’ dynamic preferences. Cai Equation 2 outlines the predicted rating (𝑟ˆ𝑢𝑗 ) of user et al. [14] proposed a framework based on multi-objective 𝑢 for item 𝑗. It incorporates the average rating 𝜇𝑢 and algorithms designed to concurrently optimize accuracy, di- calculates the predicted rating by considering the similarity versity, and coverage within recommendation lists. Utiliz- between user 𝑢 and other users (𝑣) who have rated the same ing multi-objective algorithms reflects their commitment to item (𝑗). The set 𝑃𝑢 (𝑗) represents the group of nearest addressing multiple dimensions of recommendation qual- users to 𝑢 who have provided ratings for item 𝑗. The item- ity, aiming to enhance the overall user experience. Jain et based collaborative filtering is similar to the user-based al. introduced a novel similarity metric tailored for base- collaborative filtering. line algorithms[15]. Their approach involved modifying fundamental functions of genetic algorithms, precisely the crossover operation, to effectively manage the trade-off be- 3.2. Multi-objective algorithms tween accuracy and diversity of recommended items. Pang In this section, we introduce algorithms that leverage the et al. introduced a framework based on genetic algorithms, outputs of baseline algorithms, implementing reranking where accuracy and coverage serve as objective functions strategies to achieve specific objectives. Each algorithm [16]. This innovative approach is designed to tackle pop- is characterized by an objective function to mitigate item ularity bias in recommendation lists, emphasizing a dual popularity bias. focus on improving accuracy and coverage for a more com- Malekzadeh and Kaedi [8] employ the simulated anneal- prehensive and unbiased recommendation system. ing algorithm to address the long-tail problem in recom- Fairness-aware recommender systems try to tackle the al- mender systems. Their approach begins with applying a gorithmic bias issue in different ways and ensure that the rec- collaborative filtering algorithm to generate initial recom- ommendations made by the system are unbiased [17]. How- mendation lists. Subsequently, an evolutionary algorithm ever, many approaches consider tackling only one-sided fair- is employed to optimize the combination of items in these ness issues but abandon all-sided multi-stakeholder fairness lists, focusing on satisfying three defined objective func- [18]. In the realm of multi-stakeholder recommender sys- tions. These functions encompass considerations for per- tems (MS-RS), where numerous users participate in the rec- sonalized diversification, accuracy, and increased participa- ommendation process from multiple perspectives, as noted tion of long-tail items, aiming to enhance recommendations’ by Cornacchia et al. [19], there should be studies on how overall quality. The objective functions are: items side fairness how can affect another side of fairness. 1. Diversity: The Shannon entropy is used for diver- sity which the entropy 𝐻𝑎 (𝑢) for attribute 𝑎 of user 3. Methods 𝑢 is defined using the formula: In this section, our initial focus is to introduce the algo- 𝑘 ∑︁ rithms employed in our study. Subsequently, we will delve 𝐻𝑎 (𝑢) = − 𝑝𝑖 · log𝑘 𝑝𝑖 (3) into the evaluation metrics utilized for comparing results. 𝑖=1 Our objective is to comprehensively explore the impact of In this Equation: item bias mitigation on the producers’ side fairness and 𝐻𝑎 (𝑢) is the entropy for attribute 𝑎 of user 𝑢. 𝑘 understand how it influences the outcomes of the recom- is the number of possible values for attribute 𝑎. 𝑝𝑖 mendation systems. represents the ratio of the number of ratings given by user 𝑢 to items with attribute 𝑎 having the value 3.1. Baseline algorithms 𝑖, divided by the total number of user’s ratings. We have selected two baseline algorithms, item-based and Essentially, this formula calculates the entropy of user-based collaborative filtering, where no post-processing the distribution of ratings given by a user 𝑢 across different values of attribute 𝑎. The attribute-based diversity measurement in this Wang et al. [10] address the long-tail problem by defining study is determined using an equation to assess a two objective functions. The first function assesses the recommendation list’s diversity. The formula is ex- accuracy of recommendation lists, while the second aims pressed as: to reduce the dominance of popular items. The objectives formula can be expressed as follows: 𝐷𝑖𝑣𝑒𝑟𝑠𝑖𝑡𝑦𝑎 (𝑐1 , . . . , 𝑐𝑛 ) = 1. Accuracy: The primary objective function for as- sessing accuracy, labeled as 𝐹 1, is formulated as: 𝑛 ∑︁𝑛 1 ∑︁ (1 − similarity𝑎 (𝑐𝑖 , 𝑐𝑗 )) (4) 𝑘 ∑︁ 𝑛(𝑛 − 1) 𝑖=1 𝑗=𝑖 𝐹1 = ˆ 𝑟𝑢,𝑖 (8) 𝑖=1 In this context, 𝑛 signifies the number of items In this expression, 𝑘 denotes the length of the rec- within the recommendation list, and 𝑐1 , . . . , 𝑐𝑛 ommendation list. A higher 𝐹 1 value signifies in- represents the items recommended. The term creased popularity of the items within the list. similarity𝑎 (𝑐𝑖 , 𝑐𝑗 ) denotes the measure of similarity 2. Long tail recommendation: Items with higher rat- between two items 𝑐𝑖 and 𝑐𝑗 based on the attribute ings might be prioritized higher on the ranking list 𝑎. for all users, and popular items often receive similar Equation 3 illustrates the ideal diversity for a specific ratings, resulting in low variance. To measure the user, capturing an optimal scenario. Subsequently, unpopularity in terms of the mean and variance of the deviation between this ideal diversity and the item ratings, Tamas et al. proposed a value for an actual diversity computed from Equation 4 for the item 𝑖: recommendation list is measured. The disparity for 1 each item attribute is quantified through Equation 𝑚𝑖 = (9) 𝜇𝑖 (𝜎𝑖 + 1)2 (7): Here, 𝜇𝑖 and 𝜎𝑖 represent the mean and variance of Personalized Diversity = |𝐻𝑎 − Diversity𝑎 | (5) ratings for item 𝑖 across all users. To prevent division by zero, a value of one is added to the variance. The In this expression, Diversity𝑎 denotes the diversity reciprocal of this mean-variance combination yields of the recommendation list based on attribute 𝑎, the value 𝑚𝑖 , where a smaller value indicates a more while 𝐻𝑎 signifies the entropy of user preferences re- popular item. lated to attribute 𝑎. This metric, termed Personalized Motivated by this concept, an objective function 𝐹 2 Diversity, serves to quantify the difference between is introduced to calculate the unpopularity of the the ideal and actual diversity in the recommenda- recommendation result: tion list for a given user, explicitly considering the 𝑘 preferences associated with a particular attribute. ∑︁ 1 𝐹2 = 2 (10) 2. The participation of long tail items: The long 𝑖=1 𝜇 𝑖 (𝜎𝑖 + 1) tail metric is computed using the formula: This function quantifies the unpopularity of the rec- 𝑘 ommended items, with lower values indicating more popular items in the list. ∑︁ Long Tail = Popularity(𝑖𝑡𝑒𝑚) (6) 𝑖𝑡𝑒𝑚=1 They employ a genetic algorithm to achieve these ob- In this Equation, 𝑘 signifies the size of the recommen- jectives, seeking optimal combinations of items within rec- dation list, representing the total number of items ommendation lists that satisfy the defined criteria. This included in the recommendation. A lower value approach aims to enhance accuracy and mitigate popularity obtained from this calculation indicates a higher bias for more balanced and practical recommendations. likelihood of incorporating less popular items in the Shafiloo et al. [9] introduced a framework to alleviate pop- recommendation list. This suggests a greater em- ularity bias in recommender systems by incorporating users’ phasis on the inclusion of long-tail items in the rec- dynamic preferences. Their approach employs a memetic al- ommendations, reflecting a preference for diversity gorithm, creating opportunities to include unpopular items and coverage beyond just popular items. in recommendation lists. They define two objective func- 3. Accuracy: The Accuracy metric is evaluated using tions within their framework, aiming to simultaneously pre- the following Equation: serve accuracy and mitigate popularity bias. This innovative solution focuses on providing more diverse and unbiased 1 recommendations, catering to the dynamic preferences of Accuracy = ∑︀𝑘 (7) users. The objectives to be achieved are: 𝑖𝑡𝑒𝑚=1 PredictedRate(𝑖𝑡𝑒𝑚) 1. Accuracy: In their research, they employ accuracy In this Equation, PredictedRate(𝑖𝑡𝑒𝑚) denotes the as expressed in formula 7. predicted rating assigned to the item. The formula 2. Long tail participation: They utilize long tail par- computes the inverse of the sum of the predicted ticipation as described in formula 6. ratings for all recommended items, offering a metric Additionally, in their research, they modified the memetic to assess the accuracy of the recommendation sys- algorithm. Instead of randomly adding items to the initial tem. A lower value in the Accuracy metric suggests population, as is common in other genetic algorithms, they a higher overall accuracy in the predicted ratings introduced a higher possibility of including items from the for the recommended items. long tail and a lower possibility of including popular items in the initial population. 4. Experimental Evaluation This section presents the datasets employed for evaluating 1 ∑︁ 𝐷𝑢 (𝑘) = 𝑆𝑖𝑚(𝑖𝑝 , 𝑖𝑞 ) (14) the proposed method. Subsequently, we outline the evalua- 𝑘(𝑘 − 1) 𝑝̸=𝑞 tion criteria and comprehensively represent the comparison result. Here, 𝑘 represents the length of the recommendation lists for user 𝑢, and 𝑆𝑖𝑚(𝑖𝑝 , 𝑖𝑞 ) calculates the similarity between two items 𝑖𝑝 and 𝑖𝑞 based on a similarity metric 4.1. Dataset defined in Equation 1. The purpose of 𝐷𝑢 (𝑘) is to quantify In our experimental evaluation, we use 2 real-world datasets, the similarity of items within user 𝑢’s recommendation list. namely MovieLens and IMDB. The MovieLens dataset is a The intra-user diversity for all users is then defined as: commonly employed dataset for evaluating methods ad- dressing long-tail problems in various studies. Specifically, 1 ∑︁ we utilize the MovieLens 1M dataset that features 6040 𝐷all users (𝑘) = 𝐷𝑢 (𝑘) (15) 𝑚 𝑢∈𝑈 users and 1 million ratings for 3883 items. The IMDB Dataset is also employed to enhance information about movie providers, and director information is extracted. In Here, 𝑚 denotes the number of users in the set 𝑈 . This this study, movie directors are considered providers, and Equation provides a measure of intra-user diversity consid- the dataset includes information on 2208 movie directors. ering all users in the study. Equation 16 introduces the Normalized Discounted Cu- mulative Gain (NDCG) measurement, a widely used metric 4.2. Evaluation metric for evaluating the quality of recommendations. This mea- The study evaluates methods addressing the long-tail prob- surement is defined as: lem using three criteria for comparison. The first criterion is accuracy, measured through the precision metric defined 𝐷𝐶𝐺@𝑘(𝑢) as: 𝑁 𝐷𝐶𝐺@𝑘(𝑢) = (16) 𝐼𝐷𝐶𝐺@𝑘(𝑢) 𝑁𝑟𝑠 𝑃 𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 = (11) Here, 𝐼𝐷𝐶𝐺@𝑘(𝑢) represents the ideal 𝐷𝐶𝐺@𝑘(𝑢) for 𝑁𝑠 user 𝑢, where the ideal scenario assumes that all relevant Here, 𝑁𝑠 represents the total number of items recom- items in the user’s recommendation list appear at the top mended to the user, and 𝑁𝑟𝑠 denotes the relevant items rank, resulting in the maximum possible 𝐷𝐶𝐺@𝑘(𝑢). suggested to the user. Relevant items are those with ratings The discounted cumulative gain at position 𝑘 for user 𝑢, higher than the user’s average ratings, as outlined by Wang denoted as 𝐷𝐶𝐺@𝑘(𝑢), is calculated using the formula: et al. [10]. The second criterion, aggregate diversity (AG) (Equation 𝑘 12), counts the number of distinct items offered to users, 𝑟𝑒𝑙(𝑖) ∑︁ 𝐷𝐶𝐺@𝑘(𝑢) = (17) particularly focusing on long-tail items in recommendation 𝑖=1 log 2 (𝑖 + 1) lists [8]. In this Equation, 𝑟𝑒𝑙(𝑖) is an indicator function that deter- mines if item 𝑖 is relevant to user 𝑢. A value of 1 indicates ⋃︁ 𝐴𝑔𝑔𝑟𝑒𝑔𝑎𝑡𝑒 𝑑𝑖𝑣𝑒𝑟𝑠𝑖𝑡𝑦 = | 𝐿𝑛 (𝑢)| (12) 𝑢∈𝑈 that item 𝑖 is relevant, while a 0 indicates that item 𝑖 is irrelevant. Equation 12 introduces the aggregate diversity criterion, NDCG provides a normalized measure of the effectiveness where 𝑢 represents a specific user from the set of users 𝑈 , of a recommendation list by considering both relevance and and 𝐿𝑛 (𝑢) is the list of items recommended to the user 𝑢. the position of items within the list. The equation 12 is normalized by the number of items. This equation is used to measure popularity bias on the item and 4.3. Results and discussion provider sides. The third criterion is Novelty, calculated as: In this section, the study compares and analyzes the results obtained from various methods using the criteria introduced in the section above. For comparison, we use a real life 1 𝑁 𝑜𝑣𝑒𝑙𝑡𝑦 = ∑︀ scenario where the length of recommendation lists in all 𝑎𝑙𝑙 𝑟𝑒𝑐𝑜𝑚𝑚𝑒𝑛𝑑𝑒𝑑 𝑖𝑡𝑒𝑚𝑠 𝑃 𝑜𝑝𝑢𝑙𝑎𝑟𝑖𝑡𝑦(𝑖𝑡𝑒𝑚𝑠) algorithms is considered to be 10. (13) The results in Table 1 indicate that MORS algorithms This Equation indicates that the novelty of the recommen- outperform baseline algorithms in addressing the long-tail dation list decreases as the popularity of items increases, problem. These algorithms demonstrate superior perfor- emphasizing a preference for less popular items. The study mance in diversifying items in recommendation lists, ef- employs these criteria to compare and evaluate the results fectively increasing the participation of unpopular items. of different methods addressing the long-tail problem in Notably, the study highlights that MORS algorithms achieve recommender systems. this diversification without compromising the accuracy of Equation 14 introduces a measurement for intra-user di- the recommendation lists. Therefore, the MORS algorithms versity proposed by Zou et al. [20]. This measurement, are successful in preserving accuracy while simultaneously denoted as 𝐷𝑢 (𝑘), is defined for a specific user 𝑢 and is enhancing the inclusion of less popular items in the rec- calculated as follows: ommendations, addressing the long-tail problem in recom- mender systems. Table 1 Comparison of Results: Evaluation measures and outcomes for various algorithms. Accuracy measures include precision, NDCG and diversity and fairness measures include Novelty, AG-Items, and AG-providers. Evaluation metrics Algorithms Precision Novelty AG-Items Diversity NDCG AG-Providers CF-Item 0.7163 3.23E-05 0.5318 0.3873 0.5906 0.5067 CF-Users 0.6622 3.26E-05 0.5449 0.4051 0.7938 0.5230 Malekzadeh [8] (MORS) 0.8338 3.72E-05 0.6651 0.6221 0.9381 0.5711 Wang [10] (MORS) 0.6968 3.61E-05 0.6422 0.6200 0.9173 0.5697 Shafiloo [9] (MORS) 0.7989 4.02E-05 0.6930 0.7396 0.9599 0.6059 The comparison table suggests that while MORS algo- particularly those leveraging evolutionary techniques and rithms effectively mitigate popularity bias in recommenda- introducing less popular items to the initial population of tion lists, there is not a significant difference in the diversity their algorithms, can effectively mitigate popularity bias of providers between baseline algorithms and MORS al- and enhance the provider’s fairness. This emphasizes the gorithms. For example, CF-User has a value of 0.5230 in importance of considering provider-side fairness in the de- AG-providers, while Malekzadeh and Wang show 0.5697 and velopment of recommender systems, as it is often neglected 0.5711, respectively. Although MORS algorithms, aided by in current research. item diversifying objectives, offer providers a better chance Overall, our research contributes to the growing body of to present their items, the disparity in aggregate diversity work on fairness and bias in recommender systems and em- is more noticeable on the item side than on the provider phasizes the crucial role of MORS algorithms, particularly side when comparing MORS algorithms with baseline algo- those employing evolutionary approaches, in mitigating rithms. Moreover, the comparison indicates that Baseline bias and promoting fairness for both items and providers. algorithms with higher accuracy than MORS algorithms Our study provides insights into how existing work objec- exhibit poor performance in aggregate diversity, suggesting tives can influence provider fairness. It highlights the need that recommendation list accuracy can negatively impact for future research to delve deeper into this issue to pro- provider-side fairness. Specifically, CF-items achieve an ac- vide a more comprehensive understanding of the problem. curacy of 0.7163, whereas CF-Users attain 0.6622. However, The effectiveness of MORS algorithms for providers could AG-providers exhibit respective values of 0.5067 and 0.5230. be further enhanced if a specific objective function were Also, Table 1 indicates that among MORS algorithms, dedicated to mitigating provider bias. The absence of such Malekzadeh’s work outperforms Shafiloo and Wang’s work an objective function might limit the algorithms’ ability to in terms of the precision metric. However, this superi- address biases related to the popularity of providers in the ority adversely impacts aggregate diversity on both the recommendation process. provider and item sides. Specifically, Shafiloo’s work ex- hibits a precision of 0.7989, with aggregate provider diver- sity at 0.6059 and aggregate item diversity at 0.6930. In References contrast, Malekzadeh’s work achieves a precision of 0.8338, [1] M. Stratigi, E. Pitoura, K. Stefanidis, Squirrel: A frame- but the aggregate provider diversity decreases to 0.5711, and work for sequential group recommendations through the aggregate item diversity is 0.6651. reinforcement learning, Information Systems 112 Furthermore, in Figure 1, we present the provider fre- (2023) 102128. URL: https://www.sciencedirect.com/ quency using a bucketing technique. Specifically, in this science/article/pii/S0306437922001065. doi:https:// figure, providers are assigned to a bucket based on the num- doi.org/10.1016/j.is.2022.102128. ber of items belonging to that specific provider that are [2] E. Pitoura, K. Stefanidis, G. Koutrika, Fairness in rank- represented in all recommendation lists generated by the ings and recommendations: an overview, The VLDB algorithm. For instance, a provider is placed in bucket one Journal (2022) 1–28. if only one item from all items associated with that provider [3] B. Salimi, L. Rodriguez, B. Howe, D. Suciu, Interven- is present in all recommendation lists. tional fairness: Causal database repair for algorithmic This figure shows that baseline algorithms exhibit a weak- fairness, in: Proceedings of the 2019 International ness in recommending items from providers who lack popu- Conference on Management of Data, 2019, pp. 793– larity. This is illustrated in the initial buckets, where baseline 810. algorithms struggle to include more items from less famous [4] Z. Zhu, X. Hu, J. Caverlee, Fairness-aware tensor- providers. Conversely, the first part of the buckets shows based recommendation, in: Proceedings of the 27th that MORS algorithms provide a more significant opportu- ACM international conference on information and nity for less-known providers to showcase their items in knowledge management, 2018, pp. 1153–1162. the recommendation lists, offering more visibility. [5] T. Kamishima, S. Akaho, H. Asoh, J. Sakuma, Recom- mendation independence, in: Conference on fairness, 5. Conclusions accountability and transparency, PMLR, 2018, pp. 187– 201. In conclusion, our study highlights the significance of MORS [6] G. Giannopoulos, G. Papastefanatos, D. Sacharidis, algorithms in addressing the issue of bias in recommender K. Stefanidis, Interactivity, fairness and explanations systems and promoting fairness for both items and providers. in recommendations, in: ACM UMAP, 2021, pp. 157– Our findings reveal that while baseline algorithms can neg- 161. atively impact the provider’s fairness, MORS algorithms, [7] L. Xiao, Z. Min, Z. Yongfeng, G. Zhaoquan, L. Yiqun, Figure 1: Figure illustrates provider frequency distribution using a bucketing technique. Providers are categorized based on the number of items they contribute to all recommendation lists. This visualization offers insights into provider diversity and prevalence within recommendation systems. M. Shaoping, Fairness-aware group recommendation ceedings, Part III 23, Springer, 2019, pp. 302–313. with pareto-efficiency, in: Proceedings of the eleventh [17] C.-T. Li, C. Hsu, Y. Zhang, Fairsr: Fairness-aware se- ACM conference on recommender systems, 2017, pp. quential recommendation through multi-task learning 107–115. with preference graph embeddings, ACM Transac- [8] E. M. Hamedani, M. Kaedi, Recommending the tions on Intelligent Systems and Technology (TIST) 13 long tail items through personalized diversification, (2022) 1–21. Knowledge-Based Systems 164 (2019) 348–357. doi:10. [18] H. Wu, C. Ma, B. Mitra, F. Diaz, X. Liu, A 1016/j.knosys.2018.11.004. multi-objective optimization framework for multi- [9] R. Shafiloo, M. Kaedi, A. Pourmiri, Considering user stakeholder fairness-aware recommendation, ACM dynamic preferences for mitigating negative effects Transactions on Information Systems 41 (2022) 1–29. of long tail in recommender systems, arXiv preprint [19] G. Cornacchia, F. M. Donini, F. Narducci, C. Pomo, arXiv:2112.02406 (2021). A. Ragone, Explanation in multi-stakeholder recom- [10] S. Wang, M. Gong, H. Li, J. Yang, Multi-objective opti- mendation for enterprise decision support systems, in: mization for long-tail recommendation, Knowledge- International Conference on Advanced Information Based Systems 104 (2016) 145–155. doi:10.1016/j. Systems Engineering, Springer, 2021, pp. 39–47. knosys.2016.04.018. [20] F. Zou, D. Chen, Q. Xu, Z. Jiang, J. Kang, A two- [11] G. Adomavicius, Y. Kwon, Improving aggregate recom- stage personalized recommendation based on multi- mendation diversity using ranking-based techniques, objective teaching–learning-based optimization with IEEE Transactions on Knowledge and Data Engineer- decomposition, Neurocomputing 452 (2021) 716–727. ing 24 (2011) 896–911. doi:10.1109/TKDE.2011.15. [12] W. Yue, Z. Wang, W. Liu, B. Tian, S. Lauria, X. Liu, An optimally weighted user-and item-based collabo- rative filtering approach to predicting baseline data for friedreich’s ataxia patients, Neurocomputing 419 (2021) 287–294. [13] R. Borges, K. Stefanidis, Feature-blind fairness in col- laborative filtering recommender systems, Knowledge and Information Systems 64 (2022) 943–962. [14] X. Cai, Z. Hu, P. Zhao, W. Zhang, J. Chen, A hybrid recommendation system with many-objective evolu- tionary algorithm, Expert Systems with Applications 159 (2020) 113648. [15] A. Jain, P. K. Singh, J. Dhar, Multi-objective item evaluation for diverse as well as novel item recom- mendations, Expert Systems with Applications 139 (2020) 112857. [16] J. Pang, J. Guo, W. Zhang, Using multi-objective op- timization to solve the long tail problem in recom- mender system, in: Advances in Knowledge Discov- ery and Data Mining: 23rd Pacific-Asia Conference, PAKDD 2019, Macau, China, April 14-17, 2019, Pro-