Examining the Impact of Multi-Objective Recommender Systems
                         on Providers Bias
                         Reza Shafiloo1,* , Kostas Stefanidis1
                         1
                             Tampere University, Finland


                                                 Abstract
                                                 Recommender systems are designed to help customers in finding their personalized content. However, biases in recommender systems
                                                 can potentially exacerbate over time. Multi-objective recommender system (MORS) algorithms aim to alleviate bias while maintaining
                                                 the accuracy of recommendation lists. While these algorithms effectively address item-side fairness, provider-side fairness often remains
                                                 neglected. This study investigates the impact of MORS algorithms, leveraging evolutionary techniques to mitigate popularity bias on
                                                 the item-side, on providers’ fairness. Our findings reveal that baseline algorithms can adversely affect providers’ fairness. Moreover, it is
                                                 demonstrated that evolutionary algorithms, specifically those introducing less popular items to the initial population of their algorithms,
                                                 exhibit superior performance compared to other MORS algorithms in enhancing providers’ fairness. This research sheds light on the
                                                 crucial role MORS algorithms, particularly those employing evolutionary approaches, can play in mitigating bias and promoting fairness
                                                 for both users and providers in recommender systems.

                                                 Keywords
                                                 Recommender systems, Items-side fairness, Producer-side fairness


                         1. Introduction                                                                                                    a post-processing approach offers a potential solution for
                                                                                                                                            achieving fairness in RS outputs [7]. Some existing MORS
                         These days, with the increasing amount of information on                                                           specifically address fairness for the item side. These ap-
                         the web, content providers need systems to personalize con-                                                        proaches aim to maintain the accuracy of RS for consumer
                         tent for end-users. As a result, users can efficiently access                                                      satisfaction while also creating opportunities for recom-
                         their favorite content, leading to user satisfaction [1]. Rec-                                                     mending less popular items, thereby mitigating popularity
                         ommender systems (RS) provide personalized content for                                                             bias [8, 9, 10]. While preserving accuracy and enhancing
                         users based on their historical interactions with systems,                                                         fairness among items is valuable, it is crucial to investigate
                         such as ratings or clicks on items. Despite being a crucial                                                        fairness among providers of items.
                         and valuable tool for users, RS has been identified as ampli-                                                         In this study, our objective is to investigate the behavior of
                         fying various biases. These biases can significantly impact                                                        MORS algorithms in mitigating item popularity bias and its
                         the outcomes of RS, particularly concerning factors such as                                                        impact on providers’ fairness. While existing research has
                         gender, age, race, and other characteristics. One such bias                                                        shown the trade-off between mitigating popularity bias and
                         is popularity bias, where certain items typically receive a                                                        maintaining recommendation accuracy on the item side, it
                         substantial number of ratings, leading to them being recom-                                                        is crucial to delve deeper into how existing work objectives
                         mended more frequently than others.                                                                                can influence providers’ fairness. Prior research has yet to
                            Fairness-aware recommender systems aim to address al-                                                           be conducted in this area, and our study aims to address
                         gorithmic bias in various ways, ensuring the system’s recom-                                                       this gap [2].
                         mendations are unbiased [2]. Fairness-aware recommender                                                               Furthermore, we aim to explore which specific objectives
                         systems can take into account various attributes to offer                                                          may have a trade-off with providers’ fairness to provide a
                         equitable recommendations. The concept involves evalu-                                                             more comprehensive understanding of the issue. We have
                         ating how a recommender system treats or affects individ-                                                          chosen MORS algorithms that benefit from evolutionary
                         uals or groups based on the values of specific attributes.                                                         algorithms to solve a multi-objective optimization to achieve
                         Methods for ensuring fairness in RS can be categorized into                                                        this. While evolutionary algorithms may not be the swiftest,
                         pre-processing, which involves modifying input data [3];                                                           their superiority in addressing multi-objective problems
                         in-processing, which constrains learning algorithms for fair                                                       arises from their capability to tackle complex and non-linear
                         recommendations [4]; and post-processing, which modifies                                                           optimization problems [8].
                         the output of the baseline algorithm [5].                                                                             Our work shows that MORS algorithms perform better
                            In RS, various stakeholders play crucial roles, with two                                                        in ensuring providers’ fairness than baseline algorithms.
                         primary groups being consumers of items and providers of                                                           MORS algorithms enable providers to showcase their items
                         items [6]. However, numerous fairness-aware RS focus on                                                            more effectively than baseline algorithms. Although it is
                         addressing consumer or producer-sided fairness, often ne-                                                          noteworthy that, among all MORS algorithms, there is no
                         glecting comprehensive, all-sided multi-stakeholder fairness.                                                      significant difference in covering providers’ fairness, those
                         While numerous studies concentrate on one-sided fairness                                                           algorithms that add less popular items to their initial popu-
                         in RS, it is essential to explore how addressing fairness for                                                      lation of evolutionary algorithms show better performance
                         one group might impact the fairness of other stakeholders.                                                         than other MORS algorithms.
                            Using Multi-objective Recommender Systems (MORS) as                                                                The remainder of this paper is structured as follows. Sec-
                         Published in the Proceedings of the Workshops of the EDBT/ICDT 2024                                                tion 2 reviews the related fairness in recommender systems.
                         Joint Conference (March 25-28, 2024), Paestum, Italy                                                               Section 3 describes the algorithms we use in our study and
                         *
                           Corresponding author.                                                                                            the measures we utilize to compare the algorithms’ perfor-
                         $ shafiloo.reza@tuni.fi (R. Shafiloo); konstantinos.stefanidis@tuni.fi                                             mance. Section 4 presents some results of our framework
                         (K. Stefanidis)
                                                                                                                                            on the MovieLens and IMDB datasets. Section 5 concludes
                          https://www.tuni.fi/fi/konstantinos-stefanidis (K. Stefanidis)
                          0000-0003-1317-8062 (K. Stefanidis)                                                                              this work.
                             © 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0
                             International (CC BY 4.0).


CEUR
                  ceur-ws.org
Workshop      ISSN 1613-0073
Proceedings
2. Related Work                                                   has been applied to the output. These algorithms serve as
                                                                  our baseline models for evaluating bias mitigation strate-
The fundamental RS aims to forecast ratings for unknown           gies and their impact on the producers’ side in subsequent
items among users, employing diverse algorithms for this          analyses.
task. Approaches like User-based and Item-based collab-              For computing the similarity between two users, we have:
orative filtering algorithms, as explored by Adomavicius
et al. [11] and Yue et al. [12], entail the identification of                           ∑︀
similar users or items to predict item ratings. CF algorithms                                𝑘∈𝐼𝑢 ∩𝐼𝑣 (𝑟𝑢𝑘 − 𝜇𝑢 ).(𝑟𝑣𝑘 − 𝜇𝑣 )
                                                                  𝑠𝑖𝑚(𝑢, 𝑣) = √︁∑︀
can be used in many post-processing algorithms as baseline
                                                                                                              √︁∑︀
                                                                                     𝑘∈𝐼𝑢 ∩𝐼𝑣 (𝑟𝑢𝑘 − 𝜇𝑢 ).         𝑘∈𝐼𝑢 ∩𝐼𝑣 (𝑟𝑣𝑘 − 𝜇𝑣 )
algorithms, from neural networks [13] to multi-objective
evolutionary algorithms [9, 8, 10].                                                                                           (1)
   The investigation for an optimal balance between accu-
                                                                     Equation 1 defines the similarity measure between two
racy and bias mitigation has garnered significant attention
                                                                  users, 𝑢 and 𝑣, calculated based on the items they have both
in RS. Malekzadeh and Kaedi propose a strategy that si-
                                                                  rated. Here, 𝐼𝑢 represents the subset of items rated by user
multaneously personalizes recommended items to maintain
                                                                  𝑢, 𝑟𝑢𝑘 denotes the rating given by user 𝑢 to item 𝑘, and 𝜇𝑢
accuracy as discussed in their work [8]. Similarly, Wang et
                                                                  is the average rating provided by user 𝑢.
al. [10] address the long-tail problem by employing multi-
objective evolutionary optimization algorithms, focusing
on improving recommendation list accuracy and reducing
                                                                                      ∑︀
                                                                                        𝑣∈𝑝𝑢 (𝑗) 𝑠𝑖𝑚(𝑢, 𝑣).(𝑟𝑣𝑗 − 𝜇𝑣 )
the dominance of popular items. Shafiloo et al. [9] present             ˆ
                                                                        𝑟𝑢𝑗 = 𝜇𝑢 +           ∑︀                               (2)
                                                                                               𝑣∈𝑝𝑢 (𝑗) |𝑠𝑖𝑚(𝑢, 𝑣)|
a framework to alleviate popularity bias in recommender
systems by incorporating users’ dynamic preferences. Cai             Equation 2 outlines the predicted rating (𝑟ˆ𝑢𝑗 ) of user
et al. [14] proposed a framework based on multi-objective         𝑢 for item 𝑗. It incorporates the average rating 𝜇𝑢 and
algorithms designed to concurrently optimize accuracy, di-        calculates the predicted rating by considering the similarity
versity, and coverage within recommendation lists. Utiliz-        between user 𝑢 and other users (𝑣) who have rated the same
ing multi-objective algorithms reflects their commitment to       item (𝑗). The set 𝑃𝑢 (𝑗) represents the group of nearest
addressing multiple dimensions of recommendation qual-            users to 𝑢 who have provided ratings for item 𝑗. The item-
ity, aiming to enhance the overall user experience. Jain et       based collaborative filtering is similar to the user-based
al. introduced a novel similarity metric tailored for base-       collaborative filtering.
line algorithms[15]. Their approach involved modifying
fundamental functions of genetic algorithms, precisely the
crossover operation, to effectively manage the trade-off be-
                                                                  3.2. Multi-objective algorithms
tween accuracy and diversity of recommended items. Pang           In this section, we introduce algorithms that leverage the
et al. introduced a framework based on genetic algorithms,        outputs of baseline algorithms, implementing reranking
where accuracy and coverage serve as objective functions          strategies to achieve specific objectives. Each algorithm
[16]. This innovative approach is designed to tackle pop-         is characterized by an objective function to mitigate item
ularity bias in recommendation lists, emphasizing a dual          popularity bias.
focus on improving accuracy and coverage for a more com-             Malekzadeh and Kaedi [8] employ the simulated anneal-
prehensive and unbiased recommendation system.                    ing algorithm to address the long-tail problem in recom-
   Fairness-aware recommender systems try to tackle the al-       mender systems. Their approach begins with applying a
gorithmic bias issue in different ways and ensure that the rec-   collaborative filtering algorithm to generate initial recom-
ommendations made by the system are unbiased [17]. How-           mendation lists. Subsequently, an evolutionary algorithm
ever, many approaches consider tackling only one-sided fair-      is employed to optimize the combination of items in these
ness issues but abandon all-sided multi-stakeholder fairness      lists, focusing on satisfying three defined objective func-
[18]. In the realm of multi-stakeholder recommender sys-          tions. These functions encompass considerations for per-
tems (MS-RS), where numerous users participate in the rec-        sonalized diversification, accuracy, and increased participa-
ommendation process from multiple perspectives, as noted          tion of long-tail items, aiming to enhance recommendations’
by Cornacchia et al. [19], there should be studies on how         overall quality. The objective functions are:
items side fairness how can affect another side of fairness.
                                                                      1. Diversity: The Shannon entropy is used for diver-
                                                                         sity which the entropy 𝐻𝑎 (𝑢) for attribute 𝑎 of user
3. Methods                                                               𝑢 is defined using the formula:

In this section, our initial focus is to introduce the algo-                                         𝑘
                                                                                                    ∑︁
rithms employed in our study. Subsequently, we will delve                            𝐻𝑎 (𝑢) = −           𝑝𝑖 · log𝑘 𝑝𝑖        (3)
into the evaluation metrics utilized for comparing results.                                         𝑖=1
Our objective is to comprehensively explore the impact of
                                                                         In this Equation:
item bias mitigation on the producers’ side fairness and
                                                                         𝐻𝑎 (𝑢) is the entropy for attribute 𝑎 of user 𝑢. 𝑘
understand how it influences the outcomes of the recom-
                                                                         is the number of possible values for attribute 𝑎. 𝑝𝑖
mendation systems.
                                                                         represents the ratio of the number of ratings given
                                                                         by user 𝑢 to items with attribute 𝑎 having the value
3.1. Baseline algorithms                                                 𝑖, divided by the total number of user’s ratings.
We have selected two baseline algorithms, item-based and                 Essentially, this formula calculates the entropy of
user-based collaborative filtering, where no post-processing             the distribution of ratings given by a user 𝑢 across
                                                                         different values of attribute 𝑎.
   The attribute-based diversity measurement in this              Wang et al. [10] address the long-tail problem by defining
   study is determined using an equation to assess a           two objective functions. The first function assesses the
   recommendation list’s diversity. The formula is ex-         accuracy of recommendation lists, while the second aims
   pressed as:                                                 to reduce the dominance of popular items. The objectives
                                                               formula can be expressed as follows:
                𝐷𝑖𝑣𝑒𝑟𝑠𝑖𝑡𝑦𝑎 (𝑐1 , . . . , 𝑐𝑛 ) =                    1. Accuracy: The primary objective function for as-
                                                                      sessing accuracy, labeled as 𝐹 1, is formulated as:
                𝑛 ∑︁𝑛
         1     ∑︁
                       (1 − similarity𝑎 (𝑐𝑖 , 𝑐𝑗 ))     (4)                                       𝑘
                                                                                                 ∑︁
      𝑛(𝑛 − 1) 𝑖=1 𝑗=𝑖                                                                    𝐹1 =         ˆ
                                                                                                       𝑟𝑢,𝑖                (8)
                                                                                                 𝑖=1
   In this context, 𝑛 signifies the number of items                   In this expression, 𝑘 denotes the length of the rec-
   within the recommendation list, and 𝑐1 , . . . , 𝑐𝑛                ommendation list. A higher 𝐹 1 value signifies in-
   represents the items recommended. The term                         creased popularity of the items within the list.
   similarity𝑎 (𝑐𝑖 , 𝑐𝑗 ) denotes the measure of similarity        2. Long tail recommendation: Items with higher rat-
   between two items 𝑐𝑖 and 𝑐𝑗 based on the attribute                 ings might be prioritized higher on the ranking list
   𝑎.                                                                 for all users, and popular items often receive similar
   Equation 3 illustrates the ideal diversity for a specific          ratings, resulting in low variance. To measure the
   user, capturing an optimal scenario. Subsequently,                 unpopularity in terms of the mean and variance of
   the deviation between this ideal diversity and the                 item ratings, Tamas et al. proposed a value for an
   actual diversity computed from Equation 4 for the                  item 𝑖:
   recommendation list is measured. The disparity for                                               1
   each item attribute is quantified through Equation                                   𝑚𝑖 =                             (9)
                                                                                              𝜇𝑖 (𝜎𝑖 + 1)2
   (7):
                                                                      Here, 𝜇𝑖 and 𝜎𝑖 represent the mean and variance of
     Personalized Diversity = |𝐻𝑎 − Diversity𝑎 | (5)                  ratings for item 𝑖 across all users. To prevent division
                                                                      by zero, a value of one is added to the variance. The
   In this expression, Diversity𝑎 denotes the diversity               reciprocal of this mean-variance combination yields
   of the recommendation list based on attribute 𝑎,                   the value 𝑚𝑖 , where a smaller value indicates a more
   while 𝐻𝑎 signifies the entropy of user preferences re-             popular item.
   lated to attribute 𝑎. This metric, termed Personalized             Motivated by this concept, an objective function 𝐹 2
   Diversity, serves to quantify the difference between               is introduced to calculate the unpopularity of the
   the ideal and actual diversity in the recommenda-                  recommendation result:
   tion list for a given user, explicitly considering the                                    𝑘
   preferences associated with a particular attribute.
                                                                                            ∑︁         1
                                                                                     𝐹2 =                    2
                                                                                                                         (10)
2. The participation of long tail items: The long                                           𝑖=1
                                                                                                𝜇 𝑖 (𝜎𝑖 + 1)

   tail metric is computed using the formula:                         This function quantifies the unpopularity of the rec-
                           𝑘                                          ommended items, with lower values indicating more
                                                                      popular items in the list.
                          ∑︁
          Long Tail =            Popularity(𝑖𝑡𝑒𝑚)       (6)
                        𝑖𝑡𝑒𝑚=1
                                                                  They employ a genetic algorithm to achieve these ob-
   In this Equation, 𝑘 signifies the size of the recommen-     jectives, seeking optimal combinations of items within rec-
   dation list, representing the total number of items         ommendation lists that satisfy the defined criteria. This
   included in the recommendation. A lower value               approach aims to enhance accuracy and mitigate popularity
   obtained from this calculation indicates a higher           bias for more balanced and practical recommendations.
   likelihood of incorporating less popular items in the          Shafiloo et al. [9] introduced a framework to alleviate pop-
   recommendation list. This suggests a greater em-            ularity bias in recommender systems by incorporating users’
   phasis on the inclusion of long-tail items in the rec-      dynamic preferences. Their approach employs a memetic al-
   ommendations, reflecting a preference for diversity         gorithm, creating opportunities to include unpopular items
   and coverage beyond just popular items.                     in recommendation lists. They define two objective func-
3. Accuracy: The Accuracy metric is evaluated using            tions within their framework, aiming to simultaneously pre-
   the following Equation:                                     serve accuracy and mitigate popularity bias. This innovative
                                                               solution focuses on providing more diverse and unbiased
                                 1                             recommendations, catering to the dynamic preferences of
     Accuracy = ∑︀𝑘                              (7)           users. The objectives to be achieved are:
                      𝑖𝑡𝑒𝑚=1 PredictedRate(𝑖𝑡𝑒𝑚)
                                                                   1. Accuracy: In their research, they employ accuracy
   In this Equation, PredictedRate(𝑖𝑡𝑒𝑚) denotes the                  as expressed in formula 7.
   predicted rating assigned to the item. The formula              2. Long tail participation: They utilize long tail par-
   computes the inverse of the sum of the predicted                   ticipation as described in formula 6.
   ratings for all recommended items, offering a metric
                                                               Additionally, in their research, they modified the memetic
   to assess the accuracy of the recommendation sys-
                                                               algorithm. Instead of randomly adding items to the initial
   tem. A lower value in the Accuracy metric suggests
                                                               population, as is common in other genetic algorithms, they
   a higher overall accuracy in the predicted ratings
                                                               introduced a higher possibility of including items from the
   for the recommended items.
                                                               long tail and a lower possibility of including popular items
                                                               in the initial population.
4. Experimental Evaluation
This section presents the datasets employed for evaluating                                1     ∑︁
                                                                            𝐷𝑢 (𝑘) =               𝑆𝑖𝑚(𝑖𝑝 , 𝑖𝑞 )           (14)
the proposed method. Subsequently, we outline the evalua-                              𝑘(𝑘 − 1)
                                                                                                    𝑝̸=𝑞
tion criteria and comprehensively represent the comparison
result.                                                            Here, 𝑘 represents the length of the recommendation
                                                                lists for user 𝑢, and 𝑆𝑖𝑚(𝑖𝑝 , 𝑖𝑞 ) calculates the similarity
                                                                between two items 𝑖𝑝 and 𝑖𝑞 based on a similarity metric
4.1. Dataset
                                                                defined in Equation 1. The purpose of 𝐷𝑢 (𝑘) is to quantify
In our experimental evaluation, we use 2 real-world datasets,   the similarity of items within user 𝑢’s recommendation list.
namely MovieLens and IMDB. The MovieLens dataset is a              The intra-user diversity for all users is then defined as:
commonly employed dataset for evaluating methods ad-
dressing long-tail problems in various studies. Specifically,
                                                                                                  1 ∑︁
we utilize the MovieLens 1M dataset that features 6040                         𝐷all users (𝑘) =         𝐷𝑢 (𝑘)             (15)
                                                                                                  𝑚 𝑢∈𝑈
users and 1 million ratings for 3883 items. The IMDB
Dataset is also employed to enhance information about
movie providers, and director information is extracted. In         Here, 𝑚 denotes the number of users in the set 𝑈 . This
this study, movie directors are considered providers, and       Equation provides a measure of intra-user diversity consid-
the dataset includes information on 2208 movie directors.       ering all users in the study.
                                                                   Equation 16 introduces the Normalized Discounted Cu-
                                                                mulative Gain (NDCG) measurement, a widely used metric
4.2. Evaluation metric                                          for evaluating the quality of recommendations. This mea-
The study evaluates methods addressing the long-tail prob-      surement is defined as:
lem using three criteria for comparison. The first criterion
is accuracy, measured through the precision metric defined                                           𝐷𝐶𝐺@𝑘(𝑢)
as:                                                                          𝑁 𝐷𝐶𝐺@𝑘(𝑢) =                                  (16)
                                                                                                    𝐼𝐷𝐶𝐺@𝑘(𝑢)
                                   𝑁𝑟𝑠
                    𝑃 𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 =                        (11)       Here, 𝐼𝐷𝐶𝐺@𝑘(𝑢) represents the ideal 𝐷𝐶𝐺@𝑘(𝑢) for
                                   𝑁𝑠
                                                                user 𝑢, where the ideal scenario assumes that all relevant
   Here, 𝑁𝑠 represents the total number of items recom-         items in the user’s recommendation list appear at the top
mended to the user, and 𝑁𝑟𝑠 denotes the relevant items          rank, resulting in the maximum possible 𝐷𝐶𝐺@𝑘(𝑢).
suggested to the user. Relevant items are those with ratings       The discounted cumulative gain at position 𝑘 for user 𝑢,
higher than the user’s average ratings, as outlined by Wang     denoted as 𝐷𝐶𝐺@𝑘(𝑢), is calculated using the formula:
et al. [10].
   The second criterion, aggregate diversity (AG) (Equation                                        𝑘
12), counts the number of distinct items offered to users,                                               𝑟𝑒𝑙(𝑖)
                                                                                                  ∑︁
                                                                              𝐷𝐶𝐺@𝑘(𝑢) =                                   (17)
particularly focusing on long-tail items in recommendation                                        𝑖=1
                                                                                                      log 2 (𝑖 + 1)
lists [8].
                                                                   In this Equation, 𝑟𝑒𝑙(𝑖) is an indicator function that deter-
                                                                mines if item 𝑖 is relevant to user 𝑢. A value of 1 indicates
                                        ⋃︁
          𝐴𝑔𝑔𝑟𝑒𝑔𝑎𝑡𝑒 𝑑𝑖𝑣𝑒𝑟𝑠𝑖𝑡𝑦 = |             𝐿𝑛 (𝑢)|   (12)
                                        𝑢∈𝑈                     that item 𝑖 is relevant, while a 0 indicates that item 𝑖 is
                                                                irrelevant.
Equation 12 introduces the aggregate diversity criterion,          NDCG provides a normalized measure of the effectiveness
where 𝑢 represents a specific user from the set of users 𝑈 ,    of a recommendation list by considering both relevance and
and 𝐿𝑛 (𝑢) is the list of items recommended to the user 𝑢.      the position of items within the list.
The equation 12 is normalized by the number of items. This
equation is used to measure popularity bias on the item and
                                                                4.3. Results and discussion
provider sides.
  The third criterion is Novelty, calculated as:                In this section, the study compares and analyzes the results
                                                                obtained from various methods using the criteria introduced
                                                                in the section above. For comparison, we use a real life
                                    1
𝑁 𝑜𝑣𝑒𝑙𝑡𝑦 = ∑︀                                                   scenario where the length of recommendation lists in all
                𝑎𝑙𝑙 𝑟𝑒𝑐𝑜𝑚𝑚𝑒𝑛𝑑𝑒𝑑 𝑖𝑡𝑒𝑚𝑠 𝑃 𝑜𝑝𝑢𝑙𝑎𝑟𝑖𝑡𝑦(𝑖𝑡𝑒𝑚𝑠)        algorithms is considered to be 10.
                                                        (13)       The results in Table 1 indicate that MORS algorithms
   This Equation indicates that the novelty of the recommen-    outperform baseline algorithms in addressing the long-tail
dation list decreases as the popularity of items increases,     problem. These algorithms demonstrate superior perfor-
emphasizing a preference for less popular items. The study      mance in diversifying items in recommendation lists, ef-
employs these criteria to compare and evaluate the results      fectively increasing the participation of unpopular items.
of different methods addressing the long-tail problem in        Notably, the study highlights that MORS algorithms achieve
recommender systems.                                            this diversification without compromising the accuracy of
   Equation 14 introduces a measurement for intra-user di-      the recommendation lists. Therefore, the MORS algorithms
versity proposed by Zou et al. [20]. This measurement,          are successful in preserving accuracy while simultaneously
denoted as 𝐷𝑢 (𝑘), is defined for a specific user 𝑢 and is      enhancing the inclusion of less popular items in the rec-
calculated as follows:                                          ommendations, addressing the long-tail problem in recom-
                                                                mender systems.
     Table 1
     Comparison of Results: Evaluation measures and outcomes for various algorithms. Accuracy measures include precision,
     NDCG and diversity and fairness measures include Novelty, AG-Items, and AG-providers.
                                                                       Evaluation metrics
                      Algorithms
                                            Precision     Novelty    AG-Items    Diversity   NDCG       AG-Providers
                      CF-Item                0.7163      3.23E-05     0.5318      0.3873     0.5906       0.5067
                      CF-Users               0.6622      3.26E-05     0.5449      0.4051     0.7938       0.5230
               Malekzadeh [8] (MORS)         0.8338      3.72E-05     0.6651      0.6221     0.9381       0.5711
                Wang [10] (MORS)             0.6968      3.61E-05     0.6422      0.6200     0.9173       0.5697
                Shafiloo [9] (MORS)          0.7989      4.02E-05     0.6930      0.7396     0.9599       0.6059


   The comparison table suggests that while MORS algo-                particularly those leveraging evolutionary techniques and
rithms effectively mitigate popularity bias in recommenda-            introducing less popular items to the initial population of
tion lists, there is not a significant difference in the diversity    their algorithms, can effectively mitigate popularity bias
of providers between baseline algorithms and MORS al-                 and enhance the provider’s fairness. This emphasizes the
gorithms. For example, CF-User has a value of 0.5230 in               importance of considering provider-side fairness in the de-
AG-providers, while Malekzadeh and Wang show 0.5697 and               velopment of recommender systems, as it is often neglected
0.5711, respectively. Although MORS algorithms, aided by              in current research.
item diversifying objectives, offer providers a better chance            Overall, our research contributes to the growing body of
to present their items, the disparity in aggregate diversity          work on fairness and bias in recommender systems and em-
is more noticeable on the item side than on the provider              phasizes the crucial role of MORS algorithms, particularly
side when comparing MORS algorithms with baseline algo-               those employing evolutionary approaches, in mitigating
rithms. Moreover, the comparison indicates that Baseline              bias and promoting fairness for both items and providers.
algorithms with higher accuracy than MORS algorithms                  Our study provides insights into how existing work objec-
exhibit poor performance in aggregate diversity, suggesting           tives can influence provider fairness. It highlights the need
that recommendation list accuracy can negatively impact               for future research to delve deeper into this issue to pro-
provider-side fairness. Specifically, CF-items achieve an ac-         vide a more comprehensive understanding of the problem.
curacy of 0.7163, whereas CF-Users attain 0.6622. However,            The effectiveness of MORS algorithms for providers could
AG-providers exhibit respective values of 0.5067 and 0.5230.          be further enhanced if a specific objective function were
   Also, Table 1 indicates that among MORS algorithms,                dedicated to mitigating provider bias. The absence of such
Malekzadeh’s work outperforms Shafiloo and Wang’s work                an objective function might limit the algorithms’ ability to
in terms of the precision metric. However, this superi-               address biases related to the popularity of providers in the
ority adversely impacts aggregate diversity on both the               recommendation process.
provider and item sides. Specifically, Shafiloo’s work ex-
hibits a precision of 0.7989, with aggregate provider diver-
sity at 0.6059 and aggregate item diversity at 0.6930. In             References
contrast, Malekzadeh’s work achieves a precision of 0.8338,
                                                                       [1] M. Stratigi, E. Pitoura, K. Stefanidis, Squirrel: A frame-
but the aggregate provider diversity decreases to 0.5711, and
                                                                           work for sequential group recommendations through
the aggregate item diversity is 0.6651.
                                                                           reinforcement learning, Information Systems 112
   Furthermore, in Figure 1, we present the provider fre-
                                                                           (2023) 102128. URL: https://www.sciencedirect.com/
quency using a bucketing technique. Specifically, in this
                                                                           science/article/pii/S0306437922001065. doi:https://
figure, providers are assigned to a bucket based on the num-
                                                                           doi.org/10.1016/j.is.2022.102128.
ber of items belonging to that specific provider that are
                                                                       [2] E. Pitoura, K. Stefanidis, G. Koutrika, Fairness in rank-
represented in all recommendation lists generated by the
                                                                           ings and recommendations: an overview, The VLDB
algorithm. For instance, a provider is placed in bucket one
                                                                           Journal (2022) 1–28.
if only one item from all items associated with that provider
                                                                       [3] B. Salimi, L. Rodriguez, B. Howe, D. Suciu, Interven-
is present in all recommendation lists.
                                                                           tional fairness: Causal database repair for algorithmic
   This figure shows that baseline algorithms exhibit a weak-
                                                                           fairness, in: Proceedings of the 2019 International
ness in recommending items from providers who lack popu-
                                                                           Conference on Management of Data, 2019, pp. 793–
larity. This is illustrated in the initial buckets, where baseline
                                                                           810.
algorithms struggle to include more items from less famous
                                                                       [4] Z. Zhu, X. Hu, J. Caverlee, Fairness-aware tensor-
providers. Conversely, the first part of the buckets shows
                                                                           based recommendation, in: Proceedings of the 27th
that MORS algorithms provide a more significant opportu-
                                                                           ACM international conference on information and
nity for less-known providers to showcase their items in
                                                                           knowledge management, 2018, pp. 1153–1162.
the recommendation lists, offering more visibility.
                                                                       [5] T. Kamishima, S. Akaho, H. Asoh, J. Sakuma, Recom-
                                                                           mendation independence, in: Conference on fairness,
5. Conclusions                                                             accountability and transparency, PMLR, 2018, pp. 187–
                                                                           201.
In conclusion, our study highlights the significance of MORS           [6] G. Giannopoulos, G. Papastefanatos, D. Sacharidis,
algorithms in addressing the issue of bias in recommender                  K. Stefanidis, Interactivity, fairness and explanations
systems and promoting fairness for both items and providers.               in recommendations, in: ACM UMAP, 2021, pp. 157–
Our findings reveal that while baseline algorithms can neg-                161.
atively impact the provider’s fairness, MORS algorithms,               [7] L. Xiao, Z. Min, Z. Yongfeng, G. Zhaoquan, L. Yiqun,
         Figure 1: Figure illustrates provider frequency distribution using a bucketing technique. Providers are categorized based on
         the number of items they contribute to all recommendation lists. This visualization offers insights into provider diversity and
         prevalence within recommendation systems.


     M. Shaoping, Fairness-aware group recommendation                      ceedings, Part III 23, Springer, 2019, pp. 302–313.
     with pareto-efficiency, in: Proceedings of the eleventh          [17] C.-T. Li, C. Hsu, Y. Zhang, Fairsr: Fairness-aware se-
     ACM conference on recommender systems, 2017, pp.                      quential recommendation through multi-task learning
     107–115.                                                              with preference graph embeddings, ACM Transac-
 [8] E. M. Hamedani, M. Kaedi, Recommending the                            tions on Intelligent Systems and Technology (TIST) 13
     long tail items through personalized diversification,                 (2022) 1–21.
     Knowledge-Based Systems 164 (2019) 348–357. doi:10.              [18] H. Wu, C. Ma, B. Mitra, F. Diaz, X. Liu,            A
     1016/j.knosys.2018.11.004.                                            multi-objective optimization framework for multi-
 [9] R. Shafiloo, M. Kaedi, A. Pourmiri, Considering user                  stakeholder fairness-aware recommendation, ACM
     dynamic preferences for mitigating negative effects                   Transactions on Information Systems 41 (2022) 1–29.
     of long tail in recommender systems, arXiv preprint              [19] G. Cornacchia, F. M. Donini, F. Narducci, C. Pomo,
     arXiv:2112.02406 (2021).                                              A. Ragone, Explanation in multi-stakeholder recom-
[10] S. Wang, M. Gong, H. Li, J. Yang, Multi-objective opti-               mendation for enterprise decision support systems, in:
     mization for long-tail recommendation, Knowledge-                     International Conference on Advanced Information
     Based Systems 104 (2016) 145–155. doi:10.1016/j.                      Systems Engineering, Springer, 2021, pp. 39–47.
     knosys.2016.04.018.                                              [20] F. Zou, D. Chen, Q. Xu, Z. Jiang, J. Kang, A two-
[11] G. Adomavicius, Y. Kwon, Improving aggregate recom-                   stage personalized recommendation based on multi-
     mendation diversity using ranking-based techniques,                   objective teaching–learning-based optimization with
     IEEE Transactions on Knowledge and Data Engineer-                     decomposition, Neurocomputing 452 (2021) 716–727.
     ing 24 (2011) 896–911. doi:10.1109/TKDE.2011.15.
[12] W. Yue, Z. Wang, W. Liu, B. Tian, S. Lauria, X. Liu,
     An optimally weighted user-and item-based collabo-
     rative filtering approach to predicting baseline data
     for friedreich’s ataxia patients, Neurocomputing 419
     (2021) 287–294.
[13] R. Borges, K. Stefanidis, Feature-blind fairness in col-
     laborative filtering recommender systems, Knowledge
     and Information Systems 64 (2022) 943–962.
[14] X. Cai, Z. Hu, P. Zhao, W. Zhang, J. Chen, A hybrid
     recommendation system with many-objective evolu-
     tionary algorithm, Expert Systems with Applications
     159 (2020) 113648.
[15] A. Jain, P. K. Singh, J. Dhar, Multi-objective item
     evaluation for diverse as well as novel item recom-
     mendations, Expert Systems with Applications 139
     (2020) 112857.
[16] J. Pang, J. Guo, W. Zhang, Using multi-objective op-
     timization to solve the long tail problem in recom-
     mender system, in: Advances in Knowledge Discov-
     ery and Data Mining: 23rd Pacific-Asia Conference,
     PAKDD 2019, Macau, China, April 14-17, 2019, Pro-