<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>September</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <article-id pub-id-type="doi">10.15439/2021F117</article-id>
      <title-group>
        <article-title>Consumer Fairness Benchmark in Recommendation</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Discussion Paper</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ludovico Boratto</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Gianni Fenu</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Mirko Marras</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Giacomo Medda</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Mathematics and Computer Science, University of Cagliari</institution>
          ,
          <addr-line>Cagliari</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2023</year>
      </pub-date>
      <volume>20</volume>
      <issue>2019</issue>
      <fpage>2</fpage>
      <lpage>5</lpage>
      <abstract>
        <p>Several mitigation procedures have emerged to address consumer unfairness in personalized rankings. However, evaluating their performance is dificult due to variations in experimental protocols, such as difering fairness definitions, data sets, evaluation metrics, and sensitive attributes. This makes it challenging for scientists to choose a suitable procedure for their practical setting. In this paper, we summarize our previous work on investigating the properties a given mitigation procedure against consumer unfairness should be evaluated on. To this end, we defined eight technical properties and leveraged two public datasets to evaluate the extent to which existing mitigation procedures against consumer unfairness met these properties. Source code and data: https://github.com/jackmedda/Perspective-C-Fairness-RecSys.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Recommender Systems</kwd>
        <kwd>Consumer Fairness</kwd>
        <kwd>Mitigation Procedure</kwd>
        <kwd>Reproducibility</kwd>
        <kwd>Evaluation Protocol</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        With the large adoption of decision-support systems, governments are establishing regulations
to account for their trustworthiness. Indeed, it is fundamental to highlight and administer the
harmful impacts of artificial intelligence (AI) systems. Recommender systems denote a notable
example of systems where trustworthiness and safety are key aspects to be concerned about. In
such systems, people are provided with personalized suggestions generated by a certain model
[
        <xref ref-type="bibr" rid="ref1 ref2">1, 2</xref>
        ]. Prior studies have however shown that recommender systems often lead to discriminatory
outcomes [
        <xref ref-type="bibr" rid="ref3 ref4 ref5">3, 4, 5</xref>
        ], afecting the entity being ranked or the users the recommendations are
targeted to (consumers) [
        <xref ref-type="bibr" rid="ref6 ref7 ref8">6, 7, 8</xref>
        ]. Despite the growing interest in providing fair recommendations
to consumers, diverging definitions of consumer fairness have led to unfairness mitigation
procedures built on top of heterogeneous evaluation protocols. It is then crucial to discussing
which properties a mitigation procedure against consumer unfairness should be evaluated on.
      </p>
      <p>
        In this paper, we summarize our prior work [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] on building a common ground that can act as
a basis for the evaluation of consumer unfairness mitigation procedures. To this end, we defined
eight technical properties a given mitigation procedure against consumer unfairness should meet
for being efective in practice. We then benchmarked the extent to which existing mitigation
procedures meet the defined properties, qualitatively and quantitatively (when possible), on two
public data sets. Finally, we gathered the evaluation performance of the mitigation procedures
under each property and highlighted the extent to which each procedure meets these properties.
      </p>
    </sec>
    <sec id="sec-2">
      <title>2. Perception of the State of the Art</title>
      <p>
        In this section, we describe the process followed for collecting papers about consumer
unfairness mitigation procedures and then categorizing them based on how unfairness was defined,
mitigated, and assessed (Table 1). Please refer to the original work [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] for detailed information.
Paper Collection Process. Mitigation procedures against consumer unfairness proposed so far
in the literature were collected by scanning Information Retrieval conferences and workshops
proceedings as well journals with high impact. Relying on the framework shared by [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] when
possible, we reproduced the mitigation procedures proposed in the collected papers.
Fairness Definition Perception . There is no consensus on how to perceive unfairness from
a consumer perspective in recommendation. Studies often consider diferent viewpoints to
analyze, mitigate, and evaluate unfairness. Generally, these studies explored fairness notions
that mainly address two principles: equity of certain metric scores between demographic groups
(EQ); independence of a certain outcome from the sensitive attribute (IND).
      </p>
      <p>
        Unfairness Mitigation Perception. Studies focusing on fairness from an EQ perspective
usually perform a mitigation by balancing the representation of groups in the training set (e.g.,
[
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]), reducing the error across groups (e.g., [
        <xref ref-type="bibr" rid="ref6">6, 16</xref>
        ]) or re-ranking items (e.g., [
        <xref ref-type="bibr" rid="ref12">12, 14</xref>
        ]). From
an IND perspective, unfairness is usually countered by decoupling the user and item latent
representations from sensitive attribute information (e.g., [15, 17]) or introducing independence
guarantees between the sensitive attribute and the predicted relevance score (e.g., [13]).
      </p>
    </sec>
    <sec id="sec-3">
      <title>3. Research Methodology</title>
      <p>
        In this section, we describe the data sets, sensitive attributes, recommendation models, and the
unified evaluation protocol [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] used to benchmark the collected mitigation procedures.
Experimental Data Sets. We selected the two public data sets reported in Table 2, namely
ML1M (movies) and LFM1K (music). We considered binarized protected attribute labels (if not
already binary, the groups were binarized to have the most similar representation possible).
Recommendation Models. The range of recommendation models evaluated in prior work
in terms of consumer unfairness was heterogeneous, since no common protocol existed. Our
study in this paper focuses on recommendation models considered in at least one prior work.
Evaluation Protocol. For each setup, we obtained the predicted relevance scores and monitored
the utility of top-n recommendations through NDCG. Unfairness between consumer groups was
monitored from an equity (EQ) perspective in terms of NDCG Demographic Parity (DP) [19],
computed as the diference in NDCG between the majority group and the minority group (w.r.t.
their representation in the data set), and from an independence (IND) perspective by means of a
Kolmogorov-Smirnov test (KS) on the predicted relevance scores, as also proposed by [20].
      </p>
    </sec>
    <sec id="sec-4">
      <title>4. Mitigation Procedures Benchmark</title>
      <p>
        In this section, we propose eight key properties to consider while evaluating a mitigation
procedure ofline, before moving it into practice. Table 3 reports the performance of the
recommender systems, before and after unfairness was mitigated, in terms of recommendation
utility and fairness between gender groups. Other results can be found in our original study [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ].
Applicability. Indicates the extent to which a mitigation procedure can be technically run on
a wide range of diferent recommendation models without requiring any substantial change to
the fundamental steps it is based on. Pre-processing approaches potentially have a very high
applicability, while the applicability of in-processing and post-processing approaches could be
afected by aspects related to the implementation or to the adopted fairness notion.
Coherence. Indicates the extent to which a mitigation procedure tends to reduce the biased
outcomes for the originally disadvantaged group, without reversing the disparate outcome towards
the other group(s). In Table 3, low coherence was reported by SLIM-U since applying the mitigation
of [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] led to male users being advantaged instead of female users.
      </p>
      <p>
        Consistency. Indicates the ability of a mitigation procedure to substantially reduce the model’s
unfairness according to the pursued fairness notion, given any data set and any consumer grouping
method. Overall, Li et al. [14] was the only consistent mitigation procedure across data sets and
sensitive attributes under our unified evaluation protocol. Instead, under the papers’ original
evaluation protocols, no procedure was consistent according to our definition.
Data Robustness. Indicates the ability of a mitigation procedure to reduce unfairness also in
challenging cases related to data distribution (e.g., imbalances) and relationships between unfairness
and other features. Our analysis uncovered how leveraging data characteristics causally-related
to unfairness, e.g., popularity bias [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ], to reduce it could provide better insights on the problem.
Reproducibility. Indicates the ability of taking the original source code that implements a
mitigation procedure and being able to execute it under the same or a diferent evaluation protocol,
with respect to the one used in the original paper. Our analysis showed that 2 out of 8 papers were
not reproducible, which limited our work and it remarks the need to sharing the source code.
Scalability. Indicates the ability of a mitigation procedure to scale well when the number of
interactions, users, items, and sensitive attributes, and other relevant features increases consistently. On
data sets with a higher number of entities (e.g., users, interactions), some mitigation procedures
(Li et al. and Burke et al. [
        <xref ref-type="bibr" rid="ref6">6, 14</xref>
        ]) would lead to unmanageable time and memory requirements.
Trade-of Management . Indicates the ability of a mitigation procedure to preserve the
performance estimate achieved by the target recommendation model originally (before the mitigation
was applied). Overall, Ekstrand et al. [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] reported the best trade-of across all the data sets and
sensitive attributes. It reduced unfairness, while minimally afecting utility.
      </p>
      <p>
        Transferability. Indicates the ability of a mitigation procedure to be efective (and not only
applicable) on a wide range of recommendations models, even those it was not originally designed
for or tested on. We applied the mitigation procedures of Ekstrand et al. [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] and Li et al. [14]
on the models used by the other papers. Both methods do not hold a good transferability level.
Discussion. As a summary, for each property and mitigation procedure, we assigned one of
the two following labels: Higher when the corresponding work was better than the others on
average for the selected property, Lower otherwise. The mitigation procedures proposed by
[
        <xref ref-type="bibr" rid="ref11">14, 11</xref>
        ] reported the highest number of above-average properties.
      </p>
    </sec>
    <sec id="sec-5">
      <title>5. Conclusions and Future Work</title>
      <p>In this work, we collected and reproduced relevant papers addressing consumer unfairness
mitigation and categorized them according to the definition, mitigation, and assessment
strategy. Then, we defined a unified experimental protocol, including eight technical properties a
mitigation procedure should meet, and evaluated the reproduced mitigation procedures on two
public data sets on the basis of the defined evaluation properties. Our work allows to have a
better understanding of the aspects that could increase the mitigation efectiveness and what
can be done to avoid the phenomena outlined by our experiments. Future work will consider
novel mitigation procedures able to satisfy all the properties introduced in our paper.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>M. G.</given-names>
            <surname>Armentano</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Monteserin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Berdun</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Bongiorno</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L. M.</given-names>
            <surname>Coussirat</surname>
          </string-name>
          ,
          <article-title>User recommendation in low degree networks with a learning-based approach</article-title>
          ,
          <source>in: Mexican International Conference on Artificial Intelligence</source>
          , Springer,
          <year>2018</year>
          , pp.
          <fpage>286</fpage>
          -
          <lpage>298</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>N.</given-names>
            <surname>Mauro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Ardissono</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Cocomazzi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Cena</surname>
          </string-name>
          ,
          <article-title>Using consumer feedback from locationbased services in poi recommender systems for people with autism</article-title>
          ,
          <source>Expert Systems with Applications</source>
          <volume>199</volume>
          (
          <year>2022</year>
          )
          <fpage>116972</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>K.</given-names>
            <surname>Dinnissen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Bauer</surname>
          </string-name>
          ,
          <article-title>Fairness in music recommender systems: A stakeholder-centered mini review, Frontiers in Big Data (????) 63.</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>O.</given-names>
            <surname>Lesota</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Melchiorre</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Rekabsaz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Brandl</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Kowald</surname>
          </string-name>
          , E. Lex,
          <string-name>
            <given-names>M.</given-names>
            <surname>Schedl</surname>
          </string-name>
          ,
          <article-title>Analyzing item popularity bias of music recommender systems: Are diferent genders equally afected?</article-title>
          ,
          <source>in: Fifteenth ACM Conference on Recommender Systems</source>
          ,
          <year>2021</year>
          , pp.
          <fpage>601</fpage>
          -
          <lpage>606</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>J.</given-names>
            <surname>Neidhardt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Sertkan</surname>
          </string-name>
          ,
          <article-title>Towards an approach for analyzing dynamic aspects of bias and beyond-accuracy measures</article-title>
          ,
          <source>in: International Workshop on Algorithmic Bias in Search and Recommendation</source>
          , Springer,
          <year>2022</year>
          , pp.
          <fpage>35</fpage>
          -
          <lpage>42</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>R.</given-names>
            <surname>Burke</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Sonboli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Ordonez-Gauger</surname>
          </string-name>
          ,
          <article-title>Balanced neighborhoods for multi-sided fairness in recommendation</article-title>
          , in: Conference on Fairness, Accountability and Transparency,
          <string-name>
            <surname>FAT</surname>
          </string-name>
          <year>2018</year>
          ,
          <volume>23</volume>
          -24
          <source>February</source>
          <year>2018</year>
          , New York, NY, USA, volume
          <volume>81</volume>
          <source>of Proceedings of Machine Learning Research, PMLR</source>
          ,
          <year>2018</year>
          , pp.
          <fpage>202</fpage>
          -
          <lpage>214</lpage>
          . URL: http://proceedings.mlr.press/v81/burke18a. html.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>J.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Dong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Feng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>He</surname>
          </string-name>
          ,
          <article-title>Bias and debias in recommender system: A survey and future directions</article-title>
          , CoRR abs/
          <year>2010</year>
          .03240 (
          <year>2020</year>
          ). URL: https://arxiv. org/abs/
          <year>2010</year>
          .03240. arXiv:
          <year>2010</year>
          .03240.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>H.</given-names>
            <surname>Abdollahpouri</surname>
          </string-name>
          , G. Adomavicius,
          <string-name>
            <given-names>R.</given-names>
            <surname>Burke</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I.</given-names>
            <surname>Guy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Jannach</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Kamishima</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Krasnodebski</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L. A.</given-names>
            <surname>Pizzato</surname>
          </string-name>
          ,
          <article-title>Multistakeholder recommendation: Survey and research directions, User Model</article-title>
          .
          <source>User Adapt. Interact</source>
          .
          <volume>30</volume>
          (
          <year>2020</year>
          )
          <fpage>127</fpage>
          -
          <lpage>158</lpage>
          . URL: https://doi.org/10.1007/ s11257-019-09256-1. doi:
          <volume>10</volume>
          .1007/s11257-019-09256-1.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>L.</given-names>
            <surname>Boratto</surname>
          </string-name>
          , G. Fenu,
          <string-name>
            <given-names>M.</given-names>
            <surname>Marras</surname>
          </string-name>
          , G. Medda,
          <article-title>Practical perspectives of consumer fairness in recommendation</article-title>
          ,
          <source>Inf. Process. Manag</source>
          .
          <volume>60</volume>
          (
          <year>2023</year>
          )
          <article-title>103208</article-title>
          . URL: https://doi.org/10.1016/j. ipm.
          <year>2022</year>
          .
          <volume>103208</volume>
          . doi:
          <volume>10</volume>
          .1016/j.ipm.
          <year>2022</year>
          .
          <volume>103208</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>L.</given-names>
            <surname>Boratto</surname>
          </string-name>
          , G. Fenu,
          <string-name>
            <given-names>M.</given-names>
            <surname>Marras</surname>
          </string-name>
          , G. Medda,
          <article-title>Consumer fairness in recommender systems: Contextualizing definitions and mitigations</article-title>
          , in: M.
          <string-name>
            <surname>Hagen</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Verberne</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Macdonald</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Seifert</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          <string-name>
            <surname>Balog</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          <string-name>
            <surname>Nørvåg</surname>
          </string-name>
          , V. Setty (Eds.),
          <source>Advances in Information Retrieval</source>
          , Springer International Publishing, Cham,
          <year>2022</year>
          , pp.
          <fpage>552</fpage>
          -
          <lpage>566</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <surname>M. D. Ekstrand</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Tian</surname>
            ,
            <given-names>I. M.</given-names>
          </string-name>
          <string-name>
            <surname>Azpiazu</surname>
            ,
            <given-names>J. D.</given-names>
          </string-name>
          <string-name>
            <surname>Ekstrand</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          <string-name>
            <surname>Anuyah</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>McNeill</surname>
            ,
            <given-names>M. S.</given-names>
          </string-name>
          <string-name>
            <surname>Pera</surname>
          </string-name>
          ,
          <article-title>All the cool kids, how do they fit in?: Popularity and demographic biases in recommender evaluation and efectiveness</article-title>
          , in: Conference on Fairness, Accountability and Transparency,
          <string-name>
            <surname>FAT</surname>
          </string-name>
          <year>2018</year>
          , volume
          <volume>81</volume>
          ,
          <string-name>
            <surname>PMLR</surname>
          </string-name>
          ,
          <year>2018</year>
          , pp.
          <fpage>172</fpage>
          -
          <lpage>186</lpage>
          . URL: http://proceedings.mlr.press/v81/ ekstrand18b.html.
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>V.</given-names>
            <surname>Tsintzou</surname>
          </string-name>
          , E. Pitoura,
          <string-name>
            <given-names>P.</given-names>
            <surname>Tsaparas</surname>
          </string-name>
          ,
          <article-title>Bias disparity in recommendation systems</article-title>
          , in: R. Burke,
          <string-name>
            <given-names>H.</given-names>
            <surname>Abdollahpouri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E. C.</given-names>
            <surname>Malthouse</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K. P.</given-names>
            <surname>Thai</surname>
          </string-name>
          , Y. Zhang (Eds.),
          <source>Proceedings of the Workshop on Recommendation in Multi-stakeholder Environments co-located with the 13th ACM Conference on Recommender Systems (RecSys</source>
          <year>2019</year>
          ), Copenhagen, Denmark,
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>