<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>MuRS</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Hard Negative Sampling for Music Recommendation: Real-World Impacts on Accuracy and Diversity</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>M. Jefrey Mei</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Oliver Bembom</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Andreas F. Ehmann</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>SiriusXM Radio Inc., 1221 Avenue of the Americas, New York</institution>
          ,
          <addr-line>New York 10020</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2024</year>
      </pub-date>
      <volume>2</volume>
      <fpage>0000</fpage>
      <lpage>0002</lpage>
      <abstract>
        <p>Using real user negative feedback as hard negative samples is known to help improve recommendation accuracy and diversity. We evaluate the benefits of implicit and explicit negative feedback and their impacts on model accuracy and recommendation diversity for Pandora and Spotify datasets, including an online test on Pandora users. We find both implicit and explicit negative feedback can increase both accuracy and diversity, and confirm this in the online test. Moreover, mixing in some randomly-sampled negative feedback as well gives the highest diversity, without hurting the accuracy. We evaluate the prediction accuracy for users with diferent proportions of positive and negative feedback. Lastly, we explore possible connections between prediction accuracy and song diversity that may occur from introducing hard negative samples.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;music recommendation</kwd>
        <kwd>sequential recommendation</kwd>
        <kwd>negative sampling</kwd>
        <kwd>diversity</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Modern music streaming platforms have vast catalogs comprising millions (if not tens or hundreds of
millions) of songs, of which a user is likely only interested in hearing a small proportion. Recommender
systems are crucial to efectively filter and personalize these tracks for each user. These recommender
systems may be powered by user feedback. Although positive feedback (e.g. clicks, views, or a ‘like’
button) is generally easier to collect than negative feedback, where available, negative feedback can be
incorporated into recommender systems to improve their accuracy [
        <xref ref-type="bibr" rid="ref1 ref2 ref3 ref4">1, 2, 3, 4</xref>
        ]. In the music domain,
negative feedback may be explicit ‘down-thumb’/‘dislike’ negative feedback, or implicit feedback (e.g.
song skips) which may share traits of both positive and negative feedback [
        <xref ref-type="bibr" rid="ref4 ref5">5, 4</xref>
        ].
      </p>
      <p>
        Sequential and temporal information have been found to improve music recommendation accuracy
[e.g. 6, 7, 4]. Negative feedback can also be included as additional inputs to improve recommendation
accuracy [e.g. 3, 1, 4], as well as recommendation diversity [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. Poor recommendation diversity may
also enhance already-extant popularity biases in music recommender systems [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ].
      </p>
      <p>In addition to using negative feedback as inputs, negative feedback can also be used as negative
targets for prediction during training. In cases where no negative feedback exists, often random negative
samples are used: this can lead to model overconfidence as random negatives are often irrelevant and
unrelated to the positive target and are therefore unrealistic [10]. A variety of ways to improve on
random negative sampling have been explored, such as picking the highest-scoring random negative
out of a batch [11], frameworks to avoid false hard negatives [12], customized loss functions [10] and
augmenting negative samples to better contrast against positive samples [13].</p>
      <p>In this paper, we explore how the recommendation accuracy and diversity for a transformer model
are afected by incorporating hard negative samples during training, culminating in an online A/B test.
More specifically, we find that:
• The accuracy and diversity of recommendations generally both increase when using some hard
negatives; however, too many hard negatives causes diversity to drop
• The higher accuracy is generally caused by both positive and negative targets being ranked
lower, which also afects the recommendation diversity. Negative targets are lowered by a greater
amount, causing a gain in overall accuracy.
• The recommendation accuracy is fairly robust with respect to difering proportions of
positive/negative feedback in user data, although there is a slight accuracy increase for users with a
higher proportion of positive feedback</p>
    </sec>
    <sec id="sec-2">
      <title>2. Data</title>
      <p>We use two datasets for our analysis. One is Spotify’s open-source Sequential Skip Prediction dataset
[14] and the other is a proprietary dataset from Pandora. Both are filtered for ‘radio stations’ only.
Summary statistics are shown in Table 1. The collected feedback types in the two datasets are not
necessarily equivalent, which is discussed later in Sect. 6.1. Positive feedback (‘+’) for Pandora is
explicit up-thumbs given by the user, whereas for Spotify it is song plays (i.e. implicit positive feedback).
The negative (‘− ’) feedback for Pandora includes both explicit down-thumb and implicit skip feedback,
whereas for Spotify only includes implicit skip feedback.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Model</title>
      <p>
        The model is based of SASRec [ 15], with more details given in [16] and [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. The benefits of using
a transformer model vs. matrix factorization have been already quantified in [ 16] and are not the
focus of this work. The benefits in accuracy and coverage of including negative feedback as inputs
is also covered in [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. This work aims to quantify the benefits of using negative feedback as targets
(real hard negative samples) within the same transformer model described above and its impacts on
recommendation accuracy and diversity. We define a hyperparameter phard, which is the proportion of
negative samples each epoch that use real negative feedback instead of a randomly-sampled negative.
Negative samples are re-sampled each epoch, so for 0 &lt; phard &lt; 1, the exact positions for which hard
negative samples are used may difer each epoch.
      </p>
    </sec>
    <sec id="sec-4">
      <title>4. Evaluation</title>
      <p>The model is evaluated against users who have given both positive and negative feedback in the
subsequent month (‘test period’) after the training time window (∼ 3 years of sessions for Spotify
and ∼ 1 year of user feedback for Pandora). The test periods for the Pandora/Spotify dataset start on
2024-02-01 and 2018-08-15 respectively. For each user who has given both ‘+’ and ‘− ’ feedback, we pick
a random (‘+’,‘− ’) pair (from the same radio station). A successful prediction is when the ‘+’ song is
correctly scored higher than the ‘− ’ song. This test accuracy is equivalent to the averaged per-user area
under the receiver operating characteristic curve (AUC) for all users in the test set [17]. A model that
randomly guesses would have an accuracy of 0.5. We use explicit negative feedback for testing because
if we test against randomly-sampled negatives, the accuracy ≫ 0.99, which is not very useful, nor
realistic, as our model is used to distinguish and rank similar songs which are on the same user-selected
station.</p>
    </sec>
    <sec id="sec-5">
      <title>5. Results</title>
      <p>We find that using hard negatives generally increases the prediction accuracy, though this lift quickly
plateaus after phard ≈ 0.3 (Fig. 1a). This is likely because for any phard &gt; 0, all positive targets for
prediction will eventually be paired with a hard negative (as each target’s negative sample is re-sampled
each epoch with probability phard of being a hard negative). Higher phard simply means that these hard
negatives are used more frequently, which may help the model converge faster (Fig. 1d).</p>
      <p>However, the diversity (measured in terms of the Gini coeficient, where a lower Gini means higher
diversity) seems to have a local minimum around 0.2 &lt; phard &lt; 0.6 (depending on the dataset), and
then increases again for phard ≫ 0. The optimal phard for highest diversity for a given dataset may
depend on dataset statistics such as average sequence length (Table 1) or others. Given the accuracy is
relatively static for phard &gt; 0, we are able to optimize for higher diversity without having to sacrifice
accuracy.</p>
      <p>Although the accuracy is relatively static for phard &gt; 0 (Fig. 1a), the mean reciprocal ranks (MRR)
are not (Fig. 1c). Both the positive and negative targets are ranked lower as phard increases. The
accuracy gain compared to phard = 0 comes from the negative target generally being ranked/scored
lower comparatively more than the positive target (the test accuracy is loosely related to the diference
between the positive and negative target song ranks).</p>
      <p>Given the relatively flat accuracy beyond phard &gt; 0.3 and the optimal diversity for phard between
0.4 and 0.6, the model chosen for the Pandora online test was the phard = 0.4 model. This also trains
with 30% fewer epochs, as an additional benefit in reducing training costs.
5.1. Online test
8M users are subjected the above model and the control experience over a one-week period. The control
experience is a proprietary baseline model using collaborative filtering without individual negative
user feedback. The tested phard = 0.4 model increased the key business metric (completed plays) by
0.8%. The recommendation accuracy, which is what the model is trained to optimize, increased by
2.4%. There was also an increase in recommendation diversity, as shown in Table 2.</p>
    </sec>
    <sec id="sec-6">
      <title>6. Discussion</title>
      <p>6.1. Positive feedback proportion
As the model is training to predict future positive feedback, it is possible that the model accuracy can
vary depending on the composition of the input user’s feedback. For both the Pandora and Spotify
datasets, there is a slight accuracy increase for users with more positive feedback (Fig. 2). They manifest
in slightly diferent ways: the Pandora dataset shows a relatively unchanged accuracy for phard &gt; 0,
and then a slight drop in accuracy for phard = 0; the Spotify dataset shows a relatively unchanged
accuracy for phard &lt; 1, and then a slight rise in accuracy for phard = 1.</p>
      <p>This may reflect the diferent feedback types that comprise the data, as the Spotify dataset uses
implicit positive feedback (plays) and implicit negative feedback (skips) whereas the Pandora dataset
contains explicit positive and negative feedback. Users in the Spotify dataset who are almost all-play
(i.e. virtually no skips) have scant negative feedback to use as negative targets, and may generally not
want to (or not know how to) use the ‘skip’ button, so using their subsequent plays as the positive
sample may overestimate their true preference for that song.</p>
      <p>
        We note that the bimodal distribution of positive feedback proportions in the Spotify dataset (Fig. 2)
aligns with prior research clustering user types into ‘mostly-play’ and ‘mostly-skip’ [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. In contrast, the
Pandora dataset (with explicit feedback only) shows that almost all users use some skips, and that most
users’ feedback is dominated by skips (in line with Table 1). Using implicit feedback like skips and/or
plays allows for most users to be covered, though there may be some contexts in which users may not
give skips as readily (such as while driving, or or when casting music at a party). In such contexts the
recommendation accuracy may be expected to be lower while also being hard to quantify due to the
lack of feedback.
      </p>
      <p>Overall, for both datasets, the accuracy with respect to positive feedback proportion is fairly robust,
which is promising for real-world implementation of this model without fear of hurting the experience
for certain users. Further work should be done to better understand diferent user segments and to
distinguish the diferences between explicit and implicit positive feedback.
6.2. phard and positive/negative song ranks
Both positive and negative targets are scored/ranked lower as phard increases (Fig. 1c). This may be
connected to the increase in diversity, as popular songs generally attract more feedback (both positive
and negative) and are therefore more likely to occur as negative samples for nonzero phard. This would
in turn lead the model to give these popular songs lower scores/ranks and thus recommend other, less
popular songs. The model thus learns to recommend less popular songs in a personalized way for each
user, leading to an increase in both diversity and accuracy (Table 2).</p>
    </sec>
    <sec id="sec-7">
      <title>7. Conclusion</title>
      <p>Incorporating real user feedback as negative targets allows for higher prediction accuracy as well
as increased song diversity. This was evaluated via an ofline simulation and then confirmed in an
online A/B test. The model, which is trained on users with varying proportions of positive and
negative feedback, is able to also cover users with only positive feedback, only negative feedback, and
everything in between, with fairly consistent accuracy. The model diversity is particularly sensitive to
the proportion of hard negatives used as negative targets, and more research should follow in how this
optimal proportion may depend on dataset attributes.
Machinery, New York, NY, USA, 2020, p. 2145–2148. URL: https://doi.org/10.1145/3340531.3412152.
doi:10.1145/3340531.3412152.
[10] A. V. Petrov, C. Macdonald, GSASRec: Reducing Overconfidence in Sequential Recommendation
Trained with Negative Sampling, in: Proceedings of the 17th ACM Conference on Recommender
Systems, RecSys ’23, Association for Computing Machinery, New York, NY, USA, 2023, p. 116–128.</p>
      <p>URL: https://doi.org/10.1145/3604915.3608783. doi:10.1145/3604915.3608783.
[11] T. Wilm, P. Normann, S. Baumeister, P.-V. Kobow, Scaling session-based transformer
recommendations using optimized negative sampling and loss functions, in: Proceedings of the 17th
ACM Conference on Recommender Systems, RecSys ’23, Association for Computing
Machinery, New York, NY, USA, 2023, pp. 1023–1026. URL: https://doi.org/10.1145/3604915.3610236.
doi:10.1145/3604915.3610236.
[12] H. Ma, R. Xie, L. Meng, X. Chen, X. Zhang, L. Lin, J. Zhou, Exploring False Hard Negative Sample in
Cross-Domain Recommendation, in: Proceedings of the 17th ACM Conference on Recommender
Systems, RecSys ’23, Association for Computing Machinery, New York, NY, USA, 2023, p. 502–514.</p>
      <p>URL: https://doi.org/10.1145/3604915.3608791. doi:10.1145/3604915.3608791.
[13] Y. Zhao, R. Chen, R. Lai, Q. Han, H. Song, L. Chen, Augmented Negative Sampling for Collaborative
Filtering, in: Proceedings of the 17th ACM Conference on Recommender Systems, RecSys ’23,
Association for Computing Machinery, New York, NY, USA, 2023, p. 256–266. URL: https://doi.org/
10.1145/3604915.3608811. doi:10.1145/3604915.3608811.
[14] B. Brost, R. Mehrotra, T. Jehan, The Music Streaming Sessions Dataset, in: Proceedings of the
2019 Web Conference, ACM, 2019.
[15] W.-C. Kang, J. McAuley, Self-attentive sequential recommendation, in: 2018 IEEE International</p>
      <p>Conference on Data Mining (ICDM), IEEE, 2018, pp. 197–206. doi:10.1109/ICDM.2018.00035.
[16] M. J. Mei, O. Bembom, A. Ehmann, Station and Track Attribute-Aware Music Personalization,
in: Proceedings of the 17th ACM Conference on Recommender Systems, RecSys ’23, Association
for Computing Machinery, New York, NY, USA, 2023, p. 1031–1035. URL: https://doi.org/10.1145/
3604915.3610239. doi:10.1145/3604915.3610239.
[17] J. A. Hanley, B. J. McNeil, The meaning and use of the area under a receiver operating characteristic
(ROC) curve., Radiology 143 (1982) 29–36.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>P.</given-names>
            <surname>Seshadri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Knees</surname>
          </string-name>
          ,
          <article-title>Leveraging Negative Signals with Self-Attention for Sequential Music Recommendation</article-title>
          ,
          <source>arXiv preprint arXiv:2309.11623</source>
          (
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Halpern</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Chang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Feng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E. Y.</given-names>
            <surname>Le</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Liang</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.-C. Huang</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Beutel</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          <string-name>
            <surname>Zhang</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Bi</surname>
          </string-name>
          ,
          <article-title>Learning from Negative User Feedback and Measuring Responsiveness for Sequential Recommenders</article-title>
          ,
          <source>in: Proceedings of the 17th ACM Conference on Recommender Systems</source>
          , RecSys '23,
          <string-name>
            <surname>Association</surname>
          </string-name>
          for Computing Machinery, New York, NY, USA,
          <year>2023</year>
          , p.
          <fpage>1049</fpage>
          -
          <lpage>1053</lpage>
          . URL: https://doi.org/10.1145/3604915.3610244. doi:
          <volume>10</volume>
          .1145/3604915.3610244.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>S.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Zhou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Wang</surname>
          </string-name>
          , S. Han,
          <string-name>
            <given-names>C.</given-names>
            <surname>Su</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Yuan</surname>
          </string-name>
          ,
          <string-name>
            <surname>C.</surname>
          </string-name>
          <article-title>Wang, SIGformer: Sign-aware Graph Transformer for Recommendation</article-title>
          ,
          <source>in: Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval</source>
          , SIGIR '24,
          <string-name>
            <surname>Association</surname>
          </string-name>
          for Computing Machinery, New York, NY, USA,
          <year>2024</year>
          , p.
          <fpage>1274</fpage>
          -
          <lpage>1284</lpage>
          . URL: https://doi.org/10.1145/ 3626772.3657747. doi:
          <volume>10</volume>
          .1145/3626772.3657747.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>M. J.</given-names>
            <surname>Mei</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Bembom</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Ehmann</surname>
          </string-name>
          ,
          <article-title>Negative Feedback for Music Recommendation</article-title>
          ,
          <source>in: Proceedings of the 32nd ACM Conference on User Modeling, Adaptation and Personalization</source>
          , UMAP '
          <volume>24</volume>
          ', Association for Computing Machinery, New York, NY, USA,
          <year>2024</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>F.</given-names>
            <surname>Meggetto</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Revie</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Levine</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Moshfeghi</surname>
          </string-name>
          ,
          <article-title>On Skipping Behaviour Types in Music Streaming Sessions</article-title>
          ,
          <source>in: Proceedings of the 30th ACM International Conference on Information &amp; Knowledge Management, CIKM '21</source>
          ,
          <string-name>
            <surname>Association</surname>
          </string-name>
          for Computing Machinery, New York, NY, USA,
          <year>2021</year>
          , p.
          <fpage>3333</fpage>
          -
          <lpage>3337</lpage>
          . URL: https://doi.org/10.1145/3459637.3482123. doi:
          <volume>10</volume>
          .1145/3459637.3482123.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>B. L.</given-names>
            <surname>Pereira</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Ueda</surname>
          </string-name>
          , G. Penha,
          <string-name>
            <given-names>R. L. T.</given-names>
            <surname>Santos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Ziviani</surname>
          </string-name>
          ,
          <article-title>Online learning to rank for sequential music recommendation</article-title>
          ,
          <source>in: Proceedings of the 13th ACM Conference on Recommender Systems</source>
          , RecSys '19,
          <string-name>
            <surname>Association</surname>
          </string-name>
          for Computing Machinery, New York, NY, USA,
          <year>2019</year>
          , p.
          <fpage>237</fpage>
          -
          <lpage>245</lpage>
          . URL: https://doi.org/10.1145/3298689.3347019. doi:
          <volume>10</volume>
          .1145/3298689.3347019.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>C.</given-names>
            <surname>Hansen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Hansen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Maystre</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Mehrotra</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Brost</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Tomasi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Lalmas</surname>
          </string-name>
          ,
          <article-title>Contextual and Sequential User Embeddings for Large-Scale Music Recommendation</article-title>
          ,
          <source>in: Proceedings of the 14th ACM Conference on Recommender Systems</source>
          , RecSys '20,
          <string-name>
            <surname>Association</surname>
          </string-name>
          for Computing Machinery, New York, NY, USA,
          <year>2020</year>
          , p.
          <fpage>53</fpage>
          -
          <lpage>62</lpage>
          . URL: https://doi.org/10.1145/3383313.3412248. doi:
          <volume>10</volume>
          .1145/3383313.3412248.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>E.</given-names>
            <surname>Mena-Maldonado</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Cañamares</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Castells</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Ren</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Sanderson</surname>
          </string-name>
          ,
          <article-title>Agreement and Disagreement between True and False-Positive Metrics in Recommender Systems Evaluation</article-title>
          , in
          <source>: Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval</source>
          , SIGIR '20,
          <string-name>
            <surname>Association</surname>
          </string-name>
          for Computing Machinery, New York, NY, USA,
          <year>2020</year>
          , p.
          <fpage>841</fpage>
          -
          <lpage>850</lpage>
          . URL: https://doi.org/10.1145/3397271.3401096. doi:
          <volume>10</volume>
          .1145/3397271.3401096.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>M.</given-names>
            <surname>Mansoury</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Abdollahpouri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Pechenizkiy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Mobasher</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Burke</surname>
          </string-name>
          ,
          <source>Feedback Loop and Bias Amplification in Recommender Systems, in: Proceedings of the 29th ACM International Conference on Information &amp; Knowledge Management, CIKM '20</source>
          ,
          <string-name>
            <surname>Association</surname>
          </string-name>
          for Computing
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>