<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Exploring Artist Gender Bias in Music Recommendation</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Dougal Shakespeare</string-name>
          <email>dougalian.shakespeare01@estudiant.upf.edu</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Lorenzo Porcaro</string-name>
          <email>lorenzo.porcaro@upf.edu</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Emilia Gómez</string-name>
          <email>emilia.gomez@upf.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Carlos Castillo</string-name>
          <email>carlos.castillo@upf.edu</email>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Joint Research Centre, European Commission</institution>
          ,
          <addr-line>Seville</addr-line>
          ,
          <country country="ES">Spain</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Music Technology Group, Universitat Pompeu Fabra</institution>
          ,
          <addr-line>Barcelona</addr-line>
          ,
          <country country="ES">Spain</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Web Science and Social Computing Group, Universitat Pompeu Fabra</institution>
          ,
          <addr-line>Barcelona</addr-line>
          ,
          <country country="ES">Spain</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Music Recommender Systems (mRS) are designed to give personalised and meaningful recommendations of items (i.e. songs, playlists or artists) to a user base, thereby reflecting and further complementing individual users' specific music preferences. Whilst accuracy metrics have been widely applied to evaluate recommendations in mRS literature, evaluating a user's item utility from other impactoriented perspectives, including their potential for discrimination, is still a novel evaluation practice in the music domain. In this work, we center our attention on a specific phenomenon for which we want to estimate if mRS may exacerbate its impact: gender bias. Our work presents an exploratory study, analyzing the extent to which commonly deployed state of the art Collaborative Filtering (CF) algorithms may act to further increase or decrease artist gender bias. To assess group biases introduced by CF, we deploy a recently proposed metric of bias disparity on two listening event datasets: the LFM-1b dataset, and the earlier constructed Celma's dataset. Our work traces the causes of disparity to variations in input gender distributions and user-item preferences, highlighting the efect such configurations can have on user's gender bias after recommendation generation.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>CCS CONCEPTS</title>
      <p>• Social and professional topics → Socio-technical systems;
Gender; • Information systems → Collaborative filtering ;
Recommender systems.
gender bias, bias disparity, music recommendation</p>
    </sec>
    <sec id="sec-2">
      <title>INTRODUCTION</title>
      <p>
        Impact-oriented Recommender System (RS) research is gaining
attention as a novel paradigm for understanding not only how
users interact with recommendations, but also for shedding light
on how these interactions can influence users’ behaviours in the
short- and the long-term [
        <xref ref-type="bibr" rid="ref25">25</xref>
        ]. An outstanding issue when
studying the possible impact of RS is the heterogeneity of evaluation
procedures described in the literature. Evaluating recommender
systems is a non-trivial task because of the multiple facets that
a good recommendation can have, and the multiple players
influencing these aspects [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ]. Even if the need for going beyond the
evaluation in terms of accuracy metrics has been well-recognized by
the RS community [
        <xref ref-type="bibr" rid="ref32">32</xref>
        ], shared practices for evaluating the impact
of recommendations still are missing.
      </p>
      <p>
        Notwithstanding, recent years have seen a rise in awareness in
the scientific community about the implications of socio-technical
systems’ design and implementation responsible of reinforcing bias
and discrimination [
        <xref ref-type="bibr" rid="ref4 ref42">4, 42</xref>
        ]. Music Information Retrieval (MIR)
research is still in its early-stage with regards to the analysis of the
ethical dimensions and impact of music technology [
        <xref ref-type="bibr" rid="ref19 ref22 ref37 ref39">19, 22, 37, 39</xref>
        ],
and several challenges still need to be tackled when approaching
MIR research from a socio-technical perspective. A common issue
is the availability of data, often limited in terms of size, user
information or musical information, and as in many other fields, a
chronic shortage of gender-disaggregated data [
        <xref ref-type="bibr" rid="ref35">35</xref>
        ]. The dificulties
in our research to retrieve the artists’ gender are just one example
of this limitation, as presented in Section 3 and 4.
      </p>
      <p>
        We center our attention on a specific phenomenon that
recommender systems may exacerbate: gender bias. In its broader sense,
gender discrimination is a disadvantage for a group of people based
on their gender. Far from being an emerging problem, gender
discrimination has its roots in cultural practices historically related
with socio-political power diferentials [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]. Nonetheless, the
modern day prevalence of gender discrimination is not to be understated:
recent reports find the disproportionate treatment of female artists
to be prevalent in the Western music industry to this day 1. Whilst
the cause of such treatment is multifaceted, our work traces the
influence of one factor evidenced to be present in the works of
Millar [
        <xref ref-type="bibr" rid="ref33">33</xref>
        ] that is, the pre-existing gender bias of a music listener.
      </p>
      <p>
        In this exploratory study, we assess the extent to which
Collaborative Filtering (CF) algorithms commonly deployed in mRS may
exacerbate pre-existing users’ gender biases thereby afecting an
artist gender’s exposure and proportional representation. We focus
on the measurement of bias disparity in recommender systems,
deifned as " [...] the case where the recommender system introduces bias
in the data, by amplifying existing biases and reinforcing stereotypes."
[
        <xref ref-type="bibr" rid="ref41">41</xref>
        ]. Building on existing literature [
        <xref ref-type="bibr" rid="ref29 ref31 ref41 ref43">29, 31, 41, 43</xref>
        ], we first
reproduce the study presented by Lin et al. [
        <xref ref-type="bibr" rid="ref29">29</xref>
        ], in which preference bias
amplification in collaborative recommendation is analyzed using
the MovieLens dataset[
        <xref ref-type="bibr" rid="ref21">21</xref>
        ], a dataset of user activity with a movie
recommendation system. In our work, we focus on the music
domain making use of two Last.fm2 listening event datasets publicly
available: 1) Celma’s LFM-360k dataset [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]; 2) Schedl’s LFM-1b
dataset [
        <xref ref-type="bibr" rid="ref38">38</xref>
        ]. Our goal is twofold: on one hand, reproducing and
verifying whether previous results [
        <xref ref-type="bibr" rid="ref29">29</xref>
        ] hold across diferent datasets.
On the other hand, we aim at highlighting which aspects specific
1http://assets.uscannenberg.org/docs/aii-inclusion-recording-studio-2019.pdf
2https://www.last.fm
to the music domain can be extracted by this analysis, connecting
with existing literature on gender bias in music preferences [
        <xref ref-type="bibr" rid="ref3 ref33">3, 33</xref>
        ].
      </p>
      <p>The paper is structured as follows. Section 2 provides an overview
of previous works related to bias in Information Technology,
focusing on gender bias, but also how this bias has been approached
in music-related fields. We then introduce the considered datasets,
LFM-1b and LFM-360K respectively in Section 3 and 4. In Section 5,
the recommendation models used and the experimental settings are
presented, followed by Section 6 which details the results obtained.
Lastly, in section 7 conclusions and future work are discussed.
2</p>
    </sec>
    <sec id="sec-3">
      <title>RELATED WORK</title>
      <p>
        The notion of bias has been extensively explored in the Information
Retrieval domain [
        <xref ref-type="bibr" rid="ref11 ref24 ref4 ref5 ref7">4, 5, 7, 11, 24</xref>
        ]. Typically, metrics aim to capture
relative bias (i.e. bias pre-existing in data, for example in user
listening histories in LFM-1b), and algorithmic bias (i.e. how filtering
algorithms can result in unfair item and user treatment) to measure
disproportionate unfair treatment of a protected group.
      </p>
      <p>
        One of the most well-studied biases in RS literature is popularity
bias, with the music domain being no exception to this phenomenon
[
        <xref ref-type="bibr" rid="ref10 ref28 ref6">6, 10, 28</xref>
        ]. This describes the scenario in which a few popular items
are recommended frequently, while the majority of items in the
long-tail do not get proportional attention. Highlighted in literature
as a prominent issue for CF algorithms [
        <xref ref-type="bibr" rid="ref1 ref10 ref34">1, 10, 34</xref>
        ], Kowald et al.
in [
        <xref ref-type="bibr" rid="ref28">28</xref>
        ] find that from a user’s perspective the groups who do not
favor popular items may receive worsened recommendations in
terms of accuracy and calibration. Moreover, Ferraro et al. in [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ]
study the efect of musical styles with respect to popularity bias,
showing that CF approaches increase users’ exposure to popular
musical styles.
      </p>
      <p>
        Bias Disparity is a metric deployed to assess bias propagation
across user’s and item’s group, measuring the deviation of the
recommender output from the input preference, as detailed in Section
5.1. A first application to the RS domain was described by Tsintzou
et al. [
        <xref ref-type="bibr" rid="ref41">41</xref>
        ], but the metric has recently gained more traction in its
application to diferent domains. In Lin et al. [
        <xref ref-type="bibr" rid="ref29">29</xref>
        ], bias disparity is
applied to measure the extent to which state of the art CF algorithms
can exacerbate pre-existing biases in the MovieLens dataset. Their
ifndings show significant diferences in bias propagation across
memory- and model-based CF algorithms.
      </p>
      <p>
        Gender treatment and issues of proportional treatment in RS
have been considered in a range of literature, for which we highlight
some examples. Ekstrand et al. [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ] examined gender distribution
of item recommendations in the book RS domain. Results prove that
commonly deployed CF models difer in the gender distributions of
generated item recommendation lists, such that neighbour-based
approaches are shown to proportionality reflect user-item
preferences in their reading histories, whereas model-based matrix
factorisation favor books whose author is of male gender.
Furthermore, Ekstrand et al. in [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ] study the efect of recommendation
algorithms on the utility for users of diferent gender groups,
finding diference in efectiveness across gender groups. Such work
highlights that the efect in utility does not exclusively benefit large
groups, implying that there may be other underlying latent factors
that influence recommendation accuracy. To address such issues
of disproportionate gender treatment in recommendations, Edizel
et al. in [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ] have recently proposed a novel means of mitigating
the derivation of sensitive features (such as gender) in the latent
space, using fairness constraints based on the predictability of such
features. A similar approach proposing fairness-aware tensor-based
recommendation is also presented by Zhu et al. in [
        <xref ref-type="bibr" rid="ref44">44</xref>
        ].
      </p>
      <p>
        In the music domain, Aguiar et al. [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] propose a methodology
to assess the extent to which artists ranked in Spotify playlists
are afected by gender after accounting for plausible determinants
of inclusion on playlists such as country, song characteristics (e.g.
bpm, key signature), and past streaming success. The authors find
that there is some evidence consistent with the presence of bias
(both for and against female artists), however they do not draw
subsequent relations between this and the disproportionate low
streaming share of female artists on the platform. In the work by
Anglata-Tort et al. [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], through the analysis of UK top 5 music charts
between the years 1960-1995, authors show how popular music is
afected by a large gender inequality, showing the presence of an
existing bias in the listening preferences towards male artists.
Similarly, Millar in [
        <xref ref-type="bibr" rid="ref33">33</xref>
        ], surveying a population of Australian young
adults, shows how music preferences are afected by gender bias,
evidencing diferences between male and female listeners. In contrast,
in our work we apply an auditing strategy for bias propagation
showing under which conditions input preferences are reflected
in RS output, inferring music preferences from the users’ listening
history grouped with respect to the artists’ gender.
3
      </p>
    </sec>
    <sec id="sec-4">
      <title>THE LFM-1B DATASET</title>
      <p>
        The LFM-1b dataset consists of more than one billion listening
events created by over 120,000 users of the music streaming
platform Last.fm [
        <xref ref-type="bibr" rid="ref38">38</xref>
        ]. In our analysis, we consider user-artist
playcounts formed by aggregating user-song listening events by
common artists. We then scale logarithmically the number of listens,
as done in [
        <xref ref-type="bibr" rid="ref13 ref26">13, 26</xref>
        ]. We work with a filtered version of the dataset
in which: a) we remove users who listened to less than 10 unique
artists, and artists listened to by less than 10 users; b) we discard
users whose listening history contains more than 25% of artists
with unknown gender, to mitigate the impact of artists with missing
gender in the dataset.
      </p>
      <p>User gender is represented in the dataset with three categories:
male, female and N/A. We choose to focus only on users with
selfdeclared gender, working with two final categories of user gender:
male and female. As shown in Table 1, distributions are highly
imbalanced towards men – 72% of the users are men.</p>
      <p>
        Artist gender is not represented in the LFM-1b dataset,
consequently we retrieve this information from the open music
encyclopedia MusicBrainz3 (MB) [
        <xref ref-type="bibr" rid="ref40">40</xref>
        ]. Code repositories to implement
the following approach are made openly available4 alongside the
acquired results of the data wrangling5 to elicit reproducibility .
      </p>
      <p>We identify five discrete categories of gender defined in the MB
database: male, female, other, N/A and undef. In the case of artists of
gender N/A and undef, these are diferentiated by artists for which
gender is not applicable and identifiable respectively. For bands,
we compute gender counts of all members and then compute an
3https://musicbrainz.org/
4https://github.com/dshakes90/LFM-1b-MusicBrainz-Gender-Wrangler
5https://zenodo.org/record/3964506#.XyE5N0FKg5n</p>
      <p>LFM-1b
male female</p>
      <p>LFM-360k
male female
Users</p>
      <p>%
Artists</p>
      <p>%
Top-head</p>
      <p>%
Long-tail
%
31.4K
71.67
127K
82.30
25.7K
85.21
100K
81.87
overall classification based on whichever count has a majority. In
the case of artists with gender ties (e.g, a band consisting of 2 males
and 2 females), we discard such artists from our final analysis as
gender is in this instance, deemed ambiguous. After applying this
methodology, we are able to identify 27% of artists with a
knowngender. Distributions are observed to be highly imbalanced such
that artists of male gender consist of the majority (82%) of artists
for which gender can be identified, as shown in Table 1.</p>
      <p>In our final analysis, we further filter artists not identified as
male or female according to the procedure described above. Artists
of gender other are discarded as we deem such data to be too sparse
to be informative in the analysis of users’ listening preferences. We
note this group merits further future evaluation, perhaps relying
on qualitative methods, and limitations of this binary approach
are discussed in Section 7. Table 2 presents the top 5 artists based
on the total sum of play counts in the filtered LFM-1b dataset. We
observe a trend for male artists’ popularity, having approximately
twice as much play counts as top-rated female artists/bands. We
also observe a trend for the top male artists on the platform to be
more commonly composed of bands in comparison to the top-rated
female artists.
4</p>
    </sec>
    <sec id="sec-5">
      <title>THE LFM-360K DATASET</title>
      <p>
        The LFM-360k dataset [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] consists of approximately 360,000 users
listening histories from Last.fm collected during Fall 2008,
presenting a snapshot of listening activity for an earlier period in
comparison to the LFM-1b dataset. With respect to user gender distributions
the proportion of users with a self-declared gender rises to 91%
whereas similarly to the LFM-1b dataset, artist gender is not defined.
To resolve this, we implement the same pre-processing
methodology with the MB database as described for the LFM-1b dataset.
After further applying the lfitering criteria previously detailed, we
are able to identify 31% of artists with a known gender, a proportion
notably higher than that of what we were able to identify for the
LFM-1b dataset. As presented in Table 1, artist gender distributions
in the filtered dataset are once again highly imbalanced towards
artists classified as men. For users with identified gender, we again
observe a high imbalance towards male users (75%) comparable to
rates observed in the LFM-1b dataset. When comparing the two
datasets we observe several additional diferences and similarities
which may impact the propagation of a gender bias in artist
recommendations. First, the number of users is significantly larger than
that of the LFM-1b, whilst the number of artists is much smaller.
Second, sparsity is higher in the LFM-360k dataset in comparison
to the LFM-1b. Third, with regard to the top 5 artists of male and
female gender in the dataset we observe significantly higher
playcounts for artists classified as male in comparison to the LFM-1b
dataset, as shown in Table 1. With regard to similarities across
the two datasets, we observe that top 5 popular male artists are
more commonly bands in comparison to the top 5 female artists.
In addition, we observe that the long-tail of both datasets contains
significantly higher distribution of female artists, in comparison
to the top head reinforcing the conclusion that female artists are
significantly more likely to be less popular on the Last.fm platform
and hence, more likely to be less recommended as a result of this
popularity bias.
      </p>
      <p>In our analysis, we generate a set of r ranked items, Ru which
have the highest predicted ratings for a given user u, limiting the
value of r to 5.</p>
      <p>
        Accuracy and beyond-accuracy metrics. To evaluate the RS
performance, we additionally deploy two accuracy metrics:
Precision, nDCG, and three beyond-accuracy metrics: coverage, spread
and long-tail percentage. We refer to the metrics formulation as
detailed in the work by Noia et al. [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ]. Precision (p@n) captures the
proportion of relevant items in top-N recommendations, such that
relevance is a binary function that represents the relevance of item
i for a user u. In our work, we consider relevant a recommendation
which is greater or equal to the average scaled listening count for
a user, after discarding outliers in the data computed using the
interquartile range. Although p@n is useful for analysing generated
item recommendations, it does not capture accuracy aspects
relating to the rank of a recommendation. Hence, in our work we also
deploy the metric nDCG, a rank sensitive metric used to evaluate
the accuracy of a RS. With respect to metrics beyond accuracy, we
utilise both spread and coverage to capture a recommender
systems ability to recommend a broad range of unique items. Such
approaches are important to consider in our work to potentially
reason and explain bias propagation across artist genders. The
metric long-tail percentage is used to capture the proportion of item
recommendations which exist in the long tail. In our work, we
deifne the long tail as the 80% of least popular items in the system. We
use the metric to capture a filtering algorithms capacity to display
the popularity bias.
5.2
      </p>
    </sec>
    <sec id="sec-6">
      <title>Recommendation Algorithms</title>
      <p>
        We test several commonly deployed memory- and model-based CF
algorithms, following a similar approach to previous work [
        <xref ref-type="bibr" rid="ref28 ref29">28, 29</xref>
        ].
Using Surprise [
        <xref ref-type="bibr" rid="ref23">23</xref>
        ], a Python library for recommender systems,
we formulate our music recommendations as a rating prediction
problem where we predict the preference of a target user u for a
target artist a. We then evaluate RS recommending the top-5 artists
with the highest predicted preferences.
      </p>
      <p>
        We consider two types of CF algorithms: (1) KNN-based
approach: UserKNNAvg [
        <xref ref-type="bibr" rid="ref27">27</xref>
        ], and (2) factorisation-based approach:
Non-Negative Matrix Factorization (NMF ) [
        <xref ref-type="bibr" rid="ref30">30</xref>
        ]. Hyperparameters
of UserKNNAvg and NMF are tuned to give the best performance
we can achieve with respect to the rank aware metric, nDCG. In
addition, we consider two MostPopular and UserItemAvg algorithms
which respectively, recommend the most popular and highest rated
artists. We consider these algorithms for a baseline comparison.
      </p>
      <p>
        A variation of the leave-l-out evaluation detailed in [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] is
performed whereby we translate the approach to evaluate a top-n RS.
Drawing influence from the methodology of Said et al. [
        <xref ref-type="bibr" rid="ref36">36</xref>
        ] we
define 3 parameters: (1) n, the size of the recommendation list
generated, (2) N , the number of items selected for each user to appear in
the test set. N is constrained to be &gt; n to allow for variance in item
recommendations across tested algorithms. (3) M, the minimum
number of unique artists listened to by a user. M is constrained to
be &gt; N to ensure a non-empty test set is able to be formed for each
user. We construct three folds, randomly selecting for each user,
N items in their listening history to belong to the fold’s test set
and then subsequently removing these listening events from the
folds training set. For each of the algorithms tested, we compute all
evaluation metrics and preference ratios over each fold and then
subsequently report average performance. In our work we set N
= 10, M = 20 and n = 5, thereby generating top-5 recommendation
lists. We consider a user’s test set of size N as the sample space for
recommendations to be formed.
5.3
      </p>
    </sec>
    <sec id="sec-7">
      <title>Experimental Design</title>
      <p>We set up two experimental designs to evaluate variations in gender
bias disparity across recommended artists and user groups for the
two datasets. For all experiments detailed, code repositories are
made openly available6. Experiment 1 is a real-world scenario in
which male and female gender distributions are representative of
those in both datasets. Experiment 2 is an extreme scenario in which
all users have high levels of preference ratio, representing extreme
listening preferences towards artists of a specific gender.</p>
      <p>Experiment 1. We generate recommendations for a sample of
all users for which gender can be identified. In the LFM-1b dataset,
we limit the size of this sample to be 30% randomly chosen of all
male and female users in the whole dataset (approx 12,000 users),
due to computational constraints. The size of the user sample for
the LFM-360k dataset was also constrained to be approximately the
same size as samples for the LFM-1b dataset. User and artist gender
distributions in both samples are representative of overall gender
distributions in the entirety of both datasets. We therefore use this
experiment to consider the case of gender bias propagation under
a real world scenario, assessing the extent to which gender bias
disparity may difer across datasets.</p>
      <p>Experiment 2. We generate recommendations only for a
sample of male and female users which have high preference ratios in
the dataset, thereby simulating an extreme scenario under which
all users are highly biased towards one artist gender group in their
listening preferences. For the LFM-1b dataset, we select the top
30% of both male and female user groups with the highest
maximum input preference ratios, maintaining both the proportions
of male and female users in the datasets, and the sample size of
experiment 1. For the LFM-360k dataset, we sample users from both
male and female user groups maintaining the distribution of male
and female users in the original dataset. The final user sample has
approximately the same sample size as that of the LFM-1b user
sample.</p>
      <p>Figure 1 represents the distributions of users’ input preference
ratio towards male and female artist groups. For both datasets
considered in this study, it shows that only around 20% of users have a
preference ratio towards male artists lower than 0.8. On the
contrary, 80% of users have a preference ratio lower than 0.2 towards
female artists. Due to the disproportionate amount of users with
extreme preferences for male artists across both datasets, a random
sampling methodology proposed does little to assess extreme
preference towards female artists, resulting in a situation very similar
to experiment 1. To resolve this, we further limit our sample space
to only users who have extreme preference for female artists, with
input preference ratio towards female artists &gt; 0.6. This results in
a sample size reduction to 100 users for the LFM-1b dataset, and
400 users for the LFM-360k dataset. Although reduced in size in
6https://github.com/dshakes90/Last-fm-Gender-Bias-Analysis
comparison to experiments 1, we believe such experimental designs
to be fundamental to measure the extent to which the treatment
of users with extreme preferences difers across artist genders.
Experiment 2 represents a situation opposite to the one proposed in
experiment 1, thanks to which we can assess if bias propagation is
not embedded in the gender per se, but is a result of pre-existing
bias.
6
6.1</p>
    </sec>
    <sec id="sec-8">
      <title>RESULTS</title>
    </sec>
    <sec id="sec-9">
      <title>Experiment 1 - Whole population</title>
      <p>
        We report in Figure 2 preference ratio, and in Figure 3 bias disparity
results obtained with the LFM-1b dataset. Figure 4 and Figure 5
present preference ratio and bias disparity results respectively for
the LFM-360K dataset. The dotted lines in Figure 2 and Figure 4
represent input preference ratios whereas the plot’s bars display
output preference ratios computed from generated
recommendation lists. With regard to pre-existing bias, users in both datasets
display high and low input preference ratios for male and female
artists respectively, thereby in line with the findings of Millar [
        <xref ref-type="bibr" rid="ref33">33</xref>
        ].
In addition, for both artist genders input preference ratios can be
seen to be higher by users who share the same gender as the artist.
With regard to bias propagation after recommendation, all
recommendation models tested result in a positive bias disparity for male
artists for which there is minimal variance in treatment across user
precision
nDCG
coverage
spread
longtail %
genders. The popularity-based algorithm results in the highest
levels of bias disparity for both male and female users, whilst the NMF
and UserKNNAvg algorithms tested result in the lowest absolute
levels of bias disparity with marginal diference in bias propagation
across the two algorithms. Whatsmore, our findings show male
users to be more afected by bias propagation in the LFM-1b dataset
whilst for LFM-360K, we observe bias propagation to be greater for
female users thereby inline with the findings of Lin et al. [
        <xref ref-type="bibr" rid="ref29">29</xref>
        ]. With
regard to bias disparity for female artists, negative levels are
observed for all algorithms tested. The MostPopular algorithm results
in the lowest levels of bias disparity due to female artists having
significantly lower popularity for both datasets tested, as shown in
Table 1. We observe bias propagation to be greater for
recommendations generated using the LFM-1b dataset reflected in the lower
long-tail percentage attained. This suggests that users in the LFM-1b
dataset may be more subject to a popularity bias in comparison to
LFM-360k which may translate to increased levels of gender bias
disparity due to female artists proportionally residing less in the
top-head. Together, our findings suggest that diferences in bias
propagation across the two datasets may be traced to pre-existing
bias entering the system in the form of listening events.
6.2
      </p>
    </sec>
    <sec id="sec-10">
      <title>Experiment 2 - Extreme preferences</title>
      <p>Considering users with extreme preferences for female artists we
observe the inverse scenario of experiment 1, such that bias
disparity is positive for female artists and negative towards male artists,
as shown in Figure 3 and Figure 5. For both datasets, we comment
that one cause of such disparity is a dramatic imbalance in users’
listening preference, which then subsequently propagates through
to other users’ recommendations. Our findings show that such bias
propagation is not reserved for male artists on the platform and can,
under extreme scenarios emerge in the opposite manner. For both
memory- and model-based approaches tested we observe significant
diferences in bias disparity: NMF results in the smallest absolute
bias disparity increase thereby reflecting a users’ input preference,
whereas the neighbour-based UserKNNAvg increases absolute bias
disparity levels towards whichever user-artist preference is in the
majority. The tendency of NMF to propagate less bias, positively
or negatively speaking, in comparison to the other models is also
reflected in the results obtained from the beyond-accuracy metrics
evaluation. Indeed, for experiment 2 NMF achieves the high
levels of coverage, recommending wider subsets of artists, and at the
results suggest that the model-based algorithm considered in this
study is capable of achieving a higher level of diversification in the
outcomes in comparison to the memory-based model. Translated to
our scenario, it means that NMF is the algorithm that focuses less
on recommending a specific gender group, avoiding the
exacerbation of pre-existing bias in the dataset that other recommendation
algorithms exhibit. Again, the efect of bias propagation is seen to
be more amplified in the case of the LFM-1b dataset.
7</p>
    </sec>
    <sec id="sec-11">
      <title>CONCLUSIONS AND FUTURE WORK</title>
      <p>Studies of gender bias in music preferences, conducted in a field
such as Music Psychology and Gender Studies, have already
evidenced how socio-cultural factors are responsible for disparate
treatment of not-male artists. In the field of MIR, relatively little
research has analyzed how existing technology can have a role
in mitigating or amplifying this bias. In line with the studies on
bias disparity in the RS literature, focusing on the musical domain
we show how recommendation outcomes can actually impact
gender bias in music preferences. Using a binary gender classification,
where users and artists are classified as male or female, we have
shown how at diferent levels recommender systems can propagate
a pre-existing bias. In addition, simulating an “upside down” world
where users have a much higher preference towards female artists,
still we find evidence of an exacerbation of that bias. Our results
show that gender bias can be propagated by CF-based
recommendations, according to the bias present in the data. Hence, RS can have
a role in propagating bias, but at least in our exploratory study, we
have not found evidence about if they cause the emergence of new
forms of biases.</p>
      <p>
        The limitations of our work are several. First, it is important to
remark that the binary classification of gender is an
oversimplification of gender representation. The state of the art perspective of
gender from both natural and social science domains is often
nonbinary, where male and female are just one of the many genders in
which an individual may choose to identify by. Binary definitions
of gender have been widely critiqued to be socially constructed
through routine gendered performances [
        <xref ref-type="bibr" rid="ref12 ref8">8, 12</xref>
        ] thereby, considering
gender to be only binary in this work is both limiting and to some
degree, reinforcing of such binary logic. Second, the evaluation of
RS is computed such that the impact of the outcome can be intended
in the short- but not in the long-term. Using longitudinal data or
simulation frameworks, we believe that a better comprehension
of the phenomenon can be achieved, complementing the results
we have presented. Lastly, Last.fm users tend to come mostly from
Western countries, consequently our results cannot be generalized
to represent a global scenario. This issue is well known in the MIR
domain [
        <xref ref-type="bibr" rid="ref39">39</xref>
        ], and we do believe that to consider a multicultural
perspective is undoubtedly a necessary step to give robustness to MIR
studies dealing with socio-cultural and socio-technical phenomena.
8
      </p>
    </sec>
    <sec id="sec-12">
      <title>ACKNOWLEDGMENTS</title>
      <p>This work is partially supported by the European Commission
under the TROMPA project (H2020 770376).</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>Himan</given-names>
            <surname>Abdollahpouri</surname>
          </string-name>
          , Masoud Mansoury, Robin Burke, and
          <string-name>
            <given-names>Bamshad</given-names>
            <surname>Mobasher</surname>
          </string-name>
          .
          <year>2019</year>
          .
          <article-title>The unfairness of popularity bias in recommendation</article-title>
          .
          <source>CEUR Workshop Proceedings</source>
          <volume>2440</volume>
          (
          <year>2019</year>
          ). arXiv:
          <year>1907</year>
          .13286
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>Luis</given-names>
            <surname>Aguiar</surname>
          </string-name>
          , Joel Waldfogel, and
          <string-name>
            <given-names>Sarah</given-names>
            <surname>Waldfogel</surname>
          </string-name>
          .
          <year>2018</year>
          . Playlisting Favorites: Is Spotify Gender-Biased?
          <source>Technical Report November</source>
          . https://ec.europa.eu/jrc/ sites/jrcsh/files/jrc113503.pdf
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>Manuel</given-names>
            <surname>Anglada-Tort</surname>
          </string-name>
          , Amanda E Krause, and Adrian C North.
          <year>2019</year>
          .
          <article-title>Popular music lyrics and musicians' gender over time: A computational approach</article-title>
          . Psychology of Music (
          <year>2019</year>
          ). https://doi.org/10.1177/0305735619871602
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>Ricardo</given-names>
            <surname>Baeza-Yates</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>Bias on the web</article-title>
          .
          <source>Commun. ACM 61</source>
          ,
          <issue>6</issue>
          (
          <year>2018</year>
          ),
          <fpage>54</fpage>
          -
          <lpage>61</lpage>
          . https://doi.org/10.1145/3209581
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>Solon</given-names>
            <surname>Barocas</surname>
          </string-name>
          and
          <string-name>
            <given-names>Andrew D.</given-names>
            <surname>Selbst</surname>
          </string-name>
          .
          <year>2014</year>
          .
          <article-title>Big Data's Disparate Impact</article-title>
          .
          <source>California Law Review</source>
          <volume>671</volume>
          (
          <year>2014</year>
          ),
          <fpage>671</fpage>
          -
          <lpage>732</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>Christine</given-names>
            <surname>Bauer</surname>
          </string-name>
          and
          <string-name>
            <given-names>Markus</given-names>
            <surname>Schedl</surname>
          </string-name>
          .
          <year>2019</year>
          .
          <article-title>Global and country-specific mainstreaminess measures: Definitions, analysis, and usage for improving personalized music recommendation systems</article-title>
          .
          <source>PLOS ONE i</source>
          (
          <year>2019</year>
          ),
          <fpage>1</fpage>
          -
          <lpage>36</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>Engin</given-names>
            <surname>Bozdag</surname>
          </string-name>
          .
          <year>2013</year>
          .
          <article-title>Bias in algorithmic filtering and personalization</article-title>
          .
          <source>Ethics and Information Technology</source>
          <volume>15</volume>
          ,
          <issue>3</issue>
          (
          <year>2013</year>
          ),
          <fpage>209</fpage>
          -
          <lpage>227</lpage>
          . https://doi.org/10.1007/s10676- 013-9321-6
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>Judith</given-names>
            <surname>Butler</surname>
          </string-name>
          .
          <year>2006</year>
          .
          <string-name>
            <given-names>Gender</given-names>
            <surname>Trouble</surname>
          </string-name>
          . Taylor and Francis.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <surname>RocÃŋo</surname>
            <given-names>CaÃśamares</given-names>
          </string-name>
          , Pablo Castells, and
          <string-name>
            <given-names>Alistair</given-names>
            <surname>Mofat</surname>
          </string-name>
          .
          <year>2020</year>
          .
          <article-title>Ofline evaluation options for recommender systems</article-title>
          .
          <source>Information Retrieval Journal</source>
          <volume>23</volume>
          (03
          <year>2020</year>
          ). https://doi.org/10.1007/s10791-020-09371-3
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>Òscar</given-names>
            <surname>Celma</surname>
          </string-name>
          .
          <year>2010</year>
          .
          <article-title>Music Recommendation and Discovery: The Long Tail, Long Fail, and Long Play in the Digital Music Space</article-title>
          . Springer-Verlag Berlin Heidelberg.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <surname>Henriette</surname>
            <given-names>Cramer</given-names>
          </string-name>
          , Jean Garcia-Gathright, Aaron Springer, and
          <string-name>
            <given-names>Sravana</given-names>
            <surname>Reddy</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>Assessing and addressing algorithmic bias in practice</article-title>
          .
          <source>Interactions</source>
          <volume>25</volume>
          ,
          <issue>6</issue>
          (
          <year>2018</year>
          ),
          <fpage>58</fpage>
          -
          <lpage>63</lpage>
          . https://doi.org/10.1145/3278156
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <surname>Simone de Beauvoir</surname>
          </string-name>
          .
          <year>1949</year>
          .
          <article-title>The Second Sex</article-title>
          . Vintage Classics.
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <surname>Sarah</surname>
            <given-names>Dean</given-names>
          </string-name>
          , Sarah Rich, and
          <string-name>
            <given-names>Benjamin</given-names>
            <surname>Recht</surname>
          </string-name>
          .
          <year>2020</year>
          .
          <article-title>Recommendations and User Agency: The Reachability of Collaboratively-Filtered Information</article-title>
          .
          <source>In Proceedings of the 3rd ACM Conference on Fairness, Accountability and Transparency (ACM FAccT</source>
          <year>2020</year>
          ). Barcelona, Spain,
          <fpage>436</fpage>
          -
          <lpage>445</lpage>
          . https://doi.org/10.1145/3351095.3372866
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>Tommaso</given-names>
            <surname>Di</surname>
          </string-name>
          <string-name>
            <surname>Noia</surname>
          </string-name>
          , Jessica Rosati, Paolo Tomeo, and Eugenio Di Sciascio.
          <year>2017</year>
          .
          <article-title>Adaptive multi-attribute diversity for recommender systems</article-title>
          .
          <source>Information Sciences 382-383</source>
          (
          <year>2017</year>
          ),
          <fpage>234</fpage>
          -
          <lpage>253</lpage>
          . https://doi.org/10.1016/j.ins.
          <year>2016</year>
          .
          <volume>11</volume>
          .015
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <surname>Bora</surname>
            <given-names>Edizel</given-names>
          </string-name>
          , Francesco Bonchi, Sara Hajian, André Panisson, and
          <string-name>
            <given-names>Tamir</given-names>
            <surname>Tassa</surname>
          </string-name>
          .
          <year>2019</year>
          .
          <article-title>FaiRecSys: mitigating algorithmic bias in recommender systems</article-title>
          .
          <source>International Journal of Data Science and Analytics 9</source>
          ,
          <issue>2</issue>
          (
          <year>2019</year>
          ),
          <fpage>197</fpage>
          -
          <lpage>213</lpage>
          . https: //doi.org/10.1007/s41060-019-00181-5
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <surname>Michael</surname>
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Ekstrand</surname>
            , Mucun Tian,
            <given-names>Jennifer D.</given-names>
          </string-name>
          <string-name>
            <surname>Ekstrand</surname>
          </string-name>
          , Oghenemaro Anuyah, David
          <string-name>
            <surname>Mcneill</surname>
            ,
            <given-names>and Maria Soledad</given-names>
          </string-name>
          <string-name>
            <surname>Pera</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>All The Cool Kids, How Do They Fit In? Popularity and Demographic Biases in Recommender Evaluation and Efectiveness</article-title>
          .
          <source>In Proceedings of the 1st ACM Conference on Fairness, Accountability and Transparency (ACM FAccT</source>
          <year>2018</year>
          ), Vol.
          <volume>81</volume>
          .
          <fpage>172</fpage>
          -
          <lpage>186</lpage>
          . https://doi.org/10.18122/ B2GM6F
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <surname>Michael</surname>
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Ekstrand</surname>
          </string-name>
          , Mucun Tian,
          <string-name>
            <surname>Mohammed R. Imran</surname>
            <given-names>Kazi</given-names>
          </string-name>
          , Hoda Mehrpouyan, and
          <string-name>
            <given-names>Daniel</given-names>
            <surname>Kluver</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>Exploring Author Gender in Book Rating and Recommendation</article-title>
          .
          <source>In Proceedings of the 12th ACM Conference on Recommender Systems (RecSys '18)</source>
          .
          <fpage>242</fpage>
          -
          <lpage>250</lpage>
          . http://dl.acm.org/citation.cfm?doid=
          <volume>3240323</volume>
          .
          <fpage>3240373</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <surname>Andres</surname>
            <given-names>Ferraro</given-names>
          </string-name>
          , Dmitry Bogdanov, Xavier Serra, and
          <string-name>
            <given-names>Jason</given-names>
            <surname>Yoon</surname>
          </string-name>
          .
          <year>2019</year>
          .
          <article-title>Artist and style exposure bias in collaborative filtering based music recommendations</article-title>
          .
          <source>In 1st Workshop on Designing Human-Centric MIR Systems (wsHCMIR19)</source>
          ,
          <source>co-located at 20th Conference of the International Society for Music Information Retrieval (ISMIR</source>
          <year>2019</year>
          ). arXiv:
          <year>1911</year>
          .04827 http://arxiv.org/abs/
          <year>1911</year>
          .04827
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <surname>Emilia</surname>
            <given-names>Gomez</given-names>
          </string-name>
          , Andre Holzapfel, Marius Miron,
          <string-name>
            <given-names>and Bob L.</given-names>
            <surname>Sturm</surname>
          </string-name>
          .
          <year>2019</year>
          .
          <article-title>Fairness, Accountability and Transparency in Music Information Research (FAT-MIR)</article-title>
          . https://doi.org/10.5281/zenodo.3546227
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>Asela</given-names>
            <surname>Gunawardana</surname>
          </string-name>
          and
          <string-name>
            <given-names>Guy</given-names>
            <surname>Shani</surname>
          </string-name>
          .
          <year>2015</year>
          .
          <article-title>Evaluating Recommender Systems</article-title>
          . Springer US, Boston, MA,
          <fpage>265</fpage>
          -
          <lpage>308</lpage>
          . https://doi.org/10.1007/978-1-
          <fpage>4899</fpage>
          -7637-
          <issue>6</issue>
          _
          <fpage>8</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>F.</given-names>
            <surname>Maxwell</surname>
          </string-name>
          Harper and
          <string-name>
            <given-names>Joseph A.</given-names>
            <surname>Konstan</surname>
          </string-name>
          .
          <year>2015</year>
          .
          <article-title>The movielens datasets: History and context</article-title>
          .
          <source>ACM Transactions on Interactive Intelligent Systems</source>
          <volume>5</volume>
          ,
          <issue>4</issue>
          (
          <year>2015</year>
          ),
          <fpage>1</fpage>
          -
          <lpage>19</lpage>
          . https://doi.org/10.1145/2827872
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <surname>Andre</surname>
            <given-names>Holzapfel</given-names>
          </string-name>
          , Bob L.
          <string-name>
            <surname>Sturm</surname>
            , and
            <given-names>Mark</given-names>
          </string-name>
          <string-name>
            <surname>Coeckelbergh</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>Ethical Dimensions of Music Information Retrieval Technology</article-title>
          .
          <source>Transactions of the International Society for Music Information Retrieval</source>
          <volume>1</volume>
          (
          <year>2018</year>
          ),
          <fpage>44</fpage>
          -
          <lpage>55</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <given-names>Nicolas</given-names>
            <surname>Hug</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>Surprise, a Python library for recommender systems</article-title>
          . http: //surpriselib.com.
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [24]
          <string-name>
            <surname>Dietmar</surname>
            <given-names>Jannach</given-names>
          </string-name>
          , Lukas Lerche, Iman Kamehkhosh, and
          <string-name>
            <given-names>Michael</given-names>
            <surname>Jugovac</surname>
          </string-name>
          .
          <year>2015</year>
          .
          <article-title>What recommenders recommend: an analysis of recommendation biases and possible countermeasures</article-title>
          .
          <source>User Modeling and User-Adapted Interaction 25</source>
          ,
          <issue>5</issue>
          (
          <year>2015</year>
          ),
          <fpage>427</fpage>
          -
          <lpage>491</lpage>
          . https://doi.org/10.1007/s11257-015-9165-3
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          [25]
          <string-name>
            <surname>Dietmar</surname>
            <given-names>Jannach</given-names>
          </string-name>
          ,
          <source>Oren Sar Shalom, and Joseph A Konstan</source>
          .
          <year>2019</year>
          .
          <article-title>Towards More Impactful Recommender Systems Research</article-title>
          .
          <source>In Proceedings of the ImpactRS Workshop, 13th ACM Conference on Recommender Systems (RecSys</source>
          <year>2019</year>
          ).
          <fpage>15</fpage>
          -
          <lpage>17</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          [26]
          <string-name>
            <surname>Gawesh</surname>
            <given-names>Jawaheer</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>Martin</given-names>
            <surname>Szomszor</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Patty</given-names>
            <surname>Kostkova</surname>
          </string-name>
          .
          <year>2010</year>
          .
          <article-title>Comparison of implicit and explicit feedback from an online music recommendation service</article-title>
          .
          <source>In Proceedings of the 1st International Workshop on Information Heterogeneity and Fusion in Recommender Systems, HetRec 2010, Held at the 4th ACM Conference on Recommender Systems (RecSys</source>
          <year>2010</year>
          ).
          <fpage>47</fpage>
          -
          <lpage>51</lpage>
          . https://doi.org/10.1145/1869446. 1869453
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          [27]
          <string-name>
            <given-names>Yehuda</given-names>
            <surname>Koren</surname>
          </string-name>
          .
          <year>2010</year>
          .
          <article-title>Factor in the Neighbors: Scalable and Accurate Collaborative Filtering</article-title>
          .
          <source>ACM Trans. Knowl. Discov. Data 4</source>
          ,
          <issue>1</issue>
          ,
          <string-name>
            <surname>Article 1</surname>
          </string-name>
          (
          <issue>Jan</issue>
          .
          <year>2010</year>
          ),
          <volume>24</volume>
          pages. https://doi.org/10.1145/1644873.1644874
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          [28]
          <string-name>
            <surname>Dominik</surname>
            <given-names>Kowald</given-names>
          </string-name>
          , Markus Schedl, and
          <string-name>
            <given-names>Elisabeth</given-names>
            <surname>Lex</surname>
          </string-name>
          .
          <year>2020</year>
          .
          <article-title>The Unfairness of Popularity Bias in Music Recommendation: A Reproducibility Study</article-title>
          .
          <source>In Advances in Information Retrieval</source>
          , Joemon M Jose, Emine Yilmaz, João Magalhães, Pablo Castells, Nicola Ferro,
          <string-name>
            <surname>Mário J Silva</surname>
          </string-name>
          , and Flávio Martins (Eds.). Springer International Publishing, Cham,
          <fpage>35</fpage>
          -
          <lpage>42</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          [29]
          <string-name>
            <surname>Kun</surname>
            <given-names>Lin</given-names>
          </string-name>
          , Nasim Sonboli, Bamshad Mobasher, and
          <string-name>
            <given-names>Robin</given-names>
            <surname>Burke</surname>
          </string-name>
          .
          <year>2019</year>
          .
          <article-title>Crank up the volume: Preference bias amplification in collaborative recommendation</article-title>
          .
          <source>In CEUR Workshop Proceedings</source>
          , Vol.
          <volume>2440</volume>
          . arXiv:
          <year>1909</year>
          .06362
        </mixed-citation>
      </ref>
      <ref id="ref30">
        <mixed-citation>
          [30]
          <string-name>
            <surname>Xin</surname>
            <given-names>Luo</given-names>
          </string-name>
          , Mengchu Zhou, Yunni Xia, and
          <string-name>
            <given-names>Qingsheng</given-names>
            <surname>Zhu</surname>
          </string-name>
          .
          <year>2014</year>
          .
          <article-title>An Eficient Non-Negative Matrix-Factorization-Based Approach to Collaborative Filtering for Recommender Systems</article-title>
          .
          <source>IEEE Transactions on Industrial Informatics</source>
          <volume>10</volume>
          ,
          <issue>2</issue>
          (
          <year>2014</year>
          ),
          <fpage>1273</fpage>
          -
          <lpage>1284</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref31">
        <mixed-citation>
          [31]
          <string-name>
            <surname>Masoud</surname>
            <given-names>Mansoury</given-names>
          </string-name>
          , Bamshad Mobasher, Robin Burke, and
          <string-name>
            <given-names>Mykola</given-names>
            <surname>Pechenizkiy</surname>
          </string-name>
          .
          <year>2019</year>
          .
          <article-title>Bias disparity in collaborative recommendation: Algorithmic evaluation and comparison</article-title>
          .
          <source>In CEUR Workshop Proceedings</source>
          , Vol.
          <volume>2440</volume>
          . arXiv:
          <year>1908</year>
          .00831
        </mixed-citation>
      </ref>
      <ref id="ref32">
        <mixed-citation>
          [32]
          <string-name>
            <surname>Sean</surname>
            <given-names>M McNee</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>John</given-names>
            <surname>Riedl</surname>
          </string-name>
          , and Joseph A Konstan.
          <year>2006</year>
          .
          <article-title>Being Accurate is Not Enough: How Accuracy Metrics Have Hurt Recommender Systems</article-title>
          .
          <source>In CHI '06 Extended Abstracts on Human Factors in Computing Systems (CHI EA '06)</source>
          .
          <article-title>Association for Computing Machinery</article-title>
          , New York, NY, USA,
          <fpage>1097</fpage>
          -
          <lpage>1101</lpage>
          . https: //doi.org/10.1145/1125451.1125659
        </mixed-citation>
      </ref>
      <ref id="ref33">
        <mixed-citation>
          [33]
          <string-name>
            <given-names>Brett</given-names>
            <surname>Millar</surname>
          </string-name>
          .
          <year>2008</year>
          .
          <article-title>Selective hearing: Gender bias in the music preferences of young adults</article-title>
          .
          <source>Psychology of Music 36</source>
          ,
          <issue>4</issue>
          (
          <year>2008</year>
          ),
          <fpage>429</fpage>
          -
          <lpage>445</lpage>
          . https://doi.org/10.1177/ 0305735607086043
        </mixed-citation>
      </ref>
      <ref id="ref34">
        <mixed-citation>
          [34]
          <string-name>
            <given-names>Yoon</given-names>
            <surname>Joo</surname>
          </string-name>
          Park and
          <string-name>
            <given-names>Alexander</given-names>
            <surname>Tuzhilin</surname>
          </string-name>
          .
          <year>2008</year>
          .
          <article-title>The Long Tail of Recommender Systems and How to Leverage It</article-title>
          .
          <source>Proceedings of the 12th ACM Conference on Recommender Systems (RecSys '18)</source>
          (
          <year>2008</year>
          ),
          <fpage>11</fpage>
          -
          <lpage>18</lpage>
          . https://doi.org/10.1145/1454008. 1454012
        </mixed-citation>
      </ref>
      <ref id="ref35">
        <mixed-citation>
          [35]
          <string-name>
            <surname>Caroline</surname>
            <given-names>Criado</given-names>
          </string-name>
          <string-name>
            <surname>Perez</surname>
          </string-name>
          .
          <year>2019</year>
          .
          <article-title>Invisible Women: Exposing data bias in a world designed for men</article-title>
          .
          <source>Random House.</source>
        </mixed-citation>
      </ref>
      <ref id="ref36">
        <mixed-citation>
          [36]
          <string-name>
            <surname>Alan</surname>
            <given-names>Said</given-names>
          </string-name>
          ,
          <article-title>Alejandro Bellogín Kouki, and</article-title>
          <string-name>
            <surname>A. P.</surname>
          </string-name>
          deVries.
          <year>2013</year>
          .
          <string-name>
            <given-names>A</given-names>
            <surname>Top-N Recommender System</surname>
          </string-name>
          <article-title>Evaluation Protocol Inspired by Deployed Systems</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref37">
        <mixed-citation>
          [37]
          <string-name>
            <given-names>Justin</given-names>
            <surname>Salamon</surname>
          </string-name>
          .
          <year>2019</year>
          .
          <article-title>What's Broken in Music Informatics Research? Three Uncomfortable Statements</article-title>
          .
          <source>In Proceedings of the 36th International Conference on Machine Learning</source>
          .
          <fpage>2012</fpage>
          -
          <lpage>2014</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref38">
        <mixed-citation>
          [38]
          <string-name>
            <given-names>Markus</given-names>
            <surname>Schedl</surname>
          </string-name>
          .
          <year>2016</year>
          .
          <article-title>The LFM-1b Dataset for Music Retrieval and Recommendation</article-title>
          .
          <source>In Proceedings of the 2016 ACM on International Conference on Multimedia Retrieval</source>
          (New York, New York, USA) (
          <article-title>ICMR âĂŹ16)</article-title>
          .
          <article-title>Association for Computing Machinery</article-title>
          , New York, NY, USA,
          <year>103âĂŞ110</year>
          . https://doi.org/10.1145/2911996. 2912004
        </mixed-citation>
      </ref>
      <ref id="ref39">
        <mixed-citation>
          [39]
          <string-name>
            <surname>Xavier</surname>
            <given-names>Serra</given-names>
          </string-name>
          , Michela Magas, Emmanouil Benetos, Magdalena Chudy, Simon Dixon, Arthur Flexer, Emilia Gómez, Fabien Gouyon, Perfecto Herrera, Sergi Jorda, Oscar Paytuvi, Geofroy Peeters, Jan Schlüter, Hugues Vinet, and
          <string-name>
            <given-names>Gerhard</given-names>
            <surname>Widmer</surname>
          </string-name>
          .
          <year>2013</year>
          .
          <article-title>Roadmap for Music Information ReSearch</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref40">
        <mixed-citation>
          [40]
          <string-name>
            <given-names>Aaron</given-names>
            <surname>Swartz</surname>
          </string-name>
          .
          <year>2002</year>
          .
          <article-title>MusicBrainz: A Semantic Web Service</article-title>
          .
          <source>IEEE Intelligent Systems</source>
          <volume>17</volume>
          ,
          <issue>1</issue>
          (Jan.
          <year>2002</year>
          ),
          <year>76âĂŞ77</year>
          . https://doi.org/10.1109/5254.988466
        </mixed-citation>
      </ref>
      <ref id="ref41">
        <mixed-citation>
          [41]
          <string-name>
            <surname>Virginia</surname>
            <given-names>Tsintzou</given-names>
          </string-name>
          , Evaggelia Pitoura, and
          <string-name>
            <given-names>Panayiotis</given-names>
            <surname>Tsaparas</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>Bias Disparity in Recommendation Systems</article-title>
          . CoRR abs/
          <year>1811</year>
          .01461 (
          <year>2018</year>
          ). arXiv:
          <year>1811</year>
          .01461 http://arxiv.org/abs/
          <year>1811</year>
          .01461
        </mixed-citation>
      </ref>
      <ref id="ref42">
        <mixed-citation>
          [42]
          <string-name>
            <given-names>Sarah</given-names>
            <surname>Myers</surname>
          </string-name>
          <string-name>
            <surname>West</surname>
          </string-name>
          , Meredith Whittaker, and
          <string-name>
            <given-names>Kate</given-names>
            <surname>Crawford</surname>
          </string-name>
          .
          <year>2019</year>
          .
          <article-title>Discriminating Systems: Gender, Race and Power in AI. AI Now Institute</article-title>
          . https://ainowinstitute. org/discriminatingsystems.html
        </mixed-citation>
      </ref>
      <ref id="ref43">
        <mixed-citation>
          [43]
          <string-name>
            <surname>Jieyu</surname>
            <given-names>Zhao</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tianlu</surname>
            <given-names>Wang</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>Mark</given-names>
            <surname>Yatskar</surname>
          </string-name>
          ,
          <source>Vicente Ordonez, and Kai Wei Chang</source>
          .
          <year>2017</year>
          .
          <article-title>Men also like shopping: Reducing gender bias amplification using corpuslevel constraints</article-title>
          .
          <source>EMNLP 2017 - Conference on Empirical Methods in Natural Language Processing</source>
          ,
          <string-name>
            <surname>Proceedings</surname>
          </string-name>
          (
          <year>2017</year>
          ),
          <fpage>2979</fpage>
          -
          <lpage>2989</lpage>
          . https://doi.org/10.18653/v1/ d17-
          <fpage>1323</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref44">
        <mixed-citation>
          [44]
          <string-name>
            <surname>Ziwei</surname>
            <given-names>Zhu</given-names>
          </string-name>
          , Xia Hu, and
          <string-name>
            <given-names>James</given-names>
            <surname>Caverlee</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>Fairness-Aware Tensor-Based Recommendation</article-title>
          .
          <source>In Proceedings of the 27th ACM International Conference on Information and Knowledge</source>
          Management (
          <article-title>Torino, Italy) (CIKM âĂŹ18)</article-title>
          .
          <article-title>Association for Computing Machinery</article-title>
          , New York, NY, USA,
          <year>1153âĂŞ1162</year>
          . https: //doi.org/10.1145/3269206.3271795
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>