<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>September</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>How playlist evaluation compares to track evaluations in music recommender systems</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Sophia Hadash</string-name>
          <email>s.hadash@tue.nl</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Yu Liang</string-name>
          <email>y.liang1@tue.nl</email>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Martijn C. Willemsen</string-name>
          <email>m.c.willemsen@tue.nl</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Eindhoven University of Technology</institution>
          ,
          <addr-line>5600 MB Eindhoven</addr-line>
          ,
          <country country="NL">The Netherlands</country>
          ,
          <institution>Jheronimus Academy of Data Science</institution>
          ,
          <addr-line>5211 DA 's-Hertogenbosch, The</addr-line>
          ,
          <country country="NL">Netherlands</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Jheronimus Academy of Data Science</institution>
          ,
          <addr-line>5211 DA 's-Hertogenbosch, The</addr-line>
          ,
          <country country="NL">Netherlands</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Jheronimus Academy of Data Science</institution>
          ,
          <addr-line>5211 DA 's-Hertogenbosch, The</addr-line>
          ,
          <country country="NL">Netherlands</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2019</year>
      </pub-date>
      <volume>19</volume>
      <issue>2019</issue>
      <abstract>
        <p>Most recommendation evaluations in music domain are focused on algorithmic performance: how a recommendation algorithm could predict a user's liking of an individual track. However, individual track rating might not fully reflect the user's liking of the whole recommendation list. Previous work has shown that subjective measures such as perceived diversity and familiarity of the recommendations, as well as the peak-end efect can influence the user's overall (holistic) evaluation of the list. In this study, we investigate how individual track evaluation compares to holistic playlist evaluation in music recommender systems, especially how playlist attractiveness is related to individual track rating and other subjective measures (perceived diversity) or objective measures (objective familiarity, peak-end efect and occurrence of good recommendations in the list). We explore this relation using a within-subjects online user experiment, in which recommendations for each condition are generated by diferent algorithms. We found that individual track ratings can not fully predict playlist evaluations, as other factors such as perceived diversity and recommendation approaches can influence playlist attractiveness to a larger extent. In addition, inclusion of the highest and last track rating (peak-end) is equally good in predicting playlist attractiveness as the inclusion of all track evaluations. Our results imply that it is important to consider which evaluation metric to use when evaluating recommendation approaches.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>CCS CONCEPTS</title>
      <p>• Human-centered computing → Empirical studies in HCI;
Heuristic evaluations; • Information systems → Recommender
systems; Relevance assessment; Personalization.
User-centric evaluation, recommender systems, playlist and track
evaluation
1</p>
    </sec>
    <sec id="sec-2">
      <title>INTRODUCTION</title>
      <p>
        In user-centric evaluation of personalized music recommendation,
users are usually asked to indicate their degree of liking of
individual tracks [
        <xref ref-type="bibr" rid="ref14 ref2 ref4 ref5">2, 4, 5, 14</xref>
        ] or by providing a holistic assessment of the
entire playlist (e.g. playlist satisfaction or playlist attractiveness)
[
        <xref ref-type="bibr" rid="ref12 ref16 ref17 ref6 ref9">6, 9, 12, 16, 17</xref>
        ] generated by the recommendation approaches.
Most recommender evaluations are focused on the first type of
evaluation to test algorithmic performance: can we accurately
predict the liking of an individual track. Many user-centric studies in
the field [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ] however focus on the second metric, does the list of
recommendations provide a satisfactory experience. Often these
studies find playlist satisfaction is not just about the objective or
subjective accuracy of the playlist, but also depends on the dificulty
of choosing from the playlist or playlist diversity [
        <xref ref-type="bibr" rid="ref27">27</xref>
        ]. For example,
in the music domain perceived diversity of the playlist has been
shown to have a negative efect on overall playlist attractiveness [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ].
Bollen et al. [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] showed people were just as satisfied with a list of
20 movie recommendations which included the top-5 list and a set
of lower ranked items (twenty’s item being the 1500th best rank)
as with a list of the best 20 recommendations (top-20 ranked).
      </p>
      <p>
        Research in psychology also shows that people’s memory of
overall experience is influenced by the largest peak and end of
the experience rather than the average of the moment to moment
experience [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]. Similar efects might occur when we ask users to
evaluate holistically a list on attractiveness: they might be triggered
more by particular items in the list (i.e. ones that they recognize
as great (or bad), ones that are familiar rather than ones that are
unknown, cf. mere exposure efect [
        <xref ref-type="bibr" rid="ref25">25</xref>
        ]) and therefore their overall
impression might not simply be the mean of the individual ratings.
      </p>
      <p>These results from earlier recommender research and from
psychological research suggest that overall (holistic) playlist evaluation
is not just reflected by the average of liking or rating of the
individual items. However, to our best knowledge, no previous work has
explored the relation between users’ evaluation of individual tracks
and overall playlist evaluation. To some extent this is because it is
not common that both types of data are collected in the same study.
Therefore, in this work, we would like to investigate how individual
item evaluations relate to holistic evaluations in sequential music
recommender systems.</p>
      <p>
        We explore these relations using a within-subject online
experiment, in which users are asked to give individual ratings as well as
overall perception of playlist attractiveness and diversity in three
conditions: (1) track and artist similarity algorithm (base), (2) track
and artist similarity algorithm combined with genre similarity
algorithm (genre) and (3) track and artist similarity algorithm combined
with audio feature algorithm (gmm). The track and artist similarity
algorithm can be regarded as a low-spread strategy since
recommendations are generated from a small subset of the total pool
of tracks relatively close to the user’s tastes [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. Both the genre
approach and gmm approach are high-spread strategies which
generates user-track ratings for a large proportion of the total pool of
tracks.
      </p>
      <p>In this study, we are interested in how perceived attractiveness of
the playlist is related to perceived playlist diversity and individual
track ratings across the three conditions. In addition, we also include
a set of objective features of the playlist in the analysis. We test
whether that users’ perceived attractiveness of the playlist will
also be afected by (1) the peak-end efect: the track they like most
and the end track, (2) their familiarity to the recommendations in
the playlist and (3) occurrences of good recommendations in the
playlist: people might be satisfied with a playlist as long as at least
some recommendations are good.
2
2.1</p>
    </sec>
    <sec id="sec-3">
      <title>RELATED WORK</title>
    </sec>
    <sec id="sec-4">
      <title>User-centric evaluation in music recommendation</title>
      <p>
        User-centric evaluation for recommendation approaches is
necessary in order to understand users’ perception of the given
recommendations [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ], such as acceptance or satisfaction [
        <xref ref-type="bibr" rid="ref23 ref24">23, 24</xref>
        ].
      </p>
      <p>
        User-centic evaluation in music recommendation can be at
individual track level or whole playlist level. Users’ perception towards
the whole playlists are often measured under the context of
automatic playlist generation [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ], smooth track transition [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] or when
the goal is to evaluate the whole recommender system [
        <xref ref-type="bibr" rid="ref12 ref17">12, 17</xref>
        ]. For
example, users were asked to indicate their perception towards
the recommended playlists to investigate how diferent settings of
control in the recommender system influence their cognitive load
as well as their acceptance to the recommendations [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]. However,
when it comes to the evaluation of the recommendation algorithms,
users are often asked to indicate their ratings[
        <xref ref-type="bibr" rid="ref2 ref4">2, 4</xref>
        ] for each
individual track rather than the playlist as a whole, neglecting the fact
that tracks are often listened in succession or within a playlist.
      </p>
      <p>
        Individual item ratings can not fully reflect users’ degree of
liking towards the recommendation list. Perceived diversity is a factor
that can only be measured at the list level. Willemsen et al. [
        <xref ref-type="bibr" rid="ref27">27</xref>
        ]
has shown that perceived diversity of a movie recommendation list
has a positive efect on perceived list attractiveness and a higher
perceived diversity would make it easier for users to make a choice
from the recommendations. Ekstrand et al. [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] also show that
perceived diversity has a positive efect on user satisfaction. While in
music domain, Ferwerda et al. [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] found that perceived diversity has
a negative efect on perceived attractiveness of the recommendation
list, however, this efect turns to positive when the recommendation
list can help users to discover new music and enrich their music
tastes.
      </p>
      <p>The novel contribution of this work is that we include both
measurements in the study for personalized music recommendations,
aiming to uncover the relation between individual track evaluation
and holistic evaluation of music playlists.
2.2</p>
    </sec>
    <sec id="sec-5">
      <title>Peak-end efect</title>
      <p>
        Research in psychology has looked into the diferences between
the ‘remembering self’ and the ‘experiencing self’ [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ], as reflected
in the peak-end rule: the memory of the overall experience of a
painful medical procedure is not simply the sum or average of the
moment to moment experience, but the average of the largest peak
and the end of the experience.
      </p>
      <p>
        In music domain, several studies have found that the
remembered intensity of the music listening experience is highly
correlated with peak, peak-end and average moment-to-moment
experience [
        <xref ref-type="bibr" rid="ref21 ref22">21, 22</xref>
        ]. However, it is argued by Wiechert [
        <xref ref-type="bibr" rid="ref26">26</xref>
        ] that these
studies fail to consider users’ personal musical preferences and
that the peak-end value and the average value measured in the
studies might be correlated with each other. Rather than giving
participants the same stimuli, Wiechert gave participants a list of
songs based on their current musical preference and came up with a
new metric: pure peak-end value (the diference between peak-end
and average). He found that while the average experience could
explain a significant part of playlist experience variance, the pure
peak-end value could explain a part of variance that would not be
explained by the average.
3
      </p>
    </sec>
    <sec id="sec-6">
      <title>METHOD</title>
      <p>
        In this study three algorithms are used for generating playlists.
These algorithms are designed to use user preferences in the form
of (ordered) lists of tracks, artists, or genres a user is known to
like. The advantage of using such an input form is that these
algorithms can be used with user preferences obtained from commercial
platforms. In this study Spotify 1 user profiles are used. These
preferences are in the form of ordered lists of top tracks and artists. The
ifrst algorithm is based on track and artist similarity. The second
algorithm uses a genre similarity metric based on genre co-occurrence
among artists. The third algorithm recommends tracks based on
a Gaussian mixture model on track features derived from audio
analyses (see [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] for details). All algorithms are described in detail
in [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ].
3.1
      </p>
    </sec>
    <sec id="sec-7">
      <title>Track and artist similarity algorithm</title>
      <p>The track and artist similarity algorithm is a combination of the
same sub-algorithm applied to both a list of tracks and artists a user
is known to like. The input to this sub-algorithm is a list of items,
potentially ordered on user likeability. This sub-algorithm uses
Spotify’s seed recommendation system to explore items that are
similar to the input. Based on the occurrence of items in the results,
an output list is generated with user likeability prediction scores.
The algorithm is formulated in Algorithm 1, with an illustration by
example in Figure 1.
1https://developer.spotify.com/documentation/web-api/
si : score of item i, x : temporary score of item i, recs:
recommendation set, N : number of sibling nodes, posnod e :
position of the node in its parents’ children, smin , smax : scores
assigned to the last and first sibling node at the current tree depth.
1: for each item i in recs do
2: si = 0
3: for each nodej as current _node where nodej is item i do
4: x = current _node .score
5: while current _node .parent not null do
6: current _node = current _node .parent
7: x = x ∗ current _node .score
8: end while
9: si = si + x
10: end for
11: end for
12: return (recs, s) order by s descending
13:
14: def node.score:
15: return N −poNsnode (smax − smin ) + smin
3.2</p>
    </sec>
    <sec id="sec-8">
      <title>Genre similarity algorithm</title>
      <p>The genre similarity algorithm uses an ordered list of genres the
user likes S u′ (a column vector which shows the user degree of
liking to all genres built from the user’s top artists) and a similarity
metric D to generate genre likeability scores for other genres. Then,
the resulting extrapolated list Su is used to favor recommendations
from genres with high likeability scores.</p>
      <p>There are 1757 diferent types of genres available in our dataset,
therefore both S u′ and Su are column vectors of dimension 1757
and matrix D is of dimension 1757 × 1757.</p>
      <p>
        The similarity metric is based on co-occurrence analysis of artists,
similar to the methodology used in [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ]. The co-occurrence analysis
used a database consisting of n ≈ 80.000 artists. For each artist it
was known which genres he/she produced music in. The data is
extracted from Spotify’s developer API. The co-occurence analysis
generated a normalized symmetric similarity matrix D. The
likeability scores of the user towards the list of genre is then computed
as follows, where I is the identity matrix.
      </p>
      <p>Su = (D + I )S u′
(1)
3.3</p>
    </sec>
    <sec id="sec-9">
      <title>Audio feature algorithm</title>
      <p>
        The audio feature algorithm clusters tracks with similar audio
features using a Gaussian mixture model (GMM). A database of
n ≈ 500.000 tracks containing 11 audio analysis features were used
to train the model. The audio features consisted of measures for
danceability, energy, key, loudness, mode, speechiness, acousticness,
instrumentalness, liveness, valence, and tempo. Multiple GMM’s
were fitted using the expectation-maximization (EM) algorithm for
varying component numbers. The model with 21 components had
the lowest BIC criterion and was therefore selected. Then, cluster
likeability was computed as follows (see [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]):
1
      </p>
      <p>Ntop
Õ</p>
      <p>Ntop j=1
p(user likes cluster i) =
p(track j belonдs to cluster i)
(2)</p>
      <p>Finally, the output recommendations favored tracks
corresponding to clusters with high user likeability probabilities.
3.4</p>
    </sec>
    <sec id="sec-10">
      <title>Familiarity of the recommendations to the users</title>
      <p>Both the track and artist similarity and the genre similarity
algorithms generate recommendations close to the users’ known
preferences. Recommendations are based on artists and genres that
are familiar to the user. The audio feature algorithm on the other
hand recommends tracks based on audio feature similarity. As a
result, recommended tracks are more likely to have genres and
artists that are less familiar to the users.
4</p>
    </sec>
    <sec id="sec-11">
      <title>EXPERIMENTAL DESIGN</title>
      <p>To evaluate the relation between track evaluations and playlist
evaluations, a within-subjects online experiment was conducted.
The study included three conditions in randomized order: track and
artist algorithm (base), track and artist algorithm combined with
the genre similarity algorithm (genre), and track and artist
algorithm combined with the audio feature algorithm (gmm). In each
condition participants were presented with a playlist containing 10
tracks generated by the corresponding algorithm and evaluated the
individual tracks on likeability and personalization and the playlist
as a whole on attractiveness and diversity. The playlist included
the top 3 recommendations and the 20th, 40th, 60th, 80th, 100th,
200th, and 300th recommendation in random order. Lower ranked
recommendations were included such that algorithm performance
could be evaluated more easily, as lower ranked recommendations
should result in lower user evaluations.
4.1</p>
    </sec>
    <sec id="sec-12">
      <title>Participant Recruitment</title>
      <p>Participants were primarily recruited using the JF Schouten
participant database of Eindhoven University of Technology. Some
participants were recruited by invitation. Participants were required
to have a Spotify account (free or Premium) and to have used this
account prior to taking part in the study.
4.2</p>
    </sec>
    <sec id="sec-13">
      <title>Materials</title>
      <p>The track evaluations included likeability and personalization
measures. One question was used for each of the tracks. This was
decided based on the repetitive nature of individual track
evaluations. The questions for measuring track likeability was: "Rate how
much you like the song". For measuring perceived track
personalization we used the following item: "Rate how well the song fits your
personal music preferences". Both questions were answered on a
5-point visual scale with halves (thus 10 actual options) containing
stars and heart icons as shown in Figure 2.</p>
      <p>The playlist evaluation included playlist attractiveness and playlist
diversity and is presented in Table 1.</p>
      <p>
        Additional scales used in the study were a demographics scale
and the Goldsmith Music Sophistication Index (MSI) [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ]. The
demographics scale measured gender, age, and Spotify usage. Spotify
usage was measured using a single item: "I listen to Spotify for __
hours a week" with 7 range options.
4.3
      </p>
    </sec>
    <sec id="sec-14">
      <title>Study Procedure</title>
      <p>After consenting, participants were prompted with a login screen
where they could connect their Spotify account with the study.
Participants who did not have a Spotify account or who had a Spotify
account containing no user preference data could not continue with
the study. After Spotify login, participants completed a background
survey. In the survey they reported their Spotify usage and music
sophistication.</p>
      <p>Following the background survey, the user entered the track
evaluation phase in which a playlist was presented to the user
generated by one of the algorithms. The interface (see Figure 2)
contained an interactive panel showing the tracks of the playlist, a
survey panel in which they had to rate the tracks, and a music
control bar. Participants could freely browse through the playlist while
providing the ratings. After all ratings were provided, participants
entered the playlist evaluation phase in which they answered the
playlist evaluations questions (Table 1). The track evaluation phase
and playlist evaluation phase were then repeated for the remaining
conditions.</p>
      <p>Finally, participants were thanked for their time and were
entered into a reward rafle. Among every 5 participants one
participant received 15 euro compensation. In total the study lasted
approximately 15 minutes.
Participants in this study included 59 people, of which 54 were
recruited through the JF Schouten database. The sample consisted
of 31 males and 28 females. The age of the participants ranged from
19 to 64 (M = 25.6, SD = 8.8). On average participants listened to
Spotify for 7 to 10 hours per week. MSI scores ranged between 0
and 5 (M = 2.18, SD = 1.0). The study took place between 9th of
January and 1st of February of 2019.</p>
      <p>We found that there was no efect of personalization rating on
perceived attractiveness, while likability rating can partially predict
perceived attractiveness. Furthermore, playlist attractiveness was
more strongly related to the recommendation algorithm. Playlists
in the gmm condition were less positively evaluated compared to
playlists in the other conditions even though the track evaluations
were similar on average. In other words, while participants
evaluated tracks across conditions similarly, the playlist evaluations
difered substantially (see Figure 3).
The results are analyzed using three methodologies. The first
methodology concerns the performance of the recommendation algorithms.
This was analyzed using descriptive statistics concerning the
relation between recommendation scores predicted by the algorithms
and the user ratings.</p>
      <p>In the second methodology the relation between playlist
evaluations and track ratings was aggregated on the playlist-level (i.e. 3
observations per user). In this methodology, an aggregate measure
for track evaluation was used, more specifically, three
aggregation measures: mean (Model 1), peak-end (Model 2), occurrence of
at least a 3-star rating (Model 3). Using these aggregates, a linear
mixed-efects model was used such that variation in participants’
answering style can be included as a random-efects variable. Playlist
diversity and the recommendation approaches were included as
ifxed-efects variables in Model 1a, Model 2a and Model 3a, and
interaction-efects were included in Model 1b, Model 2b and Model
3b.</p>
      <p>Finally, the last methodology explores how variations within the
track-level may explain playlist attractiveness. This analysis used
a linear mixed-efects model on the track level (i.e. 3x10
observations per user) (see Table 3: Model 4) with participants modelled
as a random-efects variable, similar to the playlist-level analysis.
For the track-level variables four types of indicators were included
additional to the rating, condition, and diversity. The first indicator
indicates whether the track was high-ranked (top 3
recommendation) or low-ranked (top 20 to 300). The second indicates for each
track whether it was the highest rating of the playlist. Thus, if a
user gave two 4-star ratings and 8 lower ratings, the variable would
indicate those two tracks with a 1, otherwise 0. The third indicator
is the familiarity which shows whether a track was predicted to be
familiar to the user based on their top tracks and artists. Finally,
the last indicator contains the playlist order. This variable indicates
whether the track was among the list which the user evaluated
ifrstly, secondly, or thirdly.
5.2</p>
    </sec>
    <sec id="sec-15">
      <title>Model results</title>
      <p>5.2.1 Algorithm performance. The relation between
recommendation scores and user evaluations of tracks is depicted in Figure 4.</p>
      <p>The illustration indicates that diferences exist between algorithms
in their performance on track evaluations. This is supported by an
analysis of variance (ANOVA), F (2, 171) = 36.8, p &lt; .001. The graph
shows that for all algorithms, higher recommendation scores result
in higher user ratings, showing that indeed tracks that are predicted
to be liked better also get higher ratings. However, consistent with
Figure 3, the scores for the base condition are consistently higher
than for the other two algorithms. For the дenre condition the slope
seems to be steeper than for the other two conditions, showing that
in this condition, user ratings are more sensitive to the predicted
recommendation scores.
5.2.2 Playlist-level relation between track evaluations and playlist
evaluations. In this analysis, the efect of track evaluations on
playlist evaluations is explored on a playlist-level, using three
different aggregation measures (Models 1-3).</p>
      <p>The efect of track evaluations on playlist attractiveness is
illustrated in Figure 5. All three aggregation measures are very similar
in predicting playlist attractiveness (see Table 2). We see a positive
efect of the aggregating measure, indicating that if a user scores
higher on that measure, she also finds the playlist more attractive,
together with negative efects of the conditions дenre and дmm
consistent with the efect in Figure 3 that дmm and дenre score
lower than the base condition. The aggregate indicates occurrence
of at least a 3-star rating (model 3) is a slightly worse predictor
Note. SD = standard deviation. The models are grouped by the method used for aggregating track evaluations.
’Mean’ = mean value, ’peak-end’ = average of highest rating and the last rating, ’positive’ = indicator for
occurrence of at least a 3-star evaluation. ∗∗∗p &lt; .001; ∗∗p &lt; .01; ∗p &lt; .05.
for playlist attractiveness compared to the mean and peak-end
measures.</p>
      <p>When the interaction-efects are included, the main-efect of
ratings is no longer significant (models 1b, 2b and 3b) but we get
several interactions of ratings with condition and condition with
diversity. The interaction-efects of condition with perceived
diversity and track evaluations are visualized in Figure 6 by separating
the resulting efects by condition and we will discuss each condition
and it’s interactions separately.</p>
      <p>The track evaluations had no efect on playlist evaluation in the
base condition (they do for the other two conditions, as we will
see below). Moreover, in the base condition, perceived diversity
has a negative efect, indicating that playlists with high perceived
diversity were less attractive compared to playlists with low
perceived diversity. One potential explanation could be that since these
playlists were constructed using a low-spread approach the
recommendations were closely related to the users’ known preferences
(i.e. their top tracks that feed our algorithms). Therefore, the
diversity in these users’ preferences may have influenced the diversity
of the recommended playlist. For instance, a person may listen to
diferent genres during varying activities like working and sporting.</p>
      <p>The recommendations could then include music based on all these
genres. While all recommendations are then closely related to the
users’ preferences and could receive potentially high evaluations,
the playlist may not be very attractive due to the diversity in the
genres.</p>
      <p>In the дenre condition, perceived diversity had no efect on
playlist attractiveness. In this condition track evaluations strongly
predicted playlist attractiveness regardless of diversity. The results
show that though the дenre playlist on average get a lower
attractiveness score than the base, this efect is reduced when the
aggregate ratings of the list are higher: in other words, only if users
like the дenre tracks, they like the playlist as much as the base one
that has more low-spread, familiar tracks.</p>
      <p>The дmm condition had similar results as the дenre condition.</p>
      <p>Perceived diversity predicted attractiveness only marginally.
However, while the track evaluations strongly predict attractiveness in
the mean rating, the peak-end value or the fact that at least one
track is highly rated. We see that some conditions are more
sensitive to these aggregate rating (дenre) than the others. We also see
an important (negative) role of diversity for the base condition in
predicting overall attractiveness, but no efect in the other two
conditions. In other words, diferent aspects afect playlist evaluation as
recognized in the literature, but this highly depends on the nature
of the underlying algorithm generating the recommendations.
the дenre condition, it is only a weak predictor in the дmm
condition. In other words, high aggregate ratings cannot really make up
for the fact that the дmm list in general is evaluated worse than the
base list. As in the дenre condition this recommendation algorithm
uses a high-spread approach and includes novel track
recommendations. However, the дmm recommended tracks based on audio
feature similarity is in contrast to genre similarity. Regardless of
diversity or individual track evaluations, playlists using this approach
were less attractive to participants.</p>
      <p>Overall we find that overall attractiveness of a playlist is not
always directly related to the liking of the individual tracks, as
reflected by the aggregate ratings of the tracks, whether this is
Note. SD = standard deviation, ’High-ranked’ indicates the track
was one of the top-3 recommendations, ’highest rating’
indicates the track received the highest rating within that playlist
for the participant, ’familiar’ indicates whether the track was
known to be familiar to the participant, ’playlist order’ indicates
whether the playlist was the first (=1), second (=2), or third (=3)
list that the participant evaluated. Interaction terms as in
Models 1-3 were omitted due to similarity to these models. ∗∗∗p &lt;
.001; ∗∗p &lt; .01; ∗p &lt; .05.
5.2.3 Track-level relation between track evaluations and playlist
evaluations. In this analysis, the efect of track evaluations on playlist
evaluations is explored at track-level, trying to predict the overall
attractiveness of each list with the individual track ratings, rather
than the aggregate ratings. The results are shown in Table 3. Four
types of track-level variables are included in the analysis as
described in Section 5.1.</p>
      <p>
        Whether a track is high ranked or received the highest
rating shows no significant efect on perceived attractiveness of the
playlist. The track-level objective familiarity measures if the user
is familiar with the artists of a track. The user is familiar with a
track if at least one artist of the track also appears in the user’s top
listened tracks related artists. Although we expected there would
be a positive efect of familiarity on playlist attractiveness (as also
shown in [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]), there was no significant efect observed in model 4. A
possible reason could be the objective familiarity measure was not
suficient to cover all tracks that the user is familiar with since it is
only measured with the user’s top tracks (the number is at most 50
for each user). In our future work, we are planning to directly ask
for (self-reported) familiarity, rather than calculating these from
the data. We also calculated a familiarity score for each track (how
much the user is familiar with the track). We found that there was a
positive correlation between objective familiarity and track ratings
(rs (1770) = 0.326, p &lt; .001): users give higher ratings to tracks
they are more familiar with, which is in line with previous work
on mere exposure efect [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ].
      </p>
      <p>Playlist order is also a weak predictor of playlist attractiveness.</p>
      <p>Participants perceive the last playlist as the most attractive and the
ifrst as the least attractive. However, when interaction terms as in
models 1-3 are included the efect is no longer significant. We also
checked the condition orders generated by the random generator
and found that each condition order occurred approximately equally
often. In other words, the efect of condition order can not explain
diference across conditions.
6</p>
    </sec>
    <sec id="sec-16">
      <title>DISCUSSION OF RESULTS</title>
      <p>We found that participants evaluate playlists on more aspects than
merely the likeability of its tracks. Even though the tracks in
recommended playlists may be accurate and receive positive user
evaluations, playlists can still be evaluated negatively. In
particular, the recommendation approach itself plays a role in the overall
perceived playlist attractiveness.</p>
      <p>One explanation may be that users have diferent distinct
musical styles. Playlists that contain music from more than one of the
users’ styles may be less attractive to the user even though the
track recommendations are accurate. Playlists in the base condition
are most attractive, but sufer most from diversity. Users with
multiple musical styles may have received playlists with music from
multiple styles which could have been reflected in the perceived
diversity of the playlist. Playlists from the дenre condition were
also based on genre similarity, in addition to the track and artist
similarity. Therefore, if multiple musical styles are present in the
user preferences, it is more likely in the дenre condition that the
musical style with the highest overall contribution overrules the
music from the other musical styles. Furthermore, the дmm
condition is least attractive. The recommendation algorithm used in
this condition is based on audio feature similarity. Although tracks
recommended in this condition were similar to the user preferences
based on the audio features, they could be dissimilar based on more
comprehensible attributes like genre and artists. It is likely that
music from multiple musical styles were present in these playlists.</p>
      <p>
        Another explanation may be the methodology of evaluation.
While tracks are evaluated at the moment they are experienced,
playlist evaluation occurs only after the tracks are experienced.
Therefore, playlist evaluations are based on what users remember
from the list. This diference may lead to diferences in user
evaluation styles. Although this may explain why diferences occur
between track and playlist evaluations, it cannot explain why the
diferent recommendation approaches lead to diferent playlist
attractiveness evaluations. Furthermore, using this explanation we
would have expected a model improvement from the inclusion of
the peak-end measure. The peak-end measure specifically models
how users remember diferent moments in their overall experience
while listening to a playlist [
        <xref ref-type="bibr" rid="ref26">26</xref>
        ]. However, peak-end resulted in
similar efects as using a standard mean-aggregation rating.
      </p>
      <p>Regardless of the explanation, the results show that playlist
attractiveness is not primarily related to the likeability of its tracks
but that other factors such as diversity can play a role.
7</p>
    </sec>
    <sec id="sec-17">
      <title>CONCLUSION AND FUTURE WORK</title>
      <p>While playlist evaluations can be partly predicted by evaluations of
its tracks, other factors of the playlist are more predictive. People
seem to evaluate playlists on other aspects than merely its tracks.
Even when individual tracks were rated positively, the playlist
attractiveness could be low.</p>
      <p>We found that both diversity and recommendation approach
afected playlist attractiveness. Diversity had a negative efect on
playlist attractiveness in recommenders using a low-spread
methodology. The track ratings were the most predictive for the playlist
attractiveness in the recommendation approach based on genre
similarity. Furthermore, inclusion of the highest and last track
evaluation score (peak-end) was suficient to predict playlist
attractiveness, performing just as well as the mean of the ratings.</p>
      <p>When evaluating recommendation approaches in music
recommenders, it is important to consider which evaluation metric to
use. Music is often consumed in succession leading to many factors
other than track likeability that may influence whether people have
satisfactory experiences. Although individual track evaluations are
often used in recommender evaluation, track evaluations do not
seem to predict playlist attractiveness very consistently.</p>
      <p>
        While we showed that playlist attractiveness is not primarily
related to track evaluations, we were unable to efectively measure
why certain algorithms generated more attractive playlists
compared to others. This question will be addressed in future work.
We intent to include a subjective measure for track familiarity.
Furthermore, we will identify and attempt to separate distinct musical
styles within user preferences. For example, we could give users
control about which top artists or top tracks they would like to use
to generate recommendations as in [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] to separate the tracks and
artists they like under diferent context.
      </p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>Luke</given-names>
            <surname>Barrington</surname>
          </string-name>
          ,
          <source>Reid Oda, and Gert RG Lanckriet</source>
          .
          <year>2009</year>
          .
          <article-title>Smarter than Genius? Human Evaluation of Music Recommender Systems.</article-title>
          .
          <string-name>
            <surname>In</surname>
            <given-names>ISMIR</given-names>
          </string-name>
          , Vol.
          <volume>9</volume>
          . Citeseer,
          <volume>357</volume>
          -
          <fpage>362</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>Dmitry</given-names>
            <surname>Bogdanov</surname>
          </string-name>
          , MartíN Haro, Ferdinand Fuhrmann, Anna Xambó, Emilia Gómez, and
          <string-name>
            <given-names>Perfecto</given-names>
            <surname>Herrera</surname>
          </string-name>
          .
          <year>2013</year>
          .
          <article-title>Semantic audio content-based music recommendation and visualization based on user preference examples</article-title>
          .
          <source>Information Processing &amp; Management</source>
          <volume>49</volume>
          ,
          <issue>1</issue>
          (
          <year>2013</year>
          ),
          <fpage>13</fpage>
          -
          <lpage>33</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>Dirk</given-names>
            <surname>Bollen</surname>
          </string-name>
          , Bart P Knijnenburg, Martijn C Willemsen, and
          <string-name>
            <given-names>Mark</given-names>
            <surname>Graus</surname>
          </string-name>
          .
          <year>2010</year>
          .
          <article-title>Understanding choice overload in recommender systems</article-title>
          .
          <source>In Proceedings of the fourth ACM conference on Recommender systems. ACM</source>
          ,
          <volume>63</volume>
          -
          <fpage>70</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>Òscar</given-names>
            <surname>Celma</surname>
          </string-name>
          and
          <string-name>
            <given-names>Perfecto</given-names>
            <surname>Herrera</surname>
          </string-name>
          .
          <year>2008</year>
          .
          <article-title>A new approach to evaluating novel recommendations</article-title>
          .
          <source>In Proceedings of the 2008 ACM conference on Recommender systems. ACM</source>
          ,
          <volume>179</volume>
          -
          <fpage>186</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>Zhiyong</given-names>
            <surname>Cheng</surname>
          </string-name>
          .
          <year>2011</year>
          .
          <article-title>Just-for-</article-title>
          <string-name>
            <surname>Me</surname>
          </string-name>
          :
          <article-title>An Adaptive Personalization System for Location-Aware Social Music Recommendation Categories</article-title>
          and
          <string-name>
            <given-names>Subject</given-names>
            <surname>Descriptors</surname>
          </string-name>
          . (
          <year>2011</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <surname>Michael</surname>
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Ekstrand</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          <string-name>
            <surname>Maxwell</surname>
            <given-names>Harper</given-names>
          </string-name>
          , Martijn C. Willemsen, and
          <string-name>
            <surname>Joseph</surname>
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Konstan</surname>
          </string-name>
          .
          <year>2014</year>
          .
          <article-title>User perception of diferences in recommender algorithms</article-title>
          .
          <source>Proceedings of the 8th ACM Conference on Recommender systems - RecSys '14</source>
          (
          <year>2014</year>
          ),
          <fpage>161</fpage>
          -
          <lpage>168</lpage>
          . https://doi.org/10.1145/2645710.2645737
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>Bruce</given-names>
            <surname>Ferwerda</surname>
          </string-name>
          ,
          <string-name>
            <surname>Mark P Graus</surname>
          </string-name>
          , Andreu Vall, Marko Tkalcic, and
          <string-name>
            <given-names>Markus</given-names>
            <surname>Schedl</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>How item discovery enabled by diversity leads to increased recommendation list attractiveness</article-title>
          .
          <source>In Proceedings of the Symposium on Applied Computing. ACM</source>
          ,
          <volume>1693</volume>
          -
          <fpage>1696</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>Sophia</given-names>
            <surname>Hadash</surname>
          </string-name>
          .
          <year>2019</year>
          .
          <article-title>Evaluating a framework for sequential group music recommendations: A Modular Framework for Dynamic Fairness and Coherence control</article-title>
          .
          <source>Master</source>
          . Eindhoven University of Technology. https://pure.tue.nl/ws/portalfiles/ portal/122439578/Master_thesis_
          <source>shadash_v1.0</source>
          .
          <issue>1</issue>
          _
          <fpage>1</fpage>
          _.pdf
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>Shobu</given-names>
            <surname>Ikeda</surname>
          </string-name>
          , Kenta Oku, and
          <string-name>
            <given-names>Kyoji</given-names>
            <surname>Kawagoe</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>Music Playlist Recommendation Using Acoustic-Feature Transition Inside the Songs</article-title>
          .
          <article-title>(</article-title>
          <year>2018</year>
          ),
          <fpage>216</fpage>
          -
          <lpage>219</lpage>
          . https://doi.org/10.1145/3151848.3151880
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>Tristan</given-names>
            <surname>Jehan</surname>
          </string-name>
          and
          <string-name>
            <given-names>David</given-names>
            <surname>Desroches</surname>
          </string-name>
          .
          <year>2004</year>
          .
          <source>Analyzer Documentation [version 3</source>
          .2].
          <source>Technical Report. The Echo Nest Corporation</source>
          , Somerville, MA. http://docs.echonest.com.s3
          <article-title>-website-us-east-1</article-title>
          .amazonaws.com/_static/ AnalyzeDocumentation.pdf
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <surname>Yucheng</surname>
            <given-names>Jin</given-names>
          </string-name>
          , Bruno Cardoso, and
          <string-name>
            <given-names>Katrien</given-names>
            <surname>Verbert</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>How do diferent levels of user control afect cognitive load and acceptance of recommendations?</article-title>
          .
          <source>In CEUR Workshop Proceedings</source>
          , Vol.
          <year>1884</year>
          . CEUR Workshop Proceedings,
          <fpage>35</fpage>
          -
          <lpage>42</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <surname>Yucheng</surname>
            <given-names>Jin</given-names>
          </string-name>
          , Nava Tintarev, and
          <string-name>
            <given-names>Katrien</given-names>
            <surname>Verbert</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>Efects of personal characteristics on music recommender systems with diferent levels of controllability</article-title>
          . (
          <year>2018</year>
          ),
          <fpage>13</fpage>
          -
          <lpage>21</lpage>
          . https://doi.org/10.1145/3240323.3240358
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>Daniel</given-names>
            <surname>Kahneman</surname>
          </string-name>
          .
          <year>2011</year>
          .
          <article-title>Thinking, fast and slow</article-title>
          . Macmillan.
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>Iman</given-names>
            <surname>Kamehkhosh</surname>
          </string-name>
          and
          <string-name>
            <given-names>Dietmar</given-names>
            <surname>Jannach</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>User Perception of Next-Track Music Recommendations</article-title>
          . (
          <year>2017</year>
          ),
          <fpage>113</fpage>
          -
          <lpage>121</lpage>
          . https://doi.org/10.1145/3079628. 3079668
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <surname>Bart</surname>
            <given-names>P Knijnenburg</given-names>
          </string-name>
          , Martijn C Willemsen, Zeno Gantner, Hakan Soncu, and
          <string-name>
            <given-names>Chris</given-names>
            <surname>Newell</surname>
          </string-name>
          .
          <year>2012</year>
          .
          <article-title>Explaining the user experience of recommender systems</article-title>
          .
          <source>User Modeling and User-Adapted Interaction 22</source>
          ,
          <fpage>4</fpage>
          -
          <lpage>5</lpage>
          (
          <year>2012</year>
          ),
          <fpage>441</fpage>
          -
          <lpage>504</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>Arto</given-names>
            <surname>Lehtiniemi</surname>
          </string-name>
          and
          <string-name>
            <given-names>Jukka</given-names>
            <surname>Holm</surname>
          </string-name>
          .
          <year>2011</year>
          .
          <article-title>Easy Access to Recommendation Playlists: Selecting Music by Exploring Preview Clips in Album Cover Space</article-title>
          .
          <source>Proceedings of the 10th International Conference on Mobile and Ubiquitous Multimedia</source>
          (
          <year>2011</year>
          ),
          <fpage>94</fpage>
          -
          <lpage>99</lpage>
          . https://doi.org/10.1145/2107596.2107607
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <surname>Martijn</surname>
            <given-names>Millecamp</given-names>
          </string-name>
          , Nyi Nyi Htun, Yucheng Jin, and
          <string-name>
            <given-names>Katrien</given-names>
            <surname>Verbert</surname>
          </string-name>
          .
          <year>2018</year>
          . Controlling Spotify Recommendations. (
          <year>2018</year>
          ),
          <fpage>101</fpage>
          -
          <lpage>109</lpage>
          . https://doi.org/10.1145/ 3209219.3209223
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>Daniel</given-names>
            <surname>Müllensiefen</surname>
          </string-name>
          , Bruno Gingras, Lauren Stewart, and
          <string-name>
            <given-names>Jason</given-names>
            <surname>Ji</surname>
          </string-name>
          .
          <year>2013</year>
          .
          <article-title>Goldsmiths Musical Sophistication Index (Gold-</article-title>
          <source>MSI) v1.0: Technical Report and Documentation Revision</source>
          <volume>0</volume>
          .3.
          <string-name>
            <given-names>Technical</given-names>
            <surname>Report</surname>
          </string-name>
          . Goldsmiths University of London, London. https://www.gold.ac.uk/music-mind-brain/gold-msi/
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>F.</given-names>
            <surname>Pachet</surname>
          </string-name>
          , G. Westermann, and
          <string-name>
            <given-names>D.</given-names>
            <surname>Laigre</surname>
          </string-name>
          .
          <year>2001</year>
          .
          <article-title>Musical data mining for electronic music distribution</article-title>
          .
          <source>Proceedings - 1st International Conference on WEB Delivering of Music, WEDELMUSIC 2001 May</source>
          <year>2014</year>
          (
          <year>2001</year>
          ),
          <fpage>101</fpage>
          -
          <lpage>106</lpage>
          . https://doi.org/10.1109/ WDM.
          <year>2001</year>
          .990164
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>Stefen</given-names>
            <surname>Pauws</surname>
          </string-name>
          and
          <string-name>
            <given-names>Berry</given-names>
            <surname>Eggen</surname>
          </string-name>
          .
          <year>2003</year>
          .
          <article-title>Realization and user evaluation of an automatic playlist generator</article-title>
          .
          <source>Journal of new music research 32</source>
          ,
          <issue>2</issue>
          (
          <year>2003</year>
          ),
          <fpage>179</fpage>
          -
          <lpage>192</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>Alexander</given-names>
            <surname>Rozin</surname>
          </string-name>
          , Paul Rozin, and
          <string-name>
            <given-names>Emily</given-names>
            <surname>Goldberg</surname>
          </string-name>
          .
          <year>2004</year>
          .
          <article-title>The feeling of music past: How listeners remember musical afect</article-title>
          .
          <source>Music Perception: An Interdisciplinary Journal</source>
          <volume>22</volume>
          ,
          <issue>1</issue>
          (
          <year>2004</year>
          ),
          <fpage>15</fpage>
          -
          <lpage>39</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <surname>Thomas</surname>
            <given-names>Schäfer</given-names>
          </string-name>
          , Doreen Zimmermann, and
          <string-name>
            <given-names>Peter</given-names>
            <surname>Sedlmeier</surname>
          </string-name>
          .
          <year>2014</year>
          .
          <article-title>How we remember the emotional intensity of past musical experiences</article-title>
          .
          <source>Frontiers in Psychology</source>
          <volume>5</volume>
          (
          <year>2014</year>
          ),
          <fpage>911</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <surname>Markus</surname>
            <given-names>Schedl</given-names>
          </string-name>
          , Arthur Flexer, and
          <string-name>
            <given-names>Julián</given-names>
            <surname>Urbano</surname>
          </string-name>
          .
          <year>2013</year>
          .
          <article-title>The neglected user in music information retrieval research</article-title>
          .
          <source>Journal of Intelligent Information Systems</source>
          <volume>41</volume>
          ,
          <issue>3</issue>
          (
          <year>2013</year>
          ),
          <fpage>523</fpage>
          -
          <lpage>539</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [24]
          <string-name>
            <surname>Markus</surname>
            <given-names>Schedl</given-names>
          </string-name>
          , Hamed Zamani,
          <string-name>
            <surname>Ching-Wei</surname>
            <given-names>Chen</given-names>
          </string-name>
          , Yashar Deldjoo, and
          <string-name>
            <given-names>Mehdi</given-names>
            <surname>Elahi</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>Current challenges and visions in music recommender systems research</article-title>
          .
          <source>International Journal of Multimedia Information Retrieval 7</source>
          ,
          <issue>2</issue>
          (
          <year>2018</year>
          ),
          <fpage>95</fpage>
          -
          <lpage>116</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          [25]
          <string-name>
            <surname>Morgan</surname>
            <given-names>K Ward</given-names>
          </string-name>
          ,
          <article-title>Joseph K Goodman,</article-title>
          and
          <string-name>
            <surname>Julie R Irwin</surname>
          </string-name>
          .
          <year>2014</year>
          .
          <article-title>The same old song: The power of familiarity in music choice</article-title>
          .
          <source>Marketing Letters</source>
          <volume>25</volume>
          ,
          <issue>1</issue>
          (
          <year>2014</year>
          ),
          <fpage>1</fpage>
          -
          <lpage>11</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          [26]
          <string-name>
            <surname>Eelco</surname>
            <given-names>C. E. J.</given-names>
          </string-name>
          <string-name>
            <surname>Wiechert</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>The peak-end efect in musical playlist experiences</article-title>
          .
          <source>Master</source>
          . Eindhoven University of Technology.
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          [27]
          <string-name>
            <surname>Martijn</surname>
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Willemsen</surname>
            ,
            <given-names>Mark P.</given-names>
          </string-name>
          <string-name>
            <surname>Graus</surname>
            , and
            <given-names>Bart P.</given-names>
          </string-name>
          <string-name>
            <surname>Knijnenburg</surname>
          </string-name>
          .
          <year>2016</year>
          .
          <article-title>Understanding the role of latent feature diversification on choice dificulty and satisfaction</article-title>
          .
          <source>User Modelling and User-Adapted Interaction 26</source>
          ,
          <issue>4</issue>
          (
          <year>2016</year>
          ),
          <fpage>347</fpage>
          -
          <lpage>389</lpage>
          . https://doi.org/10.1007/s11257-016-9178-6
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>