=Paper= {{Paper |id=Vol-2068/milc1 |storemode=property |title=How Automated Recommendations Affect the Playlist Creation Behavior of Users |pdfUrl=https://ceur-ws.org/Vol-2068/milc1.pdf |volume=Vol-2068 |authors=Iman Kamehkhosh,Dietmar Jannach,Geoffray Bonnin |dblpUrl=https://dblp.org/rec/conf/iui/KamehkhoshJB18 }} ==How Automated Recommendations Affect the Playlist Creation Behavior of Users== https://ceur-ws.org/Vol-2068/milc1.pdf
                 How Automated Recommendations Affect the
                     Playlist Creation Behavior of Users
          Iman Kamehkhosh                                   Dietmar Jannach                           Geoffray Bonnin
         TU Dortmund, Germany                               AAU Klagenfurt,                               LORIA
         iman.kamehkhosh@tu-                               Klagenfurt, Austria                         Nancy, France
              dortmund.de                                dietmar.jannach@aau.at                       bonnin@loria.fr


ABSTRACT                                                                 While accurate track relevance predictions are important, such
Modern music platforms like Spotify support users to create              experiments cannot inform us about whether users will actually
new playlists through interactive tools. Given an empty or               adopt the recommendation functionality and how the recom-
initial playlist, these tools often recommend additional songs,          mendations influence their behavior. With this work, we aim
which could be included in the playlist based, e.g., on the title        to obtain a better understanding of these aspects which, to our
of the playlist or the set of tracks that are already in the playlist.   knowledge, have not been studied in that form in the literature
In this work, we analyze in which ways the recommendations               before. We conducted a between-subjects user study involv-
of such playlist construction support tools influence the behav-         ing 123 subjects, where the participants’ task was to create
ior of users and the characteristics of the resulting playlists. We      playlists for a given topic using a web application that was de-
report the results of a between-subjects user study involving            veloped for the study. One group of participants was provided
123 subjects. Our analysis shows that users provided with rec-           with additional recommendation functionality, whereas the
ommendation support were more engaged and explored more                  control group could only rely on the provided search function-
alternatives than the control group. Presumably influenced by            ality. A post-task questionnaire was used to assess additional
the recommender, they also picked significantly less popular             aspects like the perceived difficulty of the task.
items, which leads to a higher potential for discovery. The
                                                                         Our analyses revealed that the recommendation support was
effort required to browse the additional alternatives, however,
                                                                         well accepted by the participants and that recommendations
increased the users’ perceived difficulty of the process.
                                                                         therefore represent a valuable tool for users. Almost half of
ACM Classification Keywords                                              the users that were provided with recommendations picked at
Information Retrieval : Recommender Systems; Human Com-                  least one of the recommended tracks, which is an unusually
puter Interaction (HCI) : User studies                                   high proportion for the domain of recommender systems.
                                                                         Furthermore, we could observe that the selection of tracks was
Author Keywords                                                          seemingly influenced by the recommendations even if no track
Music Recommendation; Playlist Creation; User Study                      was actually included in the playlists, i.e., the recommenda-
                                                                         tions served as inspirations for the participants. Participants
INTRODUCTION                                                             with recommendation support also significantly explored more
Creating and sharing playlists is a common feature of most of            tracks within the same period of time, and they more often
today’s music platforms. The manual construction of playlists            chose less popular tracks based on the suggestions by the
by users can however be a comparably complicated and time                recommender. As users are more engaged and explore more
consuming task [6]. One way to help playlist creators is to              options from the long tail, the recommender therefore measur-
provide them with automated suggestions for additional tracks            ably increases the potential of discovering new tracks or artists.
while they are creating a playlist. Such a functionality can             Also, this increased user engagement can help providers to
be found on some of today’s music platforms, like Spotify                collect more information about the users’ preferences.
and Pandora. In the literature, a number of algorithms were
proposed in the past to determine a set of suitable tracks given,        Finally, the post-task questionnaire, to some surprise, revealed
e.g., a partial playlist [5]. The evaluation of such algorithms is       that the participants found the playlist creation task to be more
mostly based on offline experiments with a focus on prediction           difficult when recommendations were provided. This was the
accuracy, i.e., on the algorithms’ capability of predicting the          case even when they actually did not pick any of the tracks.
next tracks that were picked by users.                                   This observation suggests that the user interface (UI) has to be
                                                                         carefully designed, since parts of this might be caused by the
                                                                         increased complexity of the web application that included a
                                                                         recommendation component.

                                                                         PREVIOUS USER STUDIES
                                                                         Compared to the number of papers that report results of offline
©2018. Copyright for the individual papers remains with the authors.     experiments, the number of user studies is limited. Many of
Copying permitted for private and academic purposes.                     these previous user studies in the music domain focus on how
MILC ’18, March 11, 2018, Tokyo, Japan
users search for music and on the social or contextual aspects
of listening to music [6, 7, 8, 19, 20]. In [6], for example,
interviews and web forum posts related to the construction
of playlists were analyzed. The authors found that playlists
(mixes) are often created with a theme or topic in mind, e.g., a
genre, an event or a mood. In our study, we therefore also ask
participants to create a playlist for one of several given topics.
According to the suggestion in [6], our online application
does not automatically include additional tracks, but presents
recommendations as a side information.
Questions of the UI design were also in the focus in [3] and
[24]. For instance, [3] proposed an interactive track recom-
mendation service called rush, optimized for touchscreen de-
vices and, among other aspects, analyzed its usability for left-                     Figure 1. Web application used in the study.
handed and right-handed users. In the application we used for
our study, recommendations were positioned as a horizontal
list on the bottom of the screen, which is common also for               20 provided recommendations were immediately updated.2
e-commerce sites like Amazon.com.                                        When the playlist contained at least six tracks, the participants
                                                                         could proceed to the post-task questionnaire.
A few studies explore the users’ interactions with music rec-
ommender systems [1, 16] and the quality perception of music             The first part of the questionnaire was presented to all par-
recommendations [2, 17]. The recent work of [17] provided                ticipants and contained a list of quality factors for playlists
evidence that users prefer recommendations that are coherent             mentioned in the literature, which were either related to indi-
with the recently played tracks in different dimensions. Their           vidual tracks (e.g., popularity or freshness) or to the list as a
results also indicated that the participants tend to evaluate rec-       whole (e.g., artist homogeneity). The participants were asked
ommendations better when they know the track or the artist. In           to rank these quality factors or mark them as irrelevant.
contrast to [17], we do not compare different recommendation             In the next step, participants of the Rec group were asked
algorithms but the effects of the existence of a recommender.            if they had looked at the recommendations during the task
Finally, questions related to the factors that influence which           and if so, how they assessed their quality in the following
tracks users choose for inclusion in a playlist in different             dimensions: relevance, novelty, accuracy, diversity (in terms
situations were analyzed in [15, 23] and [25]. The study in              of genre and artist), familiarity, popularity, and freshness.
[15], for example, showed that mood, genre, and artists are              Participants could express their agreement with our provided
the most important factors for users when selecting the tracks,          statements, e.g., “The recommendations were novel”, on a
which is in line with the outcomes of the study of [6]. Similar          7-point Likert scale item or state that they could not tell.
to their work, we explicitly asked users about their decision            In the final step, all participants were asked (a) how often
factors after the task and report the results in Section 5.              they create playlists, (b) about their musical expertise, and (c)
                                                                         how difficult they found the playlist creation task, again using
STUDY DESIGN                                                             7-point Likert scale items. Free text form fields were provided
A general limitation of laboratory studies is that users might           for users to specify which part of the process they considered
behave differently in a “simulated” situation than when they             the most difficult one and for general comments and feedback.
normally listen to music, e.g., at home. To alleviate this prob-         Finally, the participants could specify their age group.
lem, we provided an online application to enable users to
participate in the study when and where they wanted to.                  SPOTIFY’S RECOMMENDATION ALGORITHMS
                                                                         Both the search and the recommendation functionality in the
All participants were asked to create a playlist – using the             study were implemented using the public API of Spotify.3
developed application – for one of the following pre-defined             This allowed us to rely on industry-strength search and rec-
and randomly ordered themes: rock night, road trip, chill out,           ommendation technology. As of 2017, Spotify is a leader
dance party, hip-hop club.1 The participants were randomly               across most streaming-service markets4 and the recommenda-
assigned to one of two groups. One group (called Rec) re-                tions produced by the service result from several years of A/B
ceived additional recommendations, as shown at the bottom                testing [21].
of Figure 1. The control group (NoRec) was shown the same
interface but without the recommendation bar at the bottom.              Companies do not usually reveal the details of the algorithms
                                                                         that they use. According to the documentation of the Web API
All participants could use the provided search functionality.            of Spotify, recommendations are aimed to create a playlist-
When a user of the Rec group added an item to the playlist, the          style listening experience based on seed artists, tracks and
1 Note that such themes actually also convey an intended use or pur-     2 Recommendations were displayed after the first track was added.
                                                                         3 https://developer.spotify.com/web-api/
pose. The best example is the “road trip” theme, whose corresponding
playlists are supposed to be played when driving a car for a prolonged   4 http://www.midiaresearch.com/downloads/
amount of time.                                                          midia-streaming-music-metrics-bundle/
genres. In this context, the available information for a given            Table 1. Description of the collected information for the tracks,
                                                                          as provided by Spotify: https://developer.spotify.com/web-api/
seed item is used to find similar artists and tracks.                     get-audio-features/.
Some additional information can be obtained from what the                   Information       Description
company presents in public presentations and industry reports.              Acousticness      Absence of electrical modifications in a track.
Public presentations around the year 2014, such as [13] and                 Danceability      Suitability of a track for dancing, based on various
[14] indicate that Spotify relied at that time mainly on collab-                              information including the beat strength, tempo, and
                                                                                              the stability of the rhythm.
orative filtering techniques for generating recommendations.
                                                                            Energy            Intensity released throughout a track, based on
The latest presentation was made right after Spotify acquired                                 various information including the loudness and
The Echo Nest, a music intelligence platform that focused on                                  segment durations.
the analysis of audio content, and Spotify announced they were              Instrumentalness Absence of vocal content in a track.
going to also utilize content-based techniques. In more recent              Liveness          Presence of an audience in the recording.
presentations, such as [22], the authors report that Spotify uses           Loudness          Overall loudness of a track in decibels (dB).
an ensemble of different techniques including NLP models                    Popularity        Popularity of a track, based on the its total number of
and Recurrent Neural Networks as well as explicit feedback                                    plays and the recency of those plays.
signals (e.g., thumbs-up / thumbs-down), and also audio fea-                Release year      Year of release of a track.
tures for certain recommendation tasks, but it is unclear from              Speechiness       Presence of spoken words in a track.
the presentations which techniques are used for which types of              Tempo             Speed of a track estimated in beats per minute
recommendations (radios, weekly recommendations, playlists,                                   (BPM).
etc.).                                                                      Valence           Musical positiveness conveyed by a track.

RESULTS AND OBSERVATIONS
                                                                          recommendations (mean=3.2 recommendations per playlist).
General Statistics                                                        This is a strong indicator of the general usefulness of a rec-
Participants. Most of the 123 participants who completed the              ommendation component in that domain, considering that in
study were students of universities in Germany and Brazil; a              general e-commerce settings sometimes only every 100th click
smaller part was recruited via invitations on social network              of a user is on a recommendation list and recommendations
sites. Most (84%) of the participants were aged between 20                are used only in about 8% of the shopping sessions [12].
and 40. On a scale between 1 and 7, the median of the self-
reported experience with music was 5, i.e., the majority of the           Increased exploration. The participants that received recom-
participants considered themselves experienced or interested              mendations played significantly6 more tracks when creating
in music. Most of the participants, however, do not create                their playlist than the participants of the NoRec group (mean
playlists regularly; about 25% answered the question with a 5             value 14.4 and 9.8, respectively). This value is even higher
or greater (median=3). The median of the perceived difficulty             for the RecUsed subgroup, i.e., those who actively used the
of the playlist creation task was 4, i.e., the majority of the            recommendations, with an average of 20.3. However, the
subjects found the task comparably difficult.5                            participants of the Rec group, on average, only needed 30
                                                                          seconds more to create the playlists7 , i.e., the recommender
Topics and playlist length. Rock night and road trip were                 helped them explore, and potentially discover, many more
the most often selected themes. Each of them was selected in              options in about the same time.
about 30% of all trials. Chill out (20%), dance party (15%) and
hip-hop club (5%) were less frequently chosen. The average                Such a high exploration is interesting as it not only increases
time for the participants to create one playlist (with at least 6         the chances of music discovery but also allows the service
tracks) was 7.29 minutes and a created playlists contained, on            provider to gather more information about users’ preferences.
average, 8.44 tracks.                                                     This can in turn lead to providing better recommendations
                                                                          and a better listening experience for users, as well as higher
Recommendation use. 57% of the participants were assigned                 customer retention and business value for the provider [10].
to the Rec group (with recommendations). Almost half of
these participants (49%) drag-and-dropped at least one of the             Difficulty of playlist creation. To some surprise, the recommen-
recommended tracks to their playlists. We denote this group               dation component did not make the playlist creation task easier
as RecUsed. The other half will be denoted as RecNotUsed.                 for users but slightly added to the perceived complexity, with
                                                                          a median of 4 (higher complexity) for the Rec group and 3 for
Study Outcomes                                                            the NoRec group. This slight but not statistically significant
Impact of Recommendations on Users and their Behavior                     difference could be caused by the more complex recommenda-
Adoption of recommendations. When users actively used the                 tion UI, as well as by the fact that users explored more options
recommendations, they relied on them to a significant extent.             in the Rec condition as mentioned above (see, for example,
At the end, about 38% of the tracks of the playlists that were            [4] for a discussion on “choice overload” in recommender
created by users of the RecUsed group were taken from the                 systems).
5 The collected data is ordinal, i.e., a ranking of the response levels   6 To test for statistical significance for the ordinal data, we use the
is possible. However, we cannot assume equidistance between the           Mann-Whitney U test and for the interval data we use the Student’s
response levels, and reporting mean and standard deviation values is      t-test, both with p < 0.05.
often considered questionable in the literature.                          7 The differences were not statistically significant.
Table 2. Average (Avg) and standard deviation (Std) of the musical fea-            Table 3. Modified Borda Count: Ranking of Playlist Quality Criteria.
tures of the resulting playlists in different groups. * indicates statistical       Criteria                 All   RecUsed RecNotUsed NoRec
significance in comparison with the RecUsed group.                                  Homogeneity of musical   250      68            79          103
                         RecUsed          RecNotUsed           NoRec                features, e.g., tempo
                      (34 participants)   (36 participants)   (53 participants)     Artist diversity         195      55            62          78
 Feature               Avg       Std       Avg       Std       Avg       Std        Transition               122      30            46          46
 Acousticness          0.22      0.28     0.17*      0.23     0.17*     0.26        Popularity               106      39            34          33
 Danceability          0.56      0.18     0.59*      0.17      0.54     0.17        Lyrics                    95      32            34          29
 Energy                0.68      0.24      0.70      0.19     0.73*     0.23        Order                     74      12            33          29
 Instrumentalness      0.16      0.32      0.12      0.28      0.12     0.27        Freshness                 32      12            11           9
 Liveness              0.20      0.17      0.21      0.18      0.21     0.17
 Loudness (dB)        -7.68      4.59     -7.60      3.67     -7.52     4.72
 Popularity            50.7      21.9     55.7*      17.1     54.3*     21.3
                                                                                  differences between the recommendations and the tracks of
 Release year         2005      12.47     2002*     15.73     2003*     13.08
                                                                                  the RecNotUsed participants are also no longer significant. It
 Speechiness           0.07      0.07      0.08      0.08      0.08     0.08
                                                                                  is thus possible that the subjects in this group were biased (or
 Tempo (BPM)          123.0      28.7     122.5      27.9     122.9     28.6
                                                                                  inspired) by the presence of the recommendations. One indica-
 Valence               0.50      0.26      0.53      0.25      0.49     0.24
                                                                                  tor in favor of that possibility is that an overlap of 34% could
                                                                                  be observed in terms of the artists that appeared in the recom-
                                                                                  mendations and that were selected manually by the subjects.
Impact of Recommendations on the Resulting Playlists
                                                                                  This means that the recommendations presumably influenced
To analyze the impact of the provided recommendations on                          what users selected. This effect was previously investigated in
the resulting playlists, we queried the musical features of the                   a user study [18], where the participants exhibited a tendency
tracks contained in the created playlists through Spotify’s API.                  to select items that were content-wise similar to a (random)
Table 1 shows a list of these features. The average values and                    recommendation.
standard deviations of the features for each of the study groups
are shown in Table 2. Several differences can be observed. We                     A possible explanation why participants still selected tracks
limit the discussion here to the most pronounced and statisti-                    that are on average more popular than the recommended ones
cally significant effects, which can be found with respect to                     might lie in the intended use or context of the playlist to be
the popularity and the freshness of the tracks that were used                     created. If participants, for example, assumed that the playlist
by the participants.                                                              is designed to be played for a group of listeners (e.g., at a dance
                                                                                  party), they might prefer to pick tracks that are presumably
Popularity effect – promoting less popular items.8 Users of                       known by many people. In fact, the tracks selected for “dance”
the recommendation service added significantly less popular                       playlists were significantly more popular than those for “chill
tracks to their playlists. This is in line with the observations                  out” playlists.
from [11] where the recommendations by a commercial ser-
vice were less popular (in terms of play-counts) than the tracks                  Investigating Quality Criteria for Playlists
that users selected manually.                                                     In order to better understand which quality characteristics one
Recency effect – promoting newer tracks. Using recommenda-                        should consider when designing algorithms for playlist con-
tions also slightly but statistically significantly increased the                 struction support, we analyzed the rankings that were provided
freshness (release year) of the selected tracks. About 50% of                     by the participants in the post-task questionnaire, see the sec-
the tracks of the playlists created by the participants who used                  tion on the study design. To determine the overall ranking,
the recommendations (RecUsed) were released in the last five                      we used the Modified Borda Count method [9], which can be
years. This value is 40% for the RecNotUsed group and 34%                         applied when some rankings are only partial, i.e., when not all
for the NoRec group.                                                              items are ranked. The results are shown in Table 3.
“Mere-Presence” Effect of Recommendations                                         The results indicate that, overall, the participants consider the
When we compare the average musical features of the tracks                        homogeneity of musical features, e.g., tempo, energy or loud-
recommended to the Rec group and those that were manually                         ness along with the artist diversity of the resulting playlist as
selected by the control group (NoRec), we can observe sev-                        the most relevant quality criteria for playlists. On the other end
eral significant differences regarding, e.g., danceablity, energy,                of the spectrum, the order of the tracks in a playlist and their
popularity, freshness (release year), speechiness, or tempo                       freshness were the least relevant aspects for the participants.
i.e., the recommender often picks quite different tracks than                     When looking at the different study groups, some smaller and
users would do, see [11]. On the other hand, when com-                            not statistically significant differences in the rankings can be
paring the recommendations to what the participants in the                        observed. Participants who used the recommendations con-
RecUsed group selected, the differences are no longer statisti-                   sidered track transitions less relevant than participants of the
cally significant, except for the popularity aspect. This is not                  other groups. One potential explanation could be that the
surprising, as these participants often accepted the recommen-                    recommendations by the system (and, likewise, the created
dations. The somewhat surprising aspect, however, is that the                     playlists) were perceived by users to be comparably coherent,
8 The popularity of the tracks were also determined using the Spotify’s           e.g., in terms of the tempo, and the participants of the Rec
API, with values between 0 and 100 (lowest to highest popularity).                group therefore paid less attention to the transitions. Another
explanation is that the RecNotUsed participants are in general      tance of factors like the appropriate popularity level of the
more demanding, and did not use some of the recommenda-             tracks – according to our analysis – can depend on the topic
tions because they did not allow to make satisfying transitions.    and intended purpose of the playlist.
Other differences with respect to the quality criteria rankings
were observed for the “lyrics” aspect, which was considered         Qualitative Analysis
more important for road-trip and hip-hop playlists. Finally,        We finally asked the participants to specify what they found to
popularity was considered a very important criterion almost         be the most difficult aspect when creating a playlist. To remem-
exclusively for dance playlists for the above-discussed reasons.    ber the right tracks was most often mentioned as a difficult
                                                                    task by the participants (e.g., “Keeping up with the options. I
General Quality Perception of the Recommendations                   often can’t remember all the songs and performers I know”).
About 75% of the participants in the Rec group stated that          To find good music for everyone was another difficult task indi-
they have looked at the recommendations when constructing           cated by many (e.g., “To meet the taste of all, as playlists are
the playlists. In the post-task questionnaire, these users were     often created for occasions in which several people listen to
asked about their quality perception regarding the recommen-        them”). Some participants also emphasized on the smoothness
dations (as provided by Spotify’s service). Different quality       of the track transitions (e.g., “The most difficult thing is to
dimensions were assessed and users could express their agree-       think of tracks that are connected on several dimensions at the
ment with related statements using 7-point Likert scale items.      same time such as their energy, the transitions, their level of
In our analysis, we considered answers that were greater than       sophistication, etc.”).
4 as positive responses.                                            Recommendation tools have the potential to help users in many
Specifically, we asked users to what extent the recommenda-         of these dimensions, e.g., remind them of tracks or artists they
tions (1) matched the given topic; (2) helped them discover         know and liked in the past, to find tracks that are popular in
novel tracks or artists; (3) matched their general interests; (4)   a certain community, or to find tracks that are coherent with
were already known to them; (5) were diverse in terms of            the current playlist with respect to musical features. In fact,
genre and artist; (6) were generally popular or mainstream;         several participants that had no recommendation support have
and (7) were from trending music.                                   written about the potential advantages of having such a system:
                                                                    “Sometimes I do not know what to search for and I expect the
The results show that 62% of the respondents perceived the          system suggest something. I prefer to choose just some songs
recommendation as topic-related (e.g., they match the playlist      and then listen/receive recommendations related to my initial
topic). More than half of the respondents also found the rec-       choices”.
ommendations to match their interests and diverse in terms
of genre and artist. With respect to novelty and familiarity,
                                                                    CONCLUSIONS
the results were mixed. In both cases many participants pro-
                                                                    With our study we aim to shed more light on the perception
vided feedback with values above 4 (e.g., 45% in case of the
                                                                    and adoption of recommender systems that are designed to
novelty); a substantial amount of the participants however
                                                                    support users in the playlist construction process. The results
also found the recommendations of very limited novelty and
                                                                    of the study not only showed a high general adoption rate –
familiarity.
                                                                    almost half of the participants picked at least one track from
We further compared the quality perception of the recommen-         the provided recommendations, while the proportion in the
dations for the two groups of RecUsed and RecNotUsed. The           domain of recommender systems is often lower than 10% – but
only statistically significant difference we observed was in        also that the presentation of recommendations can indirectly
terms of novelty. In other words, those participants who used       influence the choices of the users. Our study, therefore, pro-
the recommendations in their playlists believed that the rec-       vides additional evidence that recommenders can be a valuable
ommendations were significantly more novel (median=5) than          tool to steer consumer behavior.
the other participants who looked at the recommendations but
                                                                    Generally, participants who received recommendations also
did not accept them (median=3). One plausible explanation
                                                                    explored more alternatives during the process (without need-
of this result could be that the participants who did not use
                                                                    ing much more time), which can be interpreted as a higher
the recommendations had a higher music expertise and were
                                                                    engagement in the task. As a result, a recommender system
more demanding. However, this explanation is not in line
                                                                    has the potential to increase the involvement of the user with
with what the participants claimed in their answers, as the
                                                                    the service, which from a business perspective in the best case
RecUsed group had a generally higher music expertise score
                                                                    leads to higher customer retention rates.
than the RecNotUsed group (although the difference was not
statistically significant). Another explanation is that partici-    Finally, the study also showed that including a recommenda-
pants were generally willing to discover novel music and the        tion component can lead to a higher perceived difficulty of the
recommendations were mainly adopted by those participants           task for users, which emphasizes the importance of focusing
who found the recommendations to be so.                             on intuitive UI designs.
Overall, the results indicate that Spotify’s algorithms were
quite successful in determining suitable tracks for recommen-       ACKNOWLEDGEMENTS
dation. At the same time, the mixed results for some other          We would like to thank Adrian Skirzynski who contributed to
factors are not necessarily negative either, because the impor-     this work with the results of his master’s thesis.
REFERENCES                                                     14. Chris Johnson and Edward Newett. 2014. From Idea to
 1. Ivana Andjelkovic, Denis Parra, and John O’Donovan.            Execution: Spotify’s Discover Weekly. Online. (2014).
    2016. Moodplay: Interactive Mood-based Music                   https://de.slideshare.net/MrChrisJohnson/
    Discovery and Recommendation. In UMAP ’16.                     from-idea-to-execution-spotifys-discover-weekly/
    275–279.                                                       12-Insight_users_spending_more_time
 2. Luke Barrington, Reid Oda, and Gert R. G. Lanckriet.       15. Mohsen Kamalzadeh, Dominikus Baur, and Torsten
    2009. Smarter than Genius? Human Evaluation of Music           Möller. 2012. A Survey on Music Listening and
    Recommender Systems. In ISMIR ’09. 357–362.                    Management Behaviours. In ISMIR ’12. 373–378.
 3. Dominikus Baur, Sebastian Boring, and Andreas Butz.        16. Mohsen Kamalzadeh, Christoph Kralj, Torsten Möller,
    2010. Rush: Repeated Recommendations on Mobile                 and Michael Sedlmair. 2016. TagFlip: Active Mobile
    Devices. In IUI ’10. 91–100.                                   Music Discovery with Social Tags. In IUI ’16. 19–30.
 4. Dirk Bollen, Bart P. Knijnenburg, Martijn C. Willemsen,    17. Iman Kamehkhosh and Dietmar Jannach. 2017. User
    and Mark Graus. 2010. Understanding Choice Overload            Perception of Next-Track Music Recommendations. In
    in Recommender Systems. In RecSys ’10. 63–70.                  UMAP ’17. 113–121.
 5. Geoffray Bonnin and Dietmar Jannach. 2014. Automated       18. Sören Köcher, Dietmar Jannach, Michael Jugovac, and
    Generation of Music Playlists: Survey and Experiments.         Hartmut H. Holzmüller. 2016. Investigating
    ACM Computing Surveys 47, 2 (2014), 26:1–26:35.                Mere-Presence Effects of Recommendations on the
 6. Sally Jo. Cunningham, David. Bainbridge, and Annette.          Consumer Choice Process. In IntRS Workshop at RecSys
    Falconer. 2006. ‘More of an Art than a Science’:              ’16. 2–5.
    Supporting the Creation of Playlists and Mixes. In ISMIR   19. Alexandra Lamont and Rebecca Webb. 2011. Short- and
   ’06. 240–245.                                                   Long-term Musical Preferences: What Makes a Favourite
 7. Sally Jo Cunningham, David Bainbridge, and Dana                Piece of Music? Psychology of Music 38, 2 (2011),
    McKay. 2007. Finding New Music: A Diary Study of               222–241.
    Everyday Encounters with Novel Songs. In ISMIR ’07.        20. Jin Ha Lee and J Stephen Downie. 2004. Survey Of
    83–88.                                                         Music Information Needs, Uses, And Seeking
 8. J. Stephen Downie. 2003. Music Information Retrieval.          Behaviours: Preliminary Findings. In ISMIR ’04.
    Annual Review of Information Science and Technology            441–446.
    37, 1 (2003), 295–340.                                     21. Ali Sarrafi and Evan Shrubsole. 2015. A/B testing at
 9. Peter Emerson. 2013. The Original Borda Count and              Spotify. Online. (2015). https:
    Partial Voting. Social Choice and Welfare 40, 2 (2013),        //de.slideshare.net/alisarrafi3/ab-testing-at-spotify
    353–358.                                                   22. Harald Steck, Roelof van Zwol, and Chris Johnson. 2015.
10. Dietmar Jannach and Gedas Adomavicius. 2016.                   Interactive Recommender Systems with Netflix and
    Recommendations with a Purpose. In RecSys ’16. 7–10.           Spotify. Online. (2015).
                                                                   http://de.slideshare.net/MrChrisJohnson/
11. Dietmar Jannach, Iman Kamehkhosh, and Geoffray                 interactive-recommender-systems-with-netflix-and-spotify
    Bonnin. 2016. Biases in Automated Music Playlist
    Generation: A Comparison of Next-Track                     23. S. Stumpf and S. Muscroft. 2011. When Users Generate
    Recommending Techniques. In UMAP ’16. 281–285.                 Music Playlists: When Words Leave Off, Music Begins?.
                                                                   In ICME ’11. 1–6.
12. Dietmar Jannach, Malte Ludewig, and Lukas Lerche.
    2017. Session-based Item Recommendation in                 24. Kirsten Swearingen and Rashmi Sinha. 2002. Interaction
    E-Commerce: On Short-Term Intents, Reminders, Trends,          Design for Recommender Systems. In Designing
    and Discounts. User-Modeling and User-Adapted                  Interactive Systems, ACM.
    Interaction (2017).
                                                               25. Nava Tintarev, Christoph Lofi, and Cynthia C.S. Liem.
13. Chris Johnson. 2014. Algorithmic Music Discovery at            2017. Sequences of Diverse Song Recommendations: An
    Spotify. Online. (2014).                                       Exploratory Study in a Commercial System. In UMAP
    https://de.slideshare.net/MrChrisJohnson/                     ’17. 391–392.
    algorithmic-music-recommendations-at-spotify