How Automated Recommendations Affect the Playlist Creation Behavior of Users Iman Kamehkhosh Dietmar Jannach Geoffray Bonnin TU Dortmund, Germany AAU Klagenfurt, LORIA iman.kamehkhosh@tu- Klagenfurt, Austria Nancy, France dortmund.de dietmar.jannach@aau.at bonnin@loria.fr ABSTRACT While accurate track relevance predictions are important, such Modern music platforms like Spotify support users to create experiments cannot inform us about whether users will actually new playlists through interactive tools. Given an empty or adopt the recommendation functionality and how the recom- initial playlist, these tools often recommend additional songs, mendations influence their behavior. With this work, we aim which could be included in the playlist based, e.g., on the title to obtain a better understanding of these aspects which, to our of the playlist or the set of tracks that are already in the playlist. knowledge, have not been studied in that form in the literature In this work, we analyze in which ways the recommendations before. We conducted a between-subjects user study involv- of such playlist construction support tools influence the behav- ing 123 subjects, where the participants’ task was to create ior of users and the characteristics of the resulting playlists. We playlists for a given topic using a web application that was de- report the results of a between-subjects user study involving veloped for the study. One group of participants was provided 123 subjects. Our analysis shows that users provided with rec- with additional recommendation functionality, whereas the ommendation support were more engaged and explored more control group could only rely on the provided search function- alternatives than the control group. Presumably influenced by ality. A post-task questionnaire was used to assess additional the recommender, they also picked significantly less popular aspects like the perceived difficulty of the task. items, which leads to a higher potential for discovery. The Our analyses revealed that the recommendation support was effort required to browse the additional alternatives, however, well accepted by the participants and that recommendations increased the users’ perceived difficulty of the process. therefore represent a valuable tool for users. Almost half of ACM Classification Keywords the users that were provided with recommendations picked at Information Retrieval : Recommender Systems; Human Com- least one of the recommended tracks, which is an unusually puter Interaction (HCI) : User studies high proportion for the domain of recommender systems. Furthermore, we could observe that the selection of tracks was Author Keywords seemingly influenced by the recommendations even if no track Music Recommendation; Playlist Creation; User Study was actually included in the playlists, i.e., the recommenda- tions served as inspirations for the participants. Participants INTRODUCTION with recommendation support also significantly explored more Creating and sharing playlists is a common feature of most of tracks within the same period of time, and they more often today’s music platforms. The manual construction of playlists chose less popular tracks based on the suggestions by the by users can however be a comparably complicated and time recommender. As users are more engaged and explore more consuming task [6]. One way to help playlist creators is to options from the long tail, the recommender therefore measur- provide them with automated suggestions for additional tracks ably increases the potential of discovering new tracks or artists. while they are creating a playlist. Such a functionality can Also, this increased user engagement can help providers to be found on some of today’s music platforms, like Spotify collect more information about the users’ preferences. and Pandora. In the literature, a number of algorithms were proposed in the past to determine a set of suitable tracks given, Finally, the post-task questionnaire, to some surprise, revealed e.g., a partial playlist [5]. The evaluation of such algorithms is that the participants found the playlist creation task to be more mostly based on offline experiments with a focus on prediction difficult when recommendations were provided. This was the accuracy, i.e., on the algorithms’ capability of predicting the case even when they actually did not pick any of the tracks. next tracks that were picked by users. This observation suggests that the user interface (UI) has to be carefully designed, since parts of this might be caused by the increased complexity of the web application that included a recommendation component. PREVIOUS USER STUDIES Compared to the number of papers that report results of offline ©2018. Copyright for the individual papers remains with the authors. experiments, the number of user studies is limited. Many of Copying permitted for private and academic purposes. these previous user studies in the music domain focus on how MILC ’18, March 11, 2018, Tokyo, Japan users search for music and on the social or contextual aspects of listening to music [6, 7, 8, 19, 20]. In [6], for example, interviews and web forum posts related to the construction of playlists were analyzed. The authors found that playlists (mixes) are often created with a theme or topic in mind, e.g., a genre, an event or a mood. In our study, we therefore also ask participants to create a playlist for one of several given topics. According to the suggestion in [6], our online application does not automatically include additional tracks, but presents recommendations as a side information. Questions of the UI design were also in the focus in [3] and [24]. For instance, [3] proposed an interactive track recom- mendation service called rush, optimized for touchscreen de- vices and, among other aspects, analyzed its usability for left- Figure 1. Web application used in the study. handed and right-handed users. In the application we used for our study, recommendations were positioned as a horizontal list on the bottom of the screen, which is common also for 20 provided recommendations were immediately updated.2 e-commerce sites like Amazon.com. When the playlist contained at least six tracks, the participants could proceed to the post-task questionnaire. A few studies explore the users’ interactions with music rec- ommender systems [1, 16] and the quality perception of music The first part of the questionnaire was presented to all par- recommendations [2, 17]. The recent work of [17] provided ticipants and contained a list of quality factors for playlists evidence that users prefer recommendations that are coherent mentioned in the literature, which were either related to indi- with the recently played tracks in different dimensions. Their vidual tracks (e.g., popularity or freshness) or to the list as a results also indicated that the participants tend to evaluate rec- whole (e.g., artist homogeneity). The participants were asked ommendations better when they know the track or the artist. In to rank these quality factors or mark them as irrelevant. contrast to [17], we do not compare different recommendation In the next step, participants of the Rec group were asked algorithms but the effects of the existence of a recommender. if they had looked at the recommendations during the task Finally, questions related to the factors that influence which and if so, how they assessed their quality in the following tracks users choose for inclusion in a playlist in different dimensions: relevance, novelty, accuracy, diversity (in terms situations were analyzed in [15, 23] and [25]. The study in of genre and artist), familiarity, popularity, and freshness. [15], for example, showed that mood, genre, and artists are Participants could express their agreement with our provided the most important factors for users when selecting the tracks, statements, e.g., “The recommendations were novel”, on a which is in line with the outcomes of the study of [6]. Similar 7-point Likert scale item or state that they could not tell. to their work, we explicitly asked users about their decision In the final step, all participants were asked (a) how often factors after the task and report the results in Section 5. they create playlists, (b) about their musical expertise, and (c) how difficult they found the playlist creation task, again using STUDY DESIGN 7-point Likert scale items. Free text form fields were provided A general limitation of laboratory studies is that users might for users to specify which part of the process they considered behave differently in a “simulated” situation than when they the most difficult one and for general comments and feedback. normally listen to music, e.g., at home. To alleviate this prob- Finally, the participants could specify their age group. lem, we provided an online application to enable users to participate in the study when and where they wanted to. SPOTIFY’S RECOMMENDATION ALGORITHMS Both the search and the recommendation functionality in the All participants were asked to create a playlist – using the study were implemented using the public API of Spotify.3 developed application – for one of the following pre-defined This allowed us to rely on industry-strength search and rec- and randomly ordered themes: rock night, road trip, chill out, ommendation technology. As of 2017, Spotify is a leader dance party, hip-hop club.1 The participants were randomly across most streaming-service markets4 and the recommenda- assigned to one of two groups. One group (called Rec) re- tions produced by the service result from several years of A/B ceived additional recommendations, as shown at the bottom testing [21]. of Figure 1. The control group (NoRec) was shown the same interface but without the recommendation bar at the bottom. Companies do not usually reveal the details of the algorithms that they use. According to the documentation of the Web API All participants could use the provided search functionality. of Spotify, recommendations are aimed to create a playlist- When a user of the Rec group added an item to the playlist, the style listening experience based on seed artists, tracks and 1 Note that such themes actually also convey an intended use or pur- 2 Recommendations were displayed after the first track was added. 3 https://developer.spotify.com/web-api/ pose. The best example is the “road trip” theme, whose corresponding playlists are supposed to be played when driving a car for a prolonged 4 http://www.midiaresearch.com/downloads/ amount of time. midia-streaming-music-metrics-bundle/ genres. In this context, the available information for a given Table 1. Description of the collected information for the tracks, as provided by Spotify: https://developer.spotify.com/web-api/ seed item is used to find similar artists and tracks. get-audio-features/. Some additional information can be obtained from what the Information Description company presents in public presentations and industry reports. Acousticness Absence of electrical modifications in a track. Public presentations around the year 2014, such as [13] and Danceability Suitability of a track for dancing, based on various [14] indicate that Spotify relied at that time mainly on collab- information including the beat strength, tempo, and the stability of the rhythm. orative filtering techniques for generating recommendations. Energy Intensity released throughout a track, based on The latest presentation was made right after Spotify acquired various information including the loudness and The Echo Nest, a music intelligence platform that focused on segment durations. the analysis of audio content, and Spotify announced they were Instrumentalness Absence of vocal content in a track. going to also utilize content-based techniques. In more recent Liveness Presence of an audience in the recording. presentations, such as [22], the authors report that Spotify uses Loudness Overall loudness of a track in decibels (dB). an ensemble of different techniques including NLP models Popularity Popularity of a track, based on the its total number of and Recurrent Neural Networks as well as explicit feedback plays and the recency of those plays. signals (e.g., thumbs-up / thumbs-down), and also audio fea- Release year Year of release of a track. tures for certain recommendation tasks, but it is unclear from Speechiness Presence of spoken words in a track. the presentations which techniques are used for which types of Tempo Speed of a track estimated in beats per minute recommendations (radios, weekly recommendations, playlists, (BPM). etc.). Valence Musical positiveness conveyed by a track. RESULTS AND OBSERVATIONS recommendations (mean=3.2 recommendations per playlist). General Statistics This is a strong indicator of the general usefulness of a rec- Participants. Most of the 123 participants who completed the ommendation component in that domain, considering that in study were students of universities in Germany and Brazil; a general e-commerce settings sometimes only every 100th click smaller part was recruited via invitations on social network of a user is on a recommendation list and recommendations sites. Most (84%) of the participants were aged between 20 are used only in about 8% of the shopping sessions [12]. and 40. On a scale between 1 and 7, the median of the self- reported experience with music was 5, i.e., the majority of the Increased exploration. The participants that received recom- participants considered themselves experienced or interested mendations played significantly6 more tracks when creating in music. Most of the participants, however, do not create their playlist than the participants of the NoRec group (mean playlists regularly; about 25% answered the question with a 5 value 14.4 and 9.8, respectively). This value is even higher or greater (median=3). The median of the perceived difficulty for the RecUsed subgroup, i.e., those who actively used the of the playlist creation task was 4, i.e., the majority of the recommendations, with an average of 20.3. However, the subjects found the task comparably difficult.5 participants of the Rec group, on average, only needed 30 seconds more to create the playlists7 , i.e., the recommender Topics and playlist length. Rock night and road trip were helped them explore, and potentially discover, many more the most often selected themes. Each of them was selected in options in about the same time. about 30% of all trials. Chill out (20%), dance party (15%) and hip-hop club (5%) were less frequently chosen. The average Such a high exploration is interesting as it not only increases time for the participants to create one playlist (with at least 6 the chances of music discovery but also allows the service tracks) was 7.29 minutes and a created playlists contained, on provider to gather more information about users’ preferences. average, 8.44 tracks. This can in turn lead to providing better recommendations and a better listening experience for users, as well as higher Recommendation use. 57% of the participants were assigned customer retention and business value for the provider [10]. to the Rec group (with recommendations). Almost half of these participants (49%) drag-and-dropped at least one of the Difficulty of playlist creation. To some surprise, the recommen- recommended tracks to their playlists. We denote this group dation component did not make the playlist creation task easier as RecUsed. The other half will be denoted as RecNotUsed. for users but slightly added to the perceived complexity, with a median of 4 (higher complexity) for the Rec group and 3 for Study Outcomes the NoRec group. This slight but not statistically significant Impact of Recommendations on Users and their Behavior difference could be caused by the more complex recommenda- Adoption of recommendations. When users actively used the tion UI, as well as by the fact that users explored more options recommendations, they relied on them to a significant extent. in the Rec condition as mentioned above (see, for example, At the end, about 38% of the tracks of the playlists that were [4] for a discussion on “choice overload” in recommender created by users of the RecUsed group were taken from the systems). 5 The collected data is ordinal, i.e., a ranking of the response levels 6 To test for statistical significance for the ordinal data, we use the is possible. However, we cannot assume equidistance between the Mann-Whitney U test and for the interval data we use the Student’s response levels, and reporting mean and standard deviation values is t-test, both with p < 0.05. often considered questionable in the literature. 7 The differences were not statistically significant. Table 2. Average (Avg) and standard deviation (Std) of the musical fea- Table 3. Modified Borda Count: Ranking of Playlist Quality Criteria. tures of the resulting playlists in different groups. * indicates statistical Criteria All RecUsed RecNotUsed NoRec significance in comparison with the RecUsed group. Homogeneity of musical 250 68 79 103 RecUsed RecNotUsed NoRec features, e.g., tempo (34 participants) (36 participants) (53 participants) Artist diversity 195 55 62 78 Feature Avg Std Avg Std Avg Std Transition 122 30 46 46 Acousticness 0.22 0.28 0.17* 0.23 0.17* 0.26 Popularity 106 39 34 33 Danceability 0.56 0.18 0.59* 0.17 0.54 0.17 Lyrics 95 32 34 29 Energy 0.68 0.24 0.70 0.19 0.73* 0.23 Order 74 12 33 29 Instrumentalness 0.16 0.32 0.12 0.28 0.12 0.27 Freshness 32 12 11 9 Liveness 0.20 0.17 0.21 0.18 0.21 0.17 Loudness (dB) -7.68 4.59 -7.60 3.67 -7.52 4.72 Popularity 50.7 21.9 55.7* 17.1 54.3* 21.3 differences between the recommendations and the tracks of Release year 2005 12.47 2002* 15.73 2003* 13.08 the RecNotUsed participants are also no longer significant. It Speechiness 0.07 0.07 0.08 0.08 0.08 0.08 is thus possible that the subjects in this group were biased (or Tempo (BPM) 123.0 28.7 122.5 27.9 122.9 28.6 inspired) by the presence of the recommendations. One indica- Valence 0.50 0.26 0.53 0.25 0.49 0.24 tor in favor of that possibility is that an overlap of 34% could be observed in terms of the artists that appeared in the recom- mendations and that were selected manually by the subjects. Impact of Recommendations on the Resulting Playlists This means that the recommendations presumably influenced To analyze the impact of the provided recommendations on what users selected. This effect was previously investigated in the resulting playlists, we queried the musical features of the a user study [18], where the participants exhibited a tendency tracks contained in the created playlists through Spotify’s API. to select items that were content-wise similar to a (random) Table 1 shows a list of these features. The average values and recommendation. standard deviations of the features for each of the study groups are shown in Table 2. Several differences can be observed. We A possible explanation why participants still selected tracks limit the discussion here to the most pronounced and statisti- that are on average more popular than the recommended ones cally significant effects, which can be found with respect to might lie in the intended use or context of the playlist to be the popularity and the freshness of the tracks that were used created. If participants, for example, assumed that the playlist by the participants. is designed to be played for a group of listeners (e.g., at a dance party), they might prefer to pick tracks that are presumably Popularity effect – promoting less popular items.8 Users of known by many people. In fact, the tracks selected for “dance” the recommendation service added significantly less popular playlists were significantly more popular than those for “chill tracks to their playlists. This is in line with the observations out” playlists. from [11] where the recommendations by a commercial ser- vice were less popular (in terms of play-counts) than the tracks Investigating Quality Criteria for Playlists that users selected manually. In order to better understand which quality characteristics one Recency effect – promoting newer tracks. Using recommenda- should consider when designing algorithms for playlist con- tions also slightly but statistically significantly increased the struction support, we analyzed the rankings that were provided freshness (release year) of the selected tracks. About 50% of by the participants in the post-task questionnaire, see the sec- the tracks of the playlists created by the participants who used tion on the study design. To determine the overall ranking, the recommendations (RecUsed) were released in the last five we used the Modified Borda Count method [9], which can be years. This value is 40% for the RecNotUsed group and 34% applied when some rankings are only partial, i.e., when not all for the NoRec group. items are ranked. The results are shown in Table 3. “Mere-Presence” Effect of Recommendations The results indicate that, overall, the participants consider the When we compare the average musical features of the tracks homogeneity of musical features, e.g., tempo, energy or loud- recommended to the Rec group and those that were manually ness along with the artist diversity of the resulting playlist as selected by the control group (NoRec), we can observe sev- the most relevant quality criteria for playlists. On the other end eral significant differences regarding, e.g., danceablity, energy, of the spectrum, the order of the tracks in a playlist and their popularity, freshness (release year), speechiness, or tempo freshness were the least relevant aspects for the participants. i.e., the recommender often picks quite different tracks than When looking at the different study groups, some smaller and users would do, see [11]. On the other hand, when com- not statistically significant differences in the rankings can be paring the recommendations to what the participants in the observed. Participants who used the recommendations con- RecUsed group selected, the differences are no longer statisti- sidered track transitions less relevant than participants of the cally significant, except for the popularity aspect. This is not other groups. One potential explanation could be that the surprising, as these participants often accepted the recommen- recommendations by the system (and, likewise, the created dations. The somewhat surprising aspect, however, is that the playlists) were perceived by users to be comparably coherent, 8 The popularity of the tracks were also determined using the Spotify’s e.g., in terms of the tempo, and the participants of the Rec API, with values between 0 and 100 (lowest to highest popularity). group therefore paid less attention to the transitions. Another explanation is that the RecNotUsed participants are in general tance of factors like the appropriate popularity level of the more demanding, and did not use some of the recommenda- tracks – according to our analysis – can depend on the topic tions because they did not allow to make satisfying transitions. and intended purpose of the playlist. Other differences with respect to the quality criteria rankings were observed for the “lyrics” aspect, which was considered Qualitative Analysis more important for road-trip and hip-hop playlists. Finally, We finally asked the participants to specify what they found to popularity was considered a very important criterion almost be the most difficult aspect when creating a playlist. To remem- exclusively for dance playlists for the above-discussed reasons. ber the right tracks was most often mentioned as a difficult task by the participants (e.g., “Keeping up with the options. I General Quality Perception of the Recommendations often can’t remember all the songs and performers I know”). About 75% of the participants in the Rec group stated that To find good music for everyone was another difficult task indi- they have looked at the recommendations when constructing cated by many (e.g., “To meet the taste of all, as playlists are the playlists. In the post-task questionnaire, these users were often created for occasions in which several people listen to asked about their quality perception regarding the recommen- them”). Some participants also emphasized on the smoothness dations (as provided by Spotify’s service). Different quality of the track transitions (e.g., “The most difficult thing is to dimensions were assessed and users could express their agree- think of tracks that are connected on several dimensions at the ment with related statements using 7-point Likert scale items. same time such as their energy, the transitions, their level of In our analysis, we considered answers that were greater than sophistication, etc.”). 4 as positive responses. Recommendation tools have the potential to help users in many Specifically, we asked users to what extent the recommenda- of these dimensions, e.g., remind them of tracks or artists they tions (1) matched the given topic; (2) helped them discover know and liked in the past, to find tracks that are popular in novel tracks or artists; (3) matched their general interests; (4) a certain community, or to find tracks that are coherent with were already known to them; (5) were diverse in terms of the current playlist with respect to musical features. In fact, genre and artist; (6) were generally popular or mainstream; several participants that had no recommendation support have and (7) were from trending music. written about the potential advantages of having such a system: “Sometimes I do not know what to search for and I expect the The results show that 62% of the respondents perceived the system suggest something. I prefer to choose just some songs recommendation as topic-related (e.g., they match the playlist and then listen/receive recommendations related to my initial topic). More than half of the respondents also found the rec- choices”. ommendations to match their interests and diverse in terms of genre and artist. With respect to novelty and familiarity, CONCLUSIONS the results were mixed. In both cases many participants pro- With our study we aim to shed more light on the perception vided feedback with values above 4 (e.g., 45% in case of the and adoption of recommender systems that are designed to novelty); a substantial amount of the participants however support users in the playlist construction process. The results also found the recommendations of very limited novelty and of the study not only showed a high general adoption rate – familiarity. almost half of the participants picked at least one track from We further compared the quality perception of the recommen- the provided recommendations, while the proportion in the dations for the two groups of RecUsed and RecNotUsed. The domain of recommender systems is often lower than 10% – but only statistically significant difference we observed was in also that the presentation of recommendations can indirectly terms of novelty. In other words, those participants who used influence the choices of the users. Our study, therefore, pro- the recommendations in their playlists believed that the rec- vides additional evidence that recommenders can be a valuable ommendations were significantly more novel (median=5) than tool to steer consumer behavior. the other participants who looked at the recommendations but Generally, participants who received recommendations also did not accept them (median=3). One plausible explanation explored more alternatives during the process (without need- of this result could be that the participants who did not use ing much more time), which can be interpreted as a higher the recommendations had a higher music expertise and were engagement in the task. As a result, a recommender system more demanding. However, this explanation is not in line has the potential to increase the involvement of the user with with what the participants claimed in their answers, as the the service, which from a business perspective in the best case RecUsed group had a generally higher music expertise score leads to higher customer retention rates. than the RecNotUsed group (although the difference was not statistically significant). Another explanation is that partici- Finally, the study also showed that including a recommenda- pants were generally willing to discover novel music and the tion component can lead to a higher perceived difficulty of the recommendations were mainly adopted by those participants task for users, which emphasizes the importance of focusing who found the recommendations to be so. on intuitive UI designs. Overall, the results indicate that Spotify’s algorithms were quite successful in determining suitable tracks for recommen- ACKNOWLEDGEMENTS dation. At the same time, the mixed results for some other We would like to thank Adrian Skirzynski who contributed to factors are not necessarily negative either, because the impor- this work with the results of his master’s thesis. REFERENCES 14. Chris Johnson and Edward Newett. 2014. From Idea to 1. Ivana Andjelkovic, Denis Parra, and John O’Donovan. Execution: Spotify’s Discover Weekly. Online. (2014). 2016. Moodplay: Interactive Mood-based Music https://de.slideshare.net/MrChrisJohnson/ Discovery and Recommendation. In UMAP ’16. from-idea-to-execution-spotifys-discover-weekly/ 275–279. 12-Insight_users_spending_more_time 2. Luke Barrington, Reid Oda, and Gert R. G. Lanckriet. 15. Mohsen Kamalzadeh, Dominikus Baur, and Torsten 2009. Smarter than Genius? Human Evaluation of Music Möller. 2012. A Survey on Music Listening and Recommender Systems. In ISMIR ’09. 357–362. Management Behaviours. In ISMIR ’12. 373–378. 3. Dominikus Baur, Sebastian Boring, and Andreas Butz. 16. Mohsen Kamalzadeh, Christoph Kralj, Torsten Möller, 2010. Rush: Repeated Recommendations on Mobile and Michael Sedlmair. 2016. TagFlip: Active Mobile Devices. In IUI ’10. 91–100. Music Discovery with Social Tags. In IUI ’16. 19–30. 4. Dirk Bollen, Bart P. Knijnenburg, Martijn C. Willemsen, 17. Iman Kamehkhosh and Dietmar Jannach. 2017. User and Mark Graus. 2010. Understanding Choice Overload Perception of Next-Track Music Recommendations. In in Recommender Systems. In RecSys ’10. 63–70. UMAP ’17. 113–121. 5. Geoffray Bonnin and Dietmar Jannach. 2014. Automated 18. Sören Köcher, Dietmar Jannach, Michael Jugovac, and Generation of Music Playlists: Survey and Experiments. Hartmut H. Holzmüller. 2016. Investigating ACM Computing Surveys 47, 2 (2014), 26:1–26:35. Mere-Presence Effects of Recommendations on the 6. Sally Jo. Cunningham, David. Bainbridge, and Annette. Consumer Choice Process. In IntRS Workshop at RecSys Falconer. 2006. ‘More of an Art than a Science’: ’16. 2–5. Supporting the Creation of Playlists and Mixes. In ISMIR 19. Alexandra Lamont and Rebecca Webb. 2011. Short- and ’06. 240–245. Long-term Musical Preferences: What Makes a Favourite 7. Sally Jo Cunningham, David Bainbridge, and Dana Piece of Music? Psychology of Music 38, 2 (2011), McKay. 2007. Finding New Music: A Diary Study of 222–241. Everyday Encounters with Novel Songs. In ISMIR ’07. 20. Jin Ha Lee and J Stephen Downie. 2004. Survey Of 83–88. Music Information Needs, Uses, And Seeking 8. J. Stephen Downie. 2003. Music Information Retrieval. Behaviours: Preliminary Findings. In ISMIR ’04. Annual Review of Information Science and Technology 441–446. 37, 1 (2003), 295–340. 21. Ali Sarrafi and Evan Shrubsole. 2015. A/B testing at 9. Peter Emerson. 2013. The Original Borda Count and Spotify. Online. (2015). https: Partial Voting. Social Choice and Welfare 40, 2 (2013), //de.slideshare.net/alisarrafi3/ab-testing-at-spotify 353–358. 22. Harald Steck, Roelof van Zwol, and Chris Johnson. 2015. 10. Dietmar Jannach and Gedas Adomavicius. 2016. Interactive Recommender Systems with Netflix and Recommendations with a Purpose. In RecSys ’16. 7–10. Spotify. Online. (2015). http://de.slideshare.net/MrChrisJohnson/ 11. Dietmar Jannach, Iman Kamehkhosh, and Geoffray interactive-recommender-systems-with-netflix-and-spotify Bonnin. 2016. Biases in Automated Music Playlist Generation: A Comparison of Next-Track 23. S. Stumpf and S. Muscroft. 2011. When Users Generate Recommending Techniques. In UMAP ’16. 281–285. Music Playlists: When Words Leave Off, Music Begins?. In ICME ’11. 1–6. 12. Dietmar Jannach, Malte Ludewig, and Lukas Lerche. 2017. Session-based Item Recommendation in 24. Kirsten Swearingen and Rashmi Sinha. 2002. Interaction E-Commerce: On Short-Term Intents, Reminders, Trends, Design for Recommender Systems. In Designing and Discounts. User-Modeling and User-Adapted Interactive Systems, ACM. Interaction (2017). 25. Nava Tintarev, Christoph Lofi, and Cynthia C.S. Liem. 13. Chris Johnson. 2014. Algorithmic Music Discovery at 2017. Sequences of Diverse Song Recommendations: An Spotify. Online. (2014). Exploratory Study in a Commercial System. In UMAP https://de.slideshare.net/MrChrisJohnson/ ’17. 391–392. algorithmic-music-recommendations-at-spotify