Spotivibes: Tagging Playlist Vibes With Colors Hiba Abderrazik∗ Giovan Angela∗ Hans Brouwer∗ h.abderrazik@student.tudelft.nl g.j.a.angela@student.tudelft.nl j.c.brouwer@student.tudelft.nl Delft University of Technology Delft University of Technology Delft University of Technology Delft, The Netherlands Delft, The Netherlands Delft, The Netherlands Henky Janse∗ Sterre Lutz∗ Gwennan Smitskamp∗ h.a.b.janse@student.tudelft.nl s.lutz@student.tudelft.nl g.m.smitskamp@student.tudelft.nl Delft University of Technology Delft University of Technology Delft University of Technology Delft, The Netherlands Delft, The Netherlands Delft, The Netherlands Sandy Manolios Cynthia C. S. Liem s.manolios@tudelft.nl c.c.s.liem@tudelft.nl Delft University of Technology Delft University of Technology Delft, The Netherlands Delft, The Netherlands ABSTRACT looking for. In the Music Information Retrieval research domain, Music is often both personally and affectively meaningful to hu- considerable work has been performed to automatically describe man listeners. However, little work has been done to create music music objects beyond catalogued metadata. However, much of the recommender systems that take this into account. In this demo research in this area still has focused on fairly “objective” descriptors proposal, we present Spotivibes: a first prototype for a new color- of aspects of the music object (e.g. chords, tempo), but did not based tagging and music recommender system. This innovative explicitly consider corresponding end user experiences [3, 6, 11]. tagging system is designed to take the users’ personal experience Frequently, music is seen as a moderator of mood and emotion. A of music into account and allows them to tag their favorite songs considerable body of work on automatic music emotion recognition in a non-intrusive way, which can be generalized to their entire from audio content exists [2, 9, 17]. However, generally, it is hard library. The goal of Spotivibes is twofold: to help users better tag to get good labeled data (for which humans need to give the initial their playlists to get better playlists and to provide research data on input) at scale. In order to make labeling engaging, several pro- implicit grouping mechanisms in personal music collections. The posals have been made for crowdsourced tagging games [1, 8, 10]. system was tested with a user study on 34 Spotify users. While these are more engaging to users than traditional tagging interfaces, they explicitly ask for users to concentrate on the an- KEYWORDS notation within imposed linguistic frames (e.g. describing songs with a tag, or mapping songs in valence-arousal space), which may Recommender systems; Personal experience of music; Emotion- take away the “natural” affective experience of music consumption. based recommendations; Color-based tags Furthermore, these tagging interfaces generally reward consensus ACM Reference Format: across human annotators. While this allows for labels that are more Hiba Abderrazik, Giovan Angela, Hans Brouwer, Henky Janse, Sterre Lutz, stable and generalizable across a music population, this takes away Gwennan Smitskamp, Sandy Manolios, and Cynthia C. S. Liem. 2019. Spo- any notion of very personal and subjective perception. tivibes: Tagging Playlist Vibes With Colors. In Proceedings of Joint Workshop on Interfaces and Human Decision Making for Recommender Systems (IntRS Also with regard to automatic music recommendation, in which ’19) (IntRS ’19). CEUR-WS.org, 5 pages. user consumption patterns are taken into account to foster auto- matic music discovery, it was pointed out that true user feedback 1 INTRODUCTION is not yet optimally integrated [14]. While many algorithms are evaluated with user studies or trained on hand-labeled genre tags, Many people love to listen to music and share their music tastes not many approaches holistically incorporate user responses. with others. With music consumption largely having moved to the While algorithms have focused on describing musical objects, digital realm, music organization and discovery have moved to the when humans listen to music in everyday life, they actually may digital space accordingly, opening up great opportunities for digital not have their full focus on the musical object for active listening. music services to support these experiences. Instead, multiple studies have shown that music is often consumed However, many popular present-day music services are very passively, e.g. in the background while performing another activ- much framed as catalogues, in which users have to perform directed, ity [5, 7, 12, 16]. This again gives useful dimensions of (personal) linguistic searches on existing song metadata to find what they are music categorization, that presently still are understudied. ∗ All marked authors contributed equally to this research. To study these open challenges in the literature and music con- sumption in general, we propose the Spotivibes system. This system Copyright ○ c 2019 for this paper by its authors. Use permitted under Creative Com- mons License Attribution 4.0 International (CC BY 4.0). is designed to capture user reactions and associations to music in IntRS ’19, September 2019, Copenhagen, DK both a personal and an abstract way, in an integrated way with the 2019. IntRS ’19, September 2019, Copenhagen, DK H. Abderrazik, G. Angela, H. Brouwer, H. Janse, S. Lutz, G. Smitskamp, S. Manolios & C.C.S. Liem user’s existing listening preferences in the Spotify music service. Taking a user’s existing playlists as the basis, users are asked to tag the “vibe” of songs (with “vibe” intentionally chosen to be more abstract than “mood” or “purpose’) with one or more colors. This both restricts the tag vocabulary in the backend, while at the same time, it allows for more abstract associations at the user side than would be the case when imposing a certain vocabulary. In the backend, the system will learn associations from colors to content features in the user’s consumption history. Consequently, the system can generate tailored playlists for the users based on colors. In this way, Spotivibes serves a twofold goal: on one hand, it can serve as an annotation system that is both more abstracted than existing tagging tools, while at the same time being more integrated with actual everyday listening behavior of a user. On the other hand, it also directly can serve users in gaining insight into their music preferences and associations, and setting more personal recommendations. This makes Spotivibes an interesting research tool to study the impact and usefulness of abstract color tagging of personal perception of music in recommender systems. In the current paper, we present a first functional prototype of Spotivibes, Figure 1: Mosaic playlist generation that is intended to provide a framework for conducting deeper can also label their own saved Spotify playlists in one go, labeling research on tagging mechanisms in the future. each song with the chosen color. 2.3 Playlist generation 2 OVERVIEW OF THE APPLICATION Spotivibes allows users to create their vibes-based playlists in three Spotivibes is a web application and as such only requires a device different ways: a gradient playlist, a single color playlist and a with Internet access and a browser, as well as a Spotify account. mosaic playlist. Upon their first login, users have to tag a certain number of songs One color. The single color playlist is pretty self-explanatory. The (10 or 30) from their Spotify saved tracks, using as many colors user will select a single color and will receive a playlist containing as they want. The available colors are depicted in the upper part songs with a high label value for the selected color. of Figure 1. Then, they can get personalized playlists based on a Gradient. The gradient playlist generation works by selecting two single color, a progression between two colors or a mosaic of colors. different colors. A new playlist will be generated with a gradual Those playlists can then be exported to their Spotify account. change in song vibe from start to finish. For example, the user Spotivibes relies on user feedback to further improve its recom- selects the colors yellow and blue, for the first and second colors mendations : users can always modify existing tags or tag more respectively. The first songs in the playlist will contain a higher songs. Users have also access to various statistics regarding their “yellow" label assigned to it and gradually change to songs with tags to give them more information about their tagging behavior that contain more “blue". and motivate them to tag more. Mosaic. The mosaic pattern works by selecting multiple colors, A detailed overview of the application can be found at the user can also select the same color multiple times. As shown in https://youtu.be/x2KZ2z0s4Uk . Figure 1, if a user selects two blue and one yellow, a playlist will be generated containing songs with more blue than yellow, but should 2.1 Initialization also contain yellow. Editing and Exporting Playlists. Once a playlist has been gener- The initial set a user is asked to label is based on k-means clustering ated, the user can give feedback on each song by updating its color of Spotify’s audio features. The user is asked to label either 10 or labels. Songs can also be removed, which gives negative feedback 30 of their own tracks, so k is set to 10 or 30. This theoretically for future playlist generation. After creating and editing a playlist, a gives tracks which represent the major clusters of songs in Spotify’s user can choose to export the playlist to their Spotify library. They audio feature space and so should cover the majority of associations can give it any custom name and later listen to it on their own a user can have to songs in their library. Also a “reset labels" button Spotify account. on the home page allows the user to clear all the label data the user has provided. This way, the initialization process can be repeated 2.4 Statistics for a fresh start. As a part of their Spotivibes experience, users can get insight into their labeling behavior on a “Statistics” page. This page provides 2.2 Bulk labeling some basic information about the users, including the number of Once the initialization process has been completed, if the user wants songs the user has labeled, and tracks in the library. More detailed to label more songs, he/she can select multiple songs and labels and statistics listed in the subsections below can be viewed by selecting tag that selected group of songs with a color in one go. The user a color from the color picker pop-up window on the left side. For Spotivibes: Tagging Playlist Vibes With Colors IntRS ’19, September 2019, Copenhagen, DK Our color label predictor consists of an ensemble of classifiers and regressors from the scikit-(multi)learn [13] and XGBoost [4] python packages. The label predictor must find underlying audio features that are strongly correlated with the labels that users give to songs. This, of course, means that the predictor is strongly influenced by how a user labels tracks. If a user chooses to use a color as a label for something completely uncorrelated with the audio features, no meaningful connections will be found, but this will also show accordingly in the “Statistics” overviews. [15] found that, in the context of multi-label classification of music by emotion, the random k-label sets (RAKEL) ensembling method worked best on datasets of a similar size to most user Spotify libraries (less than 15,000 songs). The RAKEL algorithm works by training multiple naive predictors on subsets of the total label set and then deciding on the final output by voting among the predictors. Here we used scikit-multilearn’s RAkELo module. Based Figure 2: The different statistic representations of the user’s on a set of training features and training labels, the algorithm labelling behavior and playlists. outputs a list of features that is most descriptive for a color. To allow for some tolerance in the predictions, the RAKEL’s binary classification was combined with a regression of the label values, for displaying the different statistics plots shown by Figure 2, the data which we made use of scikit-multilearn’s MultiOutputRegressor is calculated by the classifiers or retrieved from the Spotify API. module. This means a song can e.g. be 30% green and 70% blue. Landscape. The “Landscape” statistic is a detailed 3d plot pro- Depending on need, the fractional scores can be thresholded in viding information about the songs labeled with the selected color. different post-processing steps. For example, labels with a score The x, y, and z-axis of the 3d plot indicate tempo, valence, and loud- higher than 0.5 score are currently shown to users in the front-end ness respectively. Each song labeled with the selected color will (this gives a nice balance between showing many labels, while not be displayed as a dot on this 3d plot, its size corresponding to the just returning all the colors), and calculating “influencer" artists for certainty with which we have classified it to be that color, as shown the statistics page only incorporates the most certain predictions. in the upper left part of Figure 2. For example, if a user associates yellow songs with high-tempo numbers, a cluster of larger dots will appear on the higher end of the tempo axis. The plot is interactive: 3 EVALUATION: USER STUDY it can be dragged to be viewed from different angles and when the A user study was conducted to assess the usability of the system and user hovers over a dot, they can see the title and artist of the song quality of recommendations (measured by user satisfaction). The it represents. study was conducted with 34 participants recruited via personal Influencers. The “Influencers” section, displayed in the upper connections and among computer science students of our university. right part of Figure 2, is a bar plot showing the 3 most influential They all had to freely explore the application on their own, and artists within a color. The metric used to measure “influence” is fill in a questionnaire afterwards. All users had to go through the simply the sum of the likelihood of all the songs of the artist within longer setup which made them tag 30 of their Spotify favorite songs. that color. In this way, influence indicates the likelihood of an artist The experiment lasted around 20 minutes per participant. being associated with the currently selected color, depending on The questionnaire was composed of 17 questions. The answers how many of this artist’s songs are classified as that color. to the main questions are shown in Figure 3. They were designed Decades. The “Decades” tile displayed in the lover left part of Fig- to measure the tediousness of the initial set-up process, user sat- ure 2 shows a histogram of the number of tracks in decades that isfaction in the recommendations, the perceived usefulness of the belong to the selected color, weighted by their likelihood to be color-based tagging system and the usability of the interface. Other correctly classified. notable questions were about the user satisfaction of Spotivibes. Genres. The “Genre” tile displayed in the lover right part of Figure The user study concludes that overall, the participants had a 2 shows a radial histogram of genres classified within the selected good/satisfactory experience with the application (3.74 average on color. a 5 points scale), but were less satisfied with the services provided by Spotivibes (3.41 average on a 5 points scale), as shown in Figure 2.5 Associating songs with colors 3a and Figure 3b . The algorithm that learns correspondences between songs and color Initialization Process. One thing that emerged during this study tags is the heart of Spotivibes’ functionality, yet is almost invisible to is that the song labeling process was on the edge of tediousness users. Since color labels are so personal, we do not make use of any as shown in Figure 3c. The results shows an even split of 1/3rd of inter-user information. This means that classifiers need to be trained users agreeing the process was tedious, 1/3rd being neutral, and for each individual user, yielding user-dependent correspondences 1/3rd disagreeing that the process was tedious. We might consider between audio features and (categorically modeled) color tags. going back to include the short labeling process in further user IntRS ’19, September 2019, Copenhagen, DK H. Abderrazik, G. Angela, H. Brouwer, H. Janse, S. Lutz, G. Smitskamp, S. Manolios & C.C.S. Liem Figure 3: Bar charts with answers to six of the main questions about the users’ satisfaction in Spotivibes, its functionalities and perceived usefulness. experiments, but this can result in a decrease in playlist satisfaction. could be used) also had lower satisfaction with the quality of playlist Perhaps a quick initialization process with better bulk labeling generation. This suggests that our classifier does actually pick up features could be included during initialization to improve user- on patterns in the user’s color labels and functions better when friendliness as well as data for the classifier. users label meaningfully. Playlist Generation. The user study was sadly inconclusive on the value of Spotivibes’ color-based playlist generation, as shown 4 CONCLUSION by Figure 3e. Users were asked to rate their satisfaction with Spotify Spotivibes is an innovative color-based tagging system that allows and Spotivibes on a 1 to 10 scale in terms of keeping to a given users to tags songs in a personal, intuitive and abstract way in order vibe or emotion. Disregarding low scores for Spotivibes related to to get personalized playlists that supports their unique experience a couple of users for which the initialization process failed due to and needs of music. The current version of Spotivibes still is an bugs in the data management model, the minimum, lower quartile, early, but functional prototype, on which initial user studies have median, upper quartile, and maximum were identical and the mean been performed. In future work, deeper research into the merit of score for Spotivibes was 0.2 lower (not statistically significant). This the color-based tagging is planned to be performed, also including might be affected by our choice of splitting the rating of Spotify and larger-scale user studies. Spotivibes into different parts of the user study. In-person feedback from a couple of users indicated that they did not realize they were ACKNOWLEDGMENTS rating the two services against each other. Perhaps placing those two questions next to each other in the survey would have given a We would like to thanks Bernd Kreynen for his valuable feedback better view of how users actually felt about recommendations. throughout the project. Colour Associations. Users were, however, generally satisfied with the use of colors as labels for emotions, as shown by Figure 3d. Half REFERENCES agreed that colors made it easy to represent emotions, a quarter [1] Anna Aljanaki, Frans Wiering, and Remco C Veltkamp. 2016. Studying emotion induced by music through a crowdsourcing game. Information Processing & were neutral, and a quarter disagreed. When asked whether using Management 52, 1 (2016), 115–128. multiple colors helped express complex or multi-faceted feelings [2] Anna Aljanaki, Yi-Hsuan Yang, and Mohammad Soleymani. 2017. Developing a 65% agreed and only 10% disagreed. This does point towards the benchmark for emotional analysis of music. PloS one 12, 3 (2017), e0173392. [3] Michael A. Casey, Remco C. Veltkamp, Masataka Goto, Marc Leman, Christophe usefulness of colors as abstract labels for emotions in music. An Rhodes, and Malcolm Slaney. 2008. Content-Based Music Information Retrieval: interesting point to note was that users that gave negative feedback Current Directions and Future Challenges. Proc. IEEE 96, 4 (April 2008), 668–696. https://doi.org/10.1109/JPROC.2008.916370 on the intuitiveness of the color labeling process (regarding their [4] Tianqi Chen and Carlos Guestrin. 2016. Xgboost: A scalable tree boosting system. difficulty relating colors to songs or not knowing multiple labels In Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining. ACM, 785–794. Spotivibes: Tagging Playlist Vibes With Colors IntRS ’19, September 2019, Copenhagen, DK [5] Andrew Demetriou, Martha Larson, and Cynthia C. S. Liem. 2016. Go with the gap. In Dagstuhl Follow-Ups, Vol. 3. Schloss Dagstuhl-Leibniz-Zentrum fuer In- flow: When listeners use music as technology. (2016). formatik. [6] J. Stephen Downie, Donald Byrd, and Tim Crawford. 2009. Ten years of ISMIR: Re- [12] Adrian C North, David J Hargreaves, and Jon J Hargreaves. 2004. Uses of music in flections on challenges and opportunities. In Proceedings of the 10th International everyday life. Music Perception: An Interdisciplinary Journal 22, 1 (2004), 41–77. Society for Music Information Retrieval Conference (ISMIR 2009). 13–18. [13] Fabian Pedregosa, Gaël Varoquaux, Alexandre Gramfort, Vincent Michel, [7] Mohsen Kamalzadeh, Dominikus Baur, and Torsten Möller. 2012. A survey on Bertrand Thirion, Olivier Grisel, Mathieu Blondel, Peter Prettenhofer, Ron Weiss, music listening and management behaviours. (2012). Vincent Dubourg, et al. 2011. Scikit-learn: Machine learning in Python. Journal [8] Youngmoo E Kim, Erik M Schmidt, and Lloyd Emelle. 2008. Moodswings: A of machine learning research 12, Oct (2011), 2825–2830. collaborative game for music mood label collection.. In Ismir, Vol. 2008. 231–236. [14] Markus Schedl, Arthur Flexer, and Julián Urbano. 2013. The Neglected User in [9] Youngmoo E Kim, Erik M Schmidt, Raymond Migneco, Brandon G Morton, Patrick Music Information Retrieval Research. Journal of Intelligent Information Systems Richardson, Jeffrey Scott, Jacquelin A Speck, and Douglas Turnbull. 2010. Music 41, 3 (2013), 523–539. emotion recognition: A state of the art review. In Proc. ISMIR, Vol. 86. 937–952. [15] Konstantinos Trohidis, Grigorios Tsoumakas, George Kalliris, and Ioannis P [10] Edith Law and Luis von Ahn. 2009. Input-agreement: A New Mechanism for Vlahavas. 2008. Multi-label classification of music into emotions.. In ISMIR, Vol. 8. Collecting Data Using Human Computation Games. In Proceedings of the SIGCHI 325–330. Conference on Human Factors in Computing Systems (CHI ’09). ACM, New York, [16] Karthik Yadati, Cynthia C. S. Liem, Martha Larson, and Alan Hanjalic. 2017. On NY, USA, 1197–1206. https://doi.org/10.1145/1518701.1518881 the Automatic Identification of Music for Common Activities. In Proceedings of [11] Cynthia C. S. Liem, Andreas Rauber, Thomas Lidy, Richard Lewis, Christopher the 2017 ACM on International Conference on Multimedia Retrieval. ACM, 192–200. Raphael, Joshua D Reiss, Tim Crawford, and Alan Hanjalic. 2012. Music infor- [17] Yi-Hsuan Yang and Homer H. Chen. 2012. Machine Recognition of Music Emotion: mation technology and professional stakeholder audiences: Mind the adoption A Review. ACM Trans. Intell. Syst. Technol. 3, 3, Article 40 (May 2012), 30 pages. https://doi.org/10.1145/2168752.2168754