Spotivibes: Tagging Playlist Vibes With Colors
                Hiba Abderrazik∗                                          Giovan Angela∗                                 Hans Brouwer∗
        h.abderrazik@student.tudelft.nl                           g.j.a.angela@student.tudelft.nl                 j.c.brouwer@student.tudelft.nl
        Delft University of Technology                            Delft University of Technology                  Delft University of Technology
            Delft, The Netherlands                                     Delft, The Netherlands                          Delft, The Netherlands

                   Henky Janse∗                                              Sterre Lutz∗                           Gwennan Smitskamp∗
        h.a.b.janse@student.tudelft.nl                              s.lutz@student.tudelft.nl                  g.m.smitskamp@student.tudelft.nl
        Delft University of Technology                           Delft University of Technology                  Delft University of Technology
            Delft, The Netherlands                                   Delft, The Netherlands                          Delft, The Netherlands

                                              Sandy Manolios                                  Cynthia C. S. Liem
                                          s.manolios@tudelft.nl                                c.c.s.liem@tudelft.nl
                                      Delft University of Technology                     Delft University of Technology
                                          Delft, The Netherlands                             Delft, The Netherlands
ABSTRACT                                                                               looking for. In the Music Information Retrieval research domain,
Music is often both personally and affectively meaningful to hu-                       considerable work has been performed to automatically describe
man listeners. However, little work has been done to create music                      music objects beyond catalogued metadata. However, much of the
recommender systems that take this into account. In this demo                          research in this area still has focused on fairly “objective” descriptors
proposal, we present Spotivibes: a first prototype for a new color-                    of aspects of the music object (e.g. chords, tempo), but did not
based tagging and music recommender system. This innovative                            explicitly consider corresponding end user experiences [3, 6, 11].
tagging system is designed to take the users’ personal experience                         Frequently, music is seen as a moderator of mood and emotion. A
of music into account and allows them to tag their favorite songs                      considerable body of work on automatic music emotion recognition
in a non-intrusive way, which can be generalized to their entire                       from audio content exists [2, 9, 17]. However, generally, it is hard
library. The goal of Spotivibes is twofold: to help users better tag                   to get good labeled data (for which humans need to give the initial
their playlists to get better playlists and to provide research data on                input) at scale. In order to make labeling engaging, several pro-
implicit grouping mechanisms in personal music collections. The                        posals have been made for crowdsourced tagging games [1, 8, 10].
system was tested with a user study on 34 Spotify users.                               While these are more engaging to users than traditional tagging
                                                                                       interfaces, they explicitly ask for users to concentrate on the an-
KEYWORDS                                                                               notation within imposed linguistic frames (e.g. describing songs
                                                                                       with a tag, or mapping songs in valence-arousal space), which may
Recommender systems; Personal experience of music; Emotion-
                                                                                       take away the “natural” affective experience of music consumption.
based recommendations; Color-based tags
                                                                                       Furthermore, these tagging interfaces generally reward consensus
ACM Reference Format:                                                                  across human annotators. While this allows for labels that are more
Hiba Abderrazik, Giovan Angela, Hans Brouwer, Henky Janse, Sterre Lutz,                stable and generalizable across a music population, this takes away
Gwennan Smitskamp, Sandy Manolios, and Cynthia C. S. Liem. 2019. Spo-
                                                                                       any notion of very personal and subjective perception.
tivibes: Tagging Playlist Vibes With Colors. In Proceedings of Joint Workshop
on Interfaces and Human Decision Making for Recommender Systems (IntRS
                                                                                          Also with regard to automatic music recommendation, in which
’19) (IntRS ’19). CEUR-WS.org, 5 pages.                                                user consumption patterns are taken into account to foster auto-
                                                                                       matic music discovery, it was pointed out that true user feedback
1    INTRODUCTION                                                                      is not yet optimally integrated [14]. While many algorithms are
                                                                                       evaluated with user studies or trained on hand-labeled genre tags,
Many people love to listen to music and share their music tastes
                                                                                       not many approaches holistically incorporate user responses.
with others. With music consumption largely having moved to the
                                                                                          While algorithms have focused on describing musical objects,
digital realm, music organization and discovery have moved to the
                                                                                       when humans listen to music in everyday life, they actually may
digital space accordingly, opening up great opportunities for digital
                                                                                       not have their full focus on the musical object for active listening.
music services to support these experiences.
                                                                                       Instead, multiple studies have shown that music is often consumed
   However, many popular present-day music services are very
                                                                                       passively, e.g. in the background while performing another activ-
much framed as catalogues, in which users have to perform directed,
                                                                                       ity [5, 7, 12, 16]. This again gives useful dimensions of (personal)
linguistic searches on existing song metadata to find what they are
                                                                                       music categorization, that presently still are understudied.
∗ All marked authors contributed equally to this research.                                To study these open challenges in the literature and music con-
                                                                                       sumption in general, we propose the Spotivibes system. This system
Copyright ○  c 2019 for this paper by its authors. Use permitted under Creative Com-
mons License Attribution 4.0 International (CC BY 4.0).
                                                                                       is designed to capture user reactions and associations to music in
IntRS ’19, September 2019, Copenhagen, DK                                              both a personal and an abstract way, in an integrated way with the
2019.
IntRS ’19, September 2019, Copenhagen, DK                         H. Abderrazik, G. Angela, H. Brouwer, H. Janse, S. Lutz, G. Smitskamp, S. Manolios & C.C.S. Liem


user’s existing listening preferences in the Spotify music service.
Taking a user’s existing playlists as the basis, users are asked to tag
the “vibe” of songs (with “vibe” intentionally chosen to be more
abstract than “mood” or “purpose’) with one or more colors. This
both restricts the tag vocabulary in the backend, while at the same
time, it allows for more abstract associations at the user side than
would be the case when imposing a certain vocabulary.
    In the backend, the system will learn associations from colors to
content features in the user’s consumption history. Consequently,
the system can generate tailored playlists for the users based on
colors. In this way, Spotivibes serves a twofold goal: on one hand,
it can serve as an annotation system that is both more abstracted
than existing tagging tools, while at the same time being more
integrated with actual everyday listening behavior of a user. On the
other hand, it also directly can serve users in gaining insight into
their music preferences and associations, and setting more personal
recommendations. This makes Spotivibes an interesting research
tool to study the impact and usefulness of abstract color tagging
of personal perception of music in recommender systems. In the
current paper, we present a first functional prototype of Spotivibes,                       Figure 1: Mosaic playlist generation
that is intended to provide a framework for conducting deeper                  can also label their own saved Spotify playlists in one go, labeling
research on tagging mechanisms in the future.                                  each song with the chosen color.

                                                                               2.3     Playlist generation
2     OVERVIEW OF THE APPLICATION                                             Spotivibes allows users to create their vibes-based playlists in three
Spotivibes is a web application and as such only requires a device            different ways: a gradient playlist, a single color playlist and a
with Internet access and a browser, as well as a Spotify account.             mosaic playlist.
Upon their first login, users have to tag a certain number of songs              One color. The single color playlist is pretty self-explanatory. The
(10 or 30) from their Spotify saved tracks, using as many colors              user will select a single color and will receive a playlist containing
as they want. The available colors are depicted in the upper part             songs with a high label value for the selected color.
of Figure 1. Then, they can get personalized playlists based on a                Gradient. The gradient playlist generation works by selecting two
single color, a progression between two colors or a mosaic of colors.         different colors. A new playlist will be generated with a gradual
Those playlists can then be exported to their Spotify account.                change in song vibe from start to finish. For example, the user
   Spotivibes relies on user feedback to further improve its recom-           selects the colors yellow and blue, for the first and second colors
mendations : users can always modify existing tags or tag more                respectively. The first songs in the playlist will contain a higher
songs. Users have also access to various statistics regarding their           “yellow" label assigned to it and gradually change to songs with
tags to give them more information about their tagging behavior               that contain more “blue".
and motivate them to tag more.                                                   Mosaic. The mosaic pattern works by selecting multiple colors,
   A detailed overview of the application can be found at                     the user can also select the same color multiple times. As shown in
   https://youtu.be/x2KZ2z0s4Uk .                                             Figure 1, if a user selects two blue and one yellow, a playlist will be
                                                                              generated containing songs with more blue than yellow, but should
2.1     Initialization                                                        also contain yellow.
                                                                                 Editing and Exporting Playlists. Once a playlist has been gener-
The initial set a user is asked to label is based on k-means clustering
                                                                              ated, the user can give feedback on each song by updating its color
of Spotify’s audio features. The user is asked to label either 10 or
                                                                              labels. Songs can also be removed, which gives negative feedback
30 of their own tracks, so k is set to 10 or 30. This theoretically
                                                                              for future playlist generation. After creating and editing a playlist, a
gives tracks which represent the major clusters of songs in Spotify’s
                                                                              user can choose to export the playlist to their Spotify library. They
audio feature space and so should cover the majority of associations
                                                                              can give it any custom name and later listen to it on their own
a user can have to songs in their library. Also a “reset labels" button
                                                                              Spotify account.
on the home page allows the user to clear all the label data the user
has provided. This way, the initialization process can be repeated             2.4     Statistics
for a fresh start.
                                                                               As a part of their Spotivibes experience, users can get insight into
                                                                               their labeling behavior on a “Statistics” page. This page provides
2.2     Bulk labeling                                                          some basic information about the users, including the number of
Once the initialization process has been completed, if the user wants          songs the user has labeled, and tracks in the library. More detailed
to label more songs, he/she can select multiple songs and labels and           statistics listed in the subsections below can be viewed by selecting
tag that selected group of songs with a color in one go. The user              a color from the color picker pop-up window on the left side. For
Spotivibes: Tagging Playlist Vibes With Colors                                                                IntRS ’19, September 2019, Copenhagen, DK


                                                                                Our color label predictor consists of an ensemble of classifiers
                                                                             and regressors from the scikit-(multi)learn [13] and XGBoost [4]
                                                                             python packages.
                                                                                The label predictor must find underlying audio features that
                                                                             are strongly correlated with the labels that users give to songs.
                                                                             This, of course, means that the predictor is strongly influenced by
                                                                             how a user labels tracks. If a user chooses to use a color as a label
                                                                             for something completely uncorrelated with the audio features,
                                                                             no meaningful connections will be found, but this will also show
                                                                             accordingly in the “Statistics” overviews.
                                                                                [15] found that, in the context of multi-label classification of
                                                                             music by emotion, the random k-label sets (RAKEL) ensembling
                                                                             method worked best on datasets of a similar size to most user
                                                                             Spotify libraries (less than 15,000 songs). The RAKEL algorithm
                                                                             works by training multiple naive predictors on subsets of the total
                                                                             label set and then deciding on the final output by voting among the
                                                                             predictors. Here we used scikit-multilearn’s RAkELo module. Based
Figure 2: The different statistic representations of the user’s              on a set of training features and training labels, the algorithm
labelling behavior and playlists.                                            outputs a list of features that is most descriptive for a color. To
                                                                             allow for some tolerance in the predictions, the RAKEL’s binary
                                                                             classification was combined with a regression of the label values, for
displaying the different statistics plots shown by Figure 2, the data
                                                                             which we made use of scikit-multilearn’s MultiOutputRegressor
is calculated by the classifiers or retrieved from the Spotify API.
                                                                             module. This means a song can e.g. be 30% green and 70% blue.
    Landscape. The “Landscape” statistic is a detailed 3d plot pro-
                                                                             Depending on need, the fractional scores can be thresholded in
viding information about the songs labeled with the selected color.
                                                                             different post-processing steps. For example, labels with a score
The x, y, and z-axis of the 3d plot indicate tempo, valence, and loud-
                                                                             higher than 0.5 score are currently shown to users in the front-end
ness respectively. Each song labeled with the selected color will
                                                                             (this gives a nice balance between showing many labels, while not
be displayed as a dot on this 3d plot, its size corresponding to the
                                                                             just returning all the colors), and calculating “influencer" artists for
certainty with which we have classified it to be that color, as shown
                                                                             the statistics page only incorporates the most certain predictions.
in the upper left part of Figure 2. For example, if a user associates
yellow songs with high-tempo numbers, a cluster of larger dots will
appear on the higher end of the tempo axis. The plot is interactive:         3    EVALUATION: USER STUDY
it can be dragged to be viewed from different angles and when the
                                                                             A user study was conducted to assess the usability of the system and
user hovers over a dot, they can see the title and artist of the song
                                                                             quality of recommendations (measured by user satisfaction). The
it represents.
                                                                             study was conducted with 34 participants recruited via personal
    Influencers. The “Influencers” section, displayed in the upper
                                                                             connections and among computer science students of our university.
right part of Figure 2, is a bar plot showing the 3 most influential
                                                                             They all had to freely explore the application on their own, and
artists within a color. The metric used to measure “influence” is
                                                                             fill in a questionnaire afterwards. All users had to go through the
simply the sum of the likelihood of all the songs of the artist within
                                                                             longer setup which made them tag 30 of their Spotify favorite songs.
that color. In this way, influence indicates the likelihood of an artist
                                                                             The experiment lasted around 20 minutes per participant.
being associated with the currently selected color, depending on
                                                                                 The questionnaire was composed of 17 questions. The answers
how many of this artist’s songs are classified as that color.
                                                                             to the main questions are shown in Figure 3. They were designed
    Decades. The “Decades” tile displayed in the lover left part of Fig-
                                                                             to measure the tediousness of the initial set-up process, user sat-
ure 2 shows a histogram of the number of tracks in decades that
                                                                             isfaction in the recommendations, the perceived usefulness of the
belong to the selected color, weighted by their likelihood to be
                                                                             color-based tagging system and the usability of the interface. Other
correctly classified.
                                                                             notable questions were about the user satisfaction of Spotivibes.
    Genres. The “Genre” tile displayed in the lover right part of Figure
                                                                                 The user study concludes that overall, the participants had a
2 shows a radial histogram of genres classified within the selected
                                                                             good/satisfactory experience with the application (3.74 average on
color.
                                                                             a 5 points scale), but were less satisfied with the services provided
                                                                             by Spotivibes (3.41 average on a 5 points scale), as shown in Figure
2.5     Associating songs with colors                                        3a and Figure 3b .
The algorithm that learns correspondences between songs and color                Initialization Process. One thing that emerged during this study
tags is the heart of Spotivibes’ functionality, yet is almost invisible to   is that the song labeling process was on the edge of tediousness
users. Since color labels are so personal, we do not make use of any         as shown in Figure 3c. The results shows an even split of 1/3rd of
inter-user information. This means that classifiers need to be trained       users agreeing the process was tedious, 1/3rd being neutral, and
for each individual user, yielding user-dependent correspondences            1/3rd disagreeing that the process was tedious. We might consider
between audio features and (categorically modeled) color tags.               going back to include the short labeling process in further user
IntRS ’19, September 2019, Copenhagen, DK                          H. Abderrazik, G. Angela, H. Brouwer, H. Janse, S. Lutz, G. Smitskamp, S. Manolios & C.C.S. Liem


Figure 3: Bar charts with answers to six of the main questions about the users’ satisfaction in Spotivibes, its functionalities
and perceived usefulness.


experiments, but this can result in a decrease in playlist satisfaction.        could be used) also had lower satisfaction with the quality of playlist
Perhaps a quick initialization process with better bulk labeling                generation. This suggests that our classifier does actually pick up
features could be included during initialization to improve user-               on patterns in the user’s color labels and functions better when
friendliness as well as data for the classifier.                                users label meaningfully.
   Playlist Generation. The user study was sadly inconclusive on
the value of Spotivibes’ color-based playlist generation, as shown              4    CONCLUSION
by Figure 3e. Users were asked to rate their satisfaction with Spotify          Spotivibes is an innovative color-based tagging system that allows
and Spotivibes on a 1 to 10 scale in terms of keeping to a given                users to tags songs in a personal, intuitive and abstract way in order
vibe or emotion. Disregarding low scores for Spotivibes related to              to get personalized playlists that supports their unique experience
a couple of users for which the initialization process failed due to            and needs of music. The current version of Spotivibes still is an
bugs in the data management model, the minimum, lower quartile,                 early, but functional prototype, on which initial user studies have
median, upper quartile, and maximum were identical and the mean                 been performed. In future work, deeper research into the merit of
score for Spotivibes was 0.2 lower (not statistically significant). This        the color-based tagging is planned to be performed, also including
might be affected by our choice of splitting the rating of Spotify and          larger-scale user studies.
Spotivibes into different parts of the user study. In-person feedback
from a couple of users indicated that they did not realize they were
                                                                                ACKNOWLEDGMENTS
rating the two services against each other. Perhaps placing those
two questions next to each other in the survey would have given a               We would like to thanks Bernd Kreynen for his valuable feedback
better view of how users actually felt about recommendations.                   throughout the project.
   Colour Associations. Users were, however, generally satisfied with
the use of colors as labels for emotions, as shown by Figure 3d. Half           REFERENCES
agreed that colors made it easy to represent emotions, a quarter                 [1] Anna Aljanaki, Frans Wiering, and Remco C Veltkamp. 2016. Studying emotion
                                                                                     induced by music through a crowdsourcing game. Information Processing &
were neutral, and a quarter disagreed. When asked whether using                      Management 52, 1 (2016), 115–128.
multiple colors helped express complex or multi-faceted feelings                 [2] Anna Aljanaki, Yi-Hsuan Yang, and Mohammad Soleymani. 2017. Developing a
65% agreed and only 10% disagreed. This does point towards the                       benchmark for emotional analysis of music. PloS one 12, 3 (2017), e0173392.
                                                                                 [3] Michael A. Casey, Remco C. Veltkamp, Masataka Goto, Marc Leman, Christophe
usefulness of colors as abstract labels for emotions in music. An                    Rhodes, and Malcolm Slaney. 2008. Content-Based Music Information Retrieval:
interesting point to note was that users that gave negative feedback                 Current Directions and Future Challenges. Proc. IEEE 96, 4 (April 2008), 668–696.
                                                                                     https://doi.org/10.1109/JPROC.2008.916370
on the intuitiveness of the color labeling process (regarding their              [4] Tianqi Chen and Carlos Guestrin. 2016. Xgboost: A scalable tree boosting system.
difficulty relating colors to songs or not knowing multiple labels                   In Proceedings of the 22nd acm sigkdd international conference on knowledge
                                                                                     discovery and data mining. ACM, 785–794.
Spotivibes: Tagging Playlist Vibes With Colors                                                                                       IntRS ’19, September 2019, Copenhagen, DK


 [5] Andrew Demetriou, Martha Larson, and Cynthia C. S. Liem. 2016. Go with the                 gap. In Dagstuhl Follow-Ups, Vol. 3. Schloss Dagstuhl-Leibniz-Zentrum fuer In-
     flow: When listeners use music as technology. (2016).                                      formatik.
 [6] J. Stephen Downie, Donald Byrd, and Tim Crawford. 2009. Ten years of ISMIR: Re-       [12] Adrian C North, David J Hargreaves, and Jon J Hargreaves. 2004. Uses of music in
     flections on challenges and opportunities. In Proceedings of the 10th International        everyday life. Music Perception: An Interdisciplinary Journal 22, 1 (2004), 41–77.
     Society for Music Information Retrieval Conference (ISMIR 2009). 13–18.               [13] Fabian Pedregosa, Gaël Varoquaux, Alexandre Gramfort, Vincent Michel,
 [7] Mohsen Kamalzadeh, Dominikus Baur, and Torsten Möller. 2012. A survey on                   Bertrand Thirion, Olivier Grisel, Mathieu Blondel, Peter Prettenhofer, Ron Weiss,
     music listening and management behaviours. (2012).                                         Vincent Dubourg, et al. 2011. Scikit-learn: Machine learning in Python. Journal
 [8] Youngmoo E Kim, Erik M Schmidt, and Lloyd Emelle. 2008. Moodswings: A                      of machine learning research 12, Oct (2011), 2825–2830.
     collaborative game for music mood label collection.. In Ismir, Vol. 2008. 231–236.    [14] Markus Schedl, Arthur Flexer, and Julián Urbano. 2013. The Neglected User in
 [9] Youngmoo E Kim, Erik M Schmidt, Raymond Migneco, Brandon G Morton, Patrick                 Music Information Retrieval Research. Journal of Intelligent Information Systems
     Richardson, Jeffrey Scott, Jacquelin A Speck, and Douglas Turnbull. 2010. Music            41, 3 (2013), 523–539.
     emotion recognition: A state of the art review. In Proc. ISMIR, Vol. 86. 937–952.     [15] Konstantinos Trohidis, Grigorios Tsoumakas, George Kalliris, and Ioannis P
[10] Edith Law and Luis von Ahn. 2009. Input-agreement: A New Mechanism for                     Vlahavas. 2008. Multi-label classification of music into emotions.. In ISMIR, Vol. 8.
     Collecting Data Using Human Computation Games. In Proceedings of the SIGCHI                325–330.
     Conference on Human Factors in Computing Systems (CHI ’09). ACM, New York,            [16] Karthik Yadati, Cynthia C. S. Liem, Martha Larson, and Alan Hanjalic. 2017. On
     NY, USA, 1197–1206. https://doi.org/10.1145/1518701.1518881                                the Automatic Identification of Music for Common Activities. In Proceedings of
[11] Cynthia C. S. Liem, Andreas Rauber, Thomas Lidy, Richard Lewis, Christopher                the 2017 ACM on International Conference on Multimedia Retrieval. ACM, 192–200.
     Raphael, Joshua D Reiss, Tim Crawford, and Alan Hanjalic. 2012. Music infor-          [17] Yi-Hsuan Yang and Homer H. Chen. 2012. Machine Recognition of Music Emotion:
     mation technology and professional stakeholder audiences: Mind the adoption                A Review. ACM Trans. Intell. Syst. Technol. 3, 3, Article 40 (May 2012), 30 pages.
                                                                                                https://doi.org/10.1145/2168752.2168754