geMsearch: Personalized Explorative Music Search Christian Esswein Markus Schedl Eva Zangerle University of Innsbruck, Johannes Kepler University University of Innsbruck, Austria Linz, Austria Austria christian.esswein@student.uibk.ac.at markus.schedl@jku.at eva.zangerle@uibk.ac.at ABSTRACT users and provides a collection of more than 30 million songs1 Due to the rise of music streaming platforms, huge collections (as of June 2017). Consequently, the primary objective for of music are now available to users on various devices. Within users has shifted from retrieving specific songs to finding and these collections, users aim to find and explore songs based ultimately exploring songs that match certain criteria reflecting on certain criteria reflecting their current and context-specific the user’s current preferences and context [8, 5]. preferences. Currently, users are limited to either using search Currently, two paradigms allow users to explore large mu- facilities or relying on recommender systems that suggest suit- sic collections: search and recommender systems. Utilizing able tracks or artists. Using search facilities requires the user naive search approaches based on simple attribute matching to have some idea about the targeted music and to formulate requires the collection data to be fully annotated with meta- a query that accurately describes this music, whereas recom- data. When relying on keyword search facilities, the user is mender systems are traditionally geared towards long-term required to have some idea of his/her current preferences and shifts of user preferences in contrast to ad-hoc and interactive has to be able to formulate a query that actually describes these preference elicitation. To bridge this gap, we propose geM- preferences well. More advanced search facilities are based search, an approach for personalized, explorative music search on content similarities of items (aka “find similar artists or based on graph embedding techniques. As the ecosystem songs”) and are rarely personalized. Especially data sparsity of a music collection can be represented as a heterogeneous and the lacking ability for comparing heterogeneous items graph containing nodes describing e.g., tracks, artists, genres (tracks, artists, albums, etc.) makes it hard for such systems to or users, we employ graph embedding techniques to learn low- succeed. In contrast, recommender systems propose items that dimensional vector representations for all nodes within the might be suitable for the user (based on some collaborative graph. This allows for efficient approximate querying of the filtering approach or more complex models. While recom- collection and, more importantly, for employing visualization mender systems do not require the user to be able to formulate strategies that allow the user to explore the music collection in his/her current preferences, the user also is not able to directly a 3D-space. influence recommendations by stating e.g., a starting point for his/her explorative search for music matching his/her current ACM Classification Keywords preferences (except for feedback mechanisms like relevance H.3.3. Information Search and Retrieval: Information filter- feedback and explicit ratings that influence the user model in ing; H.4.2. 2. Information Systems Applications: Types of the long term). Systems: Decision Support Only very few approaches like the one proposed by Chen et al. [1] allow the user to specify his/her current needs and Author Keywords preferences in an abstract manner, where the returned results music information retrieval, search, recommender systems, are jointly based on the query (the user’s current information visualization, graph embedding need) and the user’s personal music preferences. However, there is still a substantial lack of user interfaces that provide INTRODUCTION dynamic, exploration-driven visualization strategies for large In recent years, music streaming platforms have become a cen- collections of music. tral means for listening to music as these allow users to access Therefore, we propose the geMsearch system to bridge this huge collections of music. This evolution has also influenced gap in explorative music search. In particular, we propose the way users search and explore music. For instance, the to use graph embedding techniques for computing latent rep- streaming platform Spotify currently serves 140 million active resentations of items contained in the graph, such as tracks, users, artists, genres or acoustic features of tracks. Using such graph embedding techniques [14], a low-dimensional latent vector representation is learned for every node. These firstly allow to create advanced search facilities as search queries can be encoded in the same vector space. As a result, not only exact results can be retrieved, but also similar items and hence, ©2018. Copyright for the individual papers remains with the authors. 1 http://press.spotify.com/us/about Copying permitted for private and academic purposes. MILC ’18, March 11, 2018, Tokyo, Japan 1 exploiting previously unknown similarities between hetero- music recommender approach that is similar to the approach geneous items that can be utilized to retrieve diverse search presented in this paper. A similar approach has also been results. Secondly, the obtained vector representations can utilized for playlist recommendation [2] or text-based music be exploited for advanced visualization paradigms enabling retrieval based on playlists [3]. However, these approaches do explorative music search. not provide a user interface for the exploration of new music. This work presents a preliminary study and visualization pro- totype based on latent representations obtained by graph em- GEMSEARCH: EMBEDDING-BASED VISUALIZATION bedding techniques. In contrast to traditional list-based aggre- In the following section, we present the geMsearch system, a gations of search results that provide a one-dimensional view first prototype for personalized explorative music search based of the retrieved items, we exploit the low-dimensional vector on latent representations of nodes of the musical ecosystem2 . representation to generate 3D representations of the suggested geMsearch stands for graph embedding based music search items, allowing users to visually explore the music collection and consists of two main components, which we will detail in a 3D-space. The user is able to specify a starting point in this section: the graph embedding and retrieval engine that for his/her exploration of the musical 3D-space by browsing computes latent representations of items and query results, and through this space, the query is implicitly refined and the user the client providing a search and visualization interface. is provided with further suitable tracks and artists. Graph Embedding and Retrieval Engine The remainder of this paper is structured as follows. In Sec- For the creation of the graph underlying our approach, we tion 2, we describe related work. Section 3 proposes a visu- rely on the Spotify playlist dataset by Pichl et al. [12], con- alization for explorative music search based on graph embed- taining 852,293 tracks crawled from public Spotify playlists. dings and presents the proposed prototype. Section 4 sums up To enrich the available item descriptors for improved query key aspects and details future work. performance, we also add Last.fm tags3 for the contained tracks. The resulting dataset is represented as a graph con- RELATED WORK taining undirected edges between the following item types: For the task of building visualizations for music exploration, user–track, track–tag, track–album, album–artist and artist– there are a number of relevant approaches, mostly based on genre. For the computation of latent representations of nodes proximity-preserving dimension reduction techniques. via graph embedding, we rely on the popular Deepwalk al- gorithm [11], where we learn representations for all nodes in The Islands of Music interface [10] incorporates rythm descrip- a 128 dimensional vector space. The resulting latent repre- tors and employs self-organizing maps for visualizing music sentations provides means for flexibly computing similarities collections based on the metaphor of geographic maps in two- between heterogeneous items such as tracks, users or artists. dimensional space. One highly relevant extension of these maps is a browsable 3D landscape by Knees et al. [6], where geMsearch allows users to interactively explore the music tracks are clustered based on content features. Hamasaki space to find new music. Therefore, a starting position for and Goto [4] propose Songrium, a collection of visualization browsing through the items has to be determined by eliciting and exploration approaches. These include the “Music Star the user’s current musical preferences. As can be seen in the Map”, a visualization of songs in a graph, where placement top left corner of Figure 1, a text input field (with autocomple- of songs is based on audio similarity. Also, Lamere et al. [7] tion support) allows to select multiple items from the dataset presented a 3D interface (Search Inside the Music) based on to construct a query that reflects the user’s current preferences. Multidimensional Scaling (MDS) techniques to visualize sim- Here, the search query for artist “Jimi Hendrix” may return ilarities between tracks, where each item is represented as similar and suitable artists, tracks or tags. In addition, the a single colored item in the 3D space. Similarly, the Music search result can further be refined by adding further search Box visualization approach relies on Principal Component terms. In Figure 1, the tag “guitar” is entered and combined Analyses to visualize tracks, where song similarity is used with the first term. To create a search vector which is evaluated to distribute tracks on a plane. Stober et al. [13] also rely to retrieve nearest neighbors as search results, the mean item on MDS, however, utilize bisociative lens distortions to sup- representation of these query terms is computed. The scaled port serendipitous music discovery in the MusicGalaxy UI. user’s latent representation is finally added to this vector and The visualization proposed in this work differs from these hence, long-term preferences partly influences the outcome. approaches in the fact that we base the visualization on latent The resulting vector is then used to retrieve the most similar representations of items within a heterogeneous graph that items from the graph as search results. includes tracks, artists, albums, genres, etc. Due to the applied graph embedding techniques, proximities within the graph Visualization visualization are not restricted to similarities between items The most common visualization for both recommendation of the same type (e.g., tracks) or similarities based on a single and search results is to display a list of items ordered by the set of features (e.g., audio features), but rather capture the predicted relevance of the individual items for the user. This similarity of items of any type in the latent feature space. limits users to only observing the sequential order of items and Recently, graph embedding techniques have also been intro- 2 The prototype can be accessed at http://dbis-graphembeddings. duced to the field of music information retrieval. Chen et uibk.ac.at al. [1] utilize graph embeddings for realizing a query-based 3 https://www.last.fm/api/show/track.getTags 2 further extend the query and hence, refine the search to match current preferences more precisely. Besides this active manip- ulation, the 3D scene provides an even more effective process of implicit refinement. The most relevant search results are positioned around the center of the screen. When exploring additional items further away, the user has to opt for a direc- tion in which to continue exploring. After inspecting items at the new position, the navigation direction can be refined. If the user detects suitable items, the direction is correct; otherwise the user will navigate in a different direction. This choice of directions and moving within the virtual result space directly translates to (implicit) query refinement. It is crucial to simplify the inspection of single items such that huge collections of music are explorable in reasonable time. We use album covers as textures for 3D objects describing track and album items and hence, also allow for visually inspecting node textures as this has shown to be an efficient means for judging the relevance of albums and tracks [9]. To provide detailed information about selected items (e.g., artists of a given track, genres, etc.), information from the underlying graph is retrieved and displayed. Also, we provide Figure 1. geMsearch query bar with autocomplete and list results. music previews for each track that allow users to inspect and immediately consume newly discovered tracks. hence, a one-dimensional view agnostic to distances between As similar items are located in close proximity to one another consecutive items. With a latent feature space underlying the in the resulting space, distance-based clustering techniques can system (obtained through, e.g., graph embedding techniques), be applied to represented accumulations of items as annotated similarities between arbitrary items can be expressed which clusters. This allows users to decide whether a set of items permits developing more advanced interfaces. Through recent might be of interest by looking at the characteristics of the advances in browser technology, like the availability of native cluster and not having to inspect the individual items contained WebGl, just-in-time visualizations of 3D scenes can be cre- in the cluster. However, zooming in into a cluster to inspect ated directly on websites without complex precomputations the individual contained items is still possible. Figure 2 shows or add-ons. Using dimension reduction methods, the com- how clusters of similar items are represented as single orange puted high-dimensional latent representations can be reduced circles. On click, the contained items are shown while all other to three dimensions, allowing to directly visualize items while elements are faded with transparency to enhance the contrast. preserving proximity. Here, we utilize principal component As items within a cluster are positioned nearby, the scene is analysis to reduce the 128 dimensional representation of items zoomed in without scaling the circle sizes to avoid overlapping to a three dimensional space. Instead of displaying a list of elements. items, the recommended items can now be visualized in a 3D scene. Each track, artist or album can be positioned using its three-dimensional representation and can hence be displayed as an interactive 3D object. The positions and resulting dis- tances reflect the relationships and proximities between items within the music collection. Beside the traditional list view for search results, the gemSearch client visualizes the surrounding items in a 3D WebGl scene as depicted in Figure 2. Using such an interface does not only allow to express distance between items, but, more importantly, it allows the user to explore and browse through the result space interactively. Mouse gestures allow for exploring the virtual space and while navigating, additional items are lazy-loaded into the scene. The user may first use a keyword search to express his/her current preferences (cf. section on Graph Embedding and Retrieval Engine). Based on these criteria, the first search results are retrieved and displayed in a 3D space, where the user should feel like navigating through a virtual result space instead of jumping to unconnected items. In the underlying Figure 2. Web client 3D view and player bar. latent vector space, any of the proposed items can be used to 3 To alleviate the cold start problem for user profiles, users can the 10th ACM Conf. on Recommender Systems (RecSys connect with their Spotify account. The official Spotify API ’16). 79–82. supports the OAuth protocol with different scopes, allowing 2. Chih-Ming Chen, Chun-Yao Yang, Chih-Chun Hsia, Yian access to, e.g., personal playlists, playing history or saved Chen, and Ming-Feng Tsai. 2016b. Music Playlist tracks. To create a personal preference profile, geMsearch Recommendation via Preference Embedding. In Proc. of retrieves the user’s saved tracks as we argue that saved tracks the Poster Track of the 10th ACM Conf. on Recommender may serve as a strong indicator for preference. After a user Systems. has connected with his/her account, the music library is loaded and compared with the current contents of the underlying 3. Chia-Hao Chung, Yian Chen, and Homer Chen. 2017. graph. For tracks, artists, etc. that are not yet contained in the Exploiting Playlists for Representation of Songs and underlying graph, we gather the missing metadata from Spotify Words for Text-Based Music Retrieval. In Proc. of the and user-curated tags describing these items from Last.fm. 18th Intl. Society for Music Information Retrieval Conf. After the data is collected, the graph is extended with this new 4. Masahiro Hamasaki and Masataka Goto. Songrium: A information. In a next step, latent representations have to be Music Browsing Assistance Service Based on computed in case of new items or updated in case of items Visualization of Massive Open Collaboration Within that are affected by the newly added information. Deepwalk Music Content Creation Community. In Proc. of the 9th uses short random walks to model the graph structure with Intl. Symposium on Open Collaboration. an uniform distribution over nodes. Therefore, neither the 5. Mohsen Kamalzadeh, Dominikus Baur, and Torsten complete graph structure, nor all nodes have to be known Möller. A survey on music listening and management to the algorithm initially. This implies that additional nodes behaviours. In Proc. of the 12th Intl. Society for Music and edges can be added on-the-fly to continue learning and Information Retrieval Conf. extending existing embeddings without the need to relearn the complete model from scratch when adding new users or items. 6. Peter Knees, Markus Schedl, Tim Pohle, and Gerhard Widmer. 2006. An innovative three-dimensional user interface for exploring music collections enriched. In CONCLUSION AND FUTURE WORK Proc. of the 14th ACM Intl. Conf. on Multimedia. ACM, In this work, we presented geMsearch, a preliminary prototype 17–24. for personalized exploration and search of music collections. 7. Paul Lamere and Douglas Eck. 2007. Using 3D We exploit graph embedding techniques to compute a low- Visualizations to Explore and Discover Music. In Proc. of dimensional vector space representation of the music collec- the 7th Intl. Society for Music Information Retrieval Conf. tion and the contained items. This allows for query-based, 173–174. personalized exploration of music collections. Particularly, our approach provides users with a 3D representation to yield 8. Jin Ha Lee, Yea-Seul Kim, and Chris Hubbles. 2016. A a visual exploration of new music; allowing the user to browse Look at the Cloud from Both Sides Now: An Analysis of through search results and the full collection, where the dis- Cloud Music Service Usage. In Proc. of the 17th Intl. tance of items (tracks, artists, genres) in the displayed graph Society for Music Information Retrieval Conf. 299–305. corresponds to item similarity. Please note that the browsing 9. Janis Libeks and Douglas Turnbull. 2011. You can judge through the 3D-space is not restrained to search results, the an artist by an album cover: Using images for music user’s query is a mere definition of a starting point for brows- annotation. IEEE MultiMedia 18, 4 (2011), 30–37. ing for the full collection graph and hence, query refinement. 10. Elias Pampalk. 2001. Islands of music: Analysis, We believe that the proposed method is not necessarily limited organization, and visualization of music archives. to music and may also be used in different domains, where Master’s thesis. Technical University Vienna. data can be represented as graph and metadata for single items 11. Bryan Perozzi, Rami Al-Rfou, and Steven Skiena. 2014. is sparse. Deepwalk: Online learning of social representations. In Proc. of the 20th ACM SIGKDD Intl. Conf. on Knowledge As for future work, we aim to further extend the prototype by Discovery and Data Mining. ACM, 701–710. improving the visualization performance and updating user preferences on-the-fly. For computing the user profiles, we 12. Martin Pichl, Eva Zangerle, and Günther Specht. 2017. aim to look into incorporating listening histories and create Improving Context-Aware Music Recommender Systems: more comprehensive user profiles. Also, we aim to lay a Beyond the Pre-filtering Approach. In Proc. of the 7th particular emphasis on interaction aspects in the prototype Conf. on Multimedia Retrieval (ICMR). ACM, 201–208. by, e.g., allowing the user to up- or downvote certain tracks 13. Sebastian Stober, Stefan Haun, and Andreas Nürnberger. explicitly. We further aim to perform a user-centric evaluation 2012. Bisociative Music Discovery and Recommendation. of the system. Springer Berlin Heidelberg, 472–483. 14. Shuicheng Yan, Dong Xu, Benyu Zhang, Hong-Jiang REFERENCES Zhang, Qiang Yang, and Stephen Lin. 2007. Graph 1. Chih-Ming Chen, Ming-Feng Tsai, Yu-Ching Lin, and embedding and extensions: A general framework for Yi-Hsuan Yang. 2016a. Query-based Music dimensionality reduction. IEEE Transactions on Pattern Recommendations via Preference Embedding. In Proc. of Analysis and Machine Intelligence 29, 1 (2007), 40–51. 4