Distributional models vs. Linked Data: exploiting crowdsourcing to personalize music playlists Cataldo Musto1 , Fedelucio Narducci2 , Giovanni Semeraro1 , Pasquale Lops1 , and Marco de Gemmis1 1 Department of Computer Science University of Bari Aldo Moro, Italy name.surname@uniba.it 2 Department of Information Science, Systems Theory, and Communication University of Milano-Bicocca, Italy narducci@disco.unimib.it Abstract. This paper presents Play.me, a system that exploits social media to generate personalized music playlists. First, we extracted user preferences in music by mining Facebook profiles. Next, given this prelim- inary playlist based on explicit preferences, we enriched it by adding new artists related to those the user already likes. In this work two different enrichment techniques are compared: the first one relies on knowledge stored on DBpedia while the latter is based on the similarity calculations between semantic descriptions of the artists. A prototype version of the tool was made available online in order to carry out a preliminary user study to evaluate the best enrichment strategy. This paper summarizes the results presented in EC-Web 2012 [3]. 1 Introduction and Related Work According to a recent study3 , 31,000 hours of music (and 28 million songs) are currently available on iTunes Store. As a consequence, the problem of informa- tion overload is currently felt for online music libraries and multimedia content, as well. However, the recent spread of social networks provides researchers with a rich source to draw to overcome the typical bottleneck represented by user preferences elicitation. Given this insight, in this work we propose Play.me, a system that leverages social media for personalizing music playlists. The filtering model is based on the assumption that information about music preferences can be gathered from Facebook profiles. Next, explicit Facebook preferences may be enriched with new artists related to those the user already likes. In this paper we compare two different enrichment techniques: the first leverages the knowledge stored on DBpedia while the second is based on similarity calculations between semantics descriptions of artists. The final playlist is then ranked and finally presented 3 http://www.digitalmusicnews.com/permalink/2012/120425itunes to the user that can express her feedback. A prototype version of Play.me was made available online and a preliminary user study to detect the best enrich- ment technique was performed. Generally speaking, this work can be placed in the area of music recommendation (MR), a topic that has been widely covered in literature: an early attempt of handling MR problem is due to Shardanand [5], who proposed collaborative filtering to provide music recommendations. Simi- larly to our work, in [1] Lamere analyzed the use of tags as source for music recommendation, while the use of Linked Data is investigated in [4]. 2 Play.me: personalized playlists generator Fig. 1. Play.me architecture The general architecture of Play.me is depicted in Figure 1. The generation is directly triggered by the user, who invokes the playlist generator module. The set of her favourite artists is built by mapping her preferences gathered from her own Facebook profile (specifically, by mining the links she posted as well as the pages she likes) with a set of artists extracted from Last.fm. Given this preliminary set, the playlist enricher adds new artists by using different enrichment strategies. Finally, for each artist in that set, the most popular tracks are extracted and the final playlist is shown to the target user, who can express her feedback. A working implementation of Play.me has been made available online (Figure 2). For a complete description of the system it is possible to refer to [3], while in this paper we just focus on the enrichment algorithms. Enrichment based on Linked Data. The first technique for enriching user preferences extracted from Facebook relies on the exploitation of DBpe- Fig. 2. Play.me screenshot dia4 . Our approach is based on the assumption that each artist can be mapped to a DBpedia node. The inceptive idea is that the similarity between two artists can be computed according to the number of properties they share (e.g. two Italian bands playing rock music are probably similar). Thus, we decided to use dbpedia-owl:genre (describing the genre played by the artist) and dc- terms:subject, that provides information about the musical category. Oper- ationally, we queried a SPARQL endpoint to extract the artists that share as many properties as possible with the target one. Finally, we ranked them accord- ing to their playcount in Last.fm. The first m artists returned by the endpoint are considered as related and added to the set of the favourite artists. Enrichment based on Distributional Models. Each artist in Play.me is described through a set of tags (extracted from Last.fm), where each tag provides information about the genre played by the artist or describes features typical of her songs (e.g. melanchonic). According to the insight behind distributional models [2], each artist can be modeled as a point in a semantic vector space, and the position depends on the tags used to describe her and the co-occurrences between the tags themselves. The rationale behind this strategy is that the relatedness between two artists can be calculated by comparing their vector- space representation through the classical cosine similarity. So, we compute the cosine similarity between the target artist and all the other ones in the dataset, and the m with the highest scores are added to the list of favourite ones. 3 Experimental Evaluation In the experimental evaluation we tried to identify the technique able to generate the most relevant playlists. We carried out an experiment by involving 30 users against a Last.fm crawl containing data on 228k artists. In order to identify the best enrichment technique, we asked users to use the application for three weeks. In the first two weeks the system was set with a different enrichment technique, while in the last a simple baseline based on the most popular artists was used. Given the playlist generated by the system, users were asked to express their 4 http://dbpedia.org feedback only on the tracks generated by the enrichment process. Results are reported in Table 1. The parameter m refers to the number of artists added by the enrichment algorithm for each one extracted from Facebook. It is worth to notice Table 1. Results: each score represent the ratio of positive feedbacks. Artists Strategy m=1 m=2 m=3 Linked Data 65.9% 64.6% 63.2% Distributional Models 76.3% 75.2% 69.7% Popularity 58% that both enrichment strategies outperform the baseline. This means that the social network data actually reflect user preferences. The enrichment technique that gained the best performance is that based on distributional models. However, even though this technique gained the best results, a deeper analysis can provide different outcomes. Indeed, with m=3 the gap between the approaches drops down: this means that a pure content-based representation introduces more noise than DBpedia, whose effectiveness stays constant. The good results obtained by the baseline can be justified by the low diversity of the users involved in the evaluation. More details about the experimental settings are reported in [3]. 4 Conclusions and Future Work In this work we presented Play.me, a system for building music playlists based on social media. Specifically, we compared two techniques for enriching the playlists, the first based on DBpedia and the second based on similarity calculations in vector spaces. From the experimental session it emerged that the approach based on distributional models was able to produce the best playlists. Generally speak- ing, there is still space for future work since the enrichment might be tuned by analyzing different DBpedia properties or different tags. Furthermore, context- aware personalized playlists could be a promising research direction. References 1. P. Lamere. Social Tagging and Music Information Retrieval. Journal of New Music Research, 37(2):101–114, 2008. 2. A. Lenci. Distributional approaches in linguistic and cognitive research. Italian Journal of Linguistics, (1):1–31, 2010. 3. C. Musto, G. Semeraro, P. Lops, M. de Gemmis, and F. Narducci. Leveraging social media sources to generate personalized music playlists. In EC-Web 2012. 4. A. Passant and Y. Raimond. Combining Social Music and Semantic Web for Music- Related Recommender Systems. In Social Data on the Web, ISWC Workshop, 2008. 5. U. Shardanand. Social information filtering for music recommendation. Bachelor thesis, Massachusetts Institute of Technology, Massachusetts, 1994.