-

Distributional models vs. Linked Data: exploiting crowdsourcing to personalize music playlists

Cataldo Musto

Fedelucio Narducci

narducci@disco.unimib.it 1

Giovanni Semeraro

Pasquale Lops

Marco de Gemmis

0 0 Department of Computer Science University of Bari Aldo Moro , Italy 1 Department of Information Science , Systems Theory , and Communication University of Milano-Bicocca , Italy

This paper presents Play.me, a system that exploits social media to generate personalized music playlists. First, we extracted user preferences in music by mining Facebook pro les. Next, given this preliminary playlist based on explicit preferences, we enriched it by adding new artists related to those the user already likes. In this work two di erent enrichment techniques are compared: the rst one relies on knowledge stored on DBpedia while the latter is based on the similarity calculations between semantic descriptions of the artists. A prototype version of the tool was made available online in order to carry out a preliminary user study to evaluate the best enrichment strategy. This paper summarizes the results presented in EC-Web 2012 [3].

According to a recent study3, 31,000 hours of music (and 28 million songs) are currently available on iTunes Store. As a consequence, the problem of information overload is currently felt for online music libraries and multimedia content, as well. However, the recent spread of social networks provides researchers with a rich source to draw to overcome the typical bottleneck represented by user preferences elicitation.

Given this insight, in this work we propose Play.me, a system that leverages social media for personalizing music playlists. The ltering model is based on the assumption that information about music preferences can be gathered from Facebook pro les. Next, explicit Facebook preferences may be enriched with new artists related to those the user already likes. In this paper we compare two di erent enrichment techniques: the rst leverages the knowledge stored on DBpedia while the second is based on similarity calculations between semantics descriptions of artists. The nal playlist is then ranked and nally presented 3 http://www.digitalmusicnews.com/permalink/2012/120425itunes to the user that can express her feedback. A prototype version of Play.me was made available online and a preliminary user study to detect the best enrichment technique was performed. Generally speaking, this work can be placed in the area of music recommendation (MR), a topic that has been widely covered in literature: an early attempt of handling MR problem is due to Shardanand [ 5 ], who proposed collaborative ltering to provide music recommendations. Similarly to our work, in [ 1 ] Lamere analyzed the use of tags as source for music recommendation, while the use of Linked Data is investigated in [ 4 ]. 2

Play.me: personalized playlists generator

The general architecture of Play.me is depicted in Figure 1. The generation is directly triggered by the user, who invokes the playlist generator module. The set of her favourite artists is built by mapping her preferences gathered from her own Facebook pro le (speci cally, by mining the links she posted as well as the pages she likes) with a set of artists extracted from Last.fm. Given this preliminary set, the playlist enricher adds new artists by using di erent enrichment strategies. Finally, for each artist in that set, the most popular tracks are extracted and the nal playlist is shown to the target user, who can express her feedback. A working implementation of Play.me has been made available online (Figure 2). For a complete description of the system it is possible to refer to [ 3 ], while in this paper we just focus on the enrichment algorithms.

Enrichment based on Linked Data. The rst technique for enriching user preferences extracted from Facebook relies on the exploitation of DBpedia4. Our approach is based on the assumption that each artist can be mapped to a DBpedia node. The inceptive idea is that the similarity between two artists can be computed according to the number of properties they share (e.g. two Italian bands playing rock music are probably similar). Thus, we decided to use dbpedia-owl:genre (describing the genre played by the artist) and dcterms:subject, that provides information about the musical category. Operationally, we queried a SPARQL endpoint to extract the artists that share as many properties as possible with the target one. Finally, we ranked them according to their playcount in Last.fm. The rst m artists returned by the endpoint are considered as related and added to the set of the favourite artists.

Enrichment based on Distributional Models. Each artist in Play.me is described through a set of tags (extracted from Last.fm), where each tag provides information about the genre played by the artist or describes features typical of her songs (e.g. melanchonic). According to the insight behind distributional models [ 2 ], each artist can be modeled as a point in a semantic vector space, and the position depends on the tags used to describe her and the co-occurrences between the tags themselves. The rationale behind this strategy is that the relatedness between two artists can be calculated by comparing their vectorspace representation through the classical cosine similarity. So, we compute the cosine similarity between the target artist and all the other ones in the dataset, and the m with the highest scores are added to the list of favourite ones. 3

Experimental Evaluation

In the experimental evaluation we tried to identify the technique able to generate the most relevant playlists. We carried out an experiment by involving 30 users against a Last.fm crawl containing data on 228k artists. In order to identify the best enrichment technique, we asked users to use the application for three weeks. In the rst two weeks the system was set with a di erent enrichment technique, while in the last a simple baseline based on the most popular artists was used. Given the playlist generated by the system, users were asked to express their 4 http://dbpedia.org feedback only on the tracks generated by the enrichment process. Results are reported in Table 1. The parameter m refers to the number of artists added by the enrichment algorithm for each one extracted from Facebook. It is worth to notice that both enrichment strategies outperform the baseline. This means that the social network data actually re ect user preferences. The enrichment technique that gained the best performance is that based on distributional models. However, even though this technique gained the best results, a deeper analysis can provide di erent outcomes. Indeed, with m=3 the gap between the approaches drops down: this means that a pure content-based representation introduces more noise than DBpedia, whose e ectiveness stays constant. The good results obtained by the baseline can be justi ed by the low diversity of the users involved in the evaluation. More details about the experimental settings are reported in [ 3 ]. 4

Conclusions and Future Work

In this work we presented Play.me, a system for building music playlists based on social media. Speci cally, we compared two techniques for enriching the playlists, the rst based on DBpedia and the second based on similarity calculations in vector spaces. From the experimental session it emerged that the approach based on distributional models was able to produce the best playlists. Generally speaking, there is still space for future work since the enrichment might be tuned by analyzing di erent DBpedia properties or di erent tags. Furthermore, contextaware personalized playlists could be a promising research direction.

Lamere . Social Tagging and Music Information Retrieval . Journal of New Music Research , 37 ( 2 ): 101 { 114 , 2008 .

Lenci . Distributional approaches in linguistic and cognitive research . Italian Journal of Linguistics , ( 1 ):1{ 31 , 2010 .

Musto , G. Semeraro,

Lops , M. de Gemmis, and

Narducci . Leveraging social media sources to generate personalized music playlists . In EC-Web 2012 .

Passant and

Raimond . Combining Social Music and Semantic Web for MusicRelated Recommender Systems . In Social Data on the Web, ISWC Workshop , 2008 .

Shardanand . Social information ltering for music recommendation . Bachelor thesis , Massachusetts Institute of Technology, Massachusetts, 1994 .