1. INTRODUCTION

Using Semantic Relations in Context-based Music Recommendations

˙Ipek Tatlı

Ays¸ enur Birtürk

0 0 Department of Computer Engineering, METU Inonu Bulvari , 06531 Ankara , Turkey

In this paper, we describe an approach for creating music recommendations based on user-supplied tags that are augmented with a hierarchical structure extracted for top level genres from Dbpedia. In this structure, each genre is represented by its stylistic origins, typical instruments, derivative forms, sub genres and fusion genres. We use this wellorganized structure in dimensionality reduction in user and item pro ling. We compare two recommenders; one using our method and the other using Latent Semantic Analysis (LSA) in dimensionality reduction. The recommender using our approach outperforms the other. In addition to di erent dimensionality reduction methods, we evaluate the recommenders with di erent user pro ling methods.

Recommendation systems user pro ling social tagging semantic relations dimensionality reduction

1. INTRODUCTION

These days, most social-networking sites let their members participate in content generation. For example, users can label artists, albums and tracks with tags in Last.fm. A tag can be anything but it is actually a short description of the item. Because tags represent the reason why a listener likes an item, but not how much he/she likes it they are better identi ers of user pro les than ratings, which are usually numerical values assigned to items by users. Thus, WOMRAD 2011 2nd Workshop on Music Recommendation and Discovery, colocated with ACM RecSys 2011 (Chicago, US) Copyright c . This is an open-access article distributed under the terms of the Creative Commons Attribution License 3.0 Unported, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. we concentrate on the tag-based contextual representations of music tracks.

Items are mostly represented in vector spaces in the recommendation systems. In tag-based recommendation systems, users and items are de ned in terms of weighted vectors of social tags. When there is a large amount of tags, calculation of the items to be recommended becomes hard, because working with huge is to represent individual tracks (songs) in lower dimensional spaces. In order to reduce the dimensionality, we focus on the genre information of the tags. Each genre has a relationship with some instrumentation, with some subgenre information and with style information each of which may be entered as tags in the music domain. In our work, for each genre Dbpedia1 (a structured form of Wikipedia2) is crawled to set the relationships between genre and its stylistic origins, typical instruments, derivative forms, sub genres and fusion genres. The contributions of our approach are that: (1) we provide a "semantic relations" method for dimensionality reduction in very huge vector spaces and (2) we perform the comparison of our method against the classical Singular Value Decomposition (SVD) method which is the base of Latent Semantic Analysis (LSA). Our method outperforms the traditional one. 2.

RELATED WORK

In music recommendation systems, tracks can be pro led in terms of their audio contents (like rhythm, timbre, tonality, instrumentation). In addition to audio descriptions, tracks can be pro led in terms of their text descriptions like their metadata, lyrics, tags and reviews mined from various blogs [ 1 ]. Metadata information is mostly supplied by experts. The artist's name, the album's name, genre, duration and year are some attributes in the metadata. Attributes are global descriptions of items and do not change according to users whereas tags are local descriptors and might change from user to user [ 2 ]. In our study, we focus on text descriptions, namely tags in track pro ling.

Recommender systems either predict ratings for unseen items or predict items that can be liked. Most of the social web-sites like Last.fm do not have a rating mechanism. Instead of explicit ratings, today's recommender systems use implicit ratings (users' listening habits and purchase histories etc.). Thus the rating scale in implicit rating mechanisms is 0-1. Tags can be used in rating-based collaborative 1http://dbpedia.org 2http://en.wikipedia.org ltering systems with the help of an implicit rating mechanism [ 2 ]. If the tag is used by the user, its rating is 1; otherwise its rating is 0. In most previous studies, 2-dimensional spaces in music space are taken into consideration (item-user or user-tag or item-tag). User-tag and item-tag relations can be used to extend the rating data [ 2 ]. A new approach which uses all dimensionalities of the music space is proposed in [ 3 ]. Each 'useritem- tag' data is a tensor in this study and the researchers propose a Higher Order Singular Value Decomposition (HOSVD) technique for 3-dimensional social tagging data. The HOSVD method outperforms the classical methods. In contrast, the three 2-dimensional relations among users, tags and items have been used in a new similarity measure generation which outperforms the traditional item-based and user-based collaborative ltering methods [ 4 ]. In this approach, neighborhood generation is e ected through the similarity of users' tags, similarity of users' items and the similarity of user's tag-item relationships. In addition to user similarities item similarities have been calculated with common tags, common users and common tag-item relationships. Moreover, tags can be clustered and these clusters improve the personalized recommendations [ 5 ].

Up to our knowledge, [ 6 ] uses a similar approach to ours in extracting top 50 music facets from Wikipedia but the main objective of [ 6 ] is to provide an automatic method for uncovering the music facets and to classify tags according to these facets. On the other hand, we create a hierarchical genre structure and evaluate the usefulness of our approach in music recommendation. 3.

OUR APPROACH

Our system performs 6 main tasks, shown in Figure 1: web crawling, creating an ontology of musical genres, classifying tags according to the ontology, track pro ling, user pro ling and enacting the recommendation process. The circles denote the phases of the system.

Users listen to music and enter tags for tracks in their Last.fm pro les. In the web crawling phase of the system, a data set is generated. Details of the data set are given in Section 4.

Tags may be about genre, instrumentation, location, moods, style, personal opinions and/or artists' names [ 1 ]. For example, two users of Last.fm tagged some songs as follows: the rst one loved listening to "The Wall" from "Pink Floyd" and tagged the track with the words "energetic" and "seen live". The second one loved "Only Girl" from "Rihanna" and tagged "Only Girl" also with the words "energetic" and "seen live". Thus both "Only Girl" and "The Wall" have the same tags. According to the recommendation's similarity function, they appear as very similar tracks, although in most other ways (genre, instrumentation for instance) they are not. Because of such reasons, subjective tags like personal opinions and moods are ignored in the track and user representation in our system. Instrumentation, subgenre, fusion genre, derivative forms and stylistic information are used in our track and user representation. Firstly, we decided the main genres in the musical domain. In [ 8 ], 14 mainstream genres (country, folk, jazz, blues, r&b, heavy metal, alternative/indie, punk, rap, electro, reggae, classical, rock and pop) are used. We enriched these genres with Last.fm's mainstream genres, which can be reached on the left frame of the page http://www.last.fm/music. The main genres used in our system are as follows: acoustic, ambient, blues, classical, country, electronic, emo, folk, hardcore, hip hop, indie, jazz, Latin, metal, pop, pop punk, punk, reggae, r&b, rock, soul, world. Having identi ed our genres, we decided to crawl the Wikipedia page for each main genre but then we switched to Dbpedia since it is more structured for webcrawling. We crawled Dbpedia page for each main genre. Obtained information is illustrated in Table 1 for "rock music".In the ontology creation phase, we created a small ontology-a hierarchical structure -with the help of the data crawled from the Dbpedia. Relations in our ontology can be seen in Table 2. In this structure instrumentation, stylistic origins, derivative forms, subgenres and fusion genres are the classes; the crawled data are the instances. For example, "New Age Music" and "Synthpop" are instances of the class "Derivative forms" which can be seen in Table 1.

LSA does not use word order and morphology. In order not to di erentiate "electronic" from "electronica", we applied some stemming algorithm. Stemming is a technique to convert similar words to a common base form. This base form does not have to have a meaning from a linguistic point of view (such as reducing synonyms to a single word, or nding the root of the word). Various stemming algorithms exist for the English language. We used the Porter stemmer3 which is a classical stemming algorithm. By using a stemming algorithm, morphology is taken into consideration in our approach. In the tag classi cation phase of our system, we parsed instances existing in the ontology into single words. We applied the stemming algorithm to each single word. Then we concatenated the stemmed roots with "%" in order to consider "word order" that LSA does not use. Some 3http://tartarus.org/ martin/PorterStemmer examples of the stemming results can be seen in Table 3.

All the tags in our dataset are saved in the "tags" table. The reason why we use "%" in the new version of instances is that we use these versions of the instances in our SQL statements. We use the newer instances in SQL statements like "select * from tags where tag name like '%Aborigin%rock%' ". With this usage, we are using about 100000 tags out of 160000 tags in the track representation. In the track proling phase, the size of a track vector is the size of mainstream genres (22 in our case). Last.fm provides integer percentages (between 0 and 100) relative to the most used tags per track. We updated these percentages by adding 1 to each percentage value in order not to discard any having 0 percentage. Each entry in the vector is calculated as follows:

X hasSubGenres(i; j) + X hasF usionGenres(i; j)

Where, g(i; j) is the ith term (genre) in jth track; and hasInstrumentation(i; j) is the total percentage (between 1 and 101) of the tags of the jth track which are found to be similar to the new instance versions of the instrumentation class of ith genre (with the help of the aforementioned SQL statements). The term count is usually normalized to prevent a bias towards longer documents (which may have a higher term count regardless of the actual importance of that term in the document). The term frequency (TF) value gives local information about a tag. An inverse document frequency (IDF) value is calculated for each di erent tag in the training set. This is calculated by dividing the total number of tracks by the number of tracks that refer to that feature. The IDF value gives global information of a tag. Thus, tracks in our dataset are represented as a weighted list of genres and the weights of the genres are calculated with TF*IDF.

wi = ni;t nt log( jT j ) jTij

In the formula above, wi is the weight of ith genre; ni;t equals the number of times ith genre appears in tth track; nt is the total number of genres in tth track; jT j is the total size of the tracks, and jTij equals the number of tracks in which ith genre appears. In the user pro ling phase, 3 di erent methods are used:

1- using the users' own tags (personal tags) that they entered

2- using the users' friends' tags (friends' tags) that their friends entered

3- using all the tags of the tracks (social tags) that they listened to

In the rst method, users are pro led with their own tags. In the second method, users are pro led with their friends' tags. And in the last method, users are pro led with all the tags of the tracks that they listened to. Semantic relations are also used in user pro ling method 1 and method 2, just the same as in track pro ling. In method 3, a user pro le is the sum of the tracks that he/she has listened to. Weights of the genres in user vectors are also calculated with TF*IDF method. The main goal after creating a user pro le from the training set is to recommend the items in the test set.

In the recommendation phase, we use the common cosine similarity method. The cosine similarity formula is given below:

CosSim(track; user) = track vector user vector jtrack vectorjjuser vectorj 4.

DATA SET

We use real Last.fm data in this study. In order to not to use similar users from our own friend lists and in order to achieve diversity, we selected 69 users from an application named "join Last.fm"4. In this group, members of the group share their Last.fm nicknames. We crawled their Last.fm pro les with the help of Last.fm API5. Since our approach is not collaborative but content-based, this number of users is reasonable. Firstly we gathered their top 300 tracks. Then we extracted their "loved" tracks. For each track, we extracted the singer names and tags. We also gathered the 4http://www.facebook.com/group.php?gid=2246697136&v=wall 5http://last.fm/api tag counts per track. Finally for each user we extracted their tags and their friends' tags. Details of our data set can be seen in Table 4.

EXPERIMENTS Methodology

We performed a 4-fold-cross validation in which the training data size was 75% and the test data was 25%. User pro les were created using the training set and the task of our recommender system was to predict the correct items in the test set. 5.2

Metrics

In this study we used the most common evaluation metric: Precision at the top N ranked results (P@N). Precision is the ratio of relevant tracks selected correctly to the number of tracks selected. 5.3

Results

In Last.fm, although users listen to music, they rarely enter tags for the tracks that they like. Thus, user pro les in Last.fm are smaller than in other social tagging sites, so that the performance of the pure content-based recommendation is not satisfying [ 7 ]. In Table 5; two recommenders using LSA with an optimized parameter -k- and our method in dimensionality reduction are compared in terms of recommending the corresponding tracks in the test set. LSA is applied to the track-tag matrix whose size is 13312*169174 (13312 tracks, 169174 tags). On the other hand, the recommender using semantic relations method decreases the matrix size to 13312 * 22 (13312 tracks, 22 genres). In this recommender, each genre is semantically related to instruments, stylistic origins, subgenres, fusion genres and derivative forms. Thus, semantically related tags are counted as the same genre in this representation. As seen in the Table 5, it is obvious that the recommender using semantic relations outperforms the recommender using LSA in dimensionality reduction because it handles the semantic gap problem in social tagging. Moreover, the recommender using users' own tags in user pro ling performs better than the recommender using friends' own tags. However, the recommender using all social tags in the user pro ling seems to provide the best results because of the increasing number of tags in user vectors.

CONCLUSION

User annotated texts, tags in our case, are huge in size, but the representation matrix is very sparse. Using such giant matrices in calculations is a time- and resource- consuming job. For the document categorization or text summarization, LSA has been used for years because it is easy to use and reliable. As an alternative, with the help of Dbpedia, we created an ontology-like semantic relations structure for the music domain. In this paper, we evaluated two methods which can be used in dimensionality reduction. In the evaluation Last.fm dataset was used and the recommenders were evaluated with di erent user pro ling methods. Our method has the advantage of using "word order" and "morphology" with respect to LSA. We plan to extend our work, assigning di erent weights for di erent relations. For instance, hasInstrumentation and hasSubgenres may have di erent weights in the track pro ling.

[1]

Celma and

Lamere. Music Recommendation Tutorial .ISMIR.Vienna, Austria, 2007 .

[2]

K.H.L.

Tso-Sutter ,

L.B.

Marinho and

Schmidt-Thieme . Tag-aware recommender systems by fusion of collaborative ltering algorithms . ACM symposium on Applied Computing. Brazil , 2008 .

[3]

Symeonidis ,

Ruxanda ,

Nanopoulos and

Manolopoulos . Ternary semantic analysis of social tags for personalized music recommendation . ISMIR . Philadelphia, USA, 2008 .

[4]

Liang ,

Xu ,

Li and

Nayak . Collaborative Filtering Recommender Systems Using Tag Information. Web Intelligence and Intelligent Agent Technology. Sydney, Australia , 2008 .

[5]

Shepitsen ,

Gemmell ,

Mobasher and

Burke . Personalized recommendation in social tagging systems using hierarchical clustering . ACM conference on Recommender systems. Lausanne, Switzerland , 2008 .

[6]

Sordo ,

Gouyon and

Sarmento . A Method for Obtaining Semantic Facets of Music Tags . WOMRAD. Barcelona, Spain, 2010 .

[7]

Cantador ,

Bellogin and

Vallet . Content-based Recommendation Systems in Social Tagging Systems. ACM conference on Recommender systems. Barcelona, Spain , 2010 .

[8]

Levy and

Sandler . Learning Latent Semantic Models For Music From Social Tags . Journal of New Music Research . pages 137 - 150 .