Exploiting Twitter’s Collective Knowledge for Music Recommendations∗ Eva Zangerle, Wolfgang Gassler, Günther Specht Databases and Information Systems, Institute of Computer Science University of Innsbruck, Austria {firstname.lastname}@uibk.ac.at ABSTRACT as last.fm1 , which own such big corpora. However, most Twitter is the largest source of public opinion and also con- of them are not publicly available. Especially for academic tains a vast amount of information about its users’ music purposes, only few (mostly small) data sets for the evalua- favors or listening behaviour. However, this source has not tion of the proposed approaches are available, like e.g. the been exploited for the recommendation of music yet. In this million song data set [4]. paper, we present how Twitter can be facilitated for the cre- Twitter is a publicly available service, which holds huge ation of a data set upon which music recommendations can amounts of data and is still growing tremendously. Twit- be computed. The data set is based on microposts which ter stated that there are about 140 million new messages a were automatically generated by music player software or day. Such messages can also be exploited in the context of posted by users and may also contain further information music recommendations. Many audio players offer the func- about audio tracks. tionality of automatically posting a tweet containing the ti- tle and artist of the track the user currently is listening to. These tweets traditionally contain keywords like nowplay- Categories and Subject Descriptors ing or listeningto, like e.g. in the tweet “#nowplaying Tom H.2.8 [Database Management]: Database Applications— Waits-Temptation”. For users who frequently make use of Data Mining such a service, the set of these tweets can be seen as a user profile in terms of her musical preferences and provide well General Terms suited data for e.g. a music recommendation corpus. In this paper we present an approach for gathering such Algorithms, Performance, Human Factors, Experimentation data and refining it such that the tweeted artists and tracks can directly be related to the free music databases FreeDB Keywords and MusicBrainz. As a use case scenario, we present the Recommender Systems, Music Recommendation, Twitter recommendation of music based on the data set. This paper is structured as follows. Section 2 describes the processes underlying the creation of the proposed data 1. INTRODUCTION set. Section 3 features the approach for the recommendation Throughout the last years, music recommendation ser- of suitable music tracks as a use case for the gathered data. vices have become very popular in both academia and in- Section 4 contains related work and Section 5 concludes the dustry. The goal of such services is the recommendation of paper and discusses future work. suitable music for a certain user. This is traditionally ac- complished by (i) either taking the user profile consisting of the tracks the user listened to in the past and (if available) 2. DATA SET CREATION the user’s rating for songs into account or (ii) analysing the The goal of this approach is the creation of a corpus of song itself and using the extracted features in order to find music tracks gathered from tweets of users. These tweets similar songs. For the recommendation of music, huge cor- contain tracks the user previously listened to and tweeted pora and user profiles are required as there are millions of about (the so-called user stream). In particular, we propose different audio tracks. There are some large services, such to make use of tweets which have been posted by users or ∗This research was partially funded by the University of audio players and contain the title and artist of the music Innsbruck (Nachwuchsförderung 2011). track currently played, like e.g. “#NowPlaying Best Thing I Never Had by Beyonce”. The following sections describe the steps taken for the creation of the data set. Permission to make digital or hard copies of all or part of this work for 2.1 Crawling of Twitter Data Set and Analysis personal or classroom use is granted without fee provided that copies are The data set was crawled via the Twitter Streaming API- not made or distributed for profit or commercial advantage and that copies between July 2011 and February 2012. The only publicly bear this notice Copyright c and theheld 2012 full by citation on the first page. To copy otherwise, to author(s)/owner(s). available access method is the Spritzer access which only republish, Publishedto as postpart on servers of the or to redistribute #MSM2012 to lists, requires Workshop prior specific proceedings, permission available and/or online aasfee. CEUR Vol-838, at: http://ceur-ws.org/Vol-838 provides real-time access to about 1% of all posted Twitter #MSM2012, April 16, 2012, Lyon, France. WWW Lyon, France, 2012 1 Copyright 20XX ACM X-XXXXX-XX-X/XX/XX ...$10.00. http://www.last.fm 14 · #MSM2012 · 2nd Workshop on Making Sense of Microposts · messages. Due to these restrictions, we crawled 4,734,014 than 37 million audio tracks, roughly 3,000,000 discs and tweets containing one of the keywords nowplaying, lis- 766,909 different artists. MusicBrainz was also considered tento or listeningto posted by 864,736 different users. as a reference database as we expected it to be of higher This implies an average of 5.5 tweets for each user. Within quality than FreeDB. MusicBrainz contains about 8 million our data set, the distribution of tweets per user resembles tracks of about 650,000 different artists. a longtail distribution, as can be seen in Table 1. Such a The goal of this task is to assign each tweet a FreeDB distribution implies that considering the fact that recom- and a MusicBrainz entry which represents the title and the mendations can only be made if a user has posted about according artist extracted from the tweet. We tackle this two or more tracks, a total of 457,675 users and the respec- resolution task by making use of a Lucene fulltext index as tive tweets can not be facilitated for our approach as only it allows a simple matching of strings, namely the tweet and one tweet of these users is featured within the data set. a certain FreeDB or MusicBrainz entry. The fulltext index is filled with a combined string containing both the artist Tweets in stream Users and the title of all tracks within the reference databases. 1 457,675 In a next step, we query this fulltext index for each of the >3 196,422 tweets within the data set in order to obtain the most suit- >5 126,783 able FreeDB/MusicBrainz candidates for the title and artist > 10 63,017 of the track. We then use the top-20 search results of Lucene > 100 3,190 as candidates for the assignment of tracks to the informa- > 1,000 253 tion mentioned in the according tweet. Lucene’s ranking > 10,000 5 function is based on the term frequency/inverse document frequency measure (tf/idf). This measure is dependent on Table 1: Population of User Streams the length of the query which is not favourable in our ap- In total, 5,916,294 hashtags were used within the data set. proach as tweets contains a high degree of noise (e.g. URLs, Clearly due to our used search keywords the hashtags #now- feelings, smilies, etc.) which are not part of a track title playing and #listeningto were the most prominent hash- but also part of the query (the tweet). Therefore, we im- tags within the crawled data set. Also, general hashtags like plemented a bag-of-words similarity measure between the e.g. #music, #radio or #video have been used frequently. query and the documents contained within the Lucene in- Music streaming services or online radios also make use of dex similar to the Jaccard similarity measure. Our proposed hashtags when tweeting about the currently playing track similarity measure is defined by the ratio between the size (e.g. #cityfm or #fizy). of the (term-) intersection of the query and the track and A total of 1,413,983 tweets (29.8% of the whole corpus) the number of terms contained in the track, as can be seen featured hyperlinks. An analysis of these URLs revealed in Equation 1. that URLs are mostly used to point to music services like |tweet ∩ track| e.g. Youtube or Spotify, an online music streaming service. simmusic (tweet, track) = (1) A large part of the hyperlinks lead to the website of the |track| service which was used to post the track information on The advantage of such a measure is the independence of Twitter, like e.g. tweetmylast.fm or tinysong.com. the length of the query and the reduced influence of the noise in tweets. Furthermore, as our goal is to find the 2.2 Resolution of Twittered Tracks best matching audio track for all given tweets, it is crucial This task aims at parsing the gathered tweets and rec- that most terms within the track are matched. However, in ognizing the artist name and track title mentioned in the the case of multiple search results having obtained an equal tweet. Consider e.g. the tweet “#NowPlaying Best Thing score, we still rely on the tf/idf values computed by Lucene. I Never Had by Beyonce”. For this tweet, we have to ex- Our proposed score is used for a ranking of the Lucene search tract Beyonce as the artist and “Best Thing I Never Had” as results. For each of the tweets, the track which obtained the the title of the audio track and match it with a reference mu- highest score are assigned to the tweet. In order to be able to sic database. Most of the crawled messages are very noisy set a certain threshold for the scores of the matching entries and consist of many terms which are not concerned with the later, we also store the computed simmusic -score. music track itself. Considering e.g. the tweet “listening to Hey Hey My My (Out Of The Blue) by Neil Young on 2.2.2 Evaluation of Resolution @Grooveshark: #nowplaying #musicmonday http://t.co- For the evaluation of the resolution and the comparison of /7os3eeA” which contains further information about the on- FreeDB and MusicBrainz, we created a ground truth data line radio service, a URL and other information which are set which consists of 100 tweets randomly chosen from the not related to the music track. Especially when dealing with data set. Subsequently, we tried to assign matching tracks in such noisy tweets, the matching is a crucial task as the qual- the FreeDB and MusicBrainz databases manually. This task ity of the data resulting from this step significantly influences was done by the same person for both reference databases the quality of the resulting recommendations. and also contains the resolution of abbreviations or men- 2.2.1 Resolution Approach tions which link to the artist’s Twitter account. For exam- ple the tweet #nowplaying @Lloyd_YG ft. @LilTunechi As a reference database for artists and the according tracks, - You can be resolved to the two Twitter accounts Lloyd- we made use of the publicly available databases FreeDB2 and Young Goldie and Lil Wayne WEEZY F and therefore to MusicBrainz3 . FreeDB contains information about more the MusicBrainz entry Lil Wayne feat. Lloyd - You. Having 2 http://www.FreeDB.org gathered all possible information from the tweet, the assign- 3 http://www.MusicBrainz.org ing person searched for matching tracks in the database. 15 · #MSM2012 · 2nd Workshop on Making Sense of Microposts · If the artist or the title of the track were not directly rec- input user stream. Hence, all recommendation candidates ognizable in the tweet, single words are used to search the are ranked by the respective count values where a higher database and find matching artists or titles. We only consid- count value results in a higher rank for the candidate. ered tweets which were resolved to both the according track and artist. Tweets such as Chris Duarte, famous blues 3.2 Offline Evaluation musician - free videos here: http://t.co/UZMXaGQ As a first evaluation we performed an offline evaluation #blues #guitar #music #roots #free #nowplaying #mu- and compared the computed track recommendations with sicmonday which only contain information about the artist recommendations provided by the last.fm API4 which lists were not counted as a match. However, such information is tracks similar to a given track including a score stating the also very valuable as it describes the musical taste of a user. relevance of the song (matching score). For our ground truth data set, we were able to manually We made use of the MusicBrainz data set as it contains assign 57 tracks of FreeDB and 59 tracks of MusicBrainz. cleaner data than FreeDB. Firstly, we removed all tweets This shows that the size of both data sets is similar, how- of users who contributed only one tweet and which were ever the FreeDB data set is very noisy (typos, spelling errors matched with a MusicBrainz track with simmusic < 0.8 to and variations). dismiss uncertain mappings. Hence our final data set con- Subsequently we ran our automated Lucene based reso- sisted of 2.5 million tweets of 525,751 users. Based on this lution process on the ground truth dataset using both ref- data set we computed the according association rules and erence databases ( see details in Table 2). Considering a obtained 500 million distinct rules. Due to computability simmusic -score threshold of 0.8 we were able to resolve 73% reasons and API limitations, we chose a subset consisting of the ground truth correctly and had an error rate (false of the most popular tracks and according rules which are positives) of about 10% of all matched tracks. The high present more than 10 times (c > 10). The final data set con- number of false positives using the FreeDB data set can be sisted of 15,000 unique tracks and 90 million distinct rules. lead back to the noisy entries in FreeDB. We called the last.fm API for all tracks and the API was able to recognize 13,138 out of 15,000 songs. The API re- RefDB Manually Automated False Pos. turned 3.2 million similar tracks which we matched with our MusicBrainz 59 43 (73%) 5 (10%) internal MusicBrainz database. In total, 83% of all tracks FreeDB 57 31 (54%) 18 (36%) with a score > 0.8 were matched. We transformed the gath- ered last.fm data to association rules and computed the over- Table 2: Resolution Ground Truth (100 tweets) lap of rules with our rule set. 19% of the last.fm rules are covered by the Twitter-based rules. If we consider only sim- Due to these obtained results we used MusicBrainz for all ilar tracks of last.fm with a matching score (gathered via the further computations (e.g. music recommendations). last.fm API) higher than 0.6, the twitter-based rules cover 79% of all rules in the set. When comparing the top-10 3. MUSIC RECOMMENDATIONS recommendations on both sides the coverage is only about As a use case, we implemented a music recommendation 1% of all rules. These low numbers can be lead back to service on top of the data set. The necessary steps for a the restrictions of the Twitter API and the resulting sparse recommendation of music are described in the following. data set. Especially the incomplete user profiles decrease the The proposed approach for the recommendation of mu- coverage. E.g. within the “taste” subset of the million song sic titles relies on the co-occurrence of titles within a user data set roughly 70% of the tracks were played more than stream. Based on the obtained tweets and the assigned 10 times. In contrast, in our data set only 5% of the tweets tracks, we propose to use association rules [2] in order to were contained more than 10 times. This fact strengthens be able to model the co-occurrence of items efficiently. In the evidence that the crawled data set is not representative the case of the co-occurrence of tweeted music titles, an as- enough which can be lead back to the API limitation and sociation rule t1 → t2 describes that a particular user who uncertainties in the matching processes. Furthermore, due tweeted about song t1 also tweeted about song t2 . These to the diversity of music tracks, such an offline evaluation rules are the basis for the further recommendation process may not reveal the full potential of the approach. Online and are stored as triples r = (t1 , t2 , c), where t1 and t2 are evaluations may achieve better results for our proposed ap- tracks which have been tweeted by the same user. c is a vari- proach and are subject to future work. able holding the popularity of the rule. Hence, such a rule denotes that track t1 and track t2 both have been listened 4. RELATED WORK by c users. Research related to the presented approach can be cate- 3.1 Ranking of Recommendation Candidates gorized into (i) approaches dealing with recommendations either for Twitter or based on tweets and (ii) approaches In this step, the computed association rules are analysed mainly dealing with the recommendation of music. and so-called recommendation candidates are extracted. Bas- The utilization of a corpus of tweets for the recommenda- ed on the rules, the recommended tracks for a certain user tion of resources has been a popular research topic. For ex- are computed by selecting a subset C ⊆ T of track recom- ample the recommendation of suitable hashtags is discussed mendation candidates by determining all rules which feature in [14]. Many approaches aim at the recommendation of tracks occurring on the user stream. The final step for the users who might be interesting to follow, like e.g. in [7]. recommendation of tracks is the ranking of the recommen- Such approaches are typically based on the social ties of a dation candidates within the set C. Therefore, we make use user (his followees and followers). There are also many ap- of the count value c describing the popularity of a certain 4 track within all association rules matching the tracks of the http://www.last.fm/api 16 · #MSM2012 · 2nd Workshop on Making Sense of Microposts · proaches which exploit these ties to recommend resources, applying CF techniques for the exploitation of the social ties such as websites [6] or news [12]. of the user are subject to future work. In order to evaluate As for the second category of related work, the recom- the approach from a user’s point-of-view, online user tests mendation of music, many different approaches have been are also part of the future work. presented. Celma [5] provides an overview about this topic. Within Recommender Systems, in principle two major ap- 6. REFERENCES proaches are distinguished [1]: content-based recommenda- [1] G. Adomavicius and A. Tuzhilin. Toward the Next tions and collaborative filtering (CF) approaches. Content- Generation of Recommender Systems. IEEE based recommendation systems aim at recommending re- Transactions on Knowledge and Data Engineering, sources which are similar to the resources the user already 17(6):734–749, 2005. consumed or showed interest in. Collaborative filtering ap- [2] R. Agrawal and R. Srikant. Fast Algorithms for proaches aim at finding users with a profile similar to the Mining Association Rules. In Proc. of the 20th Intl. current user in order to recommend items which these simi- Conf on Very Large Data Bases, pages 487–499, 1994. lar users also were in favor of. This categorization also holds [3] L. Baltrunas and X. Amatriain. Towards within music recommendations. Content-based methods for Time-Dependant Recommendation based on Implicit music titles typically rely on the extraction and analysis of Feedback. Workshop on ContextAware Recommender audio features. The presented approach relies on the second Systems CARS 2009 in ACM Recsys, 2009:1–5. type as the computation of association rules based on user [4] T. Bertin-Mahieux, D. P. Ellis, B. Whitman, and profiles can be assigned to the class of CF approaches. P. Lamere. The Million Song Dataset. In Proc. of the However, for music recommendations also a third impor- 12th Intl. Conf. on Music Information Retrieval, 2011. tant aspect is exploited for the computation of recommenda- [5] Ò. Celma. Music Recommendation and Discovery - tions: context. The notion of context has e.g. been defined The Long Tail, Long Fail, and Long Play in the by Schmidt et al. as being threefold: physical environment, Digital Music Space. Springer, 2010. human factors and time [13]. These three factors have all [6] J. Chen, R. Nairn, L. Nelson, M. Bernstein, and been addressed by music recommendation research. As for E. Chi. Short and Tweet: Experiments on the physical environment of a user, e.g. Kaminskas and Ricci Recommending Content from Information Streams. In presented a location-aware approach for music recommenda- Proc. of the 28th Intl. conference on Human Factors tions [8]. The mood of users has been incorporated for the in Computing Systems, pages 1185–1194. ACM, 2010. computation of recommendations in [9] and Baltrunas et al. [3] considered temporal facts when recommending music. [7] J. Hannon, M. Bennett, and B. Smyth. Many approaches exploited user profiles in social networks Recommending Twitter Users to Follow using Content to recommend resources. Mesnage et al. [10] showed that and Collaborative Filtering Approaches. In Proc. of people prefer the music that their friends in the social net- the 4th ACM Conf. on Recommender Systems, pages work prefer. The Serendip.me project5 provides its users 199–206. ACM, 2010. with music which is selected solely based on the Twitter [8] M. Kaminskas and F. Ricci. Location-Adapted Music ties (the followees) of the user. The dbrec project [11] is Recommendation Using Tags. In User Modeling, concerned with recommending music based on the DBPedia Adaption and Personalization 2011, Girona, Spain, data set. In particular, the authors developed a distance July 11-15, 2011, volume 6787 of LNCS, pages metric for resources within DBPedia which enables the au- 183–194. Springer, 2011. thors to recommend similar artists. [9] J. Lee and J. Lee. Context Awareness by Case-based However, to the best of our knowledge there are no ap- Reasoning in a Music Recommendation System. In proaches concerned with the recommendation of music based Proc. of the 4th Intl. Conference on Ubiquitous on an analysis of “nowplaying” user streams on Twitter. Computing Systems, pages 45–58. Springer, 2007. [10] C. Mesnage, A. Rafiq, S. Dixon, and R. Brixtel. Music Discovery with Social Networks. In Proc. of the 5. CONCLUSION AND FUTURE WORK Workshop on Music Recommendation and Discovery In this paper we showed that tweets can be exploited to 2011 in conjunction with ACM RecSys, volume 793, build a corpus for music recommendations. The compari- pages 1–6. CEUR-WS, 2011. son with the recommendation service of last.fm showed that [11] A. Passant. dbrec - Music Recommendations Using despite the sparse corpus due to Twitter’s API limitations, DBpedia. The Semantic Web–ISWC 2010, pages the coverage of last.fm’s recommendations is up to 79%. The 209–224, 2010. results are very promising although the approach has to be [12] O. Phelan, K. McCarthy, and B. Smyth. Using enhanced to be usable in real-world recommendation envi- Twitter to Recommend Real-Time Topical News. In ronments. A mayor improvement would be the expansion Proc. of the third ACM conference on Recommender of the data set as currently the corpus is very sparse and systems, RecSys ’09, pages 385–388, New York, NY, the user profiles are incomplete. Also, the matching task USA, 2009. ACM. of noisy tweets deteriorates the quality of recommendations. [13] A. Schmidt, M. Beigl, and H. Gellersen. There is more This is due to the fact that many uncertain matching results to Context than Location. Computers & Graphics, have to be dismissed and hence, the size of the usable data 23(6):893–901, 1999. corpus decreases. Future work also comprises the enhance- [14] E. Zangerle, W. Gassler, and G. Specht. Using Tag ment of the matching process by using metadata such as Recommendations to Homogenize Folksonomies in location, URLs or further sentiment analysis. Additionally, Microblogging Environments. In Social Informatics, 5 http://serendip.me volume 6984 of LNCS, pages 113–126. Springer, 2011. 17 · #MSM2012 · 2nd Workshop on Making Sense of Microposts ·