Exploiting Twitter’s Collective Knowledge for Music
                           Recommendations∗

                                          Eva Zangerle, Wolfgang Gassler, Günther Specht
                                    Databases and Information Systems, Institute of Computer Science
                                                     University of Innsbruck, Austria
                                                    {firstname.lastname}@uibk.ac.at


ABSTRACT                                                                               as last.fm1 , which own such big corpora. However, most
Twitter is the largest source of public opinion and also con-                          of them are not publicly available. Especially for academic
tains a vast amount of information about its users’ music                              purposes, only few (mostly small) data sets for the evalua-
favors or listening behaviour. However, this source has not                            tion of the proposed approaches are available, like e.g. the
been exploited for the recommendation of music yet. In this                            million song data set [4].
paper, we present how Twitter can be facilitated for the cre-                          Twitter is a publicly available service, which holds huge
ation of a data set upon which music recommendations can                               amounts of data and is still growing tremendously. Twit-
be computed. The data set is based on microposts which                                 ter stated that there are about 140 million new messages a
were automatically generated by music player software or                               day. Such messages can also be exploited in the context of
posted by users and may also contain further information                               music recommendations. Many audio players offer the func-
about audio tracks.                                                                    tionality of automatically posting a tweet containing the ti-
                                                                                       tle and artist of the track the user currently is listening to.
                                                                                       These tweets traditionally contain keywords like nowplay-
Categories and Subject Descriptors                                                     ing or listeningto, like e.g. in the tweet “#nowplaying Tom
H.2.8 [Database Management]: Database Applications—                                    Waits-Temptation”. For users who frequently make use of
Data Mining                                                                            such a service, the set of these tweets can be seen as a user
                                                                                       profile in terms of her musical preferences and provide well
General Terms                                                                          suited data for e.g. a music recommendation corpus.
                                                                                          In this paper we present an approach for gathering such
Algorithms, Performance, Human Factors, Experimentation                                data and refining it such that the tweeted artists and tracks
                                                                                       can directly be related to the free music databases FreeDB
Keywords                                                                               and MusicBrainz. As a use case scenario, we present the
Recommender Systems, Music Recommendation, Twitter                                     recommendation of music based on the data set.
                                                                                          This paper is structured as follows. Section 2 describes
                                                                                       the processes underlying the creation of the proposed data
1. INTRODUCTION                                                                        set. Section 3 features the approach for the recommendation
  Throughout the last years, music recommendation ser-                                 of suitable music tracks as a use case for the gathered data.
vices have become very popular in both academia and in-                                Section 4 contains related work and Section 5 concludes the
dustry. The goal of such services is the recommendation of                             paper and discusses future work.
suitable music for a certain user. This is traditionally ac-
complished by (i) either taking the user profile consisting of
the tracks the user listened to in the past and (if available)                         2.     DATA SET CREATION
the user’s rating for songs into account or (ii) analysing the                           The goal of this approach is the creation of a corpus of
song itself and using the extracted features in order to find                          music tracks gathered from tweets of users. These tweets
similar songs. For the recommendation of music, huge cor-                              contain tracks the user previously listened to and tweeted
pora and user profiles are required as there are millions of                           about (the so-called user stream). In particular, we propose
different audio tracks. There are some large services, such                            to make use of tweets which have been posted by users or
∗This research was partially funded by the University of                               audio players and contain the title and artist of the music
Innsbruck (Nachwuchsförderung 2011).                                                  track currently played, like e.g. “#NowPlaying Best Thing
                                                                                       I Never Had by Beyonce”. The following sections describe
                                                                                       the steps taken for the creation of the data set.

Permission to make digital or hard copies of all or part of this work for              2.1     Crawling of Twitter Data Set and Analysis
personal or classroom use is granted without fee provided that copies are                The data set was crawled via the Twitter Streaming API-
not made or distributed for profit or commercial advantage and that copies             between July 2011 and February 2012. The only publicly
bear this notice
  Copyright    c and theheld
                  2012    full by
                               citation on the first page. To copy otherwise, to
                                   author(s)/owner(s).
                                                                                       available access method is the Spritzer access which only
republish,
  Publishedto as
              postpart
                   on servers
                       of the or   to redistribute
                                #MSM2012           to lists, requires
                                               Workshop               prior specific
                                                              proceedings,
permission
  available and/or
            online aasfee.
                        CEUR Vol-838, at: http://ceur-ws.org/Vol-838                   provides real-time access to about 1% of all posted Twitter
  #MSM2012, April 16, 2012, Lyon, France.
WWW Lyon, France, 2012                                                                 1
Copyright 20XX ACM X-XXXXX-XX-X/XX/XX ...$10.00.                                           http://www.last.fm


                                                                                                                                                    14
· #MSM2012 · 2nd Workshop on Making Sense of Microposts ·
messages. Due to these restrictions, we crawled 4,734,014           than 37 million audio tracks, roughly 3,000,000 discs and
tweets containing one of the keywords nowplaying, lis-              766,909 different artists. MusicBrainz was also considered
tento or listeningto posted by 864,736 different users.             as a reference database as we expected it to be of higher
This implies an average of 5.5 tweets for each user. Within         quality than FreeDB. MusicBrainz contains about 8 million
our data set, the distribution of tweets per user resembles         tracks of about 650,000 different artists.
a longtail distribution, as can be seen in Table 1. Such a             The goal of this task is to assign each tweet a FreeDB
distribution implies that considering the fact that recom-          and a MusicBrainz entry which represents the title and the
mendations can only be made if a user has posted about              according artist extracted from the tweet. We tackle this
two or more tracks, a total of 457,675 users and the respec-        resolution task by making use of a Lucene fulltext index as
tive tweets can not be facilitated for our approach as only         it allows a simple matching of strings, namely the tweet and
one tweet of these users is featured within the data set.           a certain FreeDB or MusicBrainz entry. The fulltext index
                                                                    is filled with a combined string containing both the artist
                 Tweets in stream       Users                       and the title of all tracks within the reference databases.
                 1                    457,675                          In a next step, we query this fulltext index for each of the
                 >3                   196,422                       tweets within the data set in order to obtain the most suit-
                 >5                   126,783                       able FreeDB/MusicBrainz candidates for the title and artist
                 > 10                  63,017                       of the track. We then use the top-20 search results of Lucene
                 > 100                  3,190                       as candidates for the assignment of tracks to the informa-
                 > 1,000                  253                       tion mentioned in the according tweet. Lucene’s ranking
                 > 10,000                   5                       function is based on the term frequency/inverse document
                                                                    frequency measure (tf/idf). This measure is dependent on
          Table 1: Population of User Streams                       the length of the query which is not favourable in our ap-
   In total, 5,916,294 hashtags were used within the data set.      proach as tweets contains a high degree of noise (e.g. URLs,
Clearly due to our used search keywords the hashtags #now-          feelings, smilies, etc.) which are not part of a track title
playing and #listeningto were the most prominent hash-              but also part of the query (the tweet). Therefore, we im-
tags within the crawled data set. Also, general hashtags like       plemented a bag-of-words similarity measure between the
e.g. #music, #radio or #video have been used frequently.            query and the documents contained within the Lucene in-
Music streaming services or online radios also make use of          dex similar to the Jaccard similarity measure. Our proposed
hashtags when tweeting about the currently playing track            similarity measure is defined by the ratio between the size
(e.g. #cityfm or #fizy).                                            of the (term-) intersection of the query and the track and
   A total of 1,413,983 tweets (29.8% of the whole corpus)          the number of terms contained in the track, as can be seen
featured hyperlinks. An analysis of these URLs revealed             in Equation 1.
that URLs are mostly used to point to music services like
                                                                                                         |tweet ∩ track|
e.g. Youtube or Spotify, an online music streaming service.                  simmusic (tweet, track) =                         (1)
A large part of the hyperlinks lead to the website of the                                                    |track|
service which was used to post the track information on                The advantage of such a measure is the independence of
Twitter, like e.g. tweetmylast.fm or tinysong.com.                  the length of the query and the reduced influence of the
                                                                    noise in tweets. Furthermore, as our goal is to find the
2.2     Resolution of Twittered Tracks                              best matching audio track for all given tweets, it is crucial
   This task aims at parsing the gathered tweets and rec-           that most terms within the track are matched. However, in
ognizing the artist name and track title mentioned in the           the case of multiple search results having obtained an equal
tweet. Consider e.g. the tweet “#NowPlaying Best Thing              score, we still rely on the tf/idf values computed by Lucene.
I Never Had by Beyonce”. For this tweet, we have to ex-             Our proposed score is used for a ranking of the Lucene search
tract Beyonce as the artist and “Best Thing I Never Had” as         results. For each of the tweets, the track which obtained the
the title of the audio track and match it with a reference mu-      highest score are assigned to the tweet. In order to be able to
sic database. Most of the crawled messages are very noisy           set a certain threshold for the scores of the matching entries
and consist of many terms which are not concerned with the          later, we also store the computed simmusic -score.
music track itself. Considering e.g. the tweet “listening to
Hey Hey My My (Out Of The Blue) by Neil Young on                    2.2.2    Evaluation of Resolution
@Grooveshark: #nowplaying #musicmonday http://t.co-                    For the evaluation of the resolution and the comparison of
/7os3eeA” which contains further information about the on-          FreeDB and MusicBrainz, we created a ground truth data
line radio service, a URL and other information which are           set which consists of 100 tweets randomly chosen from the
not related to the music track. Especially when dealing with        data set. Subsequently, we tried to assign matching tracks in
such noisy tweets, the matching is a crucial task as the qual-      the FreeDB and MusicBrainz databases manually. This task
ity of the data resulting from this step significantly influences   was done by the same person for both reference databases
the quality of the resulting recommendations.                       and also contains the resolution of abbreviations or men-
2.2.1 Resolution Approach                                           tions which link to the artist’s Twitter account. For exam-
                                                                    ple the tweet #nowplaying @Lloyd_YG ft. @LilTunechi
  As a reference database for artists and the according tracks,     - You can be resolved to the two Twitter accounts Lloyd-
we made use of the publicly available databases FreeDB2 and         Young Goldie and Lil Wayne WEEZY F and therefore to
MusicBrainz3 . FreeDB contains information about more               the MusicBrainz entry Lil Wayne feat. Lloyd - You. Having
2
    http://www.FreeDB.org                                           gathered all possible information from the tweet, the assign-
3
    http://www.MusicBrainz.org                                      ing person searched for matching tracks in the database.


                                                                                                                                 15
· #MSM2012 · 2nd Workshop on Making Sense of Microposts ·
If the artist or the title of the track were not directly rec-     input user stream. Hence, all recommendation candidates
ognizable in the tweet, single words are used to search the        are ranked by the respective count values where a higher
database and find matching artists or titles. We only consid-      count value results in a higher rank for the candidate.
ered tweets which were resolved to both the according track
and artist. Tweets such as Chris Duarte, famous blues              3.2     Offline Evaluation
musician - free videos here: http://t.co/UZMXaGQ                      As a first evaluation we performed an offline evaluation
#blues #guitar #music #roots #free #nowplaying #mu-                and compared the computed track recommendations with
sicmonday which only contain information about the artist          recommendations provided by the last.fm API4 which lists
were not counted as a match. However, such information is          tracks similar to a given track including a score stating the
also very valuable as it describes the musical taste of a user.    relevance of the song (matching score).
For our ground truth data set, we were able to manually               We made use of the MusicBrainz data set as it contains
assign 57 tracks of FreeDB and 59 tracks of MusicBrainz.           cleaner data than FreeDB. Firstly, we removed all tweets
This shows that the size of both data sets is similar, how-        of users who contributed only one tweet and which were
ever the FreeDB data set is very noisy (typos, spelling errors     matched with a MusicBrainz track with simmusic < 0.8 to
and variations).                                                   dismiss uncertain mappings. Hence our final data set con-
   Subsequently we ran our automated Lucene based reso-            sisted of 2.5 million tweets of 525,751 users. Based on this
lution process on the ground truth dataset using both ref-         data set we computed the according association rules and
erence databases ( see details in Table 2). Considering a          obtained 500 million distinct rules. Due to computability
simmusic -score threshold of 0.8 we were able to resolve 73%       reasons and API limitations, we chose a subset consisting
of the ground truth correctly and had an error rate (false         of the most popular tracks and according rules which are
positives) of about 10% of all matched tracks. The high            present more than 10 times (c > 10). The final data set con-
number of false positives using the FreeDB data set can be         sisted of 15,000 unique tracks and 90 million distinct rules.
lead back to the noisy entries in FreeDB.                             We called the last.fm API for all tracks and the API was
                                                                   able to recognize 13,138 out of 15,000 songs. The API re-
  RefDB           Manually      Automated        False Pos.        turned 3.2 million similar tracks which we matched with our
  MusicBrainz     59            43 (73%)         5 (10%)           internal MusicBrainz database. In total, 83% of all tracks
  FreeDB          57            31 (54%)         18 (36%)          with a score > 0.8 were matched. We transformed the gath-
                                                                   ered last.fm data to association rules and computed the over-
  Table 2: Resolution Ground Truth (100 tweets)                    lap of rules with our rule set. 19% of the last.fm rules are
                                                                   covered by the Twitter-based rules. If we consider only sim-
Due to these obtained results we used MusicBrainz for all          ilar tracks of last.fm with a matching score (gathered via the
further computations (e.g. music recommendations).                 last.fm API) higher than 0.6, the twitter-based rules cover
                                                                   79% of all rules in the set. When comparing the top-10
3. MUSIC RECOMMENDATIONS                                           recommendations on both sides the coverage is only about
   As a use case, we implemented a music recommendation            1% of all rules. These low numbers can be lead back to
service on top of the data set. The necessary steps for a          the restrictions of the Twitter API and the resulting sparse
recommendation of music are described in the following.            data set. Especially the incomplete user profiles decrease the
   The proposed approach for the recommendation of mu-             coverage. E.g. within the “taste” subset of the million song
sic titles relies on the co-occurrence of titles within a user     data set roughly 70% of the tracks were played more than
stream. Based on the obtained tweets and the assigned              10 times. In contrast, in our data set only 5% of the tweets
tracks, we propose to use association rules [2] in order to        were contained more than 10 times. This fact strengthens
be able to model the co-occurrence of items efficiently. In        the evidence that the crawled data set is not representative
the case of the co-occurrence of tweeted music titles, an as-      enough which can be lead back to the API limitation and
sociation rule t1 → t2 describes that a particular user who        uncertainties in the matching processes. Furthermore, due
tweeted about song t1 also tweeted about song t2 . These           to the diversity of music tracks, such an offline evaluation
rules are the basis for the further recommendation process         may not reveal the full potential of the approach. Online
and are stored as triples r = (t1 , t2 , c), where t1 and t2 are   evaluations may achieve better results for our proposed ap-
tracks which have been tweeted by the same user. c is a vari-      proach and are subject to future work.
able holding the popularity of the rule. Hence, such a rule
denotes that track t1 and track t2 both have been listened         4.     RELATED WORK
by c users.                                                           Research related to the presented approach can be cate-
3.1    Ranking of Recommendation Candidates                        gorized into (i) approaches dealing with recommendations
                                                                   either for Twitter or based on tweets and (ii) approaches
  In this step, the computed association rules are analysed        mainly dealing with the recommendation of music.
and so-called recommendation candidates are extracted. Bas-           The utilization of a corpus of tweets for the recommenda-
ed on the rules, the recommended tracks for a certain user         tion of resources has been a popular research topic. For ex-
are computed by selecting a subset C ⊆ T of track recom-           ample the recommendation of suitable hashtags is discussed
mendation candidates by determining all rules which feature        in [14]. Many approaches aim at the recommendation of
tracks occurring on the user stream. The final step for the        users who might be interesting to follow, like e.g. in [7].
recommendation of tracks is the ranking of the recommen-           Such approaches are typically based on the social ties of a
dation candidates within the set C. Therefore, we make use         user (his followees and followers). There are also many ap-
of the count value c describing the popularity of a certain
                                                                   4
track within all association rules matching the tracks of the          http://www.last.fm/api


                                                                                                                               16
· #MSM2012 · 2nd Workshop on Making Sense of Microposts ·
proaches which exploit these ties to recommend resources,         applying CF techniques for the exploitation of the social ties
such as websites [6] or news [12].                                of the user are subject to future work. In order to evaluate
   As for the second category of related work, the recom-         the approach from a user’s point-of-view, online user tests
mendation of music, many different approaches have been           are also part of the future work.
presented. Celma [5] provides an overview about this topic.
Within Recommender Systems, in principle two major ap-            6.   REFERENCES
proaches are distinguished [1]: content-based recommenda-          [1] G. Adomavicius and A. Tuzhilin. Toward the Next
tions and collaborative filtering (CF) approaches. Content-            Generation of Recommender Systems. IEEE
based recommendation systems aim at recommending re-                   Transactions on Knowledge and Data Engineering,
sources which are similar to the resources the user already            17(6):734–749, 2005.
consumed or showed interest in. Collaborative filtering ap-        [2] R. Agrawal and R. Srikant. Fast Algorithms for
proaches aim at finding users with a profile similar to the            Mining Association Rules. In Proc. of the 20th Intl.
current user in order to recommend items which these simi-             Conf on Very Large Data Bases, pages 487–499, 1994.
lar users also were in favor of. This categorization also holds    [3] L. Baltrunas and X. Amatriain. Towards
within music recommendations. Content-based methods for                Time-Dependant Recommendation based on Implicit
music titles typically rely on the extraction and analysis of          Feedback. Workshop on ContextAware Recommender
audio features. The presented approach relies on the second            Systems CARS 2009 in ACM Recsys, 2009:1–5.
type as the computation of association rules based on user         [4] T. Bertin-Mahieux, D. P. Ellis, B. Whitman, and
profiles can be assigned to the class of CF approaches.                P. Lamere. The Million Song Dataset. In Proc. of the
   However, for music recommendations also a third impor-              12th Intl. Conf. on Music Information Retrieval, 2011.
tant aspect is exploited for the computation of recommenda-
                                                                   [5] Ò. Celma. Music Recommendation and Discovery -
tions: context. The notion of context has e.g. been defined
                                                                       The Long Tail, Long Fail, and Long Play in the
by Schmidt et al. as being threefold: physical environment,
                                                                       Digital Music Space. Springer, 2010.
human factors and time [13]. These three factors have all
                                                                   [6] J. Chen, R. Nairn, L. Nelson, M. Bernstein, and
been addressed by music recommendation research. As for
                                                                       E. Chi. Short and Tweet: Experiments on
the physical environment of a user, e.g. Kaminskas and Ricci
                                                                       Recommending Content from Information Streams. In
presented a location-aware approach for music recommenda-
                                                                       Proc. of the 28th Intl. conference on Human Factors
tions [8]. The mood of users has been incorporated for the
                                                                       in Computing Systems, pages 1185–1194. ACM, 2010.
computation of recommendations in [9] and Baltrunas et al.
[3] considered temporal facts when recommending music.             [7] J. Hannon, M. Bennett, and B. Smyth.
   Many approaches exploited user profiles in social networks          Recommending Twitter Users to Follow using Content
to recommend resources. Mesnage et al. [10] showed that                and Collaborative Filtering Approaches. In Proc. of
people prefer the music that their friends in the social net-          the 4th ACM Conf. on Recommender Systems, pages
work prefer. The Serendip.me project5 provides its users               199–206. ACM, 2010.
with music which is selected solely based on the Twitter           [8] M. Kaminskas and F. Ricci. Location-Adapted Music
ties (the followees) of the user. The dbrec project [11] is            Recommendation Using Tags. In User Modeling,
concerned with recommending music based on the DBPedia                 Adaption and Personalization 2011, Girona, Spain,
data set. In particular, the authors developed a distance              July 11-15, 2011, volume 6787 of LNCS, pages
metric for resources within DBPedia which enables the au-              183–194. Springer, 2011.
thors to recommend similar artists.                                [9] J. Lee and J. Lee. Context Awareness by Case-based
   However, to the best of our knowledge there are no ap-              Reasoning in a Music Recommendation System. In
proaches concerned with the recommendation of music based              Proc. of the 4th Intl. Conference on Ubiquitous
on an analysis of “nowplaying” user streams on Twitter.                Computing Systems, pages 45–58. Springer, 2007.
                                                                  [10] C. Mesnage, A. Rafiq, S. Dixon, and R. Brixtel. Music
                                                                       Discovery with Social Networks. In Proc. of the
5. CONCLUSION AND FUTURE WORK                                          Workshop on Music Recommendation and Discovery
   In this paper we showed that tweets can be exploited to             2011 in conjunction with ACM RecSys, volume 793,
build a corpus for music recommendations. The compari-                 pages 1–6. CEUR-WS, 2011.
son with the recommendation service of last.fm showed that        [11] A. Passant. dbrec - Music Recommendations Using
despite the sparse corpus due to Twitter’s API limitations,            DBpedia. The Semantic Web–ISWC 2010, pages
the coverage of last.fm’s recommendations is up to 79%. The            209–224, 2010.
results are very promising although the approach has to be        [12] O. Phelan, K. McCarthy, and B. Smyth. Using
enhanced to be usable in real-world recommendation envi-               Twitter to Recommend Real-Time Topical News. In
ronments. A mayor improvement would be the expansion                   Proc. of the third ACM conference on Recommender
of the data set as currently the corpus is very sparse and             systems, RecSys ’09, pages 385–388, New York, NY,
the user profiles are incomplete. Also, the matching task              USA, 2009. ACM.
of noisy tweets deteriorates the quality of recommendations.
                                                                  [13] A. Schmidt, M. Beigl, and H. Gellersen. There is more
This is due to the fact that many uncertain matching results
                                                                       to Context than Location. Computers & Graphics,
have to be dismissed and hence, the size of the usable data
                                                                       23(6):893–901, 1999.
corpus decreases. Future work also comprises the enhance-
                                                                  [14] E. Zangerle, W. Gassler, and G. Specht. Using Tag
ment of the matching process by using metadata such as
                                                                       Recommendations to Homogenize Folksonomies in
location, URLs or further sentiment analysis. Additionally,
                                                                       Microblogging Environments. In Social Informatics,
5
    http://serendip.me                                                 volume 6984 of LNCS, pages 113–126. Springer, 2011.


                                                                                                                              17
· #MSM2012 · 2nd Workshop on Making Sense of Microposts ·