<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Combining Spotify and Twitter Data for Generating a Recent and Public Dataset for Music Recommendation</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Martin Pichl</string-name>
          <email>martin.pichl@uibk.ac.at</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Eva Zangerle</string-name>
          <email>eva.zangerle@uibk.ac.at</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Günther Specht</string-name>
          <email>guenther.specht@uibk.ac.at</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Databases and Information, Systems, Institute of Computer Science, University of Innsbruck</institution>
          ,
          <country country="AT">Austria</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2014</year>
      </pub-date>
      <abstract>
        <p>In this paper, we present a dataset based on publicly available information. It contains listening histories of Spotify users, who posted what they are listening at the moment on the micro blogging platform Twitter. The dataset was derived using the Twitter Streaming API and is updated regularly. To show an application of this dataset, we implement and evaluate a pure collaborative ltering based recommender system. The performance of this system can be seen as a baseline approach for evaluating further, more sophisticated recommendation approaches. These approaches will be implemented and benchmarked against our baseline approach in future works.</p>
      </abstract>
      <kwd-group>
        <kwd>Music Recommender Systems</kwd>
        <kwd>Collaborative Filtering</kwd>
        <kwd>Social Media</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. INTRODUCTION</title>
      <p>
        More and more music is available to be consumed, due
to new distribution channels enabled by the rise of the web.
Those new distribution channels, for instance music
streaming platforms, generate and store valuable data about users
and their listening behavior. However, most of the time the
data gathered by these companies is not publicly available.
There are datasets available based on such private data
corpora, which are widely used for implementing and evaluating
recommender systems, i.e., the million song dataset (MSD)
[
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], however such datasets like the MSD often are not recent
anymore. Thus, in order to address the problem of a lack
of recent and public available data for the development and
evaluation of recommender systems, we exploit the fact that
many users of music streaming platforms post what they are
listening to on the microblogging Twitter. An example for
such a tweet is \#NowPlaying Human (The Killers)
#craigcardi #spotify http://t.co/N08f2MsdSt". Using a dataset
derived from such tweets, we implement and evaluate a
collaborative ltering (CF) based music recommender system
and show that this is a promising approach. Music
recommender systems are of interest, as the volume and variety
of available music increased dramatically, as mentioned in
the beginning. Besides commercial vendors like Spotify1,
there are also open platforms like SoundCloud2 or Promo
DJ3, which foster this development. On those platforms,
users can upload and publish their own creations. As more
and more music is available to be consumed, it gets di cult
for the user or rather customer to navigate through it. By
giving music recommendations, recommender systems help
the user to identify music he or she wants to listen to
without browsing through the whole collection. By supporting
the user nding items he or she likes, the platform
operators bene t from an increased usability and thus increase
the customer satisfaction.
      </p>
      <p>As the recommender system implemented in this work
delivers suitable results, we will gradually enlarge the dataset
by further sources and assess how the enlargements in
uences the performance of the recommender system in
future work. Additionally, as the dataset also contains time
stamps and a part of the captured tweets contains a
geolocation, more sophisticated recommendation approaches
utilizing these additional context based information can be
compared against the baseline pure CF-based approach in
future works.</p>
      <p>The remainder of this paper is structured as follows: in
Section 2 we present the dataset creation process as well as
the dataset itself in more detail. Afterwards, in Section 3 we
brie y present the recommendation approach, which is
evaluated in Section 4. Before we present the conclusion drawn
from the evaluation on Section 6, related work is discussed
in Section 5.
1http://www.spotify.com
2http://soundcloud.com
3http://promodj.com</p>
    </sec>
    <sec id="sec-2">
      <title>THE SPOTIFY DATASET</title>
      <p>In this Section, the used dataset 4 for developing and
evaluating the recommender system is presented.
2.1</p>
    </sec>
    <sec id="sec-3">
      <title>Dataset Creation</title>
      <p>
        For the crawling of a su ciently large dataset, we relied on
the Twitter Streaming API which allows for crawling tweets
containing speci ed keywords. Since July 2011, we crawled
for tweets containing the keywords nowplaying, listento and
listeningto. Until October 2014, we were able to crawl more
than 90 million tweets. In contrast to other contributions
aiming at extracting music information from Twitter, where
the tweet's content is used to extract artist and track
information from [
        <xref ref-type="bibr" rid="ref16 ref17 ref7">17, 7, 16</xref>
        ], we propose to exploit the subset
of crawled tweets containing a URL leading to the website
of the Spotify music streaming service5. I.e., information
about the artist and the track are extracted from the
website mentioned in the tweet, rather than from the content
of the tweet. This enables us an unambiguous resolution
of the tweets, in contradiction to the contributions
mentioned above, where the text of the tweets is compared to
entries in the reference database using some similarity
measure. A typical tweet, published via Spotify, is depicted in
the following: \#nowPlaying I Tried by Total on #Spotify
http://t.co/ZaFH ZAokbV", where a user published that he
or she listened to the track \I Tried" by the band \Total" on
Spotify. Additionally, a shortened URL is provided. Besides
this shortened URL, Twitter also provides the according
resolved URL via its API. This allows for directly identifying
all Spotify-URLs by searching for all URLs containing the
string \spotify.com" or \spoti. ". By following the identi ed
URLs, the artist and the track can be extracted from the
title tag of the according website. For instance, the title of
the website behind the URL stated above is \&lt;title&gt;I tried
by Total on Spotify &lt;/title&gt;". Using the regular expression
\&lt;title&gt;(.*) by (.*) on.*&lt;/title&gt;" the name of the track
(group 1) and the artist (group 2) can be extracted.
      </p>
      <p>By applying the presented approach to the crawled tweets,
we were able to extract artist and track information from
7.08% of all tweets or rather 49.45% of all tweets containing
at least one URL. We refer to the subset of tweets, for which
we are able to extract the artist and the track, as \matched
tweets". An overview of the captured tweets is given in Table
1. 1.94% of the tweets containing a Spotify-URL couldn't
be matched due to HTTP 404 Not Found and HTTP 500
Internal Server errors.</p>
      <sec id="sec-3-1">
        <title>Restriction</title>
        <p>None
At least one URL
A Spotify-URL
Matched</p>
        <p>Number of Tweets
90,642,123
12,971,482
6,541,595
6,414,702</p>
        <p>Percentage
100.00%
14.31%
7.22%
7.08%</p>
        <p>Facilitating the dataset creation approach previously
presented, we are able to gather 6,414,702 tweets and extract
artist and track data from the contained Spotify-URLs.
4available at:
spotifyDataset/
5http://www.spotify.com
http://dbis-twitterdata.uibk.ac.at/
2.2</p>
        <p>Based on the raw data presented in the previous
Section, we generate a nal dataset of &lt;user, artist,
track&gt;triples which contains 5,504,496 tweets of 569,722 unique
users who listened to 322,647 tracks by 69,271 artists. In
this nal dataset, users considered as not valuable for
recommendations, i.e., the @SpotifyNowPlay Twitter account
which retweets tweets sent via @Spotify, are removed. These
users were identi ed manually by the authors.</p>
        <p>
          As typical for social media datasets, our dataset has a
long-tailed distribution among the users and their respective
number of posted tweets [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ]. This means that there are only
a few number of users tweeting rather often in this dataset
and numerous users are tweeted rarely which can be found
in the long-tail. This long-tailed distribution can be seen in
Table 2 and Figure 1, where the logarithm of the number of
tweets is plotted against the corresponding number of users.
        </p>
      </sec>
      <sec id="sec-3-2">
        <title>Number of Tweets</title>
        <p>&gt;0
&gt;1
&gt;10
&gt;100
&gt;1,000</p>
      </sec>
      <sec id="sec-3-3">
        <title>Number of Users</title>
        <p>569,722
354,969
91,217
7,419
198</p>
        <p>The performance of a pure collaborative ltering-based
recommender system increases with the detailedness of a
user pro le. Especially for new users in a system, where
no or only little data is available about them, this poses a
problem as no suitable recommendations can be computed.
In our case, problematic users are users who tweeted rarely
and thus can be found in the long tail.</p>
        <p>Besides the long-tail among the number of posted tweets,
there is another long-tail among the distribution of the artist
play-counts in the dataset: there are a few popular artists
occurring in a large number of tweets and many artists are
mentioned only in a limited number of tweets. This is shown
in Figure 2, where the logarithm of the number of tweets in
which an artist occurs in (the play-count) is plotted against
the number of artists. Thus, this plot states how many
artists are mentioned how often in the dataset.</p>
        <p>4,000
)1,000
s
t
e
e
w
T
f
o
r
e
b
um100
N
(
g
o
l
10
0
5000
15000</p>
        <p>20000
10000</p>
        <p>Number of Artists
How the presented dataset is used as input- and evaluation
data for a music recommender system, is presented in the
next Section.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>THE BASELINE RECOMMENDATION AP</title>
    </sec>
    <sec id="sec-5">
      <title>PROACH</title>
      <p>
        In order to present how the dataset can be applied, we
use our dataset as input and evaluation data for an artist
recommendation system. This recommender system is based
on the open source machine learning library Mahout[
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. The
performance of this recommender system is shown in Section
4 and serves as a benchmark for future work.
3.1
      </p>
    </sec>
    <sec id="sec-6">
      <title>Recommendation Approach</title>
      <p>For showing the usefulness of our dataset, we implemented
a User-based CF approach. User-based CF recommends
items by solely utilizing past user-item interactions. For the
music recommender system, a user-item interaction states
that a user listened to a certain track by a certain artist.
Thus, the past user-item interactions represent the listening
history of a user. In the following, we describe our basic
approach taken for computing artist recommendations and
provide details about the implementation.</p>
      <p>
        In order to estimate the similarity of two users, we
computed a linear combination of the Jaccard-Coe cients [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]
(1)
(2)
(3)
based on the listening histories of the user. The
JaccardCoe cient is de ned in Equation 1 and measures the
proportion of common items in two sets.
      </p>
      <p>jaccardi;j = jAi \ Ajj</p>
      <p>jAi [ Ajj</p>
      <p>For each user, there are two listening histories we take
into consideration: the set of all tracks a user listened to
and the set of all artists a user listened to. Thus, we are
able to compute a artist similartiy (artistSim) and a track
similarity (trackSim) as shown in Equations 2 and 3.
artistSimi;j = jartistsi \ artistsjj</p>
      <p>jartistsi [ artistsjj
trackSimi;j = jtracksi \ tracksjj</p>
      <p>jtracksi [ tracksjj</p>
      <p>The nal user similarity is computed using a weighted
average of both, the artistSim and trackSim as depicted in
Equation 4.</p>
      <p>simi;j = wa artistSimi;j + wt trackSimi;j
(4)</p>
      <p>The weights wa and wt determine the in uence of the
artist- and the track listening history on the user
similarity, where wa + wt = 1. Thus, if wt = 0, only the artist
listening history is taken into consideration. We call such a
recommender system an artist-based recommender system.
Analogously, if wa = 0 we call such a recommender system
track-based. If wa &gt; 0 ^ wt &gt; 0, both the artist- and track
listening histories are used. Hence, we facilitate a hybrid
recommender system for artist recommendations.</p>
      <p>The presented weights have to be predetermined. In this
work, we use a grid-search for nding suitable input
parameter for our recommender system as described in Section 4.2.
4.</p>
    </sec>
    <sec id="sec-7">
      <title>EVALUATION</title>
      <p>In this Section we present the performance of the
implemented artist recommender system, but also discuss the
limitations of the conducted o ine evaluation.
4.1</p>
    </sec>
    <sec id="sec-8">
      <title>Evaluation Setup</title>
      <p>
        The performance of the recommender system with di
erent input parameters was evaluated using precision and
recall. Although we focus on the precision, for the sake of
completeness we also include the recall into the evaluation, as
this is usual in the eld of information retrieval [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. The
metrics were computed using a Leave-n-Out algorithm, which
can be described as follows:
1. Randomly remove n items from the listening history
of a user
      </p>
      <sec id="sec-8-1">
        <title>2. Recommend m items to the user</title>
        <p>3. Calculate precision and recall by comparing the m
recommended and the n removed items</p>
      </sec>
      <sec id="sec-8-2">
        <title>4. Repeat step 1 to 3 p times</title>
        <p>5. Calculate the mean precision and the mean recall
Each evaluation in the following Sections has been
repeated ve times (p = 5) and the size of the test set was
xed to 10 items (r = 10). Thus, we can evaluate the
performance of the recommender for recommending up to 10
items.
4.2</p>
      </sec>
    </sec>
    <sec id="sec-9">
      <title>Determining the Input Parameters</title>
      <p>
        In order to determine good input parameters for the
recommender system, a grid search was conducted. Therefore,
we de ne a grid of parameters and the possible
combinations are evaluated using a performance measure [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. In our
case, we relied on the precision of the recommender system
(cf. Figure 3), as the task of a music recommender system
is to nd a certain number of items a user will listen to (or
buy), but not necessarily to nd all good items. Precision
is a reasonable metric for this so called Find Good Items
task [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] and was assessed using the explained Leave-n-Out
algorithm. For this grid search, we recommended one item
and the size of the test set was xed to 10 items. In order
to nd good input parameters, the following grid
parameters determining the computation of the user similarity were
altered:
      </p>
      <sec id="sec-9-1">
        <title>Number of nearest neighbors k</title>
      </sec>
      <sec id="sec-9-2">
        <title>Weight of the artist similarity wa</title>
      </sec>
      <sec id="sec-9-3">
        <title>Weight of the track similarity wt</title>
        <p>The result can be seen in Figure 3. For our dataset it
holds, that the best results are achieved with a track-based
recommender system (wa = 0,wt = 1) and 80 nearest
neighbors (k = 80). Thus, for the performance evaluation of the
recommender system in the next Section, we use the
following parameters:</p>
      </sec>
      <sec id="sec-9-4">
        <title>Number of nearest neighbors 80</title>
      </sec>
      <sec id="sec-9-5">
        <title>Weight of the artist similarity 0</title>
      </sec>
      <sec id="sec-9-6">
        <title>Weight of the track similarity 1</title>
        <p>4.3</p>
      </sec>
    </sec>
    <sec id="sec-10">
      <title>Performance of the Baseline Recommender</title>
    </sec>
    <sec id="sec-11">
      <title>System</title>
      <p>
        In this Section, the performance of the recommender
system using the optimized input parameters is presented. Prior
to the evaluation, we also examined real implementations
of music recommender systems: Last.fm, a music discovery
service, for instance recommends 6 artists6 when
displaying a certain artist. If an artist is displayed on Spotify7,
7 similar artists are recommended at the rst page. This
number of items also corresponds to the work of Miller [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ],
who argues that people are able to process about 7 items at
a glance, or rather that the span of attention is too short
for processing long lists of items. The precision@6 and the
precision@7 of our recommender are 0.20 and 0.19,
respectively. In such a setting, 20% of the recommended items
computed by the proposed recommender system would be a
hit. In other words, a customer should be interested in at
least in two of the recommended artists. An overview about
the precision@n of the recommender is given in Table 3.
6http://www.last.fm/music/Lana+Del+Rey
7http://play.spotify.com/artist/
00FQb4jTyendYWaN8pK0wa
      </p>
      <p>As shown in Figure 4, with an increasing number of
recommendations, the performance of the presented recommender
system declines. Thus, for a high number of
recommendations the recommender system is rather limited. This is,
as the chance of false positives increases if the size of the
test set is kept constant. For computing the recall metric,
the 10 items in the test set are considered as relevant items
(and hence are desirable to recommend to the user). The
recall metric describes the fraction of relevant artists who
are recommended, i.e., when recommending 5 items, even
if all items are considered relevant, the maximum recall is
still only 50% as 10 items are considered as relevant. Thus,
in the evaluation setup, recall is bound by an upper limit,
which is the number of recommended items divided by the
size of the test set.
4.4</p>
    </sec>
    <sec id="sec-12">
      <title>Limitations of the Evaluation</title>
      <p>
        Beside discussing the results, it is worth to mention also
two limitations in the evaluation approach: First, only
recommendations for items the user already interacted with can
be evaluated [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. If something new is recommended, it can't
be stated whether the user likes the item or not. We can
only state that it is not part of the user's listening history
in our dataset. Thus, this evaluation doesn't t to the
perfectly to the intended use of providing recommendations for
new artists. However, this evaluation approach enabled us
to nd the optimal input parameters using a grid search.
Secondly, as we don't have any preference values, the
assumption that a certain user likes the artist he/she listened
to, has to be made.
      </p>
      <p>
        Both drawbacks can be eliminated by conducting a
usercentric evaluation [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. Thus, in a future work, it would be
worth to conduct a user-experiment using the optimized
recommender system.
      </p>
    </sec>
    <sec id="sec-13">
      <title>RELATED WORK</title>
      <p>As already mentioned in the introduction, there exist
several other publicly available datasets suitable for music
recommendations. A quick overview of these datasets is given
in this Section.</p>
      <p>
        One of the biggest available music datasets is the Million
Song Dataset (MSD) [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. This dataset contains information
about one million songs from di erent sources. Beside real
user play counts, it provides audio features of the songs and
is therefore suitable for CF-, CB- and hybrid recommender
systems. At the moment, the Taste Pro le subset8 of the
MSD is bigger than the dataset presented in this work,
however it was released 2011 and is therefore not as recent.
      </p>
      <p>Beside the MSD, also Yahoo! published big datasets9
containing ratings for artists and songs suitable for CF. The
biggest dataset contains 136,000 songs along with ratings
given by 1.8 million users. Additionally, the genre
information is provided in the dataset. The data itself was gathered
8http://labrosa.ee.columbia.edu/millionsong/
tasteprofile
9available at: http://webscope.sandbox.yahoo.com/
catalog.php?datatype=r
●
by monitoring users using the Yahoo! Music Services
between 2002 and 2006. Again, the MSD dataset, the Yahoo
dataset is less recent. Additionally to the ratings, the Yahoo
dataset contains genre information which can be exploited
by a hybrid recommender system.</p>
      <p>
        Celma also provides a music dataset, containing data
retrieved from last.fm10, a music discovery service. It
contains user, artists and play counts as well as the MusicBrainz
identi ers for 360,000 users. This dataset was published in
2010 [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ].
      </p>
      <p>
        Beside the datasets presented above, which are based on
data of private companies, there exist several datasets based
on publicly available information. Sources exploited have
been websites in general [
        <xref ref-type="bibr" rid="ref12 ref14 ref15">12, 15, 14</xref>
        ], Internet radios posting
their play lists [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] and micro-blogging platforms, in
particular Twitter [
        <xref ref-type="bibr" rid="ref13 ref17">17, 13</xref>
        ]. However, using these sources has a
drawback: For cleaning and matching the data, high e ort
is necessary.
      </p>
      <p>
        One of the most similar datasets to the dataset used in
this work, is the Million Musical Tweets Dataset 11 dataset
by Hauger et al. [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. Like our dataset, it was created using
the Twitter streaming API from September 2011 to April
2013, however, all tweets not containing a geolocation were
removed and thus it is much smaller. The dataset
contains 1,086,808 tweets by 215,375 users. Among the dataset,
25,060 unique artists have been identi ed [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ].
      </p>
      <p>
        Another dataset based on publicly available data which
is similar to the MovieLens dataset, is the MovieTweetings
dataset published by Dooms et al. [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. The
MovieTweetings dataset is continually updated and has the same format
as the MovieLens dataset, in order to foster exchange. At
the moment, a snapshot containing 200,000 ratings is
available12. The dataset is generated by crawling well-structured
tweets and extracting the desired information using regular
expressions. Using this regular expressions, the name of the
movie, the rating and the corresponding user is extracted.
The data is afterwards linked to the IMDb, the Internet
Movie Database 13.
6.
      </p>
    </sec>
    <sec id="sec-14">
      <title>CONCLUSION AND FUTURE WORK</title>
      <p>In this work we have shown that the presented dataset
is valuable for evaluating and benchmarking di erent
approaches for music recommendation. We implemented a
working music recommender systems, however as shown in
Section 4, for a high number of recommendations the
performance of our baseline recommendation approach is limited.
Thus, we see a need for action at two points: First we will
enrich the dataset with further context based information
that is available: in this case this can be the time stamp
or the geolocation. Secondly, hybrid recommender system
utilizing this additional context based information are from
interest. Therefore, in future works, we will focus on the
implementation of such recommender systems and compare
them to the presented baseline approach. First experiments
were already conducted with a recommender system trying
to exploit the geolocation. Two di erent implementations
are evaluated at the moment: The rst uses the normalized
linear distance between two users for approximating a user
10http://www.last.fm
11available at: http://www.cp.jku.at/datasets/MMTD/
12https://github.com/sidooms/MovieTweetings
13http://www.imdb.com
similarity. The second one, which in an early stage of
evaluation seems to be the more promising one, increases the
user similarity if a certain distance threshold is underrun.
However, there remains the open question how to determine
this distance threshold.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>N.</given-names>
            <surname>Aizenberg</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Koren</surname>
          </string-name>
          , and
          <string-name>
            <given-names>O.</given-names>
            <surname>Somekh</surname>
          </string-name>
          .
          <article-title>Build your own music recommender by modeling internet radio streams</article-title>
          .
          <source>In Proceedings of the 21st International Conference on World Wide Web (WWW</source>
          <year>2012</year>
          ), pages
          <fpage>1</fpage>
          <lpage>{</lpage>
          10. ACM,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>Apache</given-names>
            <surname>Software Foundation</surname>
          </string-name>
          . What is Apache Mahout?,
          <year>March 2014</year>
          .
          <source>Retrieved July 13</source>
          ,
          <year>2014</year>
          , from http://mahout.apache.org.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>R.</given-names>
            <surname>Baeza-Yates</surname>
          </string-name>
          and
          <string-name>
            <given-names>B.</given-names>
            <surname>Ribeiro-Neto</surname>
          </string-name>
          .
          <article-title>Modern Information Retrieval: The Concepts and Technology behind Search (2nd Edition) (ACM Press Books)</article-title>
          .
          <source>Addison-Wesley Professional</source>
          ,
          <volume>2</volume>
          <fpage>edition</fpage>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>T.</given-names>
            <surname>Bertin-Mahieux</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. P. W.</given-names>
            <surname>Ellis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Whitman</surname>
          </string-name>
          , and
          <string-name>
            <given-names>P.</given-names>
            <surname>Lamere</surname>
          </string-name>
          .
          <article-title>The million song dataset</article-title>
          . In A. Klapuri and C. Leider, editors,
          <source>Proceedings of the 12th International Society for Music Information Retrieval Conference (ISMIR</source>
          <year>2011</year>
          ), pages
          <fpage>591</fpage>
          {
          <fpage>596</fpage>
          . University of Miami,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>O.</given-names>
            <surname>Celma</surname>
          </string-name>
          .
          <article-title>Music Recommendation and Discovery - The Long Tail, Long Fail, and Long Play in the Digital Music Space</article-title>
          . Springer,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>S.</given-names>
            <surname>Dooms</surname>
          </string-name>
          , T. De Pessemier, and
          <string-name>
            <given-names>L.</given-names>
            <surname>Martens</surname>
          </string-name>
          .
          <article-title>Movietweetings: a movie rating dataset collected from twitter</article-title>
          .
          <source>In Workshop on Crowdsourcing and Human Computation for Recommender Systems at the 7th ACM Conference on Recommender Systems (RecSys</source>
          <year>2013</year>
          ),
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>D.</given-names>
            <surname>Hauger</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Schedl</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Kosir</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M.</given-names>
            <surname>Tkalcic</surname>
          </string-name>
          .
          <article-title>The million musical tweet dataset - what we can learn from microblogs</article-title>
          . In A. de Souza Britto Jr.,
          <string-name>
            <given-names>F.</given-names>
            <surname>Gouyon</surname>
          </string-name>
          , and S. Dixon, editors,
          <source>Proceedings of the 14th International Society for Music Information Retrieval Conference (ISMIR</source>
          <year>2013</year>
          ), pages
          <fpage>189</fpage>
          {
          <fpage>194</fpage>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>J. L.</given-names>
            <surname>Herlocker</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. A.</given-names>
            <surname>Konstan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L. G.</given-names>
            <surname>Terveen</surname>
          </string-name>
          , and
          <string-name>
            <given-names>J. T.</given-names>
            <surname>Riedl</surname>
          </string-name>
          .
          <article-title>Evaluating collaborative ltering recommender systems</article-title>
          .
          <source>ACM Transactions on Information Systems</source>
          ,
          <volume>22</volume>
          (
          <issue>1</issue>
          ):5{
          <fpage>53</fpage>
          ,
          <string-name>
            <surname>Jan</surname>
          </string-name>
          .
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>C. W.</given-names>
            <surname>Hsu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. C.</given-names>
            <surname>Chang</surname>
          </string-name>
          , and
          <string-name>
            <given-names>C. J.</given-names>
            <surname>Lin</surname>
          </string-name>
          .
          <article-title>A practical guide to support vector classi cation</article-title>
          . Department of Computer Science and Information Engineering, National Taiwan University, Taipei, Taiwan,
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>P.</given-names>
            <surname>Jaccard</surname>
          </string-name>
          .
          <article-title>The distribution of the ora in the alpine zone</article-title>
          .
          <source>New Phytologist</source>
          ,
          <volume>11</volume>
          (
          <issue>2</issue>
          ):
          <volume>37</volume>
          {
          <fpage>50</fpage>
          ,
          <string-name>
            <surname>Feb</surname>
          </string-name>
          .
          <year>1912</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>G. A.</given-names>
            <surname>Miller</surname>
          </string-name>
          .
          <article-title>The magical number seven, plus or minus two: Some limits on our capacity for processing information</article-title>
          .
          <volume>62</volume>
          :
          <issue>81</issue>
          {
          <fpage>97</fpage>
          ,
          <year>1956</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>A.</given-names>
            <surname>Passant</surname>
          </string-name>
          . dbrec
          <article-title>- Music Recommendations Using DBpedia</article-title>
          .
          <source>In Proceedings of the 9th International Semantic Web Conference (ISWC</source>
          <year>2010</year>
          ), volume
          <volume>6497</volume>
          of Lecture Notes in Computer Science, pages
          <volume>209</volume>
          {
          <fpage>224</fpage>
          . Springer Berlin Heidelberg,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>M.</given-names>
            <surname>Schedl</surname>
          </string-name>
          .
          <article-title>Leveraging Microblogs for Spatiotemporal Music Information Retrieval</article-title>
          .
          <source>In Proceedings of the 35th European Conference on Information Retrieval (ECIR</source>
          <year>2013</year>
          ), pages
          <fpage>796</fpage>
          {
          <fpage>799</fpage>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>M.</given-names>
            <surname>Schedl</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Knees</surname>
          </string-name>
          , and
          <string-name>
            <given-names>G.</given-names>
            <surname>Widmer. Investigating</surname>
          </string-name>
          web
          <article-title>-based approaches to revealing prototypical music artists in genre taxonomies</article-title>
          .
          <source>In Proceedings of the 1st International Conference on Digital Information Management (ICDIM</source>
          <year>2006</year>
          ), pages
          <fpage>519</fpage>
          {
          <fpage>524</fpage>
          . IEEE,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>M.</given-names>
            <surname>Schedl</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. C.</given-names>
            <surname>Liem</surname>
          </string-name>
          , G. Peeters, and
          <string-name>
            <given-names>N.</given-names>
            <surname>Orio</surname>
          </string-name>
          .
          <article-title>A Professionally Annotated and Enriched Multimodal Data Set on Popular Music</article-title>
          .
          <source>In Proceedings of the 4th ACM Multimedia Systems Conference (MMSys</source>
          <year>2013</year>
          ), pages
          <fpage>78</fpage>
          {
          <fpage>83</fpage>
          ,
          <string-name>
            <surname>February</surname>
            <given-names>{March</given-names>
          </string-name>
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>M.</given-names>
            <surname>Schedl</surname>
          </string-name>
          and
          <string-name>
            <given-names>D.</given-names>
            <surname>Schnitzer</surname>
          </string-name>
          .
          <article-title>Hybrid Retrieval Approaches to Geospatial Music Recommendation</article-title>
          .
          <source>In Proceedings of the 35th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR)</source>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>E.</given-names>
            <surname>Zangerle</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Gassler</surname>
          </string-name>
          , and
          <string-name>
            <given-names>G.</given-names>
            <surname>Specht</surname>
          </string-name>
          .
          <article-title>Exploiting twitter's collective knowledge for music recommendations</article-title>
          .
          <source>In Proceedings of the 2nd Workshop on Making Sense of Microposts (#MSM2012)</source>
          , pages
          <fpage>14</fpage>
          {
          <fpage>17</fpage>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>