<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Towards Semantic Music Information Extraction from the Web Using Rule Patterns and Supervised Learning</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Peter Knees</string-name>
          <email>peter.knees@jku.at</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Markus Schedl</string-name>
          <email>markus.schedl@jku.at</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Computational Perception, Johannes Kepler University</institution>
          ,
          <addr-line>Linz</addr-line>
          ,
          <country country="AT">Austria</country>
        </aff>
      </contrib-group>
      <fpage>3</fpage>
      <lpage>10</lpage>
      <abstract>
        <p>We present rst steps towards automatic Music Information Extraction, i.e., methods to automatically extract semantic information and relations about musical entities from arbitrary textual sources. The corresponding approaches allow us to derive structured meta-data from unstructured or semi-structured sources and can be used to build advanced recommendation systems and browsing interfaces. In this paper, several approaches to identify and extract two speci c semantic relations from related Web documents are presented and evaluated. The addressed relations are members of a music band (band members) and artists' discographies (artist albums; EP s; singles). In addition, the proposed methods are shown to be useful to relate (Web-)documents to musical artists. For all purposes, supervised learning approaches and rule-based methods are systematically evaluated on two di erent sets of Web documents.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Categories and Subject Descriptors</title>
      <p>J.5 [Arts and Humanities]: Music; I.2.7 [Arti cial
Intelligence]: Natural Language Processing|Text analysis
Algorithms</p>
    </sec>
    <sec id="sec-2">
      <title>MOTIVATION AND INTRODUCTION</title>
      <p>
        Measuring similarity between artist, tracks or other
musical entities | be it audio-based, Web-based, or a
combination of both | is a key concept for music retrieval and
recommendation. However, the type of relations between
these entities, i.e., what makes them similar, is often
neglected. Especially in the music domain, the number of
WOMRAD 2011 2nd Workshop on Music Recommendation and Discovery,
colocated with ACM RecSys 2011 (Chicago, US)
Copyright c . This is an open-access article distributed under the terms
of the Creative Commons Attribution License 3.0 Unported, which permits
unrestricted use, distribution, and reproduction in any medium, provided
the original author and source are credited.
potential relations between two entities is large. Such
relations comprise, e.g., cover versions of songs, live versions,
re-recordings, remixes, or mash-ups. Semantic high-level
concepts such as \song X was inspired by artist A" or \band
B is the new band of artist A" are very prominent in many
users' conception and perception of music and should
therefore be given attention in similarity estimation approaches.
By focusing solely on acoustic properties, such relations are
hard to detect (as can be seen, e.g., from research on cover
version detection [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]).
      </p>
      <p>
        A promising approach to deal with the limitations of
signalbased methods is to exploit contextual information (for an
overview see, e.g., [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ]). Recent work in music information
retrieval has shown that at least some cultural aspects can
be modeled by analyzing extra-musical sources (often
referred to as community metadata [
        <xref ref-type="bibr" rid="ref25">25</xref>
        ]). In the majority of
work, this data | typically originating from Web sources
and user data | is used for description/tagging of
music (e.g., [
        <xref ref-type="bibr" rid="ref10 ref23 ref24">10, 23, 24</xref>
        ]) and assessment of similarity between
artists (e.g., [
        <xref ref-type="bibr" rid="ref17 ref21 ref22 ref25">17, 21, 22, 25</xref>
        ]). However, while for these tasks
standard information retrieval (IR) methods that reduce the
obtained information to simple representations such as the
bag-of-words model may su ce, important information on
entities like artists' full names, band member names, album
and track titles, related artists, as well as some music
speci c concepts like instrument names and musical styles may
be dismissed. Addressing this issue, essential progress
towards identifying relevant entities and, in particular,
relations between these could be made. These kinds of
information would also be highly valuable to automatically populate
music-speci c ontologies, such as the Music Ontology1 [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ].
      </p>
      <p>In this paper, we aim at developing automatic methods
to discover semantic relations between musical entities by
analyzing texts from the Web. More precisely, to assess the
feasibility of this goal, we focus on two speci c sub-tasks,
namely automatic band member detection, i.e., determining
which persons a band consists (or consisted) of, and
automatic discography extraction, i.e., recognition of released
records (i.e., albums, EPs, and singles). Band member
detection is strongly related to one of the central tasks of
information extraction (IE) and named entity detection (NED),
i.e., the recognition of persons' names in documents. While
person's names typically exhibit some common patterns in
terms of orthography and number of tokens, detection of
artist names and band members is a bigger challenge as they
frequently comprise or consist of nicknames, pseudonyms,
or just a symbol (cf. Prince for a limited time).
Discog1http://www.musicontology.com
raphy detection in unstructured text is an even more
challenging task as song or album names (release names in the
following) are not bound to any conventions. That is,
release names can consist of an unknown number of tokens
(including zero tokens, cf. The Beatles's \white album", or
Weezer 's \blue", \green", and \red" albums, which might
lead to inconsistent references on di erent sources), just
special characters (e.g., Justice's \Cross"), a di erential
equation (track 2 on Aphex Twin's \Windowlicker" single), or
whole paragraphs (e.g., the full title of a Soulwax album
often abbreviated as Most of the remixes consists of 552
characters). Especially the last example demonstrates some
of the challenges of a discography-targeted named entity
recognition approach as the full album title itself exhibits
linguistic structures and even contains another band's name
(Einsturzende Neubauten). Hence, general methods not
tailored to (or even aware of) music-related entities might not
be able to deal with such speci cs.</p>
      <p>To investigate the potential and suitability of
languageprocessing-based approaches for semantic music information
extraction from (Web-)texts, two strategies commonly used
in IE tasks are explored in this paper: manual tailoring
of rule patterns to extract entities of interest (the
\knowledge engineer" approach) and automatic learning of patterns
from labeled data (supervised learning). Since particularly
for the latter, pre-labeled data is required | which is di
cult to obtain for most types of semantic relations |
bandmembership and discography extraction are, from our point
of view, good starting points as these types of information
are also largely available in a structured format (e.g., via
Web services such as MusicBrainz2). In addition, the
methods presented are also applied to relate documents to musical
artists, which is useful for further tasks such as automatic
music-focused crawling and indexing of the Web. In the
bigger picture, these are supposed to be but the rst steps
towards a collection of methods to identify high-level
musical relations between pieces, like cover versions, variations,
remasterings, live interpretations, medleys, remixes,
samples, etc. As some of these concepts are (partly) deducible
from the audio signal itself, well considered methods for
combining information from the audio with (Web-based)
metainformation are required to automatically discover such
relations.</p>
    </sec>
    <sec id="sec-3">
      <title>RELATED WORK</title>
      <p>
        The two music information extraction tasks addressed in
this paper, i.e., band member and discography extraction,
are speci c cases of relation extraction. Since in the
scenarios considered in this paper, one of the relational
concepts is considered to be known (i.e., the band a text deals
with), semantic relation extraction is reduced to named
entity recognition and extraction tasks (i.e., extraction of band
members and released records). Named entity recognition
itself is a well-researched topic (for an overview see, e.g., [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ])
and comprises the identi cation of proper names in
structured or unstructured text as well as the classi cation of
these names by means of rule-based or supervised learning
approaches. While rule-based methods rely on experts that
uncover patterns for the speci c task and domain,
supervised learning approaches require large amounts of labeled
training data (which could, for instance, also stem from an
2http://musicbrainz.org/
ontology (cf. [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]). For the music domain { despite the
numerous contributions that exploit Web-based sources to describe
music or to derive similarity (cf. Section 1) { the number
of publications aiming at extracting factual meta-data for
musical entities by applying language processing methods is
rather small.
      </p>
      <p>
        In [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ], we propose a rst step to automatically extract
the line-up of a music band, i.e., not only the members of
a band but also their corresponding instruments and roles.
As data source up to 100 Web documents for each band B,
obtained via Google queries such as \B" music, \B" music
members, or \B" lineup music, are utilized. From the
retrieved pages, n-grams (where n = f2; 3; 4g), whose tokens
consist of capitalized, non-common speech words of length
greater than one are extracted. For band member and role
extraction, a Hearst pattern approach (cf. [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]) is applied to
the extracted n-grams and their surrounding text. The seven
patterns used are 1. M plays the I, 2. M who plays the I,
3. R M, 4. M is the R, 5. M, the R, 6. M (I ), and 7. M
(R), where M is the n-gram/potential band member, I an
instrument, and R a role. For I and R, roles in a \standard
rock band line-up", i.e., singer, guitarist, bassist, drummer,
and keyboardist, as well as synonyms of these, are
considered. After extraction, the document frequency of each rule
is counted, i.e., on how many Web pages each of the above
rules applies. Entities that occur on a percentage of band
B 's Web pages that is below a given threshold are discarded.
The remaining member-role relations are predicted for B. In
this paper, evaluation of the presented approaches is also
carried out on the best-performing document set from [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ]
and compared against the Hearst pattern approach.
      </p>
      <p>
        In [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ], we investigate several approaches to determine
the country of origin for a given artist, including an
approach that performs keyword spotting for terms such as
\born" or \founded" in the context of countries' names on
Web pages. Another approach for country of origin
determination is presented in [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. Govaerts and Duval use selected
Web sites and services, such as Freebase3, Wikipedia4, and
Last.fm5. Govaerts and Duval propose three heuristics to
determine the artist's country of origin using the occurrences
of country names in biographies (highest overall occurrence,
strongly favoring early occurrences, weakly favoring early
occurrences). In [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], Geleijnse and Korst apply patterns like
G bands such as A, for example A1 and A2, or M mood
by A (where G represents a genre, A an artist name, and
M a possible mood) to unveil genre-artist, artist-artist, and
mood-artist relations, respectively.
      </p>
      <p>While these music-speci c information extraction
methods mainly build upon few simple patterns or term frequency
statistics, the work presented in this paper aims at
incorporating more general methods that take advantage of
linguistic features of the underlying texts and automatically learn
models to derive musical entities annotated examples.
3.</p>
    </sec>
    <sec id="sec-4">
      <title>METHODOLOGY</title>
      <p>
        The methods presented in this paper make use of the
linguistic properties of texts related to music bands. To
assess this information, for both approaches investigated
(rulebased and supervised-learning-based), several pre-processing
3http://www.freebase.com
4http://www.wikipedia.org
5http://last.fm
steps are required to obtain these linguistic features. Apart
from initial preparation steps such as markup removal (if
necessary), text tokenization (i.e., splitting the text into
single tokens based on white spaces) and sentence splitting
(based on punctuation), this comprises the following steps:
1. Part-of-Speech Tagging (PoS): assigns PoS tags
to tokens, i.e., annotates each token with its linguistic
category (noun, verb, preposition, etc.), cf. [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ].
2. Gazetteer Annotation: annotates occurrences of
pre-de ned keywords known to represent a speci c
concept, e.g., company names or persons' ( rst) names.
These annotations can be used as look-up information
for subsequent steps (see below). For the music
domain, in this step, we also include lists of musical
genres, instruments, and band roles, as well as a list of
country names, cf. [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ].
3. Transducing Step: identi es named entities such as
persons, companies, locations, or dates using
manually generated grammar rules. These rules can include
lexical expressions, PoS information, look-up entities
extracted via the gazetteer, or any other type of
available annotation.
      </p>
      <p>
        For all of these steps the functionalities included in the
GATE software package (General Architecture for Text
Engineering [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]) are utilized. In GATE's transducing step,
detection of the di erent kinds of named entities is
performed simultaneously in an interwoven process, i.e.,
decisions whether proper names represent persons or
organizations are made after a number of shared intermediate
steps. For instance, for person detection, information on
rst names and titles obtained from the gazetteer
annotations are combined with information on initials, rst names,
surnames, and endings detected from orthographic
characteristics (e.g., capitalization) and PoS tags. Finally, persons'
surnames are removed if they contain certain stopwords or
can be attributed to an organization. Details about this
process can be found in Appendix F of the GATE User Guide6.
      </p>
      <p>The transducing step is also where we add additional
rulepatterns designed to detect band members, releases, and
artist names as described in the following section.
3.1</p>
    </sec>
    <sec id="sec-5">
      <title>Rule-Pattern Approach</title>
      <p>
        The rst approach to extract music-related entities
consists of generating speci c rules that operate on the
annotations obtained in the pre-processing steps. This requires
the labor-intense task of manually detecting textual patterns
that indicate certain entities in exemplary documents and
writing (generalized) rules suited to capture other entities
of the same concept also in new documents. For this
purpose, for a set of 83 artists/bands, related Web pages such as
band pro les and biographies from Last.fm, Wikipedia, and
allmusic7 are examined. Based on the made observations,
rules that consider orthographic features, punctuation,
surrounding entities (such as those identi ed via the gazetteer
lists), and surrounding keywords are designed. The rules
are formalized as so-called JAPE grammars8 that are used
in the transducer step of GATE. The complete set of JAPE
6http://gate.ac.uk/userguide/
7http://www.allmusic.com
8Acronym for Java Annotation Patterns Engine
grammars for music-speci c entity recognition can be found
in Appendix B of [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] and can also be obtained by contacting
the authors. In the following, we show one exemplary (and
easily accessible) rule for each concept to demonstrate idea
and structure behind the rule-patterns for band member,
media, and artist name extraction, respectively.
      </p>
      <p>For the purpose of band member extraction, a JAPE
grammar rule that aims at nding band members by searching
for information about members leaving or joining the band
is given as:
Rule : leftJoinedBand (
( ( MemberName ) ) : BandMember
({Token.string == "had"} | {Token.string == "has"})?
({Token.string == "left"} |
{Token.string == "joined"} |
{Token.string == "rejoined"} |
{Token.string == "replaced"})
)--&gt; :BandMember.Member =</p>
      <p>{kind = "BandMember", rule = "leftJoinedBand"}
To extract record releases, the following rule matches
patterns that start with the potential media name (optionally
in quotation marks) and point to production, release,
performance, or similar events in the past or future:
Rule : MediaPassivReleased (({Token.string == "\""})?
( ( Medium ) ):Media
({Token.string == "\""})?
({Token.string == "was"} |
({Token.string == "will"} {Token.string == "be"}))
({Token.string == "released"} |
{Token.string == "issued"} |
{Token.string == "produced"} |
{Token.string == "recorded"} |
{Token.string == "played"} |
{Token.string == "performed"} ))--&gt; :Media.Media =
{kind = "Media", rule = "MediaPassivReleased"}
To identify occurrences of band names, the following rule
focuses on the entity occurring before terms such as was
founded or were supported :
Rule : Formed (
( ( BandN ) ) : BandName({Token.string == "was"} |
{Token.string == "were"})
({Token.string == "formed"} |
{Token.string == "supported"} |
{Token.string == "founded"}))--&gt; :BandName.bandname =
{kind = "Band", rule = "Formed"}</p>
      <p>Elaborating such rules is a tedious task and (especially
in heterogeneous data environments such as the Web)
unlikely to generalize well and cover all cases. Therefore, in
the next section we describe a supervised learning approach
that makes use of automatically labeled data.
3.2</p>
    </sec>
    <sec id="sec-6">
      <title>Supervised Learning Approach</title>
      <p>
        Instead of manually examining unstructured text for
occurrences of musical entities and potential patterns to
identify them, the idea of this approach is to apply a supervised
learning algorithm to a set of pre-annotated examples.
Using the learned model, relevant information should then be
found also in new documents. Several approaches, more
precisely several types of machine learning algorithms, have
been proposed for automatic information extraction tasks,
such as hidden-markov-models [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], decision trees [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ], or
support vector machines (SVM) [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]. Since the latter
demonstrates that SVMs may yield results that rival those of
optimized rule-based approaches, SVMs are chosen as classi er
for the tasks at hand (for more details see [
        <xref ref-type="bibr" rid="ref12 ref13">12, 13</xref>
        ])
For training of the SVMs, a set of documents that contain
annotations of the entities of interest is required. Since also
this step can be labor intense, we opted for an automatic
annotation approach. For the collection of training
documents, ground truth information (on band member history
and band discography) is obtained by either manually
compiling lists or by invoking Web services such as MusicBrainz
or Freebase. Using this information, occurrences of the band
name, its members (full name as well as last name only), and
releases are annotated using regular expressions.
      </p>
      <p>
        Construction of the features and SVM training is carried
out as described by Li et al. [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]. First, for each token, a
feature vector representation has to be obtained. In the given
scenario, for each token, its content (i.e., the actual string),
orthographic properties, PoS information, gazetteer-based
entity information, and identi ed person entities are
considered. In a second scenario, in addition to these, also the
output of the rule-based approach (more precisely, the name
of the rule responsible for prediction of an entity) serves as
an input feature. Ideally, this incorporates indicators of high
relevance and allows for supervised selection of the manually
generated rules for the nal predictions. For each prediction
task, the corresponding annotation type is also added to the
features as target class.
      </p>
      <p>
        To construct the feature vectors, the training corpus is
scanned for all occurring values of any of the considered
attributes (i.e., annotations). Then, each token is represented
by a vector where each distinct annotation value corresponds
to one dimension which is set to 1 if the token is annotated
with the corresponding value. In addition, the context of
each token (consisting of a window that includes the 5
preceding and the 5 subsequent tokens) is incorporated. This
is achieved by creating an SVM input vector for each token
that is a concatenation of the feature vectors of all tokens in
the context window. To re ect the distance of the
surrounding tokens to the actual token (i.e., the center of the
window), a reciprocal weighting is applied, meaning that \the
nonzero components of the feature vector corresponding to
the jth right or left neighboring word are set to be equal to
1=j in the combined input vector." [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]. In our experiments,
this typically results in feature vectors with approximately
1.5 million dimensions.
      </p>
      <p>
        In the SVM learning phase, the input vectors
corresponding to every single token in all training documents serve as
examples. According to the central idea of [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ], two distinct
SVM classi ers are trained for each concept of interest. The
rst classi er is trained to predict the beginning of an
entity (i.e., to classify whether a token is the rst token of an
entity), the second to predict the end (i.e., whether a token
is the last token of an entity). To deal with the unbalanced
distribution of positive and negative training examples, a
special form of SVMs is used, namely an SVM with uneven
margins [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ]. From the obtained predictions of start and end
positions, actual entities, as well as corresponding con dence
scores, are determined in a post-processing step. First, start
tokens without matching end token, as well as end tokens
without matching start token are removed. Second,
entities with a length (in terms of the number of tokens) that
does not match any training example's length are discarded.
Third, a con dence score is calculated based on a
probabilistic interpretation of the SVM output for all possible classes.
More precisely, for each entity, the conjunction of the
Sigmoid transformed SVM output probabilities of start and end
token is calculated for each possible output class. Finally,
the class (label) with the highest probability is predicted for
the entity if its probability is greater than 0.25. The
probability of the predicted class serves as a con dence score.
3.3
      </p>
    </sec>
    <sec id="sec-7">
      <title>Entity Consolidation and Prediction</title>
      <p>
        From the extraction step (either rule- or learning-based),
for each processed text and each concept of interest, a list
of potential entities is obtained. For each band, the lists
from all texts associated with the band are joined and the
occurrences of each entity as well as the number of texts
an entity occurs in are counted (term and document
frequency, respectively). The joined list usually contains a lot
of noise and redundant data, calling for a ltering and
merging step. First, all entities extracted by the learning-based
method that have a con dence score below 0:5 are removed
since they are more likely to not represent band members
than representing band members according to the classi
cation step. On the cleaned list, the same observations as
described in [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ] can be made. For instance, on the list
of extracted band members, some members are referenced
with di erent spellings (Paavo Lotjonen vs. Paavo
Lotjonen), with abbreviated rst names (Phil Anselmo vs. Philip
Anselmo), with nicknames (Darrell Lance Abbott vs.
Dimebag Darrell or just Dimebag), or only by their last name
(Iommi ). On the discography lists, release names are
often followed by additional information such as release year
or type of release. This is dealt with by introducing an
approximate string matching function, namely the level-two
Jaro-Winkler similarity, cf. [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ].9 For both entity types, this
type of similarity function is suited well as it assigns higher
matching scores to pairs of strings that start with the same
sequence of characters. In the level-two variant, the two
entities to compare are split into substrings and similarity is
calculated as an aggregated similarity of pairwise
comparison of the substrings. To reduce redundancies, two entities
are considered synonymous and thus merged if their
leveltwo Jaro-Winkler similarity is above 0:9. In addition, to
deal with the occurrence of last names, an entity consisting
of one token is considered a synonym of another entity if it
matches the other entity's last token.
      </p>
      <p>This consolidated list is usually still noisy, calling for
additional ltering steps. To this end, two threshold
parameters are introduced. The rst threshold, tf 2 N0,
determines the minimum number of occurrences of an entity (or
its synonyms) in the band's set to get predicted. The
second threshold, tdf 2 [0:::1] controls the lower bound of the
fraction of texts/documents associated with the band an
entity has to occur in (document frequency in relation to the
total number of documents per band). The impact of these
two parameters is systematically evaluated in the following
section.
4.</p>
    </sec>
    <sec id="sec-8">
      <title>EVALUATION</title>
      <p>To assess the potential of the proposed approaches and
to measure the impact of the parameters, systematic
experiments are conducted. This section details the used test
collections as well as the applied evaluation measures and
reports on the results of the experiments.
9For calculation, the open-source Java toolkit SecondString
(http://secondstring.sourceforge.net) is utilized.</p>
      <p>
        For evaluation, two collections with di erent
characteristics are used { the rst a previously published collection used
in [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ], the second a larger scale test collection consisting of
band biographies.
4.1.1
      </p>
      <sec id="sec-8-1">
        <title>Metal Page Sets</title>
        <p>
          The rst collection is a set of Web pages introduced in [
          <xref ref-type="bibr" rid="ref19">19</xref>
          ].
This set consist of Google's 100 top-ranked Web pages
retrieved using the query \band name"music members (cf.
Section 2) for 51 Rock and Metal bands (resulting in a total of
5,028 Web pages). In [
          <xref ref-type="bibr" rid="ref19">19</xref>
          ], this query setting yielded best
results and is therefore chosen as reference for the task of
bandmember extraction. As ground truth, the
membership-relations that include former members are chosen (i.e., the
Mf ground truth set of [
          <xref ref-type="bibr" rid="ref19">19</xref>
          ]). For this evaluation collection
also the results obtained by applying the Hearst patterns
proposed in [
          <xref ref-type="bibr" rid="ref19">19</xref>
          ] are available, allowing for a direct
comparison of the approaches' band member extraction capabilities.
        </p>
        <p>For the discography extraction evaluation, no reference
data is available in the original set. Therefore { and since the
discography of the contained bands has changed since the
creation of the set { a new Web crawl has been conducted to
retrieve recent (and more related) data. Since the aim of this
new set is to extract released media, for each of the 51 bands
in the metal set the query \band name" discography is sent
to Google and the top 100 pages are downloaded (resulting
in a total of 5,090 Web pages). To obtain a discography
ground truth, titles of albums, EPs, and singles released by
each band are downloaded from MusicBrainz.</p>
        <p>To speed up processing of the collections, all Web pages
with a le size over 100 kilobyte are discarded resulting in
set sizes of 4,561 and 4,625 documents for the member set
and the discography set, respectively. Evaluation of the
supervised learning approach is performed as a 2-fold cross
validation (by splitting the band set and separating the
associated Web pages), where in each fold a random sample
of 100 documents is drawn for training.
4.1.2</p>
      </sec>
      <sec id="sec-8-2">
        <title>Biography Set</title>
        <p>The second test collection is a larger scale collection
consisting only of band biographies to be found on the Web.
Biographies are investigated as they should contain both
information on (past) band members and information on
(important) released records.</p>
        <p>Starting from a snapshot of the MusicBrainz database
from December 2010, all artists marked as bands and all
corresponding band members as well as albums, EPs, and
singles are extracted. In addition, also band-membership
information from Freebase10 is retrieved and merged with
the MusicBrainz information to make the ground truth data
set more comprehensive. After this step, band-membership
information is available for 34,238 bands. For each band
name, the echonest API11 is invoked to obtain related
biographies. Using the echonest's Web service, related
biographies (e.g., from Wikipedia, Last.fm, allmusic, or Aol
Music12) can be conveniently retrieved in plain text format.
Since among the provided biographies for a band, duplicates
or near-duplicates, as well as only short snippets can be
ob10http://www.freebase.com
11http://developer.echonest.com
12http://music.aol.com
served, (near-)duplicates as well as biographies consisting
of less than 100 characters are ltered out. After ltering
(near-)duplicates and snippets, for 23,386 bands (68%) at
least one biography remains. In total, a set of 38,753
biographies is obtained. To keep processing times short,
furthermore all documents that contain more than 10 megabyte of
annotations after the initial processing step are ltered out.</p>
        <p>For training of the supervised learner, a random subset
of 100 biographies is chosen. All biographies by any artist
that is part of the training set are removed from the test set,
resulting in a nal test set of 37,664 biographies by 23,030
distinct bands.</p>
        <p>In comparison to the rst test sets, i.e., the Metal page
sets, the biography set contains more bands, more speci c
documents in a homogeneous format (i.e., biographies
instead of semi-structured Web pages from various sources),
but less associated documents (in average 1.63 documents
per band, as opposed to an average of 90 documents per
band for the Metal page set).
4.2</p>
      </sec>
    </sec>
    <sec id="sec-9">
      <title>Evaluation Metrics</title>
      <p>For evaluation, precision and recall are calculated
separately for each band and averaged over all bands to obtain
a nal score. The metrics are de ned as follows:
(1)
(2)
precision =
jT \P j
jP j
1
if jP j &gt; 0
otherwise
recall = jT \ P j
jT j
where P is the set of predicted entities and T the ground
truth set of the band. To assess whether an extracted entity
is correct, again the level-two Jaro-Winkler similarity (see
Section 3.3) is applied. More precisely, if the Jaro-Winkler
similarity between a predicted entity and an entity contained
in the ground truth is greater than 0:9, the prediction is
considered to be correct. Furthermore, if a predicted band
member name consist of only one token, it is considered
correct, if it matches with the last token of a member in the
ground truth. These weakened de nitions of matching allow
for tolerating small spelling variations, name abbreviations,
extracted last names, additional information of releases, as
well as string encoding di erences.</p>
      <p>
        For comparison with the Hearst pattern approach for band
member detection on the Metal page set, it has to be noted
that in [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ], calculation of precision and recall is done on
the full set of bands and members (and their corresponding
roles), yielding global precision and recall values, whereas
here, the evaluation metrics are calculated separately for
each band and are then averaged over all bands to remove
the in uence of a band's size. Using the global evaluation
scheme, e.g., orchestras are given far more importance than,
for instance, duos in the overall evaluation, although for a
duo, the individual members are generally more important
than for an orchestra. Therefore, in the following, the
different approaches are compared based on macro-averaged
evaluation metrics (calculated using the arithmetic mean of
the individual results).
4.3
      </p>
    </sec>
    <sec id="sec-10">
      <title>Evaluation Results</title>
      <p>
        In the following, the proposed rule-patterns, the SVM
approach, as well as the SVM approach that utilizes the
out0.7
0.6
n
o
i
isc0.5
e
r
P
0.4
0.9
0.8
n
o
i
isc0.7
e
r
P
0.6
0.5
Baseline
Hearst Patterns
Rule−Patterns
SVM
SVM (w/Rules)
Recall Upper Bound
put of the rule-patterns are compared for the tasks of
bandmember detection and discography extraction. For detecting
band-members, a baseline reference consisting of the person
entity prediction functionality of GATE is provided. On the
Metal page set, band-member prediction is further compared
to the Hearst pattern approach from [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ]. For the task of
discography extraction, no such reference is available. For
all evaluations, an additional upper bound for the recall is
calculated. This upper bound is implied by the underlying
documents, since band members and releases that do not
occur on any of the documents can not be predicted.
4.3.1
      </p>
      <sec id="sec-10-1">
        <title>Band-Member Detection</title>
        <p>The left part of Figure 1 shows precision-recall curves
for the di erent band member detection approaches on the
Metal page set. For a systematic comparison with the Hearst
pattern approach, the tdf , i.e., the threshold that determines
on which fraction of a band's total documents a band
member has to appear on to be predicted, is varied. It can be seen
that the rule-based approach clearly performs best. Also
SVM and SVM using the rules output outperform the Hearst
pattern approach. It becomes apparent that on the Metal
set, rule patterns, the GATE person baseline, and the
supervised approaches can yield recall values close to the upper
bound, i.e., these approaches capture nearly all members
contained in the documents at least once. For the Hearst
patterns, recall remains low. However, when comparing the
Hearst patterns, it has to be noted that this approach was
initially designed to also detect the roles of the band
members | a feature none of the other approaches is capable of.</p>
        <p>Since on the biography set only 1.63 documents per band
are available on average, variation of the tdf threshold is not
as interesting as on the Metal page set. Therefore, the right
part of Figure 1 depicts curves of the proposed approaches
with varied values of tf , i.e., the threshold that determines
how often an entity has to be detected to be predicted as
a band member. On this set, the supervised learning
approaches tend to outperform the rule-based extraction
approach slightly. However, there is basically no di erence
between the SVM approaches and the baseline with the only
exception that the SVM approaches can yield higher recall
values. Another observation is that the upper recall
boundary on the biography set is rather low at about 0.6.
4.3.2</p>
      </sec>
      <sec id="sec-10-2">
        <title>Discography Extraction</title>
        <p>For discography extraction the situation is similar as can
be seen from Figure 2. Also for this task the rule-based
approach outperforms the SVM approaches (this time also on
the biography set). Recall is also close to the upper bound
using SVMs on the Metal page set while on the biography
set, none of the approaches is capable of reaching the already
low upper recall boundary at 0.36. Conversely, on the
biography set, all proposed approaches yield rather high
precision values. However, due to the lack of a baseline reference,
it is di cult to draw nal conclusions about the quality of
these approaches for the task of discography extraction.</p>
        <p>What can be seen from both the evaluations on
discography and band-member extraction is that { despite all work
required { rule-patterns are preferable over supervised
learning methods. Another consistent nding so far is that SVMs
that utilize the output of the rule-pattern classi cation
process are superior to SVMs without this information, but still
inferior to the predictions of the rule-patterns alone.</p>
        <p>The most unexpected result can be observed for
bandmember extraction on the biography set. None of the
proposed methods outperforms the standard person detection
approach by GATE. A possible explanation could be that
the baseline itself is already high. Since biographies typically
follow a certain writing style and consist | in contrast to
arbitrary Web pages | mostly of grammatically well-formed
sentences, natural language processing techniques such as
PoS tagging perform better on this type of input. Thus, the
person detection approach just works better on the
biography data than on the Metal page set.</p>
        <p>Rule−Patterns
SVM
SVM (w/Rules)
Recall Upper Bound
Rule−Patterns
SVM
SVM (w/Rules)
Recall Upper Bound
0.8
0.7</p>
        <p>In terms of the di erent sources of data, i.e., the chosen
test collections, it can be seen that using biographies, in
general lower recall values (and higher precision values) should
be expected. This can be seen also from the upper recall
bounds that are rather low for both tasks. When using Web
documents, more information can be accessed which results
also in higher recall values. On the discography Metal set,
a recall of 0.7 can be observed which is already close to the
upper bound of 0.74. However, using Web documents
requires considerations which documents to examine (e.g., by
formulating an appropriate query to obtain many relevant
pages) as well as dealing with a lot of noise in the data.
4.3.3</p>
      </sec>
      <sec id="sec-10-3">
        <title>Relating Documents to Artists</title>
        <p>In addition to the two main tasks of this paper, we also
brie y investigate the applicability of the presented methods
to identify the central artist or band in a text about music,
which could be useful for future relation extraction tasks
and tools such as music-focused Web crawling and indexing.
To this end, we utilize the rule-patterns aiming at detecting
occurrences of artists and train SVMs on occurrences of the
name of the band a page belongs to. For prediction, the most
frequently extracted entity with occurrences greater than a
threshold tf is selected. As a baseline, simple prediction of
any sequence of capitalized tokens at the beginning of the
text is chosen. The results can be seen in Figure 3. For this
task, SVMs perform better than the rule-patterns. However,
rather surprisingly, the highest recall value can be observed
for the simple baseline.</p>
      </sec>
    </sec>
    <sec id="sec-11">
      <title>CONCLUSIONS AND FUTURE WORK</title>
      <p>In this paper, we presented rst steps towards semantic
Music Information Extraction. We focused on two speci c
tasks, namely determining the members of a music band and
determining the discography of an artist (also explored on
sets of bands). For both purposes, supervised learning
approaches and rule-based methods were systematically
evaluated on two di erent sets of documents. From the conducted
evaluations, it became evident that manually generated rules
0.95
0.9
yield superior results. Furthermore, it could be seen that
careful selection of the underlying data source is crucial to
achieve reliable results.</p>
      <p>In general, the results obtained show great potential for
these and also related tasks. By just focusing on biographies,
even more highly relevant meta-information on music could
be extracted. For instance, consider the following paragraph
taken from the Wikipedia page of the Alkaline Trio:
\In September 2006, Patent Pending, the debut album
by Matt Skiba's side project Heavens was released. The
band consisted of Skiba on guitar and vocals, and Josiah
Steinbrick (of hardcore punk out t F-Minus) on bass. On
the album, the duo were joined by The Mars Volta's Isaiah
\Ikey" Owens on organ and Matthew Compton on drums
and percussion."13</p>
      <p>This short paragraph contains band-membership and
lineup information for the Alkaline Trio, for the band
Heavens, for the band F-Minus, and for the band The Mars
13http://en.wikipedia.org/w/index.php?
title=Alkaline_Trio&amp;oldid=431587984
Volta. In addition, discographical information for
Heavens, genre information for F-Minus, and a nickname/alias
for Isaiah Owens can be inferred from this small piece of
text. Furthermore, relations between the mentioned bands
(\side-project") as well as the mentioned persons
(collaborations) can be discovered. Using further information
extraction methods, in future work, it should be possible to
capture at least some of this semantic information and
relations and to advance the current state-of-the-art in music
retrieval and recommendation. However, for systematic
experimentation and targeted development, the creation of a
comprehensive and thoroughly (manually) annotated text
corpus for music seems unavoidable.</p>
    </sec>
    <sec id="sec-12">
      <title>ACKNOWLEDGMENTS</title>
      <p>Thanks are due to Andreas Krenmair for conceiving the
music-related JAPE patterns and sharing his
implementation. This research is supported by the Austrian Research
Fund (FWF) under grants L511-N15 and P22856-N23.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>H.</given-names>
            <surname>Alani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Kim</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.E.</given-names>
            <surname>Millard</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.J.</given-names>
            <surname>Weal</surname>
          </string-name>
          , W. Hall,
          <string-name>
            <given-names>P.H.</given-names>
            <surname>Lewis</surname>
          </string-name>
          , and
          <string-name>
            <given-names>N.R.</given-names>
            <surname>Shadbolt</surname>
          </string-name>
          .
          <article-title>Automatic OntologyBased Knowledge Extraction from Web Documents</article-title>
          .
          <source>IEEE Intelligent Systems</source>
          ,
          <volume>18</volume>
          (
          <issue>1</issue>
          ):
          <volume>14</volume>
          {
          <fpage>21</fpage>
          ,
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>D. M.</given-names>
            <surname>Bikel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Miller</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Schwartz</surname>
          </string-name>
          , and
          <string-name>
            <given-names>R.</given-names>
            <surname>Weischedel</surname>
          </string-name>
          .
          <article-title>Nymble: a High-Performance Learning Name- nder</article-title>
          .
          <source>In Proc. 5th Conference on Applied Natural Language Processing</source>
          ,
          <year>1997</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>E.</given-names>
            <surname>Brill</surname>
          </string-name>
          .
          <article-title>A Simple Rule-Based Part of Speech Tagger</article-title>
          .
          <source>In Proc. 3rd Conference on Applied Natural Language Processing</source>
          ,
          <year>1992</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>J.</given-names>
            <surname>Callan</surname>
          </string-name>
          and
          <string-name>
            <given-names>T.</given-names>
            <surname>Mitamura</surname>
          </string-name>
          .
          <article-title>Knowledge-Based Extraction of Named Entities</article-title>
          .
          <source>In Proc. 11th International Conference on Information and Knowledge Management (CIKM)</source>
          ,
          <year>2002</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>H.</given-names>
            <surname>Cunningham</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Maynard</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Bontcheva</surname>
          </string-name>
          , and
          <string-name>
            <given-names>V.</given-names>
            <surname>Tablan</surname>
          </string-name>
          .
          <article-title>GATE: A framework and graphical development environment for robust NLP tools and applications</article-title>
          .
          <source>In Proc. 40th Anniversary Meeting of the Association for Computational Linguistics</source>
          ,
          <year>2002</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>G.</given-names>
            <surname>Geleijnse</surname>
          </string-name>
          and
          <string-name>
            <given-names>J.</given-names>
            <surname>Korst</surname>
          </string-name>
          .
          <article-title>Web-based artist categorization</article-title>
          .
          <source>In Proc. 7th International Conference on Music Information Retrieval (ISMIR)</source>
          ,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>E.</given-names>
            <surname>Gomez</surname>
          </string-name>
          and
          <string-name>
            <given-names>P.</given-names>
            <surname>Herrera</surname>
          </string-name>
          .
          <article-title>The song remains the same: Identifying versions of the same piece using tonal descriptors</article-title>
          .
          <source>In Proc. 7th International Conference on Music Information Retrieval (ISMIR)</source>
          ,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>S.</given-names>
            <surname>Govaerts</surname>
          </string-name>
          and
          <string-name>
            <given-names>E.</given-names>
            <surname>Duval</surname>
          </string-name>
          .
          <article-title>A Web-Based Approach to Determine the Origin of an Artist</article-title>
          .
          <source>In Proc. 10th International Society for Music Information Retrieval Conference (ISMIR)</source>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>M. A.</given-names>
            <surname>Hearst</surname>
          </string-name>
          .
          <article-title>Automatic acquisition of hyponyms from large text corpora</article-title>
          .
          <source>In Proc. 14th Conference on Computational Linguistics - Vol. 2</source>
          ,
          <year>1992</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>P.</given-names>
            <surname>Knees</surname>
          </string-name>
          .
          <article-title>Text-Based Description of Music for Indexing, Retrieval, and Browsing</article-title>
          .
          <source>PhD thesis</source>
          , Johannes Kepler Universitat, Linz, Austria,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>A.</given-names>
            <surname>Krenmair</surname>
          </string-name>
          .
          <article-title>Musikspezi sche Informationsextraktion aus Webdokumenten</article-title>
          . Diplomarbeit, Johannes Kepler Universitat, Linz, Austria,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Bontcheva</surname>
          </string-name>
          , and
          <string-name>
            <given-names>H.</given-names>
            <surname>Cunningham</surname>
          </string-name>
          .
          <article-title>SVM Based Learning System for Information Extraction</article-title>
          . In J. Winkler,
          <string-name>
            <given-names>M.</given-names>
            <surname>Niranjan</surname>
          </string-name>
          , and N. Lawrence, eds.,
          <source>Deterministic and Statistical Methods in Machine Learning</source>
          , vol.
          <volume>3635</volume>
          of LNCS. Springer,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Bontcheva</surname>
          </string-name>
          , and
          <string-name>
            <given-names>H.</given-names>
            <surname>Cunningham</surname>
          </string-name>
          .
          <source>Adapting SVM for Data Sparseness and Imbalance: A Case Study on Information Extraction. Natural Language Engineering</source>
          ,
          <volume>15</volume>
          (
          <issue>2</issue>
          ):
          <volume>241</volume>
          {
          <fpage>271</fpage>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Li</surname>
          </string-name>
          and
          <string-name>
            <given-names>J.</given-names>
            <surname>Shawe-Taylor</surname>
          </string-name>
          .
          <article-title>The SVM with uneven margins and Chinese document categorization</article-title>
          .
          <source>In Proc. 17th Paci c Asia Conference on Language, Information and Computation (PACLIC)</source>
          ,
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Raimond</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Abdallah</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Sandler</surname>
          </string-name>
          , and
          <string-name>
            <given-names>F.</given-names>
            <surname>Giasson</surname>
          </string-name>
          .
          <article-title>The Music Ontology</article-title>
          .
          <source>In Proc. 8th International Conference on Music Information Retrieval (ISMIR)</source>
          ,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>M.</given-names>
            <surname>Schedl</surname>
          </string-name>
          and
          <string-name>
            <given-names>P.</given-names>
            <surname>Knees</surname>
          </string-name>
          .
          <article-title>Context-based Music Similarity Estimation</article-title>
          .
          <source>In Proc. 3rd International Workshop on Learning the Semantics of Audio Signals (LSAS)</source>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>M.</given-names>
            <surname>Schedl</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Knees</surname>
          </string-name>
          , and
          <string-name>
            <given-names>G.</given-names>
            <surname>Widmer</surname>
          </string-name>
          .
          <article-title>A Web-Based Approach to Assessing Artist Similarity using Co-Occurrences</article-title>
          .
          <source>In Proc. 4th International Workshop on Content-Based Multimedia Indexing (CBMI)</source>
          ,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>M.</given-names>
            <surname>Schedl</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Schiketanz</surname>
          </string-name>
          , and
          <string-name>
            <given-names>K.</given-names>
            <surname>Seyerlehner</surname>
          </string-name>
          .
          <article-title>Country of Origin Determination via Web Mining Techniques</article-title>
          .
          <source>In Proc. IEEE International Conference on Multimedia and Expo (ICME): 2nd International Workshop on Advances in Music Information Research (AdMIRe)</source>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>M.</given-names>
            <surname>Schedl</surname>
          </string-name>
          and
          <string-name>
            <given-names>G.</given-names>
            <surname>Widmer. Automatically Detecting</surname>
          </string-name>
          <article-title>Members and Instrumentation of Music Bands via Web Content Mining</article-title>
          .
          <source>In Proc. 5th Workshop on Adaptive Multimedia Retrieval (AMR)</source>
          ,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>S.</given-names>
            <surname>Sekine</surname>
          </string-name>
          . NYU:
          <article-title>Description of the Japanese NE system used for MET-2</article-title>
          .
          <source>In Proc. 7th Message Understanding Conference (MUC-7)</source>
          ,
          <year>1998</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Shavitt</surname>
          </string-name>
          and
          <string-name>
            <given-names>U.</given-names>
            <surname>Weinsberg</surname>
          </string-name>
          .
          <article-title>Songs Clustering Using Peer-to-Peer Co-occurrences</article-title>
          .
          <source>In Proc. IEEE International Symposium on Multimedia (ISM): International Workshop on Advances in Music Information Research (AdMIRe)</source>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <given-names>M.</given-names>
            <surname>Slaney</surname>
          </string-name>
          and
          <string-name>
            <given-names>W.</given-names>
            <surname>White</surname>
          </string-name>
          .
          <article-title>Similarity Based on Rating Data</article-title>
          .
          <source>In Proc. 8th International Conference on Music Information Retrieval (ISMIR)</source>
          ,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <given-names>M.</given-names>
            <surname>Sordo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Laurier</surname>
          </string-name>
          , and
          <string-name>
            <given-names>O.</given-names>
            <surname>Celma</surname>
          </string-name>
          . Annotating Music Collections:
          <article-title>How Content-based Similarity Helps to Propagate Labels</article-title>
          .
          <source>In Proc. 8th International Conference on Music Information Retrieval (ISMIR)</source>
          ,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [24]
          <string-name>
            <given-names>D.</given-names>
            <surname>Turnbull</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Barrington</surname>
          </string-name>
          , and
          <string-name>
            <given-names>G.</given-names>
            <surname>Lanckriet</surname>
          </string-name>
          .
          <article-title>Five Approaches to Collecting Tags for Music</article-title>
          .
          <source>In Proc. 9th International Conference on Music Information Retrieval (ISMIR)</source>
          ,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          [25]
          <string-name>
            <given-names>B.</given-names>
            <surname>Whitman</surname>
          </string-name>
          and
          <string-name>
            <given-names>S.</given-names>
            <surname>Lawrence</surname>
          </string-name>
          .
          <article-title>Inferring Descriptions and Similarity for Music from Community Metadata</article-title>
          .
          <source>In Proc. International Computer Music Conference (ICMC)</source>
          ,
          <year>2002</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>