=Paper=
{{Paper
|id=None
|storemode=property
|title=Towards Semantic Music Information Extraction from the Web Using Rule Patterns and Supervised Learning
|pdfUrl=https://ceur-ws.org/Vol-793/womrad2011_paper4.pdf
|volume=Vol-793
}}
==Towards Semantic Music Information Extraction from the Web Using Rule Patterns and Supervised Learning==
Towards Semantic Music Information Extraction from the
Web Using Rule Patterns and Supervised Learning
Peter Knees and Markus Schedl
Department of Computational Perception, Johannes Kepler University, Linz, Austria
peter.knees@jku.at, markus.schedl@jku.at
ABSTRACT potential relations between two entities is large. Such re-
We present first steps towards automatic Music Information lations comprise, e.g., cover versions of songs, live versions,
Extraction, i.e., methods to automatically extract seman- re-recordings, remixes, or mash-ups. Semantic high-level
tic information and relations about musical entities from concepts such as “song X was inspired by artist A” or “band
arbitrary textual sources. The corresponding approaches al- B is the new band of artist A” are very prominent in many
low us to derive structured meta-data from unstructured or users’ conception and perception of music and should there-
semi-structured sources and can be used to build advanced fore be given attention in similarity estimation approaches.
recommendation systems and browsing interfaces. In this By focusing solely on acoustic properties, such relations are
paper, several approaches to identify and extract two spe- hard to detect (as can be seen, e.g., from research on cover
cific semantic relations from related Web documents are pre- version detection [7]).
sented and evaluated. The addressed relations are members A promising approach to deal with the limitations of signal-
of a music band (band−members) and artists’ discographies based methods is to exploit contextual information (for an
(artist − albums, EP s, singles). In addition, the proposed overview see, e.g., [16]). Recent work in music information
methods are shown to be useful to relate (Web-)documents retrieval has shown that at least some cultural aspects can
to musical artists. For all purposes, supervised learning ap- be modeled by analyzing extra-musical sources (often re-
proaches and rule-based methods are systematically evalu- ferred to as community metadata [25]). In the majority of
ated on two different sets of Web documents. work, this data — typically originating from Web sources
and user data — is used for description/tagging of mu-
sic (e.g., [10, 23, 24]) and assessment of similarity between
Categories and Subject Descriptors artists (e.g., [17, 21, 22, 25]). However, while for these tasks
J.5 [Arts and Humanities]: Music; I.2.7 [Artificial In- standard information retrieval (IR) methods that reduce the
telligence]: Natural Language Processing—Text analysis obtained information to simple representations such as the
bag-of-words model may suffice, important information on
entities like artists’ full names, band member names, album
General Terms and track titles, related artists, as well as some music spe-
Algorithms cific concepts like instrument names and musical styles may
be dismissed. Addressing this issue, essential progress to-
Keywords wards identifying relevant entities and, in particular, rela-
tions between these could be made. These kinds of informa-
Music Information Extraction, Band-Member Relationship, tion would also be highly valuable to automatically populate
Discography Extraction music-specific ontologies, such as the Music Ontology1 [15].
In this paper, we aim at developing automatic methods
1. MOTIVATION AND INTRODUCTION to discover semantic relations between musical entities by
Measuring similarity between artist, tracks or other mu- analyzing texts from the Web. More precisely, to assess the
sical entities — be it audio-based, Web-based, or a combi- feasibility of this goal, we focus on two specific sub-tasks,
nation of both — is a key concept for music retrieval and namely automatic band member detection, i.e., determining
recommendation. However, the type of relations between which persons a band consists (or consisted) of, and au-
these entities, i.e., what makes them similar, is often ne- tomatic discography extraction, i.e., recognition of released
glected. Especially in the music domain, the number of records (i.e., albums, EPs, and singles). Band member de-
tection is strongly related to one of the central tasks of infor-
mation extraction (IE) and named entity detection (NED),
i.e., the recognition of persons’ names in documents. While
person’s names typically exhibit some common patterns in
terms of orthography and number of tokens, detection of
WOMRAD 2011 2nd Workshop on Music Recommendation and Discovery, artist names and band members is a bigger challenge as they
colocated with ACM RecSys 2011 (Chicago, US) frequently comprise or consist of nicknames, pseudonyms,
Copyright c . This is an open-access article distributed under the terms
of the Creative Commons Attribution License 3.0 Unported, which permits
or just a symbol (cf. Prince for a limited time). Discog-
unrestricted use, distribution, and reproduction in any medium, provided 1
the original author and source are credited. http://www.musicontology.com
raphy detection in unstructured text is an even more chal- ontology (cf. [1]). For the music domain – despite the numer-
lenging task as song or album names (release names in the ous contributions that exploit Web-based sources to describe
following) are not bound to any conventions. That is, re- music or to derive similarity (cf. Section 1) – the number
lease names can consist of an unknown number of tokens of publications aiming at extracting factual meta-data for
(including zero tokens, cf. The Beatles’s “white album”, or musical entities by applying language processing methods is
Weezer ’s “blue”, “green”, and “red” albums, which might rather small.
lead to inconsistent references on different sources), just spe- In [19], we propose a first step to automatically extract
cial characters (e.g., Justice’s “Cross”), a differential equa- the line-up of a music band, i.e., not only the members of
tion (track 2 on Aphex Twin’s “Windowlicker” single), or a band but also their corresponding instruments and roles.
whole paragraphs (e.g., the full title of a Soulwax album As data source up to 100 Web documents for each band B,
often abbreviated as Most of the remixes consists of 552 obtained via Google queries such as “B” music, “B” music
characters). Especially the last example demonstrates some members, or “B” lineup music, are utilized. From the re-
of the challenges of a discography-targeted named entity trieved pages, n-grams (where n = {2, 3, 4}), whose tokens
recognition approach as the full album title itself exhibits consist of capitalized, non-common speech words of length
linguistic structures and even contains another band’s name greater than one are extracted. For band member and role
(Einstürzende Neubauten). Hence, general methods not tai- extraction, a Hearst pattern approach (cf. [9]) is applied to
lored to (or even aware of) music-related entities might not the extracted n-grams and their surrounding text. The seven
be able to deal with such specifics. patterns used are 1. M plays the I, 2. M who plays the I,
To investigate the potential and suitability of language- 3. R M, 4. M is the R, 5. M, the R, 6. M (I ), and 7. M
processing-based approaches for semantic music information (R), where M is the n-gram/potential band member, I an
extraction from (Web-)texts, two strategies commonly used instrument, and R a role. For I and R, roles in a “standard
in IE tasks are explored in this paper: manual tailoring rock band line-up”, i.e., singer, guitarist, bassist, drummer,
of rule patterns to extract entities of interest (the “knowl- and keyboardist, as well as synonyms of these, are consid-
edge engineer” approach) and automatic learning of patterns ered. After extraction, the document frequency of each rule
from labeled data (supervised learning). Since particularly is counted, i.e., on how many Web pages each of the above
for the latter, pre-labeled data is required — which is diffi- rules applies. Entities that occur on a percentage of band
cult to obtain for most types of semantic relations — band- B ’s Web pages that is below a given threshold are discarded.
membership and discography extraction are, from our point The remaining member-role relations are predicted for B. In
of view, good starting points as these types of information this paper, evaluation of the presented approaches is also
are also largely available in a structured format (e.g., via carried out on the best-performing document set from [19]
Web services such as MusicBrainz2 ). In addition, the meth- and compared against the Hearst pattern approach.
ods presented are also applied to relate documents to musical In [18], we investigate several approaches to determine
artists, which is useful for further tasks such as automatic the country of origin for a given artist, including an ap-
music-focused crawling and indexing of the Web. In the proach that performs keyword spotting for terms such as
bigger picture, these are supposed to be but the first steps “born” or “founded” in the context of countries’ names on
towards a collection of methods to identify high-level musi- Web pages. Another approach for country of origin deter-
cal relations between pieces, like cover versions, variations, mination is presented in [8]. Govaerts and Duval use selected
remasterings, live interpretations, medleys, remixes, sam- Web sites and services, such as Freebase3 , Wikipedia4 , and
ples, etc. As some of these concepts are (partly) deducible Last.fm5 . Govaerts and Duval propose three heuristics to
from the audio signal itself, well considered methods for com- determine the artist’s country of origin using the occurrences
bining information from the audio with (Web-based) meta- of country names in biographies (highest overall occurrence,
information are required to automatically discover such re- strongly favoring early occurrences, weakly favoring early
lations. occurrences). In [6], Geleijnse and Korst apply patterns like
G bands such as A, for example A1 and A2 , or M mood
by A (where G represents a genre, A an artist name, and
2. RELATED WORK M a possible mood) to unveil genre-artist, artist-artist, and
The two music information extraction tasks addressed in mood-artist relations, respectively.
this paper, i.e., band member and discography extraction, While these music-specific information extraction meth-
are specific cases of relation extraction. Since in the sce- ods mainly build upon few simple patterns or term frequency
narios considered in this paper, one of the relational con- statistics, the work presented in this paper aims at incorpo-
cepts is considered to be known (i.e., the band a text deals rating more general methods that take advantage of linguis-
with), semantic relation extraction is reduced to named en- tic features of the underlying texts and automatically learn
tity recognition and extraction tasks (i.e., extraction of band models to derive musical entities annotated examples.
members and released records). Named entity recognition
itself is a well-researched topic (for an overview see, e.g., [4])
and comprises the identification of proper names in struc- 3. METHODOLOGY
tured or unstructured text as well as the classification of The methods presented in this paper make use of the lin-
these names by means of rule-based or supervised learning guistic properties of texts related to music bands. To as-
approaches. While rule-based methods rely on experts that sess this information, for both approaches investigated (rule-
uncover patterns for the specific task and domain, super- based and supervised-learning-based), several pre-processing
vised learning approaches require large amounts of labeled 3
training data (which could, for instance, also stem from an http://www.freebase.com
4
http://www.wikipedia.org
2 5
http://musicbrainz.org/ http://last.fm
steps are required to obtain these linguistic features. Apart grammars for music-specific entity recognition can be found
from initial preparation steps such as markup removal (if in Appendix B of [11] and can also be obtained by contacting
necessary), text tokenization (i.e., splitting the text into the authors. In the following, we show one exemplary (and
single tokens based on white spaces) and sentence splitting easily accessible) rule for each concept to demonstrate idea
(based on punctuation), this comprises the following steps: and structure behind the rule-patterns for band member,
media, and artist name extraction, respectively.
1. Part-of-Speech Tagging (PoS): assigns PoS tags For the purpose of band member extraction, a JAPE gram-
to tokens, i.e., annotates each token with its linguistic mar rule that aims at finding band members by searching
category (noun, verb, preposition, etc.), cf. [3]. for information about members leaving or joining the band
is given as:
2. Gazetteer Annotation: annotates occurrences of
pre-defined keywords known to represent a specific con- Rule : leftJoinedBand (
cept, e.g., company names or persons’ (first) names. ( ( MemberName ) ) : BandMember
({Token.string == "had"} | {Token.string == "has"})?
These annotations can be used as look-up information ({Token.string == "left"} |
for subsequent steps (see below). For the music do- {Token.string == "joined"} |
main, in this step, we also include lists of musical gen- {Token.string == "rejoined"} |
res, instruments, and band roles, as well as a list of {Token.string == "replaced"})
country names, cf. [11]. )--> :BandMember.Member =
{kind = "BandMember", rule = "leftJoinedBand"}
3. Transducing Step: identifies named entities such as To extract record releases, the following rule matches pat-
persons, companies, locations, or dates using manu- terns that start with the potential media name (optionally
ally generated grammar rules. These rules can include in quotation marks) and point to production, release, per-
lexical expressions, PoS information, look-up entities formance, or similar events in the past or future:
extracted via the gazetteer, or any other type of avail-
able annotation. Rule : MediaPassivReleased (({Token.string == "\""})?
( ( Medium ) ):Media
For all of these steps the functionalities included in the ({Token.string == "\""})?
({Token.string == "was"} |
GATE software package (General Architecture for Text En- ({Token.string == "will"} {Token.string == "be"}))
gineering [5]) are utilized. In GATE’s transducing step, ({Token.string == "released"} |
detection of the different kinds of named entities is per- {Token.string == "issued"} |
formed simultaneously in an interwoven process, i.e., de- {Token.string == "produced"} |
cisions whether proper names represent persons or orga- {Token.string == "recorded"} |
nizations are made after a number of shared intermediate {Token.string == "played"} |
{Token.string == "performed"} ))--> :Media.Media =
steps. For instance, for person detection, information on {kind = "Media", rule = "MediaPassivReleased"}
first names and titles obtained from the gazetteer annota-
tions are combined with information on initials, first names, To identify occurrences of band names, the following rule
surnames, and endings detected from orthographic charac- focuses on the entity occurring before terms such as was
teristics (e.g., capitalization) and PoS tags. Finally, persons’ founded or were supported :
surnames are removed if they contain certain stopwords or Rule : Formed (
can be attributed to an organization. Details about this pro- ( ( BandN ) ) : BandName({Token.string == "was"} |
cess can be found in Appendix F of the GATE User Guide6 . {Token.string == "were"})
The transducing step is also where we add additional rule- ({Token.string == "formed"} |
{Token.string == "supported"} |
patterns designed to detect band members, releases, and {Token.string == "founded"}))--> :BandName.bandname =
artist names as described in the following section. {kind = "Band", rule = "Formed"}
3.1 Rule-Pattern Approach Elaborating such rules is a tedious task and (especially
The first approach to extract music-related entities con- in heterogeneous data environments such as the Web) un-
sists of generating specific rules that operate on the anno- likely to generalize well and cover all cases. Therefore, in
tations obtained in the pre-processing steps. This requires the next section we describe a supervised learning approach
the labor-intense task of manually detecting textual patterns that makes use of automatically labeled data.
that indicate certain entities in exemplary documents and 3.2 Supervised Learning Approach
writing (generalized) rules suited to capture other entities
Instead of manually examining unstructured text for oc-
of the same concept also in new documents. For this pur-
currences of musical entities and potential patterns to iden-
pose, for a set of 83 artists/bands, related Web pages such as
tify them, the idea of this approach is to apply a supervised
band profiles and biographies from Last.fm, Wikipedia, and
learning algorithm to a set of pre-annotated examples. Us-
allmusic7 are examined. Based on the made observations,
ing the learned model, relevant information should then be
rules that consider orthographic features, punctuation, sur-
found also in new documents. Several approaches, more
rounding entities (such as those identified via the gazetteer
precisely several types of machine learning algorithms, have
lists), and surrounding keywords are designed. The rules
been proposed for automatic information extraction tasks,
are formalized as so-called JAPE grammars 8 that are used
such as hidden-markov-models [2], decision trees [20], or sup-
in the transducer step of GATE. The complete set of JAPE
port vector machines (SVM) [12]. Since the latter demon-
6
http://gate.ac.uk/userguide/ strates that SVMs may yield results that rival those of opti-
7
http://www.allmusic.com mized rule-based approaches, SVMs are chosen as classifier
8
Acronym for Java Annotation Patterns Engine for the tasks at hand (for more details see [12, 13])
For training of the SVMs, a set of documents that contain token is calculated for each possible output class. Finally,
annotations of the entities of interest is required. Since also the class (label) with the highest probability is predicted for
this step can be labor intense, we opted for an automatic the entity if its probability is greater than 0.25. The proba-
annotation approach. For the collection of training docu- bility of the predicted class serves as a confidence score.
ments, ground truth information (on band member history
and band discography) is obtained by either manually com- 3.3 Entity Consolidation and Prediction
piling lists or by invoking Web services such as MusicBrainz From the extraction step (either rule- or learning-based),
or Freebase. Using this information, occurrences of the band for each processed text and each concept of interest, a list
name, its members (full name as well as last name only), and of potential entities is obtained. For each band, the lists
releases are annotated using regular expressions. from all texts associated with the band are joined and the
Construction of the features and SVM training is carried occurrences of each entity as well as the number of texts
out as described by Li et al. [12]. First, for each token, a fea- an entity occurs in are counted (term and document fre-
ture vector representation has to be obtained. In the given quency, respectively). The joined list usually contains a lot
scenario, for each token, its content (i.e., the actual string), of noise and redundant data, calling for a filtering and merg-
orthographic properties, PoS information, gazetteer-based ing step. First, all entities extracted by the learning-based
entity information, and identified person entities are con- method that have a confidence score below 0.5 are removed
sidered. In a second scenario, in addition to these, also the since they are more likely to not represent band members
output of the rule-based approach (more precisely, the name than representing band members according to the classifi-
of the rule responsible for prediction of an entity) serves as cation step. On the cleaned list, the same observations as
an input feature. Ideally, this incorporates indicators of high described in [19] can be made. For instance, on the list
relevance and allows for supervised selection of the manually of extracted band members, some members are referenced
generated rules for the final predictions. For each prediction with different spellings (Paavo Lötjönen vs. Paavo Lotjo-
task, the corresponding annotation type is also added to the nen), with abbreviated first names (Phil Anselmo vs. Philip
features as target class. Anselmo), with nicknames (Darrell Lance Abbott vs. Dime-
To construct the feature vectors, the training corpus is bag Darrell or just Dimebag), or only by their last name
scanned for all occurring values of any of the considered at- (Iommi). On the discography lists, release names are of-
tributes (i.e., annotations). Then, each token is represented ten followed by additional information such as release year
by a vector where each distinct annotation value corresponds or type of release. This is dealt with by introducing an
to one dimension which is set to 1 if the token is annotated approximate string matching function, namely the level-two
with the corresponding value. In addition, the context of Jaro-Winkler similarity, cf. [19].9 For both entity types, this
each token (consisting of a window that includes the 5 pre- type of similarity function is suited well as it assigns higher
ceding and the 5 subsequent tokens) is incorporated. This matching scores to pairs of strings that start with the same
is achieved by creating an SVM input vector for each token sequence of characters. In the level-two variant, the two en-
that is a concatenation of the feature vectors of all tokens in tities to compare are split into substrings and similarity is
the context window. To reflect the distance of the surround- calculated as an aggregated similarity of pairwise compari-
ing tokens to the actual token (i.e., the center of the win- son of the substrings. To reduce redundancies, two entities
dow), a reciprocal weighting is applied, meaning that “the are considered synonymous and thus merged if their level-
nonzero components of the feature vector corresponding to two Jaro-Winkler similarity is above 0.9. In addition, to
the j th right or left neighboring word are set to be equal to deal with the occurrence of last names, an entity consisting
1/j in the combined input vector.” [12]. In our experiments, of one token is considered a synonym of another entity if it
this typically results in feature vectors with approximately matches the other entity’s last token.
1.5 million dimensions. This consolidated list is usually still noisy, calling for ad-
In the SVM learning phase, the input vectors correspond- ditional filtering steps. To this end, two threshold param-
ing to every single token in all training documents serve as eters are introduced. The first threshold, tf ∈ N0 , deter-
examples. According to the central idea of [12], two distinct mines the minimum number of occurrences of an entity (or
SVM classifiers are trained for each concept of interest. The its synonyms) in the band’s set to get predicted. The sec-
first classifier is trained to predict the beginning of an en- ond threshold, tdf ∈ [0...1] controls the lower bound of the
tity (i.e., to classify whether a token is the first token of an fraction of texts/documents associated with the band an en-
entity), the second to predict the end (i.e., whether a token tity has to occur in (document frequency in relation to the
is the last token of an entity). To deal with the unbalanced total number of documents per band). The impact of these
distribution of positive and negative training examples, a two parameters is systematically evaluated in the following
special form of SVMs is used, namely an SVM with uneven section.
margins [14]. From the obtained predictions of start and end
positions, actual entities, as well as corresponding confidence
scores, are determined in a post-processing step. First, start 4. EVALUATION
tokens without matching end token, as well as end tokens
To assess the potential of the proposed approaches and
without matching start token are removed. Second, enti-
to measure the impact of the parameters, systematic ex-
ties with a length (in terms of the number of tokens) that
periments are conducted. This section details the used test
does not match any training example’s length are discarded.
collections as well as the applied evaluation measures and
Third, a confidence score is calculated based on a probabilis-
reports on the results of the experiments.
tic interpretation of the SVM output for all possible classes.
More precisely, for each entity, the conjunction of the Sig- 9
moid transformed SVM output probabilities of start and end For calculation, the open-source Java toolkit SecondString
(http://secondstring.sourceforge.net) is utilized.
4.1 Test Collections served, (near-)duplicates as well as biographies consisting
For evaluation, two collections with different characteris- of less than 100 characters are filtered out. After filtering
tics are used – the first a previously published collection used (near-)duplicates and snippets, for 23,386 bands (68%) at
in [19], the second a larger scale test collection consisting of least one biography remains. In total, a set of 38,753 biogra-
band biographies. phies is obtained. To keep processing times short, further-
more all documents that contain more than 10 megabyte of
4.1.1 Metal Page Sets annotations after the initial processing step are filtered out.
The first collection is a set of Web pages introduced in [19]. For training of the supervised learner, a random subset
This set consist of Google’s 100 top-ranked Web pages re- of 100 biographies is chosen. All biographies by any artist
trieved using the query “band name”music members (cf. Sec- that is part of the training set are removed from the test set,
tion 2) for 51 Rock and Metal bands (resulting in a total of resulting in a final test set of 37,664 biographies by 23,030
5,028 Web pages). In [19], this query setting yielded best re- distinct bands.
sults and is therefore chosen as reference for the task of band- In comparison to the first test sets, i.e., the Metal page
member extraction. As ground truth, the membership-re- sets, the biography set contains more bands, more specific
lations that include former members are chosen (i.e., the documents in a homogeneous format (i.e., biographies in-
Mf ground truth set of [19]). For this evaluation collection stead of semi-structured Web pages from various sources),
also the results obtained by applying the Hearst patterns but less associated documents (in average 1.63 documents
proposed in [19] are available, allowing for a direct compari- per band, as opposed to an average of 90 documents per
son of the approaches’ band member extraction capabilities. band for the Metal page set).
For the discography extraction evaluation, no reference 4.2 Evaluation Metrics
data is available in the original set. Therefore – and since the
discography of the contained bands has changed since the For evaluation, precision and recall are calculated sepa-
creation of the set – a new Web crawl has been conducted to rately for each band and averaged over all bands to obtain
retrieve recent (and more related) data. Since the aim of this a final score. The metrics are defined as follows:
new set is to extract released media, for each of the 51 bands |T ∩P |
in the metal set the query “band name” discography is sent |P |
if |P | > 0
to Google and the top 100 pages are downloaded (resulting precision = (1)
1 otherwise
in a total of 5,090 Web pages). To obtain a discography
ground truth, titles of albums, EPs, and singles released by |T ∩ P |
each band are downloaded from MusicBrainz. recall = (2)
|T |
To speed up processing of the collections, all Web pages
with a file size over 100 kilobyte are discarded resulting in where P is the set of predicted entities and T the ground
set sizes of 4,561 and 4,625 documents for the member set truth set of the band. To assess whether an extracted entity
and the discography set, respectively. Evaluation of the su- is correct, again the level-two Jaro-Winkler similarity (see
pervised learning approach is performed as a 2-fold cross Section 3.3) is applied. More precisely, if the Jaro-Winkler
validation (by splitting the band set and separating the as- similarity between a predicted entity and an entity contained
sociated Web pages), where in each fold a random sample in the ground truth is greater than 0.9, the prediction is
of 100 documents is drawn for training. considered to be correct. Furthermore, if a predicted band
member name consist of only one token, it is considered
4.1.2 Biography Set correct, if it matches with the last token of a member in the
The second test collection is a larger scale collection con- ground truth. These weakened definitions of matching allow
sisting only of band biographies to be found on the Web. for tolerating small spelling variations, name abbreviations,
Biographies are investigated as they should contain both extracted last names, additional information of releases, as
information on (past) band members and information on well as string encoding differences.
(important) released records. For comparison with the Hearst pattern approach for band
Starting from a snapshot of the MusicBrainz database member detection on the Metal page set, it has to be noted
from December 2010, all artists marked as bands and all that in [19], calculation of precision and recall is done on
corresponding band members as well as albums, EPs, and the full set of bands and members (and their corresponding
singles are extracted. In addition, also band-membership roles), yielding global precision and recall values, whereas
information from Freebase10 is retrieved and merged with here, the evaluation metrics are calculated separately for
the MusicBrainz information to make the ground truth data each band and are then averaged over all bands to remove
set more comprehensive. After this step, band-membership the influence of a band’s size. Using the global evaluation
information is available for 34,238 bands. For each band scheme, e.g., orchestras are given far more importance than,
name, the echonest API11 is invoked to obtain related bi- for instance, duos in the overall evaluation, although for a
ographies. Using the echonest’s Web service, related bi- duo, the individual members are generally more important
ographies (e.g., from Wikipedia, Last.fm, allmusic, or Aol than for an orchestra. Therefore, in the following, the dif-
Music12 ) can be conveniently retrieved in plain text format. ferent approaches are compared based on macro-averaged
Since among the provided biographies for a band, duplicates evaluation metrics (calculated using the arithmetic mean of
or near-duplicates, as well as only short snippets can be ob- the individual results).
10
http://www.freebase.com 4.3 Evaluation Results
11
http://developer.echonest.com In the following, the proposed rule-patterns, the SVM ap-
12
http://music.aol.com proach, as well as the SVM approach that utilizes the out-
Metal Set "music members" for t in [0...0.6], t =0 Biographies retrieved via echonest for t in [0...9]
df f f
1 1
Baseline Baseline
Hearst Patterns Rule−Patterns
0.9 Rule−Patterns SVM
SVM SVM (w/Rules)
SVM (w/Rules) 0.9 Recall Upper Bound
0.8 Recall Upper Bound
0.7
0.8
0.6
Precision
Precision
0.5 0.7
0.4
0.6
0.3
0.2
0.5
0.1
0 0.4
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 0.55 0.6 0.65
Recall Recall
Figure 1: Precision-recall plots for band-member prediction on the Metal page set (left) and on the biogra-
phy set (right). Curves are obtained by systematically varying threshold parameters (tdf and tf for Metal
page set and biography set, respectively). Precision and recall values macro-averaged over all bands in the
corresponding test set.
put of the rule-patterns are compared for the tasks of band- proaches tend to outperform the rule-based extraction ap-
member detection and discography extraction. For detecting proach slightly. However, there is basically no difference be-
band-members, a baseline reference consisting of the person tween the SVM approaches and the baseline with the only
entity prediction functionality of GATE is provided. On the exception that the SVM approaches can yield higher recall
Metal page set, band-member prediction is further compared values. Another observation is that the upper recall bound-
to the Hearst pattern approach from [19]. For the task of ary on the biography set is rather low at about 0.6.
discography extraction, no such reference is available. For
all evaluations, an additional upper bound for the recall is
4.3.2 Discography Extraction
calculated. This upper bound is implied by the underlying
documents, since band members and releases that do not For discography extraction the situation is similar as can
occur on any of the documents can not be predicted. be seen from Figure 2. Also for this task the rule-based ap-
proach outperforms the SVM approaches (this time also on
the biography set). Recall is also close to the upper bound
4.3.1 Band-Member Detection using SVMs on the Metal page set while on the biography
The left part of Figure 1 shows precision-recall curves set, none of the approaches is capable of reaching the already
for the different band member detection approaches on the low upper recall boundary at 0.36. Conversely, on the biog-
Metal page set. For a systematic comparison with the Hearst raphy set, all proposed approaches yield rather high preci-
pattern approach, the tdf , i.e., the threshold that determines sion values. However, due to the lack of a baseline reference,
on which fraction of a band’s total documents a band mem- it is difficult to draw final conclusions about the quality of
ber has to appear on to be predicted, is varied. It can be seen these approaches for the task of discography extraction.
that the rule-based approach clearly performs best. Also What can be seen from both the evaluations on discogra-
SVM and SVM using the rules output outperform the Hearst phy and band-member extraction is that – despite all work
pattern approach. It becomes apparent that on the Metal required – rule-patterns are preferable over supervised learn-
set, rule patterns, the GATE person baseline, and the super- ing methods. Another consistent finding so far is that SVMs
vised approaches can yield recall values close to the upper that utilize the output of the rule-pattern classification pro-
bound, i.e., these approaches capture nearly all members cess are superior to SVMs without this information, but still
contained in the documents at least once. For the Hearst inferior to the predictions of the rule-patterns alone.
patterns, recall remains low. However, when comparing the The most unexpected result can be observed for band-
Hearst patterns, it has to be noted that this approach was member extraction on the biography set. None of the pro-
initially designed to also detect the roles of the band mem- posed methods outperforms the standard person detection
bers — a feature none of the other approaches is capable of. approach by GATE. A possible explanation could be that
Since on the biography set only 1.63 documents per band the baseline itself is already high. Since biographies typically
are available on average, variation of the tdf threshold is not follow a certain writing style and consist — in contrast to ar-
as interesting as on the Metal page set. Therefore, the right bitrary Web pages — mostly of grammatically well-formed
part of Figure 1 depicts curves of the proposed approaches sentences, natural language processing techniques such as
with varied values of tf , i.e., the threshold that determines PoS tagging perform better on this type of input. Thus, the
how often an entity has to be detected to be predicted as person detection approach just works better on the biogra-
a band member. On this set, the supervised learning ap- phy data than on the Metal page set.
Metal Set "discography" for tdf in [0...0.6], tf=0 Biographies retrieved via echonest for t in [0...5]
f
1 1
Rule−Patterns Rule−Patterns
SVM SVM
SVM (w/Rules) SVM (w/Rules)
0.9 0.95
Recall Upper Bound Recall Upper Bound
0.8 0.9
0.7 0.85
Precision
Precision
0.6 0.8
0.5 0.75
0.4 0.7
0.3 0.65
0.2 0.6
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4
Recall Recall
Figure 2: Precision-recall plots for discography detection on the Metal page set (left) and on the biography
set (right). Settings as in Figure 1.
Biographies retrieved via echonest for tf in [0...5]
In terms of the different sources of data, i.e., the chosen 1
Baseline
test collections, it can be seen that using biographies, in gen- Rule−Patterns
eral lower recall values (and higher precision values) should 0.95 SVM
SVM (w/Rules)
be expected. This can be seen also from the upper recall 0.9
Recall Upper Bound
bounds that are rather low for both tasks. When using Web
Precision
documents, more information can be accessed which results 0.85
also in higher recall values. On the discography Metal set,
0.8
a recall of 0.7 can be observed which is already close to the
upper bound of 0.74. However, using Web documents re- 0.75
quires considerations which documents to examine (e.g., by
formulating an appropriate query to obtain many relevant 0.7
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
pages) as well as dealing with a lot of noise in the data. Recall
4.3.3 Relating Documents to Artists Figure 3: Precision-recall plots for discography de-
In addition to the two main tasks of this paper, we also tection on the biography set. Curves obtained by
briefly investigate the applicability of the presented methods varying threshold parameter tf . Precision and re-
to identify the central artist or band in a text about music, call values averaged over all pages.
which could be useful for future relation extraction tasks
and tools such as music-focused Web crawling and indexing.
To this end, we utilize the rule-patterns aiming at detecting yield superior results. Furthermore, it could be seen that
occurrences of artists and train SVMs on occurrences of the careful selection of the underlying data source is crucial to
name of the band a page belongs to. For prediction, the most achieve reliable results.
frequently extracted entity with occurrences greater than a In general, the results obtained show great potential for
threshold tf is selected. As a baseline, simple prediction of these and also related tasks. By just focusing on biographies,
any sequence of capitalized tokens at the beginning of the even more highly relevant meta-information on music could
text is chosen. The results can be seen in Figure 3. For this be extracted. For instance, consider the following paragraph
task, SVMs perform better than the rule-patterns. However, taken from the Wikipedia page of the Alkaline Trio:
rather surprisingly, the highest recall value can be observed “In September 2006, Patent Pending, the debut album
for the simple baseline. by Matt Skiba’s side project Heavens was released. The
band consisted of Skiba on guitar and vocals, and Josiah
5. CONCLUSIONS AND FUTURE WORK Steinbrick (of hardcore punk outfit F-Minus) on bass. On
the album, the duo were joined by The Mars Volta’s Isaiah
In this paper, we presented first steps towards semantic “Ikey” Owens on organ and Matthew Compton on drums
Music Information Extraction. We focused on two specific and percussion.”13
tasks, namely determining the members of a music band and This short paragraph contains band-membership and line-
determining the discography of an artist (also explored on up information for the Alkaline Trio, for the band Heav-
sets of bands). For both purposes, supervised learning ap- ens, for the band F-Minus, and for the band The Mars
proaches and rule-based methods were systematically evalu-
ated on two different sets of documents. From the conducted 13
http://en.wikipedia.org/w/index.php?
evaluations, it became evident that manually generated rules title=Alkaline_Trio&oldid=431587984
Volta. In addition, discographical information for Heav- [12] Y. Li, K. Bontcheva, and H. Cunningham. SVM Based
ens, genre information for F-Minus, and a nickname/alias Learning System for Information Extraction. In
for Isaiah Owens can be inferred from this small piece of J. Winkler, M. Niranjan, and N. Lawrence, eds.,
text. Furthermore, relations between the mentioned bands Deterministic and Statistical Methods in Machine
(“side-project”) as well as the mentioned persons (collabo- Learning, vol. 3635 of LNCS. Springer, 2005.
rations) can be discovered. Using further information ex- [13] Y. Li, K. Bontcheva, and H. Cunningham. Adapting
traction methods, in future work, it should be possible to SVM for Data Sparseness and Imbalance: A Case
capture at least some of this semantic information and re- Study on Information Extraction. Natural Language
lations and to advance the current state-of-the-art in music Engineering, 15(2):241–271, 2009.
retrieval and recommendation. However, for systematic ex- [14] Y. Li and J. Shawe-Taylor. The SVM with uneven
perimentation and targeted development, the creation of a margins and Chinese document categorization. In
comprehensive and thoroughly (manually) annotated text Proc. 17th Pacific Asia Conference on Language,
corpus for music seems unavoidable. Information and Computation (PACLIC), 2003.
[15] Y. Raimond, S. Abdallah, M. Sandler, and F. Giasson.
6. ACKNOWLEDGMENTS The Music Ontology. In Proc. 8th International
Thanks are due to Andreas Krenmair for conceiving the Conference on Music Information Retrieval (ISMIR),
music-related JAPE patterns and sharing his implementa- 2007.
tion. This research is supported by the Austrian Research [16] M. Schedl and P. Knees. Context-based Music
Fund (FWF) under grants L511-N15 and P22856-N23. Similarity Estimation. In Proc. 3rd International
Workshop on Learning the Semantics of Audio Signals
(LSAS), 2009.
7. REFERENCES [17] M. Schedl, P. Knees, and G. Widmer. A Web-Based
[1] H. Alani, S. Kim, D.E. Millard, M.J. Weal, W. Hall, Approach to Assessing Artist Similarity using
P.H. Lewis, and N.R. Shadbolt. Automatic Ontology- Co-Occurrences. In Proc. 4th International Workshop
Based Knowledge Extraction from Web Documents. on Content-Based Multimedia Indexing (CBMI), 2005.
IEEE Intelligent Systems, 18(1):14–21, 2003. [18] M. Schedl, C. Schiketanz, and K. Seyerlehner.
[2] D. M. Bikel, S. Miller, R. Schwartz, and Country of Origin Determination via Web Mining
R. Weischedel. Nymble: a High-Performance Learning Techniques. In Proc. IEEE International Conference
Name-finder. In Proc. 5th Conference on Applied on Multimedia and Expo (ICME): 2nd International
Natural Language Processing, 1997. Workshop on Advances in Music Information
[3] E. Brill. A Simple Rule-Based Part of Speech Tagger. Research (AdMIRe), 2010.
In Proc. 3rd Conference on Applied Natural Language [19] M. Schedl and G. Widmer. Automatically Detecting
Processing, 1992. Members and Instrumentation of Music Bands via
[4] J. Callan and T. Mitamura. Knowledge-Based Web Content Mining. In Proc. 5th Workshop on
Extraction of Named Entities. In Proc. 11th Adaptive Multimedia Retrieval (AMR), 2007.
International Conference on Information and [20] S. Sekine. NYU: Description of the Japanese NE
Knowledge Management (CIKM), 2002. system used for MET-2. In Proc. 7th Message
[5] H. Cunningham, D. Maynard, K. Bontcheva, and Understanding Conference (MUC-7), 1998.
V. Tablan. GATE: A framework and graphical [21] Y. Shavitt and U. Weinsberg. Songs Clustering Using
development environment for robust NLP tools and Peer-to-Peer Co-occurrences. In Proc. IEEE
applications. In Proc. 40th Anniversary Meeting of the International Symposium on Multimedia (ISM):
Association for Computational Linguistics, 2002. International Workshop on Advances in Music
[6] G. Geleijnse and J. Korst. Web-based artist Information Research (AdMIRe), 2009.
categorization. In Proc. 7th International Conference [22] M. Slaney and W. White. Similarity Based on Rating
on Music Information Retrieval (ISMIR), 2006. Data. In Proc. 8th International Conference on Music
[7] E. Gómez and P. Herrera. The song remains the same: Information Retrieval (ISMIR), 2007.
Identifying versions of the same piece using tonal [23] M. Sordo, C. Laurier, and O. Celma. Annotating
descriptors. In Proc. 7th International Conference on Music Collections: How Content-based Similarity
Music Information Retrieval (ISMIR), 2006. Helps to Propagate Labels. In Proc. 8th International
[8] S. Govaerts and E. Duval. A Web-Based Approach to Conference on Music Information Retrieval (ISMIR),
Determine the Origin of an Artist. In Proc. 10th 2007.
International Society for Music Information Retrieval [24] D. Turnbull, L. Barrington, and G. Lanckriet. Five
Conference (ISMIR), 2009. Approaches to Collecting Tags for Music. In Proc. 9th
[9] M. A. Hearst. Automatic acquisition of hyponyms International Conference on Music Information
from large text corpora. In Proc. 14th Conference on Retrieval (ISMIR), 2008.
Computational Linguistics - Vol. 2, 1992. [25] B. Whitman and S. Lawrence. Inferring Descriptions
[10] P. Knees. Text-Based Description of Music for and Similarity for Music from Community Metadata.
Indexing, Retrieval, and Browsing. PhD thesis, In Proc. International Computer Music Conference
Johannes Kepler Universität, Linz, Austria, 2010. (ICMC), 2002.
[11] A. Krenmair. Musikspezifische Informationsextraktion
aus Webdokumenten. Diplomarbeit, Johannes Kepler
Universität, Linz, Austria, 2010.