=Paper= {{Paper |id=None |storemode=property |title=Towards Semantic Music Information Extraction from the Web Using Rule Patterns and Supervised Learning |pdfUrl=https://ceur-ws.org/Vol-793/womrad2011_paper4.pdf |volume=Vol-793 }} ==Towards Semantic Music Information Extraction from the Web Using Rule Patterns and Supervised Learning== https://ceur-ws.org/Vol-793/womrad2011_paper4.pdf
  Towards Semantic Music Information Extraction from the
    Web Using Rule Patterns and Supervised Learning

                                                 Peter Knees and Markus Schedl
                    Department of Computational Perception, Johannes Kepler University, Linz, Austria
                                         peter.knees@jku.at, markus.schedl@jku.at


ABSTRACT                                                                   potential relations between two entities is large. Such re-
We present first steps towards automatic Music Information                 lations comprise, e.g., cover versions of songs, live versions,
Extraction, i.e., methods to automatically extract seman-                  re-recordings, remixes, or mash-ups. Semantic high-level
tic information and relations about musical entities from                  concepts such as “song X was inspired by artist A” or “band
arbitrary textual sources. The corresponding approaches al-                B is the new band of artist A” are very prominent in many
low us to derive structured meta-data from unstructured or                 users’ conception and perception of music and should there-
semi-structured sources and can be used to build advanced                  fore be given attention in similarity estimation approaches.
recommendation systems and browsing interfaces. In this                    By focusing solely on acoustic properties, such relations are
paper, several approaches to identify and extract two spe-                 hard to detect (as can be seen, e.g., from research on cover
cific semantic relations from related Web documents are pre-               version detection [7]).
sented and evaluated. The addressed relations are members                     A promising approach to deal with the limitations of signal-
of a music band (band−members) and artists’ discographies                  based methods is to exploit contextual information (for an
(artist − albums, EP s, singles). In addition, the proposed                overview see, e.g., [16]). Recent work in music information
methods are shown to be useful to relate (Web-)documents                   retrieval has shown that at least some cultural aspects can
to musical artists. For all purposes, supervised learning ap-              be modeled by analyzing extra-musical sources (often re-
proaches and rule-based methods are systematically evalu-                  ferred to as community metadata [25]). In the majority of
ated on two different sets of Web documents.                               work, this data — typically originating from Web sources
                                                                           and user data — is used for description/tagging of mu-
                                                                           sic (e.g., [10, 23, 24]) and assessment of similarity between
Categories and Subject Descriptors                                         artists (e.g., [17, 21, 22, 25]). However, while for these tasks
J.5 [Arts and Humanities]: Music; I.2.7 [Artificial In-                    standard information retrieval (IR) methods that reduce the
telligence]: Natural Language Processing—Text analysis                     obtained information to simple representations such as the
                                                                           bag-of-words model may suffice, important information on
                                                                           entities like artists’ full names, band member names, album
General Terms                                                              and track titles, related artists, as well as some music spe-
Algorithms                                                                 cific concepts like instrument names and musical styles may
                                                                           be dismissed. Addressing this issue, essential progress to-
Keywords                                                                   wards identifying relevant entities and, in particular, rela-
                                                                           tions between these could be made. These kinds of informa-
Music Information Extraction, Band-Member Relationship,                    tion would also be highly valuable to automatically populate
Discography Extraction                                                     music-specific ontologies, such as the Music Ontology1 [15].
                                                                              In this paper, we aim at developing automatic methods
1.    MOTIVATION AND INTRODUCTION                                          to discover semantic relations between musical entities by
   Measuring similarity between artist, tracks or other mu-                analyzing texts from the Web. More precisely, to assess the
sical entities — be it audio-based, Web-based, or a combi-                 feasibility of this goal, we focus on two specific sub-tasks,
nation of both — is a key concept for music retrieval and                  namely automatic band member detection, i.e., determining
recommendation. However, the type of relations between                     which persons a band consists (or consisted) of, and au-
these entities, i.e., what makes them similar, is often ne-                tomatic discography extraction, i.e., recognition of released
glected. Especially in the music domain, the number of                     records (i.e., albums, EPs, and singles). Band member de-
                                                                           tection is strongly related to one of the central tasks of infor-
                                                                           mation extraction (IE) and named entity detection (NED),
                                                                           i.e., the recognition of persons’ names in documents. While
                                                                           person’s names typically exhibit some common patterns in
                                                                           terms of orthography and number of tokens, detection of
WOMRAD 2011 2nd Workshop on Music Recommendation and Discovery,            artist names and band members is a bigger challenge as they
colocated with ACM RecSys 2011 (Chicago, US)                               frequently comprise or consist of nicknames, pseudonyms,
Copyright c . This is an open-access article distributed under the terms
of the Creative Commons Attribution License 3.0 Unported, which permits
                                                                           or just a symbol (cf. Prince for a limited time). Discog-
unrestricted use, distribution, and reproduction in any medium, provided   1
the original author and source are credited.                                   http://www.musicontology.com
raphy detection in unstructured text is an even more chal-           ontology (cf. [1]). For the music domain – despite the numer-
lenging task as song or album names (release names in the            ous contributions that exploit Web-based sources to describe
following) are not bound to any conventions. That is, re-            music or to derive similarity (cf. Section 1) – the number
lease names can consist of an unknown number of tokens               of publications aiming at extracting factual meta-data for
(including zero tokens, cf. The Beatles’s “white album”, or          musical entities by applying language processing methods is
Weezer ’s “blue”, “green”, and “red” albums, which might             rather small.
lead to inconsistent references on different sources), just spe-        In [19], we propose a first step to automatically extract
cial characters (e.g., Justice’s “Cross”), a differential equa-      the line-up of a music band, i.e., not only the members of
tion (track 2 on Aphex Twin’s “Windowlicker” single), or             a band but also their corresponding instruments and roles.
whole paragraphs (e.g., the full title of a Soulwax album            As data source up to 100 Web documents for each band B,
often abbreviated as Most of the remixes consists of 552             obtained via Google queries such as “B” music, “B” music
characters). Especially the last example demonstrates some           members, or “B” lineup music, are utilized. From the re-
of the challenges of a discography-targeted named entity             trieved pages, n-grams (where n = {2, 3, 4}), whose tokens
recognition approach as the full album title itself exhibits         consist of capitalized, non-common speech words of length
linguistic structures and even contains another band’s name          greater than one are extracted. For band member and role
(Einstürzende Neubauten). Hence, general methods not tai-           extraction, a Hearst pattern approach (cf. [9]) is applied to
lored to (or even aware of) music-related entities might not         the extracted n-grams and their surrounding text. The seven
be able to deal with such specifics.                                 patterns used are 1. M plays the I, 2. M who plays the I,
   To investigate the potential and suitability of language-         3. R M, 4. M is the R, 5. M, the R, 6. M (I ), and 7. M
processing-based approaches for semantic music information           (R), where M is the n-gram/potential band member, I an
extraction from (Web-)texts, two strategies commonly used            instrument, and R a role. For I and R, roles in a “standard
in IE tasks are explored in this paper: manual tailoring             rock band line-up”, i.e., singer, guitarist, bassist, drummer,
of rule patterns to extract entities of interest (the “knowl-        and keyboardist, as well as synonyms of these, are consid-
edge engineer” approach) and automatic learning of patterns          ered. After extraction, the document frequency of each rule
from labeled data (supervised learning). Since particularly          is counted, i.e., on how many Web pages each of the above
for the latter, pre-labeled data is required — which is diffi-       rules applies. Entities that occur on a percentage of band
cult to obtain for most types of semantic relations — band-          B ’s Web pages that is below a given threshold are discarded.
membership and discography extraction are, from our point            The remaining member-role relations are predicted for B. In
of view, good starting points as these types of information          this paper, evaluation of the presented approaches is also
are also largely available in a structured format (e.g., via         carried out on the best-performing document set from [19]
Web services such as MusicBrainz2 ). In addition, the meth-          and compared against the Hearst pattern approach.
ods presented are also applied to relate documents to musical           In [18], we investigate several approaches to determine
artists, which is useful for further tasks such as automatic         the country of origin for a given artist, including an ap-
music-focused crawling and indexing of the Web. In the               proach that performs keyword spotting for terms such as
bigger picture, these are supposed to be but the first steps         “born” or “founded” in the context of countries’ names on
towards a collection of methods to identify high-level musi-         Web pages. Another approach for country of origin deter-
cal relations between pieces, like cover versions, variations,       mination is presented in [8]. Govaerts and Duval use selected
remasterings, live interpretations, medleys, remixes, sam-           Web sites and services, such as Freebase3 , Wikipedia4 , and
ples, etc. As some of these concepts are (partly) deducible          Last.fm5 . Govaerts and Duval propose three heuristics to
from the audio signal itself, well considered methods for com-       determine the artist’s country of origin using the occurrences
bining information from the audio with (Web-based) meta-             of country names in biographies (highest overall occurrence,
information are required to automatically discover such re-          strongly favoring early occurrences, weakly favoring early
lations.                                                             occurrences). In [6], Geleijnse and Korst apply patterns like
                                                                     G bands such as A, for example A1 and A2 , or M mood
                                                                     by A (where G represents a genre, A an artist name, and
2.     RELATED WORK                                                  M a possible mood) to unveil genre-artist, artist-artist, and
   The two music information extraction tasks addressed in           mood-artist relations, respectively.
this paper, i.e., band member and discography extraction,               While these music-specific information extraction meth-
are specific cases of relation extraction. Since in the sce-         ods mainly build upon few simple patterns or term frequency
narios considered in this paper, one of the relational con-          statistics, the work presented in this paper aims at incorpo-
cepts is considered to be known (i.e., the band a text deals         rating more general methods that take advantage of linguis-
with), semantic relation extraction is reduced to named en-          tic features of the underlying texts and automatically learn
tity recognition and extraction tasks (i.e., extraction of band      models to derive musical entities annotated examples.
members and released records). Named entity recognition
itself is a well-researched topic (for an overview see, e.g., [4])
and comprises the identification of proper names in struc-           3.   METHODOLOGY
tured or unstructured text as well as the classification of            The methods presented in this paper make use of the lin-
these names by means of rule-based or supervised learning            guistic properties of texts related to music bands. To as-
approaches. While rule-based methods rely on experts that            sess this information, for both approaches investigated (rule-
uncover patterns for the specific task and domain, super-            based and supervised-learning-based), several pre-processing
vised learning approaches require large amounts of labeled           3
training data (which could, for instance, also stem from an            http://www.freebase.com
                                                                     4
                                                                       http://www.wikipedia.org
2                                                                    5
    http://musicbrainz.org/                                            http://last.fm
steps are required to obtain these linguistic features. Apart      grammars for music-specific entity recognition can be found
from initial preparation steps such as markup removal (if          in Appendix B of [11] and can also be obtained by contacting
necessary), text tokenization (i.e., splitting the text into       the authors. In the following, we show one exemplary (and
single tokens based on white spaces) and sentence splitting        easily accessible) rule for each concept to demonstrate idea
(based on punctuation), this comprises the following steps:        and structure behind the rule-patterns for band member,
                                                                   media, and artist name extraction, respectively.
    1. Part-of-Speech Tagging (PoS): assigns PoS tags                 For the purpose of band member extraction, a JAPE gram-
       to tokens, i.e., annotates each token with its linguistic   mar rule that aims at finding band members by searching
       category (noun, verb, preposition, etc.), cf. [3].          for information about members leaving or joining the band
                                                                   is given as:
    2. Gazetteer Annotation: annotates occurrences of
       pre-defined keywords known to represent a specific con-     Rule : leftJoinedBand (
       cept, e.g., company names or persons’ (first) names.        ( ( MemberName ) ) : BandMember
                                                                   ({Token.string == "had"} | {Token.string == "has"})?
       These annotations can be used as look-up information        ({Token.string == "left"} |
       for subsequent steps (see below). For the music do-          {Token.string == "joined"} |
       main, in this step, we also include lists of musical gen-    {Token.string == "rejoined"} |
       res, instruments, and band roles, as well as a list of       {Token.string == "replaced"})
       country names, cf. [11].                                    )--> :BandMember.Member =
                                                                        {kind = "BandMember", rule = "leftJoinedBand"}
    3. Transducing Step: identifies named entities such as           To extract record releases, the following rule matches pat-
       persons, companies, locations, or dates using manu-         terns that start with the potential media name (optionally
       ally generated grammar rules. These rules can include       in quotation marks) and point to production, release, per-
       lexical expressions, PoS information, look-up entities      formance, or similar events in the past or future:
       extracted via the gazetteer, or any other type of avail-
       able annotation.                                            Rule : MediaPassivReleased (({Token.string == "\""})?
                                                                   ( ( Medium ) ):Media
   For all of these steps the functionalities included in the      ({Token.string == "\""})?
                                                                   ({Token.string == "was"} |
GATE software package (General Architecture for Text En-            ({Token.string == "will"} {Token.string == "be"}))
gineering [5]) are utilized. In GATE’s transducing step,           ({Token.string == "released"} |
detection of the different kinds of named entities is per-          {Token.string == "issued"} |
formed simultaneously in an interwoven process, i.e., de-           {Token.string == "produced"} |
cisions whether proper names represent persons or orga-             {Token.string == "recorded"} |
nizations are made after a number of shared intermediate            {Token.string == "played"} |
                                                                    {Token.string == "performed"} ))--> :Media.Media =
steps. For instance, for person detection, information on               {kind = "Media", rule = "MediaPassivReleased"}
first names and titles obtained from the gazetteer annota-
tions are combined with information on initials, first names,        To identify occurrences of band names, the following rule
surnames, and endings detected from orthographic charac-           focuses on the entity occurring before terms such as was
teristics (e.g., capitalization) and PoS tags. Finally, persons’   founded or were supported :
surnames are removed if they contain certain stopwords or          Rule : Formed (
can be attributed to an organization. Details about this pro-      ( ( BandN ) ) : BandName({Token.string == "was"} |
cess can be found in Appendix F of the GATE User Guide6 .          {Token.string == "were"})
   The transducing step is also where we add additional rule-      ({Token.string == "formed"} |
                                                                    {Token.string == "supported"} |
patterns designed to detect band members, releases, and             {Token.string == "founded"}))--> :BandName.bandname =
artist names as described in the following section.                     {kind = "Band", rule = "Formed"}

3.1     Rule-Pattern Approach                                         Elaborating such rules is a tedious task and (especially
   The first approach to extract music-related entities con-       in heterogeneous data environments such as the Web) un-
sists of generating specific rules that operate on the anno-       likely to generalize well and cover all cases. Therefore, in
tations obtained in the pre-processing steps. This requires        the next section we describe a supervised learning approach
the labor-intense task of manually detecting textual patterns      that makes use of automatically labeled data.
that indicate certain entities in exemplary documents and          3.2   Supervised Learning Approach
writing (generalized) rules suited to capture other entities
                                                                      Instead of manually examining unstructured text for oc-
of the same concept also in new documents. For this pur-
                                                                   currences of musical entities and potential patterns to iden-
pose, for a set of 83 artists/bands, related Web pages such as
                                                                   tify them, the idea of this approach is to apply a supervised
band profiles and biographies from Last.fm, Wikipedia, and
                                                                   learning algorithm to a set of pre-annotated examples. Us-
allmusic7 are examined. Based on the made observations,
                                                                   ing the learned model, relevant information should then be
rules that consider orthographic features, punctuation, sur-
                                                                   found also in new documents. Several approaches, more
rounding entities (such as those identified via the gazetteer
                                                                   precisely several types of machine learning algorithms, have
lists), and surrounding keywords are designed. The rules
                                                                   been proposed for automatic information extraction tasks,
are formalized as so-called JAPE grammars 8 that are used
                                                                   such as hidden-markov-models [2], decision trees [20], or sup-
in the transducer step of GATE. The complete set of JAPE
                                                                   port vector machines (SVM) [12]. Since the latter demon-
6
  http://gate.ac.uk/userguide/                                     strates that SVMs may yield results that rival those of opti-
7
  http://www.allmusic.com                                          mized rule-based approaches, SVMs are chosen as classifier
8
  Acronym for Java Annotation Patterns Engine                      for the tasks at hand (for more details see [12, 13])
   For training of the SVMs, a set of documents that contain        token is calculated for each possible output class. Finally,
annotations of the entities of interest is required. Since also     the class (label) with the highest probability is predicted for
this step can be labor intense, we opted for an automatic           the entity if its probability is greater than 0.25. The proba-
annotation approach. For the collection of training docu-           bility of the predicted class serves as a confidence score.
ments, ground truth information (on band member history
and band discography) is obtained by either manually com-           3.3    Entity Consolidation and Prediction
piling lists or by invoking Web services such as MusicBrainz           From the extraction step (either rule- or learning-based),
or Freebase. Using this information, occurrences of the band        for each processed text and each concept of interest, a list
name, its members (full name as well as last name only), and        of potential entities is obtained. For each band, the lists
releases are annotated using regular expressions.                   from all texts associated with the band are joined and the
   Construction of the features and SVM training is carried         occurrences of each entity as well as the number of texts
out as described by Li et al. [12]. First, for each token, a fea-   an entity occurs in are counted (term and document fre-
ture vector representation has to be obtained. In the given         quency, respectively). The joined list usually contains a lot
scenario, for each token, its content (i.e., the actual string),    of noise and redundant data, calling for a filtering and merg-
orthographic properties, PoS information, gazetteer-based           ing step. First, all entities extracted by the learning-based
entity information, and identified person entities are con-         method that have a confidence score below 0.5 are removed
sidered. In a second scenario, in addition to these, also the       since they are more likely to not represent band members
output of the rule-based approach (more precisely, the name         than representing band members according to the classifi-
of the rule responsible for prediction of an entity) serves as      cation step. On the cleaned list, the same observations as
an input feature. Ideally, this incorporates indicators of high     described in [19] can be made. For instance, on the list
relevance and allows for supervised selection of the manually       of extracted band members, some members are referenced
generated rules for the final predictions. For each prediction      with different spellings (Paavo Lötjönen vs. Paavo Lotjo-
task, the corresponding annotation type is also added to the        nen), with abbreviated first names (Phil Anselmo vs. Philip
features as target class.                                           Anselmo), with nicknames (Darrell Lance Abbott vs. Dime-
   To construct the feature vectors, the training corpus is         bag Darrell or just Dimebag), or only by their last name
scanned for all occurring values of any of the considered at-       (Iommi). On the discography lists, release names are of-
tributes (i.e., annotations). Then, each token is represented       ten followed by additional information such as release year
by a vector where each distinct annotation value corresponds        or type of release. This is dealt with by introducing an
to one dimension which is set to 1 if the token is annotated        approximate string matching function, namely the level-two
with the corresponding value. In addition, the context of           Jaro-Winkler similarity, cf. [19].9 For both entity types, this
each token (consisting of a window that includes the 5 pre-         type of similarity function is suited well as it assigns higher
ceding and the 5 subsequent tokens) is incorporated. This           matching scores to pairs of strings that start with the same
is achieved by creating an SVM input vector for each token          sequence of characters. In the level-two variant, the two en-
that is a concatenation of the feature vectors of all tokens in     tities to compare are split into substrings and similarity is
the context window. To reflect the distance of the surround-        calculated as an aggregated similarity of pairwise compari-
ing tokens to the actual token (i.e., the center of the win-        son of the substrings. To reduce redundancies, two entities
dow), a reciprocal weighting is applied, meaning that “the          are considered synonymous and thus merged if their level-
nonzero components of the feature vector corresponding to           two Jaro-Winkler similarity is above 0.9. In addition, to
the j th right or left neighboring word are set to be equal to      deal with the occurrence of last names, an entity consisting
1/j in the combined input vector.” [12]. In our experiments,        of one token is considered a synonym of another entity if it
this typically results in feature vectors with approximately        matches the other entity’s last token.
1.5 million dimensions.                                                This consolidated list is usually still noisy, calling for ad-
   In the SVM learning phase, the input vectors correspond-         ditional filtering steps. To this end, two threshold param-
ing to every single token in all training documents serve as        eters are introduced. The first threshold, tf ∈ N0 , deter-
examples. According to the central idea of [12], two distinct       mines the minimum number of occurrences of an entity (or
SVM classifiers are trained for each concept of interest. The       its synonyms) in the band’s set to get predicted. The sec-
first classifier is trained to predict the beginning of an en-      ond threshold, tdf ∈ [0...1] controls the lower bound of the
tity (i.e., to classify whether a token is the first token of an    fraction of texts/documents associated with the band an en-
entity), the second to predict the end (i.e., whether a token       tity has to occur in (document frequency in relation to the
is the last token of an entity). To deal with the unbalanced        total number of documents per band). The impact of these
distribution of positive and negative training examples, a          two parameters is systematically evaluated in the following
special form of SVMs is used, namely an SVM with uneven             section.
margins [14]. From the obtained predictions of start and end
positions, actual entities, as well as corresponding confidence
scores, are determined in a post-processing step. First, start      4.    EVALUATION
tokens without matching end token, as well as end tokens
                                                                      To assess the potential of the proposed approaches and
without matching start token are removed. Second, enti-
                                                                    to measure the impact of the parameters, systematic ex-
ties with a length (in terms of the number of tokens) that
                                                                    periments are conducted. This section details the used test
does not match any training example’s length are discarded.
                                                                    collections as well as the applied evaluation measures and
Third, a confidence score is calculated based on a probabilis-
                                                                    reports on the results of the experiments.
tic interpretation of the SVM output for all possible classes.
More precisely, for each entity, the conjunction of the Sig-        9
moid transformed SVM output probabilities of start and end            For calculation, the open-source Java toolkit SecondString
                                                                    (http://secondstring.sourceforge.net) is utilized.
4.1      Test Collections                                           served, (near-)duplicates as well as biographies consisting
   For evaluation, two collections with different characteris-      of less than 100 characters are filtered out. After filtering
tics are used – the first a previously published collection used    (near-)duplicates and snippets, for 23,386 bands (68%) at
in [19], the second a larger scale test collection consisting of    least one biography remains. In total, a set of 38,753 biogra-
band biographies.                                                   phies is obtained. To keep processing times short, further-
                                                                    more all documents that contain more than 10 megabyte of
 4.1.1    Metal Page Sets                                           annotations after the initial processing step are filtered out.
   The first collection is a set of Web pages introduced in [19].      For training of the supervised learner, a random subset
This set consist of Google’s 100 top-ranked Web pages re-           of 100 biographies is chosen. All biographies by any artist
trieved using the query “band name”music members (cf. Sec-          that is part of the training set are removed from the test set,
tion 2) for 51 Rock and Metal bands (resulting in a total of        resulting in a final test set of 37,664 biographies by 23,030
5,028 Web pages). In [19], this query setting yielded best re-      distinct bands.
sults and is therefore chosen as reference for the task of band-       In comparison to the first test sets, i.e., the Metal page
member extraction. As ground truth, the membership-re-              sets, the biography set contains more bands, more specific
lations that include former members are chosen (i.e., the           documents in a homogeneous format (i.e., biographies in-
Mf ground truth set of [19]). For this evaluation collection        stead of semi-structured Web pages from various sources),
also the results obtained by applying the Hearst patterns           but less associated documents (in average 1.63 documents
proposed in [19] are available, allowing for a direct compari-      per band, as opposed to an average of 90 documents per
son of the approaches’ band member extraction capabilities.         band for the Metal page set).
   For the discography extraction evaluation, no reference          4.2    Evaluation Metrics
data is available in the original set. Therefore – and since the
discography of the contained bands has changed since the               For evaluation, precision and recall are calculated sepa-
creation of the set – a new Web crawl has been conducted to         rately for each band and averaged over all bands to obtain
retrieve recent (and more related) data. Since the aim of this      a final score. The metrics are defined as follows:
new set is to extract released media, for each of the 51 bands                                  |T ∩P |
in the metal set the query “band name” discography is sent                                         |P |
                                                                                                           if |P | > 0
to Google and the top 100 pages are downloaded (resulting                        precision =                                   (1)
                                                                                                     1     otherwise
in a total of 5,090 Web pages). To obtain a discography
ground truth, titles of albums, EPs, and singles released by                                         |T ∩ P |
each band are downloaded from MusicBrainz.                                                recall =                             (2)
                                                                                                       |T |
   To speed up processing of the collections, all Web pages
with a file size over 100 kilobyte are discarded resulting in          where P is the set of predicted entities and T the ground
set sizes of 4,561 and 4,625 documents for the member set           truth set of the band. To assess whether an extracted entity
and the discography set, respectively. Evaluation of the su-        is correct, again the level-two Jaro-Winkler similarity (see
pervised learning approach is performed as a 2-fold cross           Section 3.3) is applied. More precisely, if the Jaro-Winkler
validation (by splitting the band set and separating the as-        similarity between a predicted entity and an entity contained
sociated Web pages), where in each fold a random sample             in the ground truth is greater than 0.9, the prediction is
of 100 documents is drawn for training.                             considered to be correct. Furthermore, if a predicted band
                                                                    member name consist of only one token, it is considered
 4.1.2    Biography Set                                             correct, if it matches with the last token of a member in the
   The second test collection is a larger scale collection con-     ground truth. These weakened definitions of matching allow
sisting only of band biographies to be found on the Web.            for tolerating small spelling variations, name abbreviations,
Biographies are investigated as they should contain both            extracted last names, additional information of releases, as
information on (past) band members and information on               well as string encoding differences.
(important) released records.                                          For comparison with the Hearst pattern approach for band
   Starting from a snapshot of the MusicBrainz database             member detection on the Metal page set, it has to be noted
from December 2010, all artists marked as bands and all             that in [19], calculation of precision and recall is done on
corresponding band members as well as albums, EPs, and              the full set of bands and members (and their corresponding
singles are extracted. In addition, also band-membership            roles), yielding global precision and recall values, whereas
information from Freebase10 is retrieved and merged with            here, the evaluation metrics are calculated separately for
the MusicBrainz information to make the ground truth data           each band and are then averaged over all bands to remove
set more comprehensive. After this step, band-membership            the influence of a band’s size. Using the global evaluation
information is available for 34,238 bands. For each band            scheme, e.g., orchestras are given far more importance than,
name, the echonest API11 is invoked to obtain related bi-           for instance, duos in the overall evaluation, although for a
ographies. Using the echonest’s Web service, related bi-            duo, the individual members are generally more important
ographies (e.g., from Wikipedia, Last.fm, allmusic, or Aol          than for an orchestra. Therefore, in the following, the dif-
Music12 ) can be conveniently retrieved in plain text format.       ferent approaches are compared based on macro-averaged
Since among the provided biographies for a band, duplicates         evaluation metrics (calculated using the arithmetic mean of
or near-duplicates, as well as only short snippets can be ob-       the individual results).

10
   http://www.freebase.com                                          4.3    Evaluation Results
11
   http://developer.echonest.com                                      In the following, the proposed rule-patterns, the SVM ap-
12
   http://music.aol.com                                             proach, as well as the SVM approach that utilizes the out-
                                   Metal Set "music members" for t in [0...0.6], t =0                                                                        Biographies retrieved via echonest for t in [0...9]
                                                                   df            f                                                                                                                     f
              1                                                                                                                 1
                                                                                        Baseline                                                                                                                     Baseline
                                                                                        Hearst Patterns                                                                                                              Rule−Patterns
             0.9                                                                        Rule−Patterns                                                                                                                SVM
                                                                                        SVM                                                                                                                          SVM (w/Rules)
                                                                                        SVM (w/Rules)                          0.9                                                                                   Recall Upper Bound
             0.8                                                                        Recall Upper Bound


             0.7
                                                                                                                               0.8

             0.6
 Precision




                                                                                                                   Precision
             0.5                                                                                                               0.7


             0.4

                                                                                                                               0.6
             0.3


             0.2
                                                                                                                               0.5

             0.1


              0                                                                                                                0.4
                   0   0.1   0.2     0.3       0.4       0.5        0.6       0.7       0.8      0.9         1                       0   0.05   0.1   0.15      0.2    0.25     0.3   0.35       0.4       0.45    0.5   0.55   0.6   0.65
                                                        Recall                                                                                                                    Recall




Figure 1: Precision-recall plots for band-member prediction on the Metal page set (left) and on the biogra-
phy set (right). Curves are obtained by systematically varying threshold parameters (tdf and tf for Metal
page set and biography set, respectively). Precision and recall values macro-averaged over all bands in the
corresponding test set.


put of the rule-patterns are compared for the tasks of band-                                                     proaches tend to outperform the rule-based extraction ap-
member detection and discography extraction. For detecting                                                       proach slightly. However, there is basically no difference be-
band-members, a baseline reference consisting of the person                                                      tween the SVM approaches and the baseline with the only
entity prediction functionality of GATE is provided. On the                                                      exception that the SVM approaches can yield higher recall
Metal page set, band-member prediction is further compared                                                       values. Another observation is that the upper recall bound-
to the Hearst pattern approach from [19]. For the task of                                                        ary on the biography set is rather low at about 0.6.
discography extraction, no such reference is available. For
all evaluations, an additional upper bound for the recall is
                                                                                                                 4.3.2                   Discography Extraction
calculated. This upper bound is implied by the underlying
documents, since band members and releases that do not                                                              For discography extraction the situation is similar as can
occur on any of the documents can not be predicted.                                                              be seen from Figure 2. Also for this task the rule-based ap-
                                                                                                                 proach outperforms the SVM approaches (this time also on
                                                                                                                 the biography set). Recall is also close to the upper bound
4.3.1                  Band-Member Detection                                                                     using SVMs on the Metal page set while on the biography
   The left part of Figure 1 shows precision-recall curves                                                       set, none of the approaches is capable of reaching the already
for the different band member detection approaches on the                                                        low upper recall boundary at 0.36. Conversely, on the biog-
Metal page set. For a systematic comparison with the Hearst                                                      raphy set, all proposed approaches yield rather high preci-
pattern approach, the tdf , i.e., the threshold that determines                                                  sion values. However, due to the lack of a baseline reference,
on which fraction of a band’s total documents a band mem-                                                        it is difficult to draw final conclusions about the quality of
ber has to appear on to be predicted, is varied. It can be seen                                                  these approaches for the task of discography extraction.
that the rule-based approach clearly performs best. Also                                                            What can be seen from both the evaluations on discogra-
SVM and SVM using the rules output outperform the Hearst                                                         phy and band-member extraction is that – despite all work
pattern approach. It becomes apparent that on the Metal                                                          required – rule-patterns are preferable over supervised learn-
set, rule patterns, the GATE person baseline, and the super-                                                     ing methods. Another consistent finding so far is that SVMs
vised approaches can yield recall values close to the upper                                                      that utilize the output of the rule-pattern classification pro-
bound, i.e., these approaches capture nearly all members                                                         cess are superior to SVMs without this information, but still
contained in the documents at least once. For the Hearst                                                         inferior to the predictions of the rule-patterns alone.
patterns, recall remains low. However, when comparing the                                                           The most unexpected result can be observed for band-
Hearst patterns, it has to be noted that this approach was                                                       member extraction on the biography set. None of the pro-
initially designed to also detect the roles of the band mem-                                                     posed methods outperforms the standard person detection
bers — a feature none of the other approaches is capable of.                                                     approach by GATE. A possible explanation could be that
   Since on the biography set only 1.63 documents per band                                                       the baseline itself is already high. Since biographies typically
are available on average, variation of the tdf threshold is not                                                  follow a certain writing style and consist — in contrast to ar-
as interesting as on the Metal page set. Therefore, the right                                                    bitrary Web pages — mostly of grammatically well-formed
part of Figure 1 depicts curves of the proposed approaches                                                       sentences, natural language processing techniques such as
with varied values of tf , i.e., the threshold that determines                                                   PoS tagging perform better on this type of input. Thus, the
how often an entity has to be detected to be predicted as                                                        person detection approach just works better on the biogra-
a band member. On this set, the supervised learning ap-                                                          phy data than on the Metal page set.
                                    Metal Set "discography" for tdf in [0...0.6], tf=0                                                                                 Biographies retrieved via echonest for t in [0...5]
                                                                                                                                                                                                                      f
              1                                                                                                                                1
                                                                                               Rule−Patterns                                                                                                                      Rule−Patterns
                                                                                               SVM                                                                                                                                SVM
                                                                                               SVM (w/Rules)                                                                                                                      SVM (w/Rules)
             0.9                                                                                                                            0.95
                                                                                               Recall Upper Bound                                                                                                                 Recall Upper Bound


             0.8                                                                                                                             0.9



             0.7                                                                                                                            0.85
 Precision




                                                                                                                                Precision
             0.6                                                                                                                             0.8



             0.5                                                                                                                            0.75



             0.4                                                                                                                             0.7



             0.3                                                                                                                            0.65



             0.2                                                                                                                             0.6
                   0    0.1   0.2            0.3          0.4            0.5             0.6          0.7           0.8                            0    0.05         0.1           0.15          0.2           0.25         0.3         0.35           0.4
                                                         Recall                                                                                                                                 Recall




Figure 2: Precision-recall plots for discography detection on the Metal page set (left) and on the biography
set (right). Settings as in Figure 1.

                                                                                                                                                                       Biographies retrieved via echonest for tf in [0...5]
  In terms of the different sources of data, i.e., the chosen                                                                                 1
                                                                                                                                                                                                                                  Baseline
test collections, it can be seen that using biographies, in gen-                                                                                                                                                                  Rule−Patterns
eral lower recall values (and higher precision values) should                                                                               0.95                                                                                  SVM
                                                                                                                                                                                                                                  SVM (w/Rules)
be expected. This can be seen also from the upper recall                                                                                     0.9
                                                                                                                                                                                                                                  Recall Upper Bound

bounds that are rather low for both tasks. When using Web
                                                                                                                               Precision




documents, more information can be accessed which results                                                                                   0.85

also in higher recall values. On the discography Metal set,
                                                                                                                                             0.8
a recall of 0.7 can be observed which is already close to the
upper bound of 0.74. However, using Web documents re-                                                                                       0.75
quires considerations which documents to examine (e.g., by
formulating an appropriate query to obtain many relevant                                                                                     0.7
                                                                                                                                                   0   0.1     0.2           0.3          0.4            0.5          0.6     0.7         0.8          0.9
pages) as well as dealing with a lot of noise in the data.                                                                                                                                      Recall



4.3.3                  Relating Documents to Artists                                                                      Figure 3: Precision-recall plots for discography de-
   In addition to the two main tasks of this paper, we also                                                               tection on the biography set. Curves obtained by
briefly investigate the applicability of the presented methods                                                            varying threshold parameter tf . Precision and re-
to identify the central artist or band in a text about music,                                                             call values averaged over all pages.
which could be useful for future relation extraction tasks
and tools such as music-focused Web crawling and indexing.
To this end, we utilize the rule-patterns aiming at detecting                                                             yield superior results. Furthermore, it could be seen that
occurrences of artists and train SVMs on occurrences of the                                                               careful selection of the underlying data source is crucial to
name of the band a page belongs to. For prediction, the most                                                              achieve reliable results.
frequently extracted entity with occurrences greater than a                                                                 In general, the results obtained show great potential for
threshold tf is selected. As a baseline, simple prediction of                                                             these and also related tasks. By just focusing on biographies,
any sequence of capitalized tokens at the beginning of the                                                                even more highly relevant meta-information on music could
text is chosen. The results can be seen in Figure 3. For this                                                             be extracted. For instance, consider the following paragraph
task, SVMs perform better than the rule-patterns. However,                                                                taken from the Wikipedia page of the Alkaline Trio:
rather surprisingly, the highest recall value can be observed                                                               “In September 2006, Patent Pending, the debut album
for the simple baseline.                                                                                                  by Matt Skiba’s side project Heavens was released. The
                                                                                                                          band consisted of Skiba on guitar and vocals, and Josiah
5.                 CONCLUSIONS AND FUTURE WORK                                                                            Steinbrick (of hardcore punk outfit F-Minus) on bass. On
                                                                                                                          the album, the duo were joined by The Mars Volta’s Isaiah
  In this paper, we presented first steps towards semantic                                                                “Ikey” Owens on organ and Matthew Compton on drums
Music Information Extraction. We focused on two specific                                                                  and percussion.”13
tasks, namely determining the members of a music band and                                                                   This short paragraph contains band-membership and line-
determining the discography of an artist (also explored on                                                                up information for the Alkaline Trio, for the band Heav-
sets of bands). For both purposes, supervised learning ap-                                                                ens, for the band F-Minus, and for the band The Mars
proaches and rule-based methods were systematically evalu-
ated on two different sets of documents. From the conducted                                                               13
                                                                                                                           http://en.wikipedia.org/w/index.php?
evaluations, it became evident that manually generated rules                                                              title=Alkaline_Trio&oldid=431587984
Volta. In addition, discographical information for Heav-       [12] Y. Li, K. Bontcheva, and H. Cunningham. SVM Based
ens, genre information for F-Minus, and a nickname/alias            Learning System for Information Extraction. In
for Isaiah Owens can be inferred from this small piece of           J. Winkler, M. Niranjan, and N. Lawrence, eds.,
text. Furthermore, relations between the mentioned bands            Deterministic and Statistical Methods in Machine
(“side-project”) as well as the mentioned persons (collabo-         Learning, vol. 3635 of LNCS. Springer, 2005.
rations) can be discovered. Using further information ex-      [13] Y. Li, K. Bontcheva, and H. Cunningham. Adapting
traction methods, in future work, it should be possible to          SVM for Data Sparseness and Imbalance: A Case
capture at least some of this semantic information and re-          Study on Information Extraction. Natural Language
lations and to advance the current state-of-the-art in music        Engineering, 15(2):241–271, 2009.
retrieval and recommendation. However, for systematic ex-      [14] Y. Li and J. Shawe-Taylor. The SVM with uneven
perimentation and targeted development, the creation of a           margins and Chinese document categorization. In
comprehensive and thoroughly (manually) annotated text              Proc. 17th Pacific Asia Conference on Language,
corpus for music seems unavoidable.                                 Information and Computation (PACLIC), 2003.
                                                               [15] Y. Raimond, S. Abdallah, M. Sandler, and F. Giasson.
6.   ACKNOWLEDGMENTS                                                The Music Ontology. In Proc. 8th International
   Thanks are due to Andreas Krenmair for conceiving the            Conference on Music Information Retrieval (ISMIR),
music-related JAPE patterns and sharing his implementa-             2007.
tion. This research is supported by the Austrian Research      [16] M. Schedl and P. Knees. Context-based Music
Fund (FWF) under grants L511-N15 and P22856-N23.                    Similarity Estimation. In Proc. 3rd International
                                                                    Workshop on Learning the Semantics of Audio Signals
                                                                    (LSAS), 2009.
7.   REFERENCES                                                [17] M. Schedl, P. Knees, and G. Widmer. A Web-Based
 [1] H. Alani, S. Kim, D.E. Millard, M.J. Weal, W. Hall,            Approach to Assessing Artist Similarity using
     P.H. Lewis, and N.R. Shadbolt. Automatic Ontology-             Co-Occurrences. In Proc. 4th International Workshop
     Based Knowledge Extraction from Web Documents.                 on Content-Based Multimedia Indexing (CBMI), 2005.
     IEEE Intelligent Systems, 18(1):14–21, 2003.              [18] M. Schedl, C. Schiketanz, and K. Seyerlehner.
 [2] D. M. Bikel, S. Miller, R. Schwartz, and                       Country of Origin Determination via Web Mining
     R. Weischedel. Nymble: a High-Performance Learning             Techniques. In Proc. IEEE International Conference
     Name-finder. In Proc. 5th Conference on Applied                on Multimedia and Expo (ICME): 2nd International
     Natural Language Processing, 1997.                             Workshop on Advances in Music Information
 [3] E. Brill. A Simple Rule-Based Part of Speech Tagger.           Research (AdMIRe), 2010.
     In Proc. 3rd Conference on Applied Natural Language       [19] M. Schedl and G. Widmer. Automatically Detecting
     Processing, 1992.                                              Members and Instrumentation of Music Bands via
 [4] J. Callan and T. Mitamura. Knowledge-Based                     Web Content Mining. In Proc. 5th Workshop on
     Extraction of Named Entities. In Proc. 11th                    Adaptive Multimedia Retrieval (AMR), 2007.
     International Conference on Information and               [20] S. Sekine. NYU: Description of the Japanese NE
     Knowledge Management (CIKM), 2002.                             system used for MET-2. In Proc. 7th Message
 [5] H. Cunningham, D. Maynard, K. Bontcheva, and                   Understanding Conference (MUC-7), 1998.
     V. Tablan. GATE: A framework and graphical                [21] Y. Shavitt and U. Weinsberg. Songs Clustering Using
     development environment for robust NLP tools and               Peer-to-Peer Co-occurrences. In Proc. IEEE
     applications. In Proc. 40th Anniversary Meeting of the         International Symposium on Multimedia (ISM):
     Association for Computational Linguistics, 2002.               International Workshop on Advances in Music
 [6] G. Geleijnse and J. Korst. Web-based artist                    Information Research (AdMIRe), 2009.
     categorization. In Proc. 7th International Conference     [22] M. Slaney and W. White. Similarity Based on Rating
     on Music Information Retrieval (ISMIR), 2006.                  Data. In Proc. 8th International Conference on Music
 [7] E. Gómez and P. Herrera. The song remains the same:           Information Retrieval (ISMIR), 2007.
     Identifying versions of the same piece using tonal        [23] M. Sordo, C. Laurier, and O. Celma. Annotating
     descriptors. In Proc. 7th International Conference on          Music Collections: How Content-based Similarity
     Music Information Retrieval (ISMIR), 2006.                     Helps to Propagate Labels. In Proc. 8th International
 [8] S. Govaerts and E. Duval. A Web-Based Approach to              Conference on Music Information Retrieval (ISMIR),
     Determine the Origin of an Artist. In Proc. 10th               2007.
     International Society for Music Information Retrieval     [24] D. Turnbull, L. Barrington, and G. Lanckriet. Five
     Conference (ISMIR), 2009.                                      Approaches to Collecting Tags for Music. In Proc. 9th
 [9] M. A. Hearst. Automatic acquisition of hyponyms                International Conference on Music Information
     from large text corpora. In Proc. 14th Conference on           Retrieval (ISMIR), 2008.
     Computational Linguistics - Vol. 2, 1992.                 [25] B. Whitman and S. Lawrence. Inferring Descriptions
[10] P. Knees. Text-Based Description of Music for                  and Similarity for Music from Community Metadata.
     Indexing, Retrieval, and Browsing. PhD thesis,                 In Proc. International Computer Music Conference
     Johannes Kepler Universität, Linz, Austria, 2010.             (ICMC), 2002.
[11] A. Krenmair. Musikspezifische Informationsextraktion
     aus Webdokumenten. Diplomarbeit, Johannes Kepler
     Universität, Linz, Austria, 2010.