<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Kartik Asooja</string-name>
          <email>kartik.asooja@insight-</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Sindhu Kiranmai Ernala</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>sindhukiranmai.ernala@student</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Paul Buitelaar</string-name>
          <email>paul.buitelaar@insight-</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Center for Exact Humanities</institution>
          ,
          <addr-line>IIIT Hyderabad</addr-line>
          ,
          <country country="IN">India</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Insight Centre for Data Analytics</institution>
          ,
          <addr-line>NUI Galway</addr-line>
          ,
          <country country="IE">Ireland</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>s.iiit.ac.in</institution>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2015</year>
      </pub-date>
      <fpage>14</fpage>
      <lpage>15</lpage>
      <abstract>
        <p>This paper presents a description of our submission to the C@merata task in MediaEval 2015. This submission is a revision to the system submitted for the same task in MediaEval 2014 including some bug fixing. The system answers the natural language queries over the musical scores. The approach is based upon two main steps: identifying the musical entities and relations present in the query, and retrieving the relevant music passages containing those entities from the associated MusicXML file. Our approach makes a sequence of the musical entities in the query, and then searches for a sequence of passages satisfying the sequence of the entities. Musical entities in the query are recognized with the help of regular expressions.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. INTRODUCTION</title>
      <p>
        This work explains our system submitted in the C@merata task
[
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] at MediaEval 2015. The task targets natural language question
answering over musical scores. We were provided with a set of
question types, and the data over which the search was required to
be performed. The questions in the task consist of short noun
phrases in English referring to musical features in the music
scores, for instance, “F# followed two crotchets later by a G”.
Every question refers to a single natural language noun phrase
using English or American music terminology. The music scores
are provided in MusicXML [2][
        <xref ref-type="bibr" rid="ref2">3</xref>
        ], which is a standard open
format for exchanging digital sheet music. The music repertoire
consists of Western Classical works from the Renaissance and the
Baroque periods by composers like Dowland, Bach, Handel, and
Scarlatti. The answers comprise the music passages from the
music score containing the musical features mentioned in the
query string. Thus, it points to the location(s) of the requested
musical features in the score. The answer passage consists of
start/end time signature, start/end division value, and start/end
beat. The task provides two datasets, one for training and
development consisting of the 236 natural language queries used
for the task last year while the other dataset is newly introduced
for testing, which contain 200 questions. This year, questions are
linguistically more difficult and the scores are more complex.
      </p>
    </sec>
    <sec id="sec-2">
      <title>2. APPROACH</title>
      <p>There can be different types of musical features mentioned in the
query such as note, melodic phrase and others. These different
musical features can be referred to as musical entities or can be
defined with the help of such entities. Therefore, we identify some
of the musical entities from the natural language text, and perform
a location search by comparing the extracted entity values against
the corresponding values in the music score for retrieving the
answer passages. For the complex queries requiring some
combinations according to particular relations between the
entities, we just consider the sequential relation between the
musical entities as they appear in the query string. Rather than
making a system,which differentiates between question types, we
apply a rather simple approach assuming just the sequential
relation. On the other hand, the approach we submitted last year
performed union or intersection of the answer passages found for
each musical entity. Our current approach consists of the
following two main steps: Identification of the sequence of the
musical entities in the query string, and retrieval of the answer
sequences of the relevant music passages matching the sequence
of entities. Figure 1 summarizes the followed approach.</p>
    </sec>
    <sec id="sec-3">
      <title>2.1 Identification of Musical Entities</title>
      <p>We use regular expressions and create dictionaries to recognize
musical entities in the query strings. The target entity types are:
1. Notes: A note defines a particular pitch, duration or dynamics
using strings such as Do, crotchet C, quarter note C in the right
hand, or semibreve C. The note recognizer comprises of three
basic music entity recognizers: duration, pitch and staff. We first
recognize all the pitches appearing in the query string, and
separately identify all the durations and staves. To assign the
correct duration/staff for a pitch, we measure the string distance
between all the pitches and duration/staff. The duration/staff,
which occurs within a threshold distance from a pitch, is paired
with it in order to form the note. The pitches and durations are
identified using regular expressions.</p>
      <p>Duration: This defines the playing time of the pitch. In natural
language, it can be reflected by the terms like quarter, semibreve,
and whole. We write a regular expression covering the extensive
vocabulary defining the duration in both English and American
music terminology.</p>
      <p>Pitch: This is a perceptual property that allows the ordering of
sounds on a frequency-related scale. Some examples of writing
pitches in natural language are: D sharp, E#, and A flat. We form
a regular expression to identify the pitches in a query string.
Staff: To identify the staves mentioned in a string, we find the
occurrences of “right hand” and “left hand” strings in it.
The three basic musical entities: duration, pitch and staff
collectively form the note entity.
2. Instruments: In order to find the instruments mentioned in the
query string, we manually created a dictionary of instrument
related n-grams using the training and test data. The dictionary
includes words like viola, piano, alto, violoncello, soprano, tenor,
bass, violin, guitar, sopran, alt, violin, voice, and harpsichord.
3. CLEF: To identify the clef, we just check the presence of
strings like bass clef, F-clef, treble clef and G-clef in the query.
The implementation including the regular expressions and the
dictionaries used can be found at the publicly available code
repository at GitHub1.</p>
    </sec>
    <sec id="sec-4">
      <title>2.2 Music Passage Retrieval</title>
      <p>The values of the identified musical entities in the query are
compared against the corresponding values extracted from the
music score XML file associated with the question. The system
matches the musical features sequentially as they appear in the
query string. Finally, the passage sequences matching completely
with the sequence of musical entities are selected as answer
passages.</p>
    </sec>
    <sec id="sec-5">
      <title>3. RESULTS AND DISCUSSION</title>
      <p>The system performance is measured for each question type, and
an overall weighted average for all the questions is also
calculated. Table 1. shows the results obtained by our submission
for some question types. As discussed in the approach section, the
current implementation only recognizes a few types of musical
entities, which constrains the question types to be answered. The
results clearly show that the system could not answer many
question types like texture, harmonic etc. It is because detection of
such musical features was not implemented in the current system.
Comparing to the system submitted last year, we removed many
bugs related to the meaning of different tags in the Music XML
reader, as we implemented our own reader in Java. In the current
version, our system only uses string and regular expression
matching for the identification of musical elements, while string
distance is used to identify the relations between the elements, if
required. However, deep syntactic and lexical analysis of the
query has the potential to identify relations between the entities
more accurately.
1 Melody
Clef</p>
    </sec>
    <sec id="sec-6">
      <title>4. CONCLUSION</title>
      <p>We have presented a simple pipeline for natural language question
answering on musical scores. The pipeline is based upon
identifying the different types of musical entities and their
relations in the query string, and comparing them against the
corresponding values extracted from the MusicXML file.</p>
    </sec>
    <sec id="sec-7">
      <title>5. ACKNOWLEDGEMENTS</title>
      <p>This work has been funded in part by a research grant from
Science Foundation Ireland (SFI) under Grant Number
SFI/12/RC/2289 (INSIGHT). We are very grateful to Mr. Robert
Solyom from Music Academy, Galway for helpful suggestions
and references.
[2] M. Good, and L. L. C. Recordare. Lessons from the</p>
      <p>Adoption of MusicXML as an Interchange Standard. 2006.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>R.</given-names>
            <surname>Sutcliffe</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Crawford</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Fox</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.L.</given-names>
            <surname>Root</surname>
          </string-name>
          and
          <string-name>
            <given-names>E.</given-names>
            <surname>Hovy</surname>
          </string-name>
          . The C@merata Task at MediaEval 2015:
          <article-title>Natural language queries on classical music scores</article-title>
          .
          <source>In MediaEval 2015 Workshop</source>
          , Wurzen, Germany,
          <source>September 15-16</source>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [3]
          <string-name>
            <surname>Music</surname>
            <given-names>XML</given-names>
          </string-name>
          : http://www.musicxml.com/
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>