<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>CLAS at the MediaEval 2015 C@merata Task</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Stephen Wan Language</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Social Computing Team CSIRO Sydney</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Australia stephen.wan@csiro.au</string-name>
        </contrib>
      </contrib-group>
      <pub-date>
        <year>2015</year>
      </pub-date>
      <volume>1263</volume>
      <fpage>14</fpage>
      <lpage>15</lpage>
      <abstract>
        <p>The CLAS 2015 system treats the C@merata task as a Q&amp;A problem specified with a controlled language. In this year's system, we added a context-free grammar for the music controlled language using the Natural Language ToolKit. Crucially this provides an in-built feature unification mechanism allowing us to replace the ad-hoc unification component in the 2014 system. The CLAS 2015 system with this modification finished first in the C@merata shared task. In this paper, we describe the approach behind our participation in the shared task and discuss arguments for and against using a feature-based context-free grammar to parse queries.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. INTRODUCTION</title>
      <p>
        The C@merata task [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] provides an opportunity to
investigate natural language queries to structured data, in this
case, music data. This data, which is akin to time-series data, is
composed of sequenced events, each with associated metadata.
      </p>
      <p>
        In contrast to the 2014 task [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], this year’s shared task
included complex queries with constraints to restrict candidate
answers. For example, the query “ten staccato quarter notes in the
Violoncello in measures 1-60 followed by two staccato quarter
notes in the Violin 1” requires finding two answer sequences that
are juxtaposed together. Furthermore, the answer sequences
occur in different musical parts played by the violoncello and the
first violin.
      </p>
      <p>
        The CLAS 2015 system, like its 2014 predecessor [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], is
based on the general notion of unification between the
lexicosemantic features of the query and the metadata for each musical
event. In brief, the system interprets a natural language query,
converting the query to a conceptual representation. This is in
turn processed to form a query representation, defining the type of
answer required. Feature unification is generally used to find a
subset of the data that serves as a candidate answer.
      </p>
      <p>In the CLAS 2014 system, the components to detect
linguistic features and unify these with metadata was
purposebuilt for the C@merata 2014 task. In particular, our system did
not heavily rely on phrase structure in the query when extracting
linguistic features to match against --- aside from specific nouns
indicating the beginning of a new noun phrases subsequence, no
other phrase structure was inferred.</p>
      <p>However, the inclusion of more complex queries provided
some cases where a more complex syntactic phrase structure is
required to adequately represent the meaning of the query. For
example, numbers can be used to refer to a specific bar (for
example, “a note in bar 4”), to specify a range of bar indices (for
example, “a note in bars 1 to 4”) or to indicate cardinality (for
example, “4 crotchets”). Syntactic structure can help in these
cases to interpret the query correctly.</p>
      <p>
        Consequently, this year, the CLAS system uses the natural
language feature-based parsing facilities in the Python modules
distributed as part of the Natural Language ToolKit (NLTK) [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]
and a Context-Free Grammar (CFG) defined by the author. One
side-effect of this approach is that the feature unification facility
of the parser can be used to match against feature structures based
on the music events.
      </p>
      <p>The CLAS 2015 system achieved 0.60 precision and 0.63
recall when specifying answers at the granularity of “beats” in a
bar (more accurately, a subdivision of a beat, as specified by the
question). When examining accuracy at the granularity of bars,
our system achieved a 0.64 precision and 0.67 recall.</p>
      <p>In the remainder of this paper, Section 2 describes the overall
system with an emphasis on how we employ NLTK and how we
designed the feature-based CFG. Section 3 presents an overview
of the system’s performance in the C@merata 2015 tasks with a
preliminary discussion of how our approach fared with different
query types. We end with some final comments in Section 4.</p>
    </sec>
    <sec id="sec-2">
      <title>2. APPROACH</title>
    </sec>
    <sec id="sec-3">
      <title>2.1 Strategy for participation</title>
      <p>Our general approach to participation for the 2015 entry was
to port the domain-specific rules mapping from a query to a
conceptual representation in the CLAS 2014 system to
featurebased CFG for use with NLTK’s parsing tools. In the advent of
an out-of-vocabulary error or an empty parse, the system reverted
to the 2014 system.</p>
      <p>This grammar was then extended to cover the complex
queries of the 2015 shared task. We compiled a list of the new
queries from the documentation of the shared task. Grammar
development was done by checking to see that there was an
intuitive parse and that this led to candidate answer. The
correctness of the answer was vetted manually.</p>
      <p>This year, we had the benefit of a training data set. We set
up a simple evaluation framework to gauge if changes to the CFG
corresponded to overall improvement. As we did not have the
evaluation code to measure precision and recall (and we did not
have sufficient time to implement our own), we used the diff tool
(with the –w option to ignore whitespace) to compare between the
2014 gold standard and our system results. The number of
different lines was used as a rough measure of performance: fewer
lines was taken as a indicating a grammar with better coverage.</p>
    </sec>
    <sec id="sec-4">
      <title>2.2 Designing a Context-Free Grammar</title>
      <p>In general, the CLAS 2015 grammar models the query as a
nested sequence of musical noun phrases. These phrases are
based predominantly on the basic noun phrases that were handled
in the 2014 CLAS system but extended to include new aspects for
2015 such as chords in a specific key, solfege nomenclature for
notes, and references to scales. No morphological analysis was
performed and plurals were hardcoded into the lexicon. A nested
Question Type
1_melody
n_melody
1_harm
texture
follow
synch
perf
instr
clef
time
key</p>
      <p>Example
dotted minim F#4
five note melody in bars 1-10
chord D2 E5 G5 in bars 54-58</p>
      <p>monophonic passage</p>
      <sec id="sec-4-1">
        <title>A minim followed by a quaver</title>
        <p>chord C4 E4 against a C5
sforzando F2
harmonic second in the Violin 2
four Gs in the treble clef</p>
      </sec>
      <sec id="sec-4-2">
        <title>F sharp in 6/8 time in bars 1-20</title>
        <p>semantic feature structure was propagated to the root to allow for
matching against the data.</p>
        <p>sixteenth note G in G minor</p>
        <p>Table 1. Query types and examples.</p>
        <p>We used domain-specific inferences to handle the queries.
These depended, in part, on where preposition phrases are
attached. For example, a preposition at the root of the parse was
used for restrictions to bars and parts. Prepositions attached to
sequence-based phrase constituents (typically the last element in
the sequence) were used to represent metadata constraints that
should be inherited by all elements in the sequence (for example,
“C, D, E in crotchets”). Finally, a prepositional phrase in the
noun phrase for the musical event itself was used to qualify the
metadata (for example, “a chord in C”).</p>
        <p>Referring expressions proved to be a minor complication as
we did not want to hardcode every enumerated object, such as a
number, in the lexicon. We replaced numbers with a placeholder
token “_NUM_” during the parsing process. The actual numeric
value was then heuristically reinserted into the parse structure.
The same mechanism was used for lyrics and enumerated part
names like “Violin II”.</p>
        <p>When phrases like “followed by” were detected in the query,
we split the query at that point to form two component queries.
Each query was then treated independently and all candidates that
were adjacent with respect to its time (bar and beat indices) were
considered an answer.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>3. RESULTS AND DISCUSSION</title>
      <p>The evaluation was divided into a number of different query
types. Examples of these are presented in Table 1 with the key
elements in bold. We present the official evaluation results in
Table 2. Results are reported for both beat and bar granularities.</p>
      <p>In 2014, the worst category for our system was harmonic
intervals and so we focused on this category, adding
domainspecific inferences about notes, chords and scales to our handling
of harmonic intervals. We are pleased to see that this led to good
performance for this category.</p>
      <p>Our approach to splitting a query into component parts that
are juxtaposed led to good precision for the “synch” category but
leaves room for improvement in terms of recall. Our inferences
mechanism based on the attachment points of prepositional
phrases led to the reasonable performance for the “instr”, “clef”
and “time” query categories.</p>
      <p>We ignored cadences and texture in this year’s effort and our
performance suffers correspondingly for these query types. Due
to time restrictions, we were unable to add domain-specific
knowledge to handle musical references such “Alberti bass”,
“arpeggios”, and “descending scale”. We were also only partially
able to handle the restriction “melody” as in “melody C, D, E in
the violin”, as this requires an analysis of texture.</p>
      <p>Question Type
1_melody
n_melody
1_harm
texture
follow
synch
perf
instr
clef
time</p>
      <sec id="sec-5-1">
        <title>Beat</title>
        <p>Prec.
0.655
0.716
0.66
0
0.312
0.818
0.955
0.677
0.415
0.679</p>
      </sec>
      <sec id="sec-5-2">
        <title>Beat</title>
        <p>Recall
0.812
0.52
0.62
0
0.484
0.25
0.467
0.708
0.519
0.905</p>
        <p>Bar</p>
        <p>Prec.
0.687
0.77
0.702
0
1
0.323
0.955
0.72
0.431
0.75</p>
        <p>Bar
Recall
0.852
0.559
0.66
0
0.5
0.306
0.467
0.753
0.538
1
key 1 0.625 1 0.625
Table 2. Evaluation results for different query types.</p>
        <p>Evaluation performance aside, it is worth reflecting on the
strengths and weaknesses of our approach. Our NLTK-based
system was notably slower on the 2014 training data set compared
to our 2014 version. It is difficult to say which system is easier to
maintain and develop. Intuitively, we believe the CFG may be
easier to maintain given the ease of porting the 2014 resources to
a CFG and the ability to write domain-specific rules based on
phrase structure.</p>
        <p>One limitation is that we are only able to handle queries
licensed by the grammar, meaning we are unable to handle
ungrammatical query. This is potentially too prescriptive,
particularly if this were to be a real application. Our CFG
deliberately allows metadata for notes to be accepted in any order
(for example, “minim dotted # C4”) but this is the extent to which
we accept an ungrammatical query. Finally, we found that feature
unification as a paradigm for matching against metadata breaks
down at times. The simplest case is that of intervals, the metadata
value of the note name for the second note depends on the context
of another note. For example, a “perfect fifth” is not always a “C,
G” pattern. Enumerating all fifths seems inelegant. For these
cases, other answering mechanisms are needed.</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>4. CONCLUSION</title>
      <p>The CLAS 2015 system treats the C@merata task as a Q&amp;A
problem using a controlled language. We use a feature-based
context-free grammar to define a controlled language for the
music domain and parse queries using the Natural Language
ToolKit. The CLAS 2015 system with this modification finished
first in the C@merata shared task.</p>
    </sec>
    <sec id="sec-7">
      <title>5. ACKNOWLEDGEMENTS</title>
      <p>We would like to thank the organisers of C@merata for such
an engaging, enlightening, and yet entertaining shared task.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <surname>Sutcliffe</surname>
            ,
            <given-names>R. F. E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fox</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Root</surname>
            ,
            <given-names>D. L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hovy</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Lewis</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <year>2015</year>
          .
          <article-title>The C@merata Task at MediaEval 2015: Natural language queries on classical music scores</article-title>
          .
          <source>In MediaEval 2015 Workshop</source>
          , Wurzen, Germany,
          <source>September 14-15</source>
          ,
          <year>2015</year>
          . http://ceur-ws.
          <source>org.</source>
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <surname>Sutcliffe</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Crawford</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fox</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Root</surname>
            ,
            <given-names>D.L.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Hovy</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          <year>2014</year>
          .
          <article-title>The C@merata Task at MediaEval 2014: Natural language queries on classical music scores</article-title>
          . In MediaEval
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>Stephen</given-names>
            <surname>Wan</surname>
          </string-name>
          .
          <year>2014</year>
          <article-title>The CLAS System at the MediaEval 2014 C@merata Task</article-title>
          .
          <source>In the Working Notes Proceedings of MediaEval 2014 Workshop</source>
          . Barcelona, Catalunya, Spain,
          <source>October 16-17</source>
          ,
          <year>2014</year>
          , CEUR-WS.org
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <surname>Bird</surname>
            , Steven, Edward Loper and
            <given-names>Ewan</given-names>
          </string-name>
          <string-name>
            <surname>Klein</surname>
          </string-name>
          .
          <year>2009</year>
          ,
          <string-name>
            <given-names>Natural</given-names>
            <surname>Language Processing with Python. O'Reilly Media Inc</surname>
          </string-name>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>