<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>TNKG at the MediaEval 2015 C@merata Task</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Nikhil Kini Mumbai India nikhil.n.kini@gmail.com</string-name>
        </contrib>
      </contrib-group>
      <pub-date>
        <year>2015</year>
      </pub-date>
      <fpage>14</fpage>
      <lpage>15</lpage>
      <abstract>
        <p>This paper explores a method to answer natural language musicological queries against electronic music scores, with the aim of developing a complete grammar specification in the domain. Our system takes a three step approach to finding the answers to the queries - first, it replaces the musical features in the question with our own tags in a key value form, and also replaces words in the question with equivalents from a synonym list. This is to normalize the natural language and minimize the search space. Second, based on the word sequence in the normalized question, an inference on the type of question is made using handwritten rules. Finally, we have functions that search through the MusicXML for the answer based on the question type.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. INTRODUCTION</title>
      <p>
        The C@merata task [
        <xref ref-type="bibr" rid="ref1 ref2">1,2</xref>
        ] aims at interpreting natural
language queries on electronic music scores to aid musicologists
in locating occurrences of musical features of interest at precise
points in Western classical sheet music. The task is of special
interest to students and professors of Music theory, to the music
retrieval community and Musicologists studying similarity in the
works of composers.
      </p>
      <p>
        This paper continues the work presented in [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], approaching
the problem of natural language understanding in the following
manner: a) convert the NL query into tokens, b) parse the
tokenized query to identify the type of query, c) use search
functions to search the MusicXML for answers based on the type
of query.
      </p>
      <p>This is the second year of the C@merata task as well as of
our participation. We found the questions this year were more
complex than last year, at least linguistically, which made
adjustments and improvements to the system necessary. There
were on average approximately 7 words per question, up from
approximately 3 words per question in 2014. In particular, the
vocabulary of questions increased to 253 unique words from 116.
Consequently, we updated our synonyms list, token-searching
regular expressions and question inference regular expressions to
make allowance for this increase in complexity.</p>
      <p>We show how we have extended the tokens in this paper, and
we also briefly describe work in progress to create a grammar that
is capable of generating the language given by the set of available
questions.</p>
    </sec>
    <sec id="sec-2">
      <title>2. APPROACH</title>
      <p>Our approach remains the same as last year's - the outline is
shown in Figure 1. Musical features in the question are identified
in the Token identification module, and also a list of words to be
replaced is held in the synonyms list. The tokenized sentence is
analyzed by the question type inference module based on the
question rules to infer what kind of a question the query might be.
Next, a Python function corresponding to the inferred question
type searches through the MusicXML to arrive at an answer.</p>
      <p>A brief explanation of the modules and improvements in
them follows.</p>
    </sec>
    <sec id="sec-3">
      <title>2.1 Tokenizing and Synonyms</title>
      <p>In the tokenizing step, we mark musically relevant features
with 3 or 4 letter markers that we call the token class. E.g., quarter
note is a duration, and is marked DUR, and octave is an interval,
and is marked INT. For example, "quarter note then half note then
quarter note in the tenor voice" is output as "(DUR, quarter note)
(SEQ, then) (DUR, half note) (SEQ, then) (DUR, quarter note) in
the (PRT, tenor voice)". Changes to the module this year included
1) new musical features were added, culled from the test set in
2014, such as the double-sharp ‘x’ being added to sharps and
flats; 2) intervals like seventeenth and nineteenth were added; 3)
SCP (scope) was added as a token to explicitly mark the scope of
the search.</p>
    </sec>
    <sec id="sec-4">
      <title>2.2 Question rules and inferring the question</title>
      <p>The tokenized output (with synonym list substitution) is the
input to the module which infers the question type. A handcrafted
set of rules was used to guess what type of question is asked based
on the constituent tokens. While we introduced no new question
types, we worked on expanding the implementation of the
"Combination of the above" type.</p>
      <p>
        We made an attempt at grammar inference of the natural
language queries. Grammar inference (or grammar induction) is
the process of learning a formal grammar by observing the
samples of language that are presented [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. This is our long term
goal with the task - to parse the question to arrive at a question
type rather than through regular expression rules, and to be able to
dynamically extend the queries that are understood by the system,
thereby making it a self-learning system. Here we briefly discuss
the present attempts and future work we intend to do with
grammar inference.
      </p>
      <p>
        Grammar inference (or grammar induction) is the process of
learning a formal grammar by observing the samples of language
that are presented [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. Consider a short demonstrative example
where the language consists only of durations and pitch values,
and queries such as:
      </p>
      <sec id="sec-4-1">
        <title>Q1: Quaver</title>
      </sec>
      <sec id="sec-4-2">
        <title>Q2: Whole note C#</title>
        <p>A grammar for such a language is:
____________________________________________________
S</p>
      </sec>
      <sec id="sec-4-3">
        <title>NOTE P D</title>
        <sec id="sec-4-3-1">
          <title>PTCH_CLS →</title>
        </sec>
        <sec id="sec-4-3-2">
          <title>OCTV →</title>
        </sec>
        <sec id="sec-4-3-3">
          <title>ACCDNTL →</title>
          <p>→
→
→
→</p>
        </sec>
      </sec>
      <sec id="sec-4-4">
        <title>NOTE</title>
        <p>P D P</p>
        <sec id="sec-4-4-1">
          <title>PTCH_CLS OCTV ACCDNTL | ϵ</title>
          <p>breve | semibreve | quaver | whole |
half | quarter … | ϵ
A | B | C | D | E | F | G
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 … | ϵ
# | ## | b | bb | x | flat | sharp | double
sharp | ϵ
___________________________________________</p>
          <p>A total of 419 questions were available, sourced from
examples in specifications, training sets, sample questions, and
the 2014 Gold standard test set. From the 419 questions, we have
a vocabulary of 253 words. These are a subset of the terminal
symbols in the grammar. The entire set would contain all
synonyms and musical terms that do not appear in our sample
questions. It is important to note that there may be out of
vocabulary words from sources such as lyrics for which it will be
difficult to account in the grammar. We use the UNK token to
mark out of vocabulary words, and as far as possible, include it in
the production rules. For example, minim on the word "Der" can
be represented by the production rule:
___________________________________________</p>
        </sec>
        <sec id="sec-4-4-2">
          <title>NOTE_ON_LYRICS → NOTE SCOPE LYR</title>
          <p>NOTE → P D P (same as above)</p>
        </sec>
        <sec id="sec-4-4-3">
          <title>SCOPE → on the word | after | over the lyric LYR → UNK</title>
          <p>___________________________________________</p>
          <p>In our work, experiments were carried out to see if
production rules could be learnt using the questions available to
us. Unfortunately, this did not succeed. At the time of writing this
paper, we know no software that can automatically infer grammar
from positive examples. We tried two softwares, GI-Toolbox
(https://code.google.com/p/gitoolbox/) and Sequitur
(http://www.sequitur.info/). However, we could not adapt them to
the alphabet of Musicological terms in time. So we used a
manually created grammar. Our objective is to parse the tokenized
representation of each query into a question type, and each
question type can be mapped to a retrieval function that can
provide answers. The parameters of the retrieval functions are the
specific values for the non-terminal symbols in the parse tree.</p>
          <p>After writing a formal grammar, we can use a parser to parse
sentences in the language with the use of ANTLR
(http://www.antlr.org/). The parser is then used on the input
natural language queries to obtain a parse tree. The root of the
parse tree is the question type.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>2.3 Search scope</title>
      <p>Search scope is important to increase the precision of the
system by helping to retrieve not just the requested musical
feature, but to do so within the specified context in the query. This
year's approach to setting questions also made this scope explicit
in the form of specifying the basic question types and
modification of these questions in six different ways
performance, instrument, clef, time, measure/bar and key. We
updated our system to include the SCP token so that scope could
be recognized and search could be restricted to only the part of
music that satisfies the scope.</p>
    </sec>
    <sec id="sec-6">
      <title>2.4 Searching for the answer</title>
      <p>
        Once the question type is inferred, the last step is searching
the MusicXML score for the identified token/token combination.
We added the harmonic analysis parts for the score which was
missing last year. Updates to the music21 [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] library enabled us to
find "verticalities" in the score - information about the active
notes at a particular time interval slice. This was analyzed to
search for requested harmonic features in the score.
      </p>
    </sec>
    <sec id="sec-7">
      <title>2.5 Score index</title>
      <p>This is a list of all the notes in the score, stored with
metadata about each note - implemented as a reverse dictionary
for quick retrieval - and there was no difference in this from last
year.</p>
    </sec>
    <sec id="sec-8">
      <title>3. RESULTS AND DISCUSSION</title>
      <p>We did not expect to perform very well on the retrieval task
since there was insufficient time to complete the search functions
within our system by the deadline. Results from the submitted run
show that though we have decent beat recall and and measure
recall, the precision is poor. The beat precision and measure
precision values for the entire run are 0.0605 and 0.0727
respectively, and the beat recall and measure recall are 0.488,
0.586. The best recall is achieved for the 1_melod type where a
beat recall of 0.865 and measure recall of 0.908 were seen. Forty
questions were accurately answered, both beat and measure recall
and precision being 1. Twenty-nine of these were of the simple
1_melod type. However, questions like “dotted quarter note Eb
followed by sixteenth note F”, “G#3 in the "III" part” and “dotted
quarter note in 3/8 time” were also answered exactly. Further, 40
more questions were answered with a beat recall and measure
recall of 1, but precision over a range of 0.004 to 0.9. Most of
these were 1_melod modified by perf, instr, clef or time. Clearly,
the system in its current iteration is decent at recognizing notes,
but work needs to be done on other aspects of musical concept
retrieval.</p>
    </sec>
    <sec id="sec-9">
      <title>4. CONCLUSION</title>
      <p>Although our results were not improved over last year, we
believe progress has been made towards our goal of reaching a
formal grammar specification of natural language queries on
Western sheet music. We will continue with the implementation
of the pending retrieval functions and the parser for the grammar,
along with improvements to the grammar, and are confident of
achieving strong results next time.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <surname>Sutcliffe</surname>
            ,
            <given-names>R. F. E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fox</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Root</surname>
            ,
            <given-names>D. L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hovy</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Lewis</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <year>2015</year>
          .
          <article-title>The C@merata Task at MediaEval 2015: Natural language queries on classical music scores</article-title>
          .
          <source>In MediaEval 2015 Workshop</source>
          , Wurzen, Germany,
          <source>September 14-15</source>
          ,
          <year>2015</year>
          . http://ceur-ws.
          <source>org.</source>
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <surname>Sutcliffe</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Crawford</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fox</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Root</surname>
            ,
            <given-names>D.L.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Hovy</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          <year>2014</year>
          .
          <article-title>The C@merata Task at MediaEval 2014: Natural language queries on classical music scores</article-title>
          .
          <source>In MediaEval 2014 Workshop</source>
          , Barcelona, Spain, October
          <volume>16</volume>
          -17
          <year>2014</year>
          . http://ceur-ws.
          <source>org/</source>
          Vol-
          <volume>1263</volume>
          /.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <surname>Kini</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          (
          <year>2014</year>
          ).
          <article-title>TCSL at the MediaEval 2014 C@ merata Task</article-title>
          . In MediaEval 2014 Workshop, Barcelona, Spain, October
          <volume>16</volume>
          -17
          <year>2014</year>
          . http://ceur-ws.
          <source>org/</source>
          Vol-
          <volume>1263</volume>
          /.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <surname>De la Higuera</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          (
          <year>2010</year>
          ).
          <article-title>Grammatical inference: learning automata and grammars</article-title>
          . Cambridge University Press.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <surname>Cuthbert</surname>
            ,
            <given-names>M. S.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Ariza</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <year>2010</year>
          .
          <article-title>music21: A toolkit for computer-aided musicology and symbolic music data</article-title>
          .
          <source>In ISMIR 2010, Utrecht, Netherlands, August</source>
          <volume>9</volume>
          -
          <issue>13</issue>
          <year>2010</year>
          (
          <volume>637</volume>
          -
          <fpage>642</fpage>
          ).
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>