<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Nikhil Kini Tata Consultancy Services Ltd.</string-name>
          <email>nikhil.kini@tcs.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Innovation Labs</institution>
          ,
          <addr-line>Thane</addr-line>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2014</year>
      </pub-date>
      <fpage>16</fpage>
      <lpage>17</lpage>
      <abstract>
        <p>We describe a system to address the MediaEval 2014 C@merata task of natural language queries on classical music scores. Our system first tokenizes the question to tag the musically relevant features in the question using pattern matching. In this stage suitable word replacements are made in the question based on a list of synonyms. Using the tokenized sentence we infer the question type using a set of handwritten rules. We then search the input music score based on the question type to find the musical features requested. MIT's music21 library [2] is used for indexing, accessing and traversing the score.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>2. APPROACH</title>
      <p>Figure 1 presents the main modules of our system. Since we treat
the problem as one of natural language understanding (of the
question) and searching (through the musicXML), we define a set
of question classes based on the searchable musical features, and
propose a specific search method for each type of question. The
main operations performed by our system are as follows:</p>
    </sec>
    <sec id="sec-2">
      <title>2.1 Identifying tokens in the question</title>
      <p>In the tokenizing step, words representing musically important
features are marked/tokenized. We use 3 or 4 letter markers for
the token class. After tokenization, the sentence will contain
tokens grouped with the value of the token, each token-value pair
grouped by parentheses, and the token and the value will be
separated by a comma. For example, "quarter note then half note
then quarter note in the tenor voice" is output as "(DUR, quarter
note) (SEQ, then) (DUR, half note) (SEQ, then) (DUR, quarter
note) in the (PRT, tenor voice)". Another example is "melodic
octave" becomes "(HRML, melodic) (INT, octave)".</p>
    </sec>
    <sec id="sec-3">
      <title>2.2 Synonyms List</title>
      <p>A list of synonyms is referred to during tokenizing for substituting
words that refer to the same feature. This serves two purposes: 1)
to cover all manners of asking for the same feature and 2)
standardizing the different ways of asking for the same thing so
that specifying the subsequent modules becomes simpler. The list
of synonyms can be updated as new ways of asking the same
feature are discovered when users actually query the system.</p>
    </sec>
    <sec id="sec-4">
      <title>2.3 Inferring the question type</title>
      <p>The tokenized output (with synonym list substitution) is the input
to the module which infers the question type. A handcrafted set of
rules was used to guess what type of question is asked based on
the constituent tokens (see section 2.4). Looking at all questions
available to us so far - task description, training set, test set - we
specify the following types of questions: simple note , note with
expression, interval (harmonic), interval (melodic), lyrics, extrema
(highest or lowest note), time signature, key signature, cadence,
triads, texture, bar with dynamics, consecutive notes, combination
of the above.</p>
    </sec>
    <sec id="sec-5">
      <title>2.4 Question rules</title>
      <p>Based on the tokens present in the question phrase, we can write
rules to guess the type of the question. For simple questions made
up of only one question this is straightforward. For the phrases
which contain a combination of elementary question types, some
parsing capability might be necessary. We will address this in
future work.</p>
    </sec>
    <sec id="sec-6">
      <title>2.5 Search scope</title>
      <p>An important part to getting the right answer is limiting the search
scope. For example, in the question "A sharp in the Treble clef",
we are not just looking for any A#, but particularly in the Treble
clef. Our tokens PRT and CLF can be used to scope the search.
We look only in these parts during searching or we filter only
those search results as answers which are within this search scope.</p>
    </sec>
    <sec id="sec-7">
      <title>2.6 Searching for the answer</title>
      <p>The last step is searching the musicXML score for the identified
token/token combination. This step is still a work in progress. We
make extensive use of music21 capabilities.</p>
    </sec>
    <sec id="sec-8">
      <title>2.7 Score index</title>
      <p>This is a list of all the notes in the score stored with the following
associated information for each note: note name, note letter,
accidental, pitch class, note octave, bar, offset, note length, part
number, part id and whether this is a rest or a note. (This
terminology is as defined in music21).</p>
    </sec>
    <sec id="sec-9">
      <title>3. RESULTS AND DISCUSSION</title>
      <p>Upon release of the results, we saw that the organizers had also
used a scheme of classification for the questions. Reconciling the
organizers' and our question types, we saw that
as far as the test questions go, we had all possibilities covered.</p>
      <p>Table 1 shows beat and measure precision recall scores for results
produced by our system for the test set. The strongest performance
is seen in the 'simple notes' category (simple pitch, simple length,
pitch and length). This is no surprise as these question phrases are
the easiest to handle. Perf_spec questions are simple note with
expression type questions (involving for example, mordant and
trill). Word_spec is pitch/length occurring over a certain word in
the lyrics. Although these were not handled by our
implementation, some results were returned because the system
fell back to the simple note type, which explains the non-zero
precision and recall. For e.g. “F trill” returns all F notes.
Followed_by is equivalent to consecutive notes. Melodic_interval
is a type with our system too, and the system performs decently on
both these types.</p>
      <p>Although search was not implemented for harmonic_interval,
cadence_spec, triad_spec and texture_spec, nearly all questions
for these types were correctly classified by our system. No
answers were returned for these types of questions, which results
in the zero scores seen in the table. Only 8 questions remained
unclassified in the test data.</p>
    </sec>
    <sec id="sec-10">
      <title>4. CONCLUSION</title>
      <p>The system implemented based on the specifications in this paper
performs decently on single musical feature retrieval. A study of
the errors in this implementation might even be able to take the
precision and recall for such simple types to 1, and this will be the
aim of the next cycle of development.</p>
      <p>While our system performs well on the simple question phrases,
the more complex question phrases still need work. As a question
grows more complicated to include multiple musical features, we
will need to evolve a more complex parsing strategy to identify
questions. It is possible that the specification of the system will
need to be revisited to take into account all the possibilities.
The scope of the system specification is limited mainly to what we
have observed in the task description and the training set, and
these are in no way exhaustive of the types of queries that can be
asked.</p>
    </sec>
    <sec id="sec-11">
      <title>5. ACKNOWLEDGMENTS</title>
      <p>Many thanks to Dr. Sunil Kumar Kopparapu, my supervisor, for
his help in shaping this paper.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>R.</given-names>
            <surname>Sutcliffe</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Crawford</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Fox</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. L.</given-names>
            <surname>Root</surname>
          </string-name>
          and
          <string-name>
            <given-names>E.</given-names>
            <surname>Hovy</surname>
          </string-name>
          . The C@merata Task at MediaEval 2014:
          <article-title>Natural language queries on classical music scores</article-title>
          .
          <source>In MediaEval 2014 Workshop</source>
          , Barcelona, Spain, October
          <volume>16</volume>
          -17
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <surname>Cuthbert</surname>
            ,
            <given-names>M. S.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Ariza</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <year>2010</year>
          .
          <article-title>music21: A toolkit for computer-aided musicology and symbolic music data</article-title>
          .
          <source>In ISMIR 2010, Utrecht, Netherlands, August</source>
          <volume>9</volume>
          -
          <issue>13</issue>
          <year>2010</year>
          (
          <volume>637</volume>
          -
          <fpage>642</fpage>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <surname>Allam</surname>
            ,
            <given-names>A. M. N.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Haggag</surname>
            ,
            <given-names>M. H.</given-names>
          </string-name>
          <year>2012</year>
          .
          <article-title>The Question Answering Systems: A Survey</article-title>
          . In
          <source>International Journal of Research and Reviews in Information Sciences (IJRRIS)</source>
          ,
          <volume>2</volume>
          (
          <issue>3</issue>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <surname>Gabriel</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <year>2013</year>
          .
          <article-title>Large data sets &amp; recommender systems: A feasible approach to learning music</article-title>
          .
          <source>In Proceedings of the Sound and Music Computing Conference</source>
          <year>2013</year>
          ,
          <string-name>
            <surname>SMC</surname>
          </string-name>
          <year>2013</year>
          , Stockholm, Sweden, p.
          <fpage>701</fpage>
          -
          <lpage>706</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <surname>Downie</surname>
            ,
            <given-names>J. S.</given-names>
          </string-name>
          <year>1999</year>
          .
          <article-title>Evaluating a simple approach to music information retrieval: Conceiving melodic n-grams as text (Doctoral dissertation</article-title>
          , The University of Western Ontario).
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <surname>Viro</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          <year>2011</year>
          .
          <article-title>Peachnote: Music Score Search and Analysis Platform</article-title>
          .
          <source>In ISMIR</source>
          <year>2011</year>
          , Miami,
          <source>Florida (USA) October 24-28</source>
          ,
          <year>2011</year>
          (
          <volume>359</volume>
          -
          <fpage>362</fpage>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <surname>Ganseman</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Scheunders</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>D'haes</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          <year>2008</year>
          .
          <article-title>Using XQuery on MusicXML Databases for Musicological Analysis</article-title>
          .
          <source>In ISMIR</source>
          <year>2008</year>
          , Philadelphia,
          <source>Pennsylvania (USA) September 14-18</source>
          ,
          <year>2008</year>
          433-
          <fpage>438</fpage>
          ).
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>