<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>The CLAS System at the MediaEval 2017 C@merata Task</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Stephen Wan CSIRO Data</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Australia Stephen.Wan@csiro.au</string-name>
        </contrib>
      </contrib-group>
      <pub-date>
        <year>2017</year>
      </pub-date>
      <fpage>13</fpage>
      <lpage>15</lpage>
      <abstract>
        <p>In this paper, we describe the 2017 CLAS system as entered into the C@merata shared task. This year, our aim was to use the challenge as a case study of how one manages natural language queries to structured data, and so we focused on the use of a NoSQL database for managing the retrieval of passages from musical scores. We also extended the 2015 CLAS system to handle queries about harmonies between specific parts, repetition of sequences, and thematic queries about sequences and imitation. Given that the queries were quite diverse, we explored the use of paraphrase methods to transform queries into a canonical form, where possible, which might then be parsed using a feature-based CFG. Our system achieved a measure precision and recall of 0.122 and 0.26 respectively.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1 INTRODUCTION</title>
      <p>
        In this paper, we describe the CLAS submission for the 2017
C@merata shared task [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. As in previous years, the task is one
where a system must find portions of a musical score that match a
natural language query. For example, the query “two eighth
notes, an eighth note rest, three eighth notes, an eighth note rest
and three eighth notes in measures 68-80, all in the Violoncello”
(example from the 2016 data set) might be used to identify a
portion of the score, as specified by the starting and ending bar
numbers as well as the beat offsets in a manner prescribed in [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ],
such as:
      </p>
      <p>Passage
•
•
•
•
•
•
•
•
•
•
end_bar="79"
end_beat_type="4"
end_beats="4"
end_divisions="1"
end_offset="1"
start_bar="78"
start_beat_type="4"
start_beats="4"
start_divisions="1"
start_offset="1"</p>
      <p>
        Our general approach in 2017 is consistent with the earlier
CLAS submissions in 2014 [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] and 2015 [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], which viewed the
task as a Q&amp;A problem with natural language queries posed in a
controlled language. We view the task as a natural language
query task to structured data, where the data has a temporal
element and can be decomposed into multiple aligned streams of
data. In terms of music, these streams correspond to different
musical instruments or parts, temporally aligned. This year,
however, we focused on using the shared task as a case study on
natural language queries to structured data using database
technologies.
      </p>
      <p>
        The 2017 submission builds on the 2015 CLAS entry, which
uses a feature-based Context-Free Grammar (CFG) to specify the
controlled language for C@merata music queries. Parsing using
NLTK [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] provides a feature structure corresponding to the key
semantic elements of the query which is then used to retrieve
results. In the 2015 CLAS system, the feature structure was used
to find the matching events in the musical score. The Music21 [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]
library was used to transform the XML version of a score into an
array of music events, each represented with a set of
attributevalue features. Events were then retrieved for a query, using
feature unification between the query feature structure and the
features of events.
      </p>
      <p>This year, we varied our approach in the following ways:
1.
2.
3.</p>
      <p>Instead of making sequential passes through the
music events for a piece of music, we used the nosql
database (MongoDB) to store and retrieve musical
events using mongoDB queries based on
attributevalue sets.</p>
      <p>We enhanced the 2015 feature-based CFG to
recognise queries about:
a. multiple parts
b. repetition
c. sequences and themes
We used a paraphrase or near-paraphrase back-off
stage if the original query could not be parsed</p>
      <p>Our system focuses on the queries based on music theory as
opposed to those that would rely on music interpretation. In
preparation for the 2017 shared task, we used the gold standard
data from 2014-2016 as software development regression tests.
This year, our CLAS system was able to achieve a measure
precision and recall of 0.122 and 0.26 respectively.</p>
    </sec>
    <sec id="sec-2">
      <title>3 DATA STORAGE</title>
    </sec>
    <sec id="sec-3">
      <title>3.1 Indexing with MongoDB</title>
      <p>Instead of making a single pass of a music score (iterating through
all possible notes) to find matching events using feature
unification, we used a database to store and retrieve all music
events.</p>
      <p>Specifically, we used four tables:
1. Titles (and global score attributes)
2. Musical Events, one table per musical score
3. Sequences, one table per musical score
4. Analysis, one table per musical score</p>
      <p>In the Titles (and global score attributes) database table,
information mapping from the XML music score filename to an
internal ID was kept, in addition to global information
(determined using Music21 functions) such as the time signature,
key signature, and the number parts. Examples of such records
are presented in Figure 1.
{
}
{
}
"_id":ObjectId("598ddda11d41c896ba36332b"),
"name":"air_from_handels_water_music_suite.xml",
"time_signature":{
"1":[
4,
4
},
"key":"F major",
"parts":4
"_id":ObjectId("598dde0b1d41c896ba36364e"),
"name":"and_the_glory_of_the_lord_from_handels_mess
iah.xml",
"time_signature":{
"1":[
3,
4</p>
      <p>For each musical score, we dynamically created a Music
Events table. The table name is the unique identifier specified in
the Titles table. The table for a score stores music events which
could either be of a note or a chord type. Events contain offsets
and other attributes. For note events, for a score consisting of
multiple parts, the notes of each part were read sequentially to
form the note events. For chord events, each part was also
transformed into a series of Music21 Chord objects using the
corresponding function in the Music21 library. Metadata for each
chord was then stored in the table. Finally, the same functionality
for creating chords at each offset was performed on the entire
score, and then this list of chord objects was indexed. An
example of a note record is presented in Figure 2 and a chord
record in Figure 3.</p>
      <p>For some queries, chords (or harmonies) across specific parts
were required. To avoid computing every single permutation of
parts, the specific combination was checked for at query time, and
if it did not exist, the relevant parts were extracted from the XML,
merged into a temporary score and then the entire score
“chordified”, and the results indexed dynamically. Querying then
resumed as above.</p>
      <p>
        The Sequences table for a score stores passages or sequences
detected in the music. These were found by segmenting series of
notes using rests, using a maximum sequence length of 8 bars.
{
}
"_id":ObjectId("598ddda11d41c896ba36348a"),
"name":"C3-dominant seventh chord",
"bar":3,
"offset":2,
"length":0.75,
"key":"F major",
"chord_fn":"5",
"chord_match":3,
"inversion":0,
"bass":"C3",
"root":"C3",
"notes":[
"B-",
"E",
"G",
"C"
],
"notes_with_octave":[
"B-4",
"E4",
"G3",
"C3"
],
"ordered_pitch_classes":[
        <xref ref-type="bibr" rid="ref4">0,
4,
7,
10</xref>
        ],
"root_fn":"5",
"function":"5",
"type":"dominant seventh chord",
"raw_type":"dominant seventh chord",
"ties":4,
"passing":true,
"dynamic":null,
"intervals":[
        <xref ref-type="bibr" rid="ref2 ref3 ref4 ref5">0,
2,
3,
4,
5,
6</xref>
        ],
"interval_names":[
"Minor Tenth",
"Perfect Fifth",
"Major Sixth",
"Perfect Unison",
"Major Tenth",
"Diminished Fifth",
"Minor Fourteenth"
],
"stream":"chords",
"ordering":19
      </p>
      <p>Sequences were represented with unique identifiers
representing the entirety of the passage, using each of three
possible key generators: the unique names of the notes in the
sequence and their lengths (but not their octaves offsets), a variant
of this using relative displacement between notes in terms of
semitones, and a diatonic variant using interval classes. Each
sequence had an associated start and end point (specified as bar
and offset), which was stored in the table.</p>
      <p>The Analysis table for each score keeps track of which
sequence unique identifiers occur in different parts of the score,
allowing some representation of thematic sequences (those that
appear in multiple parts). An example of a sequence from the
Analysis table is presented in Figure 4 and the corresponding
metadata from the Sequence table is presented in Figure 5.
3.2</p>
    </sec>
    <sec id="sec-4">
      <title>Querying the Database</title>
      <p>Each record in the table (aside from the Titles table) is an event
represented as a set of attribute-value features. Using the Python
programming language and the PyMongo1 library which provides
an interface to mongoDB2, queries are essentially comparisons
between dictionary objects. In this way, the queries are very
similar to the feature unification method in the 2015 CLAS
system. There is a subtle difference however. Feature unification
would allow under-specifications on either of the two structures
being compared to unify as long as there were no direct
contradictions. This is not the case with mongoDB queries. If a
query is overly specific and includes attributes not present in the
event, then a match is not possible. Thus, each feature structure
first had to be transformed into an equivalent NoSQL query to
account for this.</p>
      <p>Querying for sequences of notes was performed by first using
a query for the first note of the sequence and then performing an
ordered series of queries, each checking if the event at the next
timestep corresponded to the relevant sequence note. This was
handled through a recursive function that worked its way down a
list of elements in the query.</p>
      <p>Using mongoDB to provide search facilities greatly increased
the speed at which matches could be found, utilizing a database
index for direct lookup instead of doing a sequentially pass of the
score for each query.
3.3</p>
    </sec>
    <sec id="sec-5">
      <title>Forming NoSQL Queries</title>
      <p>If we are searching for a single note, then the query would be the
set of attribute-value features for that note as represented by a
Python dictionary. Such queries were simple to implement with a
NoSQL database like mongoDB. Others, however were more
complex.</p>
      <p>For melodic intervals, these are treated as sequences. In this
case, for every note in each part, we search for the absolute pitch
and octave of a note that would correspond to the interval based
on the current note. Thus for melodic intervals, “search” becomes
a single pass through the score.</p>
      <p>For harmonic intervals, we search through the chord events,
which at the time of indexing is broken down into its component
1 https://api.mongodb.com/python/current/
2 www.mongodb.com
interval classes between all note pairs in the chord. Each of these
intervals is stored, at indexing time, as a list (with an equivalence
encoded for augmented fourths, diminished fifths, and tritones).
The mongoDB query is then a list membership test for the string
name of the interval of interest. We note that the analysis of the
chord into its component intervals is provided by Music21 and the
success of this method depends on that library’s ability to
correctly analyse the chord. For harmonic intervals and chords,
we added search constraints like there needing to be at least 2
notes in the chord event.</p>
      <p>For chord queries specifying the exact notes of the chord, this
is implemented using tests for membership of the notes in the
chord event.</p>
      <p>Cadences are treated as sequences of chord events. Tests are
based on the features of adjacent chords such as its chord function
(as a roman numeral) and the notes one would expect to see
(derived from the chord function).</p>
      <p>Whenever string matches are required in the mongoDB query,
these are implemented as regular expressions to allow “fourth” to
match to “perfect fourth”, or “Violin” to match to “Violin 1”.</p>
    </sec>
    <sec id="sec-6">
      <title>4 QUERY EXTENSIONS</title>
      <p>In this system, we extended the feature-based CFG and the query
generation components to cater to three new types of queries:
1. Harmonies between multiple parts
2. repetition
3. sequences and themes
4.1</p>
      <p>Harmonies between Multiple Parts
The extension for queries indicating multiple parts specifically
relates to harmonies, such as “minor third between Quintus and
Tenor in bars 11-18”. In this case, the part names “Quintus” and
“Tenor” are mapped to a generic symbol “PART_NAME”, which
can be resolved later to the original values. Special grammar rules
for part names in this configuration (“between part1 and part 2”)
are used to keep track of semantic features that flag that
processing of the query should use logic relating to a harmony
between parts. When such flags are encountered during resolution
of the query, the system first checks to see if the required
combination of parts has been indexed in the database. If not, this
then triggers the dynamic indexing of chords in the specified parts
(as mentioned above).</p>
      <p>We note that queries such as “B4 in the left hand followed by
C5 in the right hand in bars 7-8” are covered by a mechanism
from the 2015 system in which the query is divided into two
portions (split by the phrase “followed by”). Each is an atomic
query that could be found in different parts (here, left hand versus
right hand, assuming a simple mapping from left to bass clef and
right to treble clef).
4.2</p>
    </sec>
    <sec id="sec-7">
      <title>Repetition</title>
      <p>Queries like “six crotchet notes repeated twice” are treated as a
sequence of “six crotchet notes” which is then copied a second
time. In this system, handling of repetition requires that the
number of copies be specified. That is an unbounded repetition
query, such as “alternating fourths and fifths in the Oboe in bars
1-100” is not handled. To allow such queries, the system
arbitrarily transforms such as query to be equivalent to “repeated
twice”. This copying of sequences extends the 2015 system
which allowed copies of single notes to be specified in the query.</p>
    </sec>
    <sec id="sec-8">
      <title>4.3 Sequences and Themes</title>
      <p>To make use of the sequence and analysis tables described above,
we extend our grammar to accept relevant queries, focusing on
repeated sequences or imitation between parts. For such queries
to be resolved, we search the sequence and analysis tables
outlined in Section 3. The analysis table shows when a sequence
has been repeated verbatim, with the same note names (but
allowing variation in the octave), or transposition with relative
intervals between sequence notes, specified in both a diatonic and
chromatic form.</p>
      <p>The analysis table shows when any of these sequences has
been repeated between parts, and includes metadata that allows
trivial sequences below a certain length to be filtered out. Once
repeated sequences are found, the start and end offsets are
retrieved from the relevant sequence table.
5 SIMPLE PARAPHRASE AND CONSTRAINT</p>
      <p>RELAXATION
In this system, we introduced a new method to allow graceful
degradation from the case where the grammar did not cover the
input query. In such a case, a prioritised set of paraphrase rules
was used to see if the query could be transformed into a form that
was covered by the grammar.</p>
      <p>These paraphrase rules included synonyms, such as the
mapping from “Violin 1” and “Vln 1”, and phrasal equivalents,
such as “from bars X-Y” and “in bars X-Y”.</p>
      <p>The rules also included “movement” of certain phrases such
that occurred at set positions. For example, constraints about the
bars (“from bar X-Y”) was moved to the end, as was expected by
the grammar.</p>
      <p>We can thus think of the grammar covering the canonical form
of a query. For example, constraints about multiple parts in
queries such as “cello and viola playing dotted minims an octave
apart in bars 40-70” was mapped to “dotted minim octave
between cello and viola in bars 40-70”. Similarly, for repetition
in queries such as “repeated Bb4 whole note” was transformed
into a canonical form like: &lt;note noun phrase&gt;&lt;repetition
constraint&gt;.</p>
      <p>Finally, we encoded near paraphrases to handle concepts that
were similar to those already covered by the grammar. For
example, we used a mapping from “ascending in single steps” to
“ascending”. For some queries, the additional information was
deemed to be of little value (given the default behaviour of the
system) and was thus dropped. For example, “in a row” was
simply deleted given that most sequence queries require that the
target notes occur in series.</p>
      <p>Ideally, these rules would be learnt from data. We see this a
future work. Here we simply provide a way for paraphrase rules,
once acquired, to be used however they are obtained (by manual
inspection or machine learning).</p>
    </sec>
    <sec id="sec-9">
      <title>6 RESULTS AND ANALYSIS</title>
      <p>Our overall approach for preparing this year’s submission was to
use the gold standard results as regression tests, specifically using
the 2016 test set for the development of new features. The recall
and precision at the beat level for the three preceding years of data
is shown in Table 1. While, this is the result of development on
the 2014-2016 test sets, the performance is based on generic
mechanisms without memorisation of answers. We note that
these score for 2014 and 2015 are slightly under the official report
performance of the CLAS system for those years. This is due to
system differences between the 2017 system and the earlier
systems: the 2015 system was a blend of the 2014 system and the
use of the feature-based CFG. For the 2017 system, the earlier
chunking system of 2014 has been dropped altogether.</p>
    </sec>
    <sec id="sec-10">
      <title>7 DISCUSSION</title>
      <p>The 2017 system is based on the feature-based CFG system from
2015. We note that the queries have become more and more
complex to the extent that one might question if treating the
queries as coming from a controlled language is still viable. This
may be one reason why the performance of the system has
degraded compared to previous year’s submissions.</p>
      <p>From software engineering point of view, we note that the
grammar for the queries is increasingly difficult to manage with
each subsequent year’s set of queries. Whether the paraphrase
component is an effective way to manage this diversity of queries
remains to be tested.</p>
      <p>The move to the mongoDB search engine did introduce
efficiencies in running large sets of queries from prior years as
regression tests. Previously, the 2015 system would have required
hours to complete given that each query required a pass through
the score. Currently, each year’s queries take a few minutes to
answer. This would be faster if not for the passes needed through
the scores for melodic queries. We note that melodic intervals too
could be annotated upfront at indexing time, an optimization
which we simply did not have time to implement.</p>
      <p>However, some analyses require batch processing of the score
to identify features of interest. For these, the analysis, potentially
time consuming, has to be performed first ahead of database
indexing.</p>
    </sec>
    <sec id="sec-11">
      <title>8 FUTURE WORK</title>
      <p>In future work, we intend to further extend the system to answer
musicology related queries as opposed to just music theory
queries. This will involve additional preprocessing of the scores
in advance. For example, currently thematic sequences are
currently delimited using rests. However, additional cues in the
score, such as phrase boundaries and lyrics, could also help
provide other possible segmentations.
9</p>
    </sec>
    <sec id="sec-12">
      <title>CONCLUSIONS</title>
      <p>We describe the CLAS system as entered into the 2017 C@merata
shared task. In this system, we focused on the use of a NoSQL
database for managing the retrieval of passages from musical
scores. We also extended the 2015 CLAS system to handle
queries about harmonies between specific parts, repetition of
sequences, and thematic queries about sequences and imitation.
We also explored the use of paraphrase methods to transform
queries into a canonical form, where possible, which might then
be parsed using a feature-based CFG.</p>
    </sec>
    <sec id="sec-13">
      <title>ACKNOWLEDGMENTS</title>
      <p>We thank the organisers of the C@merata 2017 challenge for their
hard work and efforts in running the event.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <surname>Sutcliffe</surname>
            ,
            <given-names>R. F. E.</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>Ó</given-names>
            <surname>Maidín</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. S.</given-names>
            ,
            <surname>Hovy</surname>
          </string-name>
          ,
          <string-name>
            <surname>E.</surname>
          </string-name>
          (
          <year>2017</year>
          ).
          <article-title>The C@merata task at MediaEval 2017: Natural Language Queries about Music, their JSON Representations, and Matching Passages in MusicXML Scores</article-title>
          .
          <source>Proceedings of the MediaEval 2017 Workshop</source>
          , Trinity College Dublin, Ireland,
          <source>September 13-15</source>
          ,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <surname>Wan</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          (
          <year>2014</year>
          ).
          <article-title>The CLAS System at the MediaEval 2014 C@merata Task</article-title>
          .
          <source>In the Working Notes Proceedings of MediaEval 2014 Workshop</source>
          . Barcelona, Catalunya, Spain,
          <source>October 16-17</source>
          ,
          <year>2014</year>
          , CEUR-WS.org
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <surname>Wan</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          (
          <year>2015</year>
          ).
          <article-title>The CLAS System at the MediaEval 2015 C@merata Task</article-title>
          .
          <source>In: Mediaeval</source>
          <year>2015</year>
          ;
          <fpage>14</fpage>
          -
          <lpage>15</lpage>
          September 2015; Wurzen, Germany. Mediaeval;
          <year>2015</year>
          . 1-
          <fpage>6</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <surname>Bird</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Loper</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Klein</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          (
          <year>2009</year>
          ).
          <article-title>Natural Language Processing with Python. O'Reilly Media Inc</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <surname>Cuthbert</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ariza</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          (
          <year>2010</year>
          ).
          <article-title>music21: A Toolkit for Computer-Aided Musicology and Symbolic Music Data</article-title>
          .
          <source>Proceedings of the International Symposium on Music Information Retrieval</source>
          , pp.
          <fpage>637</fpage>
          -
          <lpage>42</lpage>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>