=Paper=
{{Paper
|id=Vol-1984/Mediaeval_2017_paper_36
|storemode=property
|title=The CLAS System at the MediaEval 2017 C@merata Task
|pdfUrl=https://ceur-ws.org/Vol-1984/Mediaeval_2017_paper_36.pdf
|volume=Vol-1984
|authors=Stephen Wan
|dblpUrl=https://dblp.org/rec/conf/mediaeval/Wan17
}}
==The CLAS System at the MediaEval 2017 C@merata Task==
<pdf width="1500px">https://ceur-ws.org/Vol-1984/Mediaeval_2017_paper_36.pdf</pdf>
<pre>
          The CLAS System at the MediaEval 2017 C@merata Task
                                                             Stephen Wan
                                                       CSIRO Data61, Australia
                                                        Stephen.Wan@csiro.au

ABSTRACT                                                              query task to structured data, where the data has a temporal
                                                                      element and can be decomposed into multiple aligned streams of
In this paper, we describe the 2017 CLAS system as entered into
                                                                      data. In terms of music, these streams correspond to different
the C@merata shared task. This year, our aim was to use the
                                                                      musical instruments or parts, temporally aligned. This year,
challenge as a case study of how one manages natural language
                                                                      however, we focused on using the shared task as a case study on
queries to structured data, and so we focused on the use of a
                                                                      natural language queries to structured data using database
NoSQL database for managing the retrieval of passages from
                                                                      technologies.
musical scores. We also extended the 2015 CLAS system to
                                                                          The 2017 submission builds on the 2015 CLAS entry, which
handle queries about harmonies between specific parts, repetition
                                                                      uses a feature-based Context-Free Grammar (CFG) to specify the
of sequences, and thematic queries about sequences and imitation.
                                                                      controlled language for C@merata music queries. Parsing using
Given that the queries were quite diverse, we explored the use of
                                                                      NLTK [4] provides a feature structure corresponding to the key
paraphrase methods to transform queries into a canonical form,
                                                                      semantic elements of the query which is then used to retrieve
where possible, which might then be parsed using a feature-based
                                                                      results. In the 2015 CLAS system, the feature structure was used
CFG. Our system achieved a measure precision and recall of
                                                                      to find the matching events in the musical score. The Music21 [5]
0.122 and 0.26 respectively.
                                                                      library was used to transform the XML version of a score into an
                                                                      array of music events, each represented with a set of attribute-
1 INTRODUCTION                                                        value features. Events were then retrieved for a query, using
In this paper, we describe the CLAS submission for the 2017           feature unification between the query feature structure and the
C@merata shared task [1]. As in previous years, the task is one       features of events.
where a system must find portions of a musical score that match a
natural language query. For example, the query “two eighth               This year, we varied our approach in the following ways:
notes, an eighth note rest, three eighth notes, an eighth note rest
and three eighth notes in measures 68-80, all in the Violoncello”             1.   Instead of making sequential passes through the
(example from the 2016 data set) might be used to identify a                       music events for a piece of music, we used the nosql
portion of the score, as specified by the starting and ending bar                  database (MongoDB) to store and retrieve musical
numbers as well as the beat offsets in a manner prescribed in [1],                 events using mongoDB queries based on attribute-
such as:                                                                           value sets.

   Passage                                                                    2.   We enhanced the 2015 feature-based CFG to
       •     end_bar="79"                                                          recognise queries about:
       •     end_beat_type="4"                                                          a. multiple parts
       •     end_beats="4"                                                              b. repetition
       •     end_divisions="1"                                                          c. sequences and themes
       •     end_offset="1"
       •     start_bar="78"                                                   3.   We used a paraphrase or near-paraphrase back-off
       •     start_beat_type="4"                                                   stage if the original query could not be parsed
       •     start_beats="4"
                                                                         Our system focuses on the queries based on music theory as
       •     start_divisions="1"
                                                                      opposed to those that would rely on music interpretation. In
       •     start_offset="1"
                                                                      preparation for the 2017 shared task, we used the gold standard
                                                                      data from 2014-2016 as software development regression tests.
   Our general approach in 2017 is consistent with the earlier
                                                                      This year, our CLAS system was able to achieve a measure
CLAS submissions in 2014 [2] and 2015 [3], which viewed the
                                                                      precision and recall of 0.122 and 0.26 respectively.
task as a Q&A problem with natural language queries posed in a
controlled language. We view the task as a natural language


Copyright is held by the owner/authors(s).
MediaEval’17, September 2017, Dublin, Ireland
MediaEval’17, 13-15 September 2017, Dublin, Ireland                                                                               S. Wan

3 DATA STORAGE                                                        for creating chords at each offset was performed on the entire
                                                                      score, and then this list of chord objects was indexed. An
3.1 Indexing with MongoDB                                             example of a note record is presented in Figure 2 and a chord
Instead of making a single pass of a music score (iterating through   record in Figure 3.
all possible notes) to find matching events using feature                  For some queries, chords (or harmonies) across specific parts
unification, we used a database to store and retrieve all music       were required. To avoid computing every single permutation of
events.                                                               parts, the specific combination was checked for at query time, and
                                                                      if it did not exist, the relevant parts were extracted from the XML,
    Specifically, we used four tables:                                merged into a temporary score and then the entire score
        1. Titles (and global score attributes)                       “chordified”, and the results indexed dynamically. Querying then
        2. Musical Events, one table per musical score                resumed as above.
        3. Sequences, one table per musical score                          The Sequences table for a score stores passages or sequences
        4. Analysis, one table per musical score                      detected in the music. These were found by segmenting series of
                                                                      notes using rests, using a maximum sequence length of 8 bars.
    In the Titles (and global score attributes) database table,
                                                                      {
information mapping from the XML music score filename to an
                                                                        "_id":ObjectId("598ddda11d41c896ba36332c"),
internal ID was kept, in addition to global information
                                                                        "name":"A",
(determined using Music21 functions) such as the time signature,
                                                                        "letter":"A",
key signature, and the number parts. Examples of such records
                                                                        "accidental":"",
are presented in Figure 1.
                                                                        "pitch_class":9,
                                                                        "octave":4,
{
    "_id":ObjectId("598ddda11d41c896ba36332b"),                         "bar":1,
    "name":"air_from_handels_water_music_suite.xml",                    "offset":0,
    "time_signature":{                                                  "length":0.75,
       "1":[
                                                                        "part":0,
          4,
          4                                                             "lyric":null,
       ]                                                                "freq":440,
    },                                                                  "articulation":[
    "key":"F major",                                                      "DOWNBOW","BOWING","TECHNICALINDICATION","ARTICULAT
    "parts":4
                                                                      ION","MUSIC21OBJECT","OBJECT"
}
{                                                                       ],
   "_id":ObjectId("598dde0b1d41c896ba36364e"),                          "expression":null,
   "name":"and_the_glory_of_the_lord_from_handels_mess                  "solfeg":"",
iah.xml",                                                               "type":"note",
   "time_signature":{
                                                                        "dynamic":"piano",
      "1":[
          3,                                                            "tie":"",
          4                                                             "slur":"1",
      ]                                                                 "voice":"VIOLIN I",
   },
                                                                        "clef":"G",
   "key":"E major",
   "parts":18                                                           "voice_num":"0",
}                                                                       "stream":"notes",
                                                                        "ordering":0
    Figure 1: Example JSON records from the Titles table.             }

    For each musical score, we dynamically created a Music            Figure 2: Example of a note JSON record from the Music
Events table. The table name is the unique identifier specified in    Events     table    for    “air_from_handels_water_music_
the Titles table. The table for a score stores music events which     suite.xml” which in this case was stored in table
could either be of a note or a chord type. Events contain offsets     “db.T598ddda11d41c896ba36332b” (notice the identifier
and other attributes. For note events, for a score consisting of      following the “T” is the same as in the first record of Figure
multiple parts, the notes of each part were read sequentially to      1).
form the note events. For chord events, each part was also
transformed into a series of Music21 Chord objects using the
corresponding function in the Music21 library. Metadata for each
chord was then stored in the table. Finally, the same functionality
MediaEval’17, 13-15 September 2017, Dublin, Ireland                                                               S. Wan

{                                                           {
    "_id":ObjectId("598ddda11d41c896ba36348a"),               "_id":ObjectId("598de46b1d41c896ba374f8f"),
    "name":"C3-dominant seventh chord",                       "name":"ABS::++E-_3.0000_0.0000++D_1.0000_0.0000++
    "bar":3,                                                F_1.0000_1.0000++D_1.0000_2.0000++C_1.0000_0.0000++E-
    "offset":2,                                             _1.0000_1.0000++C_1.0000_2.0000++B-_1.0000_0.0000",
    "length":0.75,                                            "key_type":"absolute",
    "key":"F major",
                                                              "seen_parts":[
    "chord_fn":"5",
                                                                "FLUTE 1 2",
    "chord_match":3,
    "inversion":0,                                              "BASSOON 1 2"
    "bass":"C3",                                              ],
    "root":"C3",                                              "motif_length":8
    "notes":[                                               }
       "B-",                                                {
       "E",                                                   "_id":ObjectId("598de46b1d41c896ba374f90"),
       "G",                                                   "name":"DIA::++-2_1.0000_0.0000++3_1.0000_1.0000++
       "C"                                                  -3_1.0000_2.0000++-2_1.0000_0.0000++3_1.0000_1.0000++
    ],                                                      -3_1.0000_2.0000++-2_1.0000_0.0000",
    "notes_with_octave":[                                     "key_type":"diatonic",
       "B-4",                                                 "seen_parts":[
       "E4",
                                                                "FLUTE 1 2",
       "G3",
                                                                "BASSOON 1 2"
       "C3"
    ],                                                        ],
    "ordered_pitch_classes":[                                 "motif_length":8
       0,                                                   }
       4,
       7,                                                   Figure 4: A sequence in both its absolute note and diatonic
       10                                                   interval forms that occurs in two parts, here Flute and
    ],                                                      Bassoon from “beethoven_symphony_3_movement_iii_muse.
    "root_fn":"5",                                          xml”.
    "function":"5",
    "type":"dominant seventh chord",
                                                            {
    "raw_type":"dominant seventh chord",
                                                              "_id":ObjectId("598de4691d41c896ba3748f2"),
    "ties":4,
    "passing":true,                                           "name":"ABS::++E-_3.0000_0.0000++D_1.0000_0.0000++
    "dynamic":null,                                         F_1.0000_1.0000++D_1.0000_2.0000++C_1.0000_0.0000++E-
    "intervals":[                                           _1.0000_1.0000++C_1.0000_2.0000++B-_1.0000_0.0000",
       0,                                                     "voice":"FLUTE 1 2",
       2,                                                     "clef":"G",
       3,                                                     "voice_num":"0",
       4,                                                     "key_type":"ABS",
       5,                                                     "type":"motif",
       6                                                      "start_bar":26,
    ],
                                                              "start_offset":0,
    "interval_names":[
                                                              "end_bar":29,
       "Minor Tenth",
       "Perfect Fifth",                                       "end_offset":1,
       "Major Sixth",                                         "end_duration":2,
       "Perfect Unison",                                      "note_length":8,
       "Major Tenth",                                         "duration":10
       "Diminished Fifth",                                  }
       "Minor Fourteenth"
    ],                                                      Figure   5:   Metadata     for  the    sequence   with
    "stream":"chords",                                      id=598de46b1d41c896ba374f8f in the score   “beethoven_
    "ordering":19
                                                            symphony_3_movement_iii_muse.xml”.
}

                                                               Sequences were represented with unique identifiers
Figure 3: A chord JSON record from the Music Events table
                                                            representing the entirety of the passage, using each of three
for “air_from_handels_water_music_suite.xml”.
                                                            possible key generators: the unique names of the notes in the


                                                                                                                        3
MediaEval’17, 13-15 September 2017, Dublin, Ireland                                                                                  S. Wan

sequence and their lengths (but not their octaves offsets), a variant   interval classes between all note pairs in the chord. Each of these
of this using relative displacement between notes in terms of           intervals is stored, at indexing time, as a list (with an equivalence
semitones, and a diatonic variant using interval classes. Each          encoded for augmented fourths, diminished fifths, and tritones).
sequence had an associated start and end point (specified as bar        The mongoDB query is then a list membership test for the string
and offset), which was stored in the table.                             name of the interval of interest. We note that the analysis of the
    The Analysis table for each score keeps track of which              chord into its component intervals is provided by Music21 and the
sequence unique identifiers occur in different parts of the score,      success of this method depends on that library’s ability to
allowing some representation of thematic sequences (those that          correctly analyse the chord. For harmonic intervals and chords,
appear in multiple parts). An example of a sequence from the            we added search constraints like there needing to be at least 2
Analysis table is presented in Figure 4 and the corresponding           notes in the chord event.
metadata from the Sequence table is presented in Figure 5.                  For chord queries specifying the exact notes of the chord, this
                                                                        is implemented using tests for membership of the notes in the
3.2 Querying the Database                                               chord event.
Each record in the table (aside from the Titles table) is an event          Cadences are treated as sequences of chord events. Tests are
represented as a set of attribute-value features. Using the Python      based on the features of adjacent chords such as its chord function
programming language and the PyMongo1 library which provides            (as a roman numeral) and the notes one would expect to see
an interface to mongoDB 2 , queries are essentially comparisons         (derived from the chord function).
between dictionary objects. In this way, the queries are very               Whenever string matches are required in the mongoDB query,
similar to the feature unification method in the 2015 CLAS              these are implemented as regular expressions to allow “fourth” to
system. There is a subtle difference however. Feature unification       match to “perfect fourth”, or “Violin” to match to “Violin 1”.
would allow under-specifications on either of the two structures
being compared to unify as long as there were no direct                 4 QUERY EXTENSIONS
contradictions. This is not the case with mongoDB queries. If a         In this system, we extended the feature-based CFG and the query
query is overly specific and includes attributes not present in the     generation components to cater to three new types of queries:
event, then a match is not possible. Thus, each feature structure                1. Harmonies between multiple parts
first had to be transformed into an equivalent NoSQL query to                    2. repetition
account for this.                                                                3. sequences and themes
    Querying for sequences of notes was performed by first using
a query for the first note of the sequence and then performing an       4.1 Harmonies between Multiple Parts
ordered series of queries, each checking if the event at the next
                                                                        The extension for queries indicating multiple parts specifically
timestep corresponded to the relevant sequence note. This was
                                                                        relates to harmonies, such as “minor third between Quintus and
handled through a recursive function that worked its way down a
                                                                        Tenor in bars 11-18”. In this case, the part names “Quintus” and
list of elements in the query.
                                                                        “Tenor” are mapped to a generic symbol “PART_NAME”, which
    Using mongoDB to provide search facilities greatly increased
                                                                        can be resolved later to the original values. Special grammar rules
the speed at which matches could be found, utilizing a database
                                                                        for part names in this configuration (“between part1 and part 2”)
index for direct lookup instead of doing a sequentially pass of the
                                                                        are used to keep track of semantic features that flag that
score for each query.
                                                                        processing of the query should use logic relating to a harmony
                                                                        between parts. When such flags are encountered during resolution
3.3 Forming NoSQL Queries                                               of the query, the system first checks to see if the required
If we are searching for a single note, then the query would be the      combination of parts has been indexed in the database. If not, this
set of attribute-value features for that note as represented by a       then triggers the dynamic indexing of chords in the specified parts
Python dictionary. Such queries were simple to implement with a         (as mentioned above).
NoSQL database like mongoDB. Others, however were more                      We note that queries such as “B4 in the left hand followed by
complex.                                                                C5 in the right hand in bars 7-8” are covered by a mechanism
    For melodic intervals, these are treated as sequences. In this      from the 2015 system in which the query is divided into two
case, for every note in each part, we search for the absolute pitch     portions (split by the phrase “followed by”). Each is an atomic
and octave of a note that would correspond to the interval based        query that could be found in different parts (here, left hand versus
on the current note. Thus for melodic intervals, “search” becomes       right hand, assuming a simple mapping from left to bass clef and
a single pass through the score.                                        right to treble clef).
    For harmonic intervals, we search through the chord events,
which at the time of indexing is broken down into its component         4.2 Repetition
                                                                        Queries like “six crotchet notes repeated twice” are treated as a
1
    https://api.mongodb.com/python/current/                             sequence of “six crotchet notes” which is then copied a second
2
    www.mongodb.com                                                     time. In this system, handling of repetition requires that the
MediaEval’17, 13-15 September 2017, Dublin, Ireland                                                                                S. Wan

number of copies be specified. That is an unbounded repetition       once acquired, to be used however they are obtained (by manual
query, such as “alternating fourths and fifths in the Oboe in bars   inspection or machine learning).
1-100” is not handled. To allow such queries, the system
arbitrarily transforms such as query to be equivalent to “repeated   6 RESULTS AND ANALYSIS
twice”. This copying of sequences extends the 2015 system
                                                                     Our overall approach for preparing this year’s submission was to
which allowed copies of single notes to be specified in the query.
                                                                     use the gold standard results as regression tests, specifically using
                                                                     the 2016 test set for the development of new features. The recall
4.3 Sequences and Themes                                             and precision at the beat level for the three preceding years of data
To make use of the sequence and analysis tables described above,     is shown in Table 1. While, this is the result of development on
we extend our grammar to accept relevant queries, focusing on        the 2014-2016 test sets, the performance is based on generic
repeated sequences or imitation between parts. For such queries      mechanisms without memorisation of answers. We note that
to be resolved, we search the sequence and analysis tables           these score for 2014 and 2015 are slightly under the official report
outlined in Section 3. The analysis table shows when a sequence      performance of the CLAS system for those years. This is due to
has been repeated verbatim, with the same note names (but            system differences between the 2017 system and the earlier
allowing variation in the octave), or transposition with relative    systems: the 2015 system was a blend of the 2014 system and the
intervals between sequence notes, specified in both a diatonic and   use of the feature-based CFG. For the 2017 system, the earlier
chromatic form.                                                      chunking system of 2014 has been dropped altogether.
    The analysis table shows when any of these sequences has
been repeated between parts, and includes metadata that allows
trivial sequences below a certain length to be filtered out. Once            Table 1: Performance on data from prior years.
repeated sequences are found, the start and end offsets are
retrieved from the relevant sequence table.                                                    Recall                  Precision
                                                                        2016                   0.387                   0.350
5 SIMPLE PARAPHRASE AND CONSTRAINT                                      2015                   0.565                   0.541
  RELAXATION                                                            2014                   0.786                   0.720
In this system, we introduced a new method to allow graceful
                                                                        For the 2017 results, our system achieved the following scores:
degradation from the case where the grammar did not cover the
                                                                        Beat Precision: 0.099
input query. In such a case, a prioritised set of paraphrase rules
                                                                        Beat Recall: 0.212
was used to see if the query could be transformed into a form that
                                                                        Measure Precision: 0.122
was covered by the grammar.
                                                                        Measure Recall: 0.260
    These paraphrase rules included synonyms, such as the
mapping from “Violin 1” and “Vln 1”, and phrasal equivalents,
                                                                        Table 2 reports the results by question type. We note that the
such as “from bars X-Y” and “in bars X-Y”.
                                                                     system is does best for harmony queries, followed by melodic
    The rules also included “movement” of certain phrases such
                                                                     queries, as shown by the recall scores. This is unsurprising given
that occurred at set positions. For example, constraints about the
                                                                     then the 2017 was developed with a focus on queries related to
bars (“from bar X-Y”) was moved to the end, as was expected by
                                                                     music theory. For the more complex texture and synch queries,
the grammar.
                                                                     the system does not do so well.
    We can thus think of the grammar covering the canonical form
                                                                        Although a system for sequences and themes was included this
of a query. For example, constraints about multiple parts in
                                                                     year, we note that the method for segmenting sequences in this
queries such as “cello and viola playing dotted minims an octave
                                                                     system is quite simplistic and may not correspond to analyses by a
apart in bars 40-70” was mapped to “dotted minim octave
                                                                     musicologist. Furthermore, the grammar coverage for these
between cello and viola in bars 40-70”. Similarly, for repetition
                                                                     queries was not exhaustive.
in queries such as “repeated Bb4 whole note” was transformed
into a canonical form like: <note noun phrase><repetition                        Table 2: 2017 results per question type.
constraint>.
    Finally, we encoded near paraphrases to handle concepts that
                                                                                  Measure       Measure       Beat          Beat
were similar to those already covered by the grammar. For
                                                                                  Recall        Precision     Recall        Precision
example, we used a mapping from “ascending in single steps” to
                                                                     1 Melod      0.475         0.062         0.328         0.043
“ascending”. For some queries, the additional information was
                                                                     N melod      0.213         0.176         0.167         0.137
deemed to be of little value (given the default behaviour of the
                                                                     1 harm       0.156         0.636         0.156         0.636
system) and was thus dropped. For example, “in a row” was
                                                                     N harm       0.304         0.263         0.290         0.250
simply deleted given that most sequence queries require that the
                                                                     Texture      0.103         0.176         0.103         0.176
target notes occur in series.
                                                                     Follow       0.052         0.133         0.026         0.067
    Ideally, these rules would be learnt from data. We see this a
                                                                     Synch        0.000         0.000         0.000         0.000
future work. Here we simply provide a way for paraphrase rules,

                                                                                                                                        5
MediaEval’17, 13-15 September 2017, Dublin, Ireland                      S. Wan

7 DISCUSSION
The 2017 system is based on the feature-based CFG system from
2015. We note that the queries have become more and more
complex to the extent that one might question if treating the
queries as coming from a controlled language is still viable. This
may be one reason why the performance of the system has
degraded compared to previous year’s submissions.
    From software engineering point of view, we note that the
grammar for the queries is increasingly difficult to manage with
each subsequent year’s set of queries. Whether the paraphrase
component is an effective way to manage this diversity of queries
remains to be tested.
    The move to the mongoDB search engine did introduce
efficiencies in running large sets of queries from prior years as
regression tests. Previously, the 2015 system would have required
hours to complete given that each query required a pass through
the score. Currently, each year’s queries take a few minutes to
answer. This would be faster if not for the passes needed through
the scores for melodic queries. We note that melodic intervals too
could be annotated upfront at indexing time, an optimization
which we simply did not have time to implement.
    However, some analyses require batch processing of the score
to identify features of interest. For these, the analysis, potentially
time consuming, has to be performed first ahead of database
indexing.

8 FUTURE WORK
In future work, we intend to further extend the system to answer
musicology related queries as opposed to just music theory
queries. This will involve additional preprocessing of the scores
in advance. For example, currently thematic sequences are
currently delimited using rests. However, additional cues in the
score, such as phrase boundaries and lyrics, could also help
provide other possible segmentations.

9 CONCLUSIONS
We describe the CLAS system as entered into the 2017 C@merata
shared task. In this system, we focused on the use of a NoSQL
database for managing the retrieval of passages from musical
scores. We also extended the 2015 CLAS system to handle
queries about harmonies between specific parts, repetition of
sequences, and thematic queries about sequences and imitation.
We also explored the use of paraphrase methods to transform
queries into a canonical form, where possible, which might then
be parsed using a feature-based CFG.

ACKNOWLEDGMENTS
We thank the organisers of the C@merata 2017 challenge for their
hard work and efforts in running the event.
MediaEval’17, 13-15 September 2017, Dublin, Ireland                S. Wan

REFERENCES
[1]   Sutcliffe, R. F. E., Ó Maidín, D. S., Hovy, E. (2017). The
      C@merata task at MediaEval 2017: Natural Language
      Queries about Music, their JSON Representations, and
      Matching Passages in MusicXML Scores. Proceedings of the
      MediaEval 2017 Workshop, Trinity College Dublin, Ireland,
      September 13-15, 2017.
[2]   Wan, S. (2014). The CLAS System at the MediaEval 2014
      C@merata Task. In the Working Notes Proceedings of
      MediaEval 2014 Workshop. Barcelona, Catalunya, Spain,
      October 16-17, 2014, CEUR-WS.org
[3]   Wan, S. (2015). The CLAS System at the MediaEval 2015
      C@merata Task. In: Mediaeval 2015; 14-15 September 2015;
      Wurzen, Germany. Mediaeval; 2015. 1-6.
[4]   Bird, S., Loper, E., Klein, E. (2009). Natural Language
      Processing with Python. O’Reilly Media Inc.
[5]   Cuthbert, M., Ariza, C. (2010). music21: A Toolkit for
      Computer-Aided Musicology and Symbolic Music Data.
      Proceedings of the International Symposium on Music
      Information Retrieval, pp. 637–42, 2010.


                                                                        7

</pre>