The C@merata task at MediaEval 2017: Natural Language Queries
    about Music, their JSON Representations, and Matching
                Passages in MusicXML Scores
                                 Richard Sutcliffe1, Donncha S. Ó Maidín2, Eduard Hovy3
                                                       1
                                                       University of Essex, UK
                                                   2
                                                    University of Limerick, Ireland
                                                 3
                                                   Carnegie-Mellon University, USA
                                     rsutcl@essex.ac.uk, donncha.omaidin@ul.ie, hovy@cmu.edu

ABSTRACT                                                             instant when the V-I transition occurs. Therefore, by using points,
                                                                     we can reduce ambiguity. The use of points allowed new types of
The C@merata task at MediaEval started in 2014 and is now in its     query such as key changes, which are clearly points not passages.
fourth year. It is a combination of Natural Language Processing          In the next section we briefly describe the task, including the
and Music Information Retrieval. The input is a short query (‘six    method of evaluation. After that, we outline the preparation of the
consecutive sixths in the right hand in bars 1-25’) against a        Gold Standard data, comprising queries, scores and answers.
classical music score in MusicXML. The required output is a set      Next, we briefly describe the work on feature structure
of matching passages in the score. There are 200 queries and 20      representations for queries, expressed in JSON. The details of the
scores each year. There were several innovations for 2017: First,    2017 campaign are then presented, together with the results.
some queries such as cadences required an answer which was a         Finally, we draw conclusions from the 2017 C@merata task.
point in a score rather than a passage; second, queries were
contributed by participants as well as by the organisers; third,
some of the queries were directly taken from real texts such as
                                                                     2   2017 TASK
articles and webpages; fourth, the organisers provided               The C@merata task has remained almost unchanged since 2014
experimental representations of the input queries in the form of     and detailed descriptions can be found in previous papers
JSON feature structures. These capture many aspects of the           [4,5,6,7,8,9]. We summarise the main points here. There are 200
queries in a form which is much closer to an MIR query. There        questions, each one being a single noun phrase in English. Half of
were just two participants in the evaluation, and scores were        the questions use American terminology (quarter note, measure)
understandably low given the considerable difficulty of the          while the other half use English terminology (crotchet, bar).
queries. However, this year we have significantly advanced our           There are twenty MusicXML scores, and ten queries are set
knowledge of how music is talked about in natural language texts,    against each one. Participants must answer each query by means
how these relate to MIR queries, and how to go about converting      of one or more answer passages or answer points. There are now
a text into a query.                                                 two forms of answer, passages and points. A passage specifies
                                                                     part of a score, beginning and ending at specific places. For
                                                                     example, [4/4,1,1:1-2:4] means we are in 4/4 time, divisions is set
1 INTRODUCTION                                                       to one (i.e. we are measuring in crotchets) the passage starts
The C@merata evaluations are concerned with the relationship         before the first crotchet beat of bar one (1:1) and the passage ends
between Natural Language Processing (NLP) and Music                  after the fourth crotchet beat of bar two (2:4).
Information Retrieval (MIR). Descriptions of classical music in          A point specifies the instant at which something happens in the
books, papers, reviews and web pages are often very detailed and     score. For example, [ 3/4, 1, 7a3 ] means we are in 3/4 time,
technical; however, experts can readily understand how they          beating in crotchets, and the point falls in bar 7 after the third
relate to the music. How can computers attain an equal level of      crotchet beat. In other words, this is the very end of bar seven. Of
understanding? The C@merata task aims to answer this question.       course, we have to consider whether that is the same as the start of
    Each year, there are 200 questions against twenty classical      bar eight, which we would write as [ 3/4, 1, 8b1 ]. There are subtle
music scores in MusicXML. Participants have to build a system        differences; a repeat mark could be considered as being at the end
which can return a set of one or more answer passages for each       of a bar and not at the start of the next bar, while a key signature
query. In previous years, each passage marked the start and end in   would be at the start of a bar, not at the end of the previous one.
the score of one answer to the question. This year, in addition,         For the 2017 task we resolve this ambiguity by stating that all
some answers are points in the score. For example, it is sometimes   points must be specified in the ‘a’ form (e.g. ‘7a3’) except the
not clear where a cadence starts and ends; what is clear is the      very start of the piece, which by definition will need to be in the
                                                                     ‘b’ form.
                                                                         We have reflected on the significance of points vs. passages in
Copyright is held by the owner/authors(s).                           the current campaign. Firstly, clefs, key signatures and time
MediaEval’17, September 2017, Dublin, Ireland                        signatures are all points, because they all have zero length.
MediaEval’17, 13-15 September 2017, Dublin, Ireland                                                                                 R. Sutcliffe et al.

Changes of clef, key signature etc. in the middle of the piece are          column. There were two from 2014, one from 2015, and seven
also points. Grace notes are points, because, while they do have a          from 2016, with the remaining ten being new this year. Because
length in performance, they have zero length from the perspective           of our wish to set questions based on real documents, we were
of the beat arithmetic within the bar in which they occur. A                particularly looking for well-known scores so that there were
particularly interesting example can be found towards the end of            plenty of suitable texts. For example, symphonies by Haydn,
Der Dichter Spricht from Kinderszenen by Robert Schumann.                   Mozart and Beethoven together with string quartets by the same
This is an extended passage lasting perhaps sixteen seconds, all            composers seemed likely to fit our criteria. However, such scores
taking up no beats in the score! Cadences are points not passages,          are not that readily available in high-quality MusicXML scores.
because it is always clear where the transition from the V chord to         Hence, we wanted to re-use such scores wherever possible.
the I chord takes place as it lies immediately before the start of the          As we have remarked in earlier papers, there are essentially
I chord. On the other hand, the start of the V chord may not be             two sources of MusicXML scores. Firstly, there are exports made
that clear, leading to problems if a cadence is specified as a              from scores in Finale, Sibelius etc. Such scores are typically
passage with a beginning and end. Interestingly, this problem               created by amateur enthusiasts; they may contain mistakes and are
seems not to have come to light in the context of music theory              not subject to any consistency or accuracy checks. Moreover, the
exams and oral tests; the examiner may ask for a cadence to be              various score-writing programs all generate different MusicXML
identified (as perfect or plagal for example) but they never ask            for the ‘same’ music. Secondly, there is a large body of scores in
exactly where it is, as this is considered to be obvious.                   Kern format, mainly from CCARH at Stanford. These scores are
    The next issue concerning points is that the start of a bar is not      extremely accurate in their original format. However, the
the same as the end of the previous one, as we have already                 conversion to MusicXML is not good. Problems with scores have
mentioned. Finally, graphic symbols such as dynamics (p,f) could            dogged our evaluations in previous years, so for the current
have a position which was the closest point in the score (in the            campaign Donncha Ó Maidín undertook to check them carefully
case of printed scores).                                                    for syntactic and semantic errors such as incorrect elements,
    Once a system has answered a question, we need a method of              attributes, inconsistent bar numbering and so on.
scoring the passage or passages returned. We use an automatic                   Concerning the scores themselves, they range from 1 stave up
evaluation procedure. A passage is beat correct if it starts at the         to eighteen (Table 4). There are four symphonic movements by
correct beat in the correct start bar and it ends at the correct beat       Mozart and Beethoven, the Berlioz Corsaire overture, and ‘And
in the correct end bar. So, if the correct answer is a crotchet, the        the Glory of the Lord’ from Handel’s Messiah. These pieces range
passage must start immediately before the crotchet in question and          from ten up to eighteen staves. Next, there are two Vivaldi
it must end immediately after it. Similarly, a passage is measure           concertos, each on eight staves. Then there are two string quartet
correct if it starts in the correct start bar (but not necessarily at the   movements, by Haydn and Beethoven, on four staves, together
correct beat) and ends in the correct end bar. The notion of ‘beat          with a Schubert song (Ständchen, D923) which is on three staves.
correct’ is used as the basis for the strict evaluation measures,           Four Scarlatti sonatas, two Beethoven sonata movements, and
while ‘measure correct’ is for the lenient evaluation measures.             pieces by Mussorgsky and Bartók lie on two staves and a
Based on beat correct and measure correct, we can define the                movement from the first Bach Cello Suite (BWV1007) occupies a
following:                                                                  single stave.
    Beat Precision (BP) as the number of beat-correct passages                  All the scores are well-known pieces, so we were hopeful that
returned by a system, in answer to a question, divided by the               suitable texts discussing those pieces would be available. This
number of passages (correct or incorrect) returned.                         year, for the first time, we asked participants to set ten questions
    Beat Recall (BR) is the number of beat-correct passages                 each. Firstly, the participant was asked to devise five questions,
returned by a system divided by the total number of answer                  one each of types 1_melod, n_melod, 1_harm, n_harm, and
passages known to exist.                                                    texture (see Table 1 for example queries of these types). Secondly,
    Beat F-Score (BF) is the harmonic mean of BP and BR.                    the participant was asked to search likely text documents for noun
    Measure Precision (MP) is the number of measure-correct                 phrases which could be used as queries. For example here is such
passages returned by a system divided by the number of passages             a noun phrase about Bartók (but not about the Ten Easy Pieces in
(correct or incorrect) returned.                                            fact): ‘two whole-tone pentachords A-E# and F#-C’1. We allowed
    Measure Recall (MR) is the number of measure-correct                    minor modifications to the text for practical purposes, and we also
passages returned by a system divided by the total number of                permitted a bar restriction to be added (e.g. ‘...in bars 10-20’) in
answer passages known to exist.                                             order to reduce the search for matching passages in the scores.
    Measure F-Score (MF) is the harmonic mean of MP and MR.                 The participants did indeed set good questions and also returned
                                                                            well-formed answers, so this innovation was a success.
3 PREPARATION OF GOLD STANDARD                                              Concerning evaluation, we decided to ignore the effect of
                                                                            knowing ten questions in advance, because all participants were in
The Gold standard consists of twenty scores, ten questions against
each, and a complete list of answer passages for each question.
The scores chosen can be seen in Table 3. This year we decided to           1
                                                                                http://homepage.tinet.ie/~braddellr/disso/ch4.htm
re-use some scores from earlier years, as indicated in the ‘Origin’
MediaEval’17, 13-15 September 2017, Dublin, Ireland                                                                      R. Sutcliffe et al.

the same position. Thus we did not eliminate from evaluation            output produced by the parser for that stage of processing. Each
answers to the questions a participant had themselves set.              stage is concerned with one kind of constuct. For example, Stage
    The remaining questions were set by the organisers. We tried        2 recognises that ‘C Major’ is a key and creates some attribute-
to find suitable texts for the scores and then scanned these looking    value pairs to capture this information; similarly, Stage 19
for suitable noun phrases. We then made up the balance with             recognises ‘five-note melody’ as a melody having a particular
devised questions. In all, 49 queries were set from real texts (by      number of notes, this information being represented as further
participants or organisers) while the remaining 151 were devised        attribute-value pairs. Later on, a phrase like ‘five-note melody in
by us.                                                                  C Major’ can be converted into a feature structure by combining
    Concerning the categories of query, the five basic types of         the outputs of Stages 2 and 19. At the end, SpaCY parses the
previous years were used: 1_melod, n_melod, 1_harm, n_harm              entire pre-processed input and can recognise any grammatical
and texture. Examples can be seen in Table 1. A 1_melod is based        constructs, whether expected or not. However, the use of pre-
on a single note while an n_melod is a series of notes of some          processing considerably reduces the parsing ambiguity.
kind. A 1_harm is a single chord and an n_harm is a series of               The parser was developed using the 2014-2016 data sets. It
chords. Finally, we have the texture classification for                 was then run on the 2017 queries and the resulting JSONs were
‘counterpoint’, ‘melody with accompaniment’ etc. It is a basic          added to the XML Gold Standard and made available to
classification but it has worked well for us.                           participants on request.
    As well the five basic categories, some of the queries are              So far, this is work in progress. When we found constructs or
additionally assigned one of the types follow and synch (Table 2).      concepts which were not being handled, we extended the
These were also used in previous years, and allow more                  representation accordingly. We carried out some error checking
complicated queries. In essence, a follow query is one musical          and made corrections along the way. However, there has so far
event coming after another (‘continuo passage then a ripieno            been no formal evaluation.
passage in measures 5-18’). In MIR terms, the juxtaposition of              Table 10 shows some examples of output for queries in the
two phrases in this manner greatly reduces the number of                2017 test set. As can be seen, quite complex constructs can be
matching passages, in the same way that word bigrams in Natural         handled. We also have some provision for adding ‘unknown’
Language Processing are less frequent than the consituent words         information into the JSON so that some downstream processor
alone. This year, some follow queries had three phrases in              could carry out further analysis if need be. For example, ‘a
sequence. A synch query specifies that two musical events are           yearning melody in A flat’ would be recognised with some
going on at the same time (‘quarter notes C#, F# during a               certainty as a sequence of notes in a stated key. However, we
crescendo’).                                                            could also note that a required property of the melody was that it
    Following the pattern of previous years, queries were               was ‘yearning’, even though we might not know what that
composed originally in ASCII Short Form and then converted into         actually meant.
XML to make the Gold Standard. All questions and answers were               The aim of this work is to see what the limit is concerning the
checked by a second expert.                                             representation of a musical text, whether it is highly specific or
                                                                        quite vague. So far, we feel that a considerable amount of
4 QUERY REPRESENTATIONS IN JSON                                         complex information can indeed be handled in this way but
                                                                        further detailed work is needed to take this further.
At the MediaEval 2016 Technical Retreat in Hilversum, The
Netherlands, one possibility discussed was to divide the
C@merata task into two stages: The first stage was to convert the       5   2017 CAMPAIGN
natural language query into an intermediate representation which        This year there were just two participants, CLAS and DMUN
was closer to an Information Retrieval query; the second stage          (Table 5). CLAS took part in 2014 and 2015 but was unable to
was to search the score using this derived information. The aim         undertake the task last year. DMUN has taken part in all four
was to separate the NLP aspect of the task from the MIR. We             years of the C@merata task, but with a different leader for 2016
decided to try this idea out. We built a multi-stage top-down           and 2017.
parser which carries out the recognition and analysis of many              The CLAS system [11] was built using Python, NLTK,
kinds of musical textual phrase. The basic algorithm was from an        Music21 and MongoDB. Firstly, the query is parsed using NLTK,
earlier parser we built [3]. The output of this parser is then passed   together with a feature-based Context-Free Grammar which
to the SpaCY statistical parser2. In the final stage, the output is     specifies the controlled language for the C@merata music queries.
analysed and a feature structure created. We chose JSON to              This grammar was an extension of the one used in 2014 and 2015.
represent the information as it is very flexible, well-known, and       The result is a feature structure corresponding to the key semantic
compatible with Python.                                                 elements of the query which is then used to retrieve results. The
    The various stages of the parser can be seen in Table 9. For        twenty MusicXML scores are indexed using MongoDB tables.
each stage, an example input is shown together with the JSON            For each score, there were four tables: Titles, Musical Events,
                                                                        Sequences and Analysis.
2
                                                                           The query is answered by carrying out searches of the
    https://spacy.io/
                                                                        MongoDB tables using the query's feature structure, and


                                                                                                                                          3
MediaEval’17, 13-15 September 2017, Dublin, Ireland                                                                         R. Sutcliffe et al.

combining information derived from the results returned. A               be accounted for by the fact that CLAS employed sophisticated
particular problem to be overcome was that feature unification as        chord and chord sequence processing, using in part the music21
used in previous versions of the CLAS system can ignore                  chordify function, and building on their experience with the
attribute-value pairs, on one side or the other of the unification,      handling of chords in previous editions of C@merata.
which do not match. MongoDB queries do not have this property.               Perhaps the 1_melod and n_melod were not that simple.
Thus each feature structure derived from the original input query        1_melod included ‘G# at the start of a bar’ where you need to
had to be converted into a NoSQL query to take account of this.          know what the start of a bar means, and ‘a dip down to the lowest
Querying for sequences of notes in the database was performed by         note on the instrument’ which is similarly rather poetic, and ‘D
a series of searches, each checking if the event at the next             two octaves lower than D4’ which is a pitch relative to another
timestep corresponded to the relevant sequence note. A similar           pitch. For n_melod we had ‘ten consecutive quavers in the left
approach was taken for chords and other sequences.                       hand in bars 50-end’ where you have to interpret ‘ten consecutive’
    The DMUN system [2] takes as input a text query and                  as well as knowing what ‘end’ is. Another example is ‘four
initialises a query-parser object by loading a .json language file, a    descending quavers’ which involves knowing what ‘descending
dictionary with single term-types for keys, and sets of terms for        means’. Of course, these vaguer and more figurative expressions
values. The query parser converts the text of the query into a           are much more realistic and they show where natural language can
Formal Information Request (FIR), another dictionary, by                 come into its own as a means of specifying musical information.
gradually identifying and replacing the terms, term types and            By contrast, simple, unambiguous concepts like ‘C#’ can be
compound types of the query with their types found in the                adequately handled in a symbolic query language suitably adapted
language file, until a top-level description of the query is found.      to the music domain.
The FIR is then sent to the Music Information Retrieval (MIR)
module which in turn selects the corresponding information-              6   CONCLUSIONS
request retrieval function. All the currently possible information
                                                                         This year, there were several important innovations in the
requests are implemented as combinations of three core types of
                                                                         C@merata task. First, queries were contributed by participants for
MIR function that find, relate and constrain music entities such as
                                                                         the first time. This involved a significant effort and commitment
notes/rests and note sets (melodies, chords, etc.). Lastly, the
                                                                         by the participants, for which we are very grateful. Moreover,
output of the MIR functions, which comprises music elements, is
                                                                         they had to assimilate the finer points of the coding process such
converted into passages.
                                                                         as the exact ASCII Short Form syntax, the procedures to follow in
    Each participant submitted one run. The results are shown in
                                                                         cases of ambiguity and the assignment of query-type tags like
Table 6. According to the author of DMUN, most queries were
                                                                         1_melod and n_harm. However, it was a success and there was a
processed with manual intervention and only a few queries were
                                                                         significant gain in having composed queries contributed by new
answered entirely automatically. Therefore the only automatic run
                                                                         people who are moreover extremely expert in their own fields of
is CLAS. The overall BF score for CLAS is 0.135 with a very
                                                                         music. They also extracted some very nice queries from real texts
similar MF of 0.166. Last year, the best scores were for DMUN01
                                                                         and along the way highlighted some important new sources for
(BF=0.070, MF=0.106) so this year is an improvement. Moreover,
                                                                         such texts.
the task is definitely harder than last year and in addition DMUN
                                                                             As far as evaluation is concerned within a scenario where each
in 2016 also declared some manual intervention and analysis on
                                                                         participant knows some of the questions, we ignored this issue for
the basis of only a subset of attempted queries. So the CLAS
                                                                         2017. There were only ten queries for each participant, out of 200,
result for this year is very good. CLAS scored very highly in 2014
                                                                         and indeed the participants could not necessarily answer their own
(BF=0.797, MF=0.854) and 2015 (BF=0.620, MF=0.656). On the
                                                                         questions anyway. The overall scores are low, so knowedge of
other hand, the task has increased greatly in difficulty; in the early
                                                                         10/200 queries, i.e. 5%, does not seem to us a significant factor.
years there were mostly simple notes etc. (F#, crotchet rest) and
                                                                             The second innovation was that more queries overall came
none of the advanced musical concepts and complicated syntactic
                                                                         from real texts than before – 49/200 with twenty contributed by
structures we now have.
                                                                         participants and 29 contributed by the organisers. This number
    Table 7 shows the average results over both runs for different
                                                                         could be higher, but there is a significant difficulty in finding and
query types. By looking at the BF column, for example, we can
                                                                         analysing such texts. We need to devote more time to the
gain some insight into the relative difficulty of the different types.
                                                                         collection of texts and their corresponding scores as an activity in
Interestingly, 1_harm has the highest score (BF=0.286) followed
                                                                         itself. As always, we are hampered by the minimal supply of high-
by n_harm (0.255), n_melod (0.218), texture (0.158) and finally
                                                                         quality MusicXML scores.
1_melod (0.148). One would expect the 1_melod questions to be
                                                                             Third, all the MusicXML scores were carefully checked this
the easiest as they refer to simple individual notes. n_harm queries
                                                                         year for conformance to the standard in terms of elements, their
are sequences of chords and include cadences. Table 8 provides
                                                                         context of use (relative to other elements), the attributes they
the same information but just for CLAS, this being the only fully
                                                                         have, and their values. For this invaluable work we must thank
automatic run. Once again, n_harm is the best (BF=0.269)
                                                                         Donncha Ó Maidín who devoted a great deal of time to it. In
followed by 1_harm (0.251), n_melod (0.151), texture (0.130) and
                                                                         previous years, Donncha has analysed MusicXML files from first
finally 1_melod (0.076). This is once again surprising but could
                                                                         principles using his own CPN software, while most participants
MediaEval’17, 13-15 September 2017, Dublin, Ireland                                                                          R. Sutcliffe et al.

tend to use Music21 [1]. This has given him a unique insight into         downstream component if required. There are always going to be
the finer points. In addition to conformance, scores were also            vague and ambiguously specified aspects to a musical description,
checked in respect of bar numbering. In MusicXML scores there             and we need a way of working with these within a more specific
can be problems in respect of anacrusis bars (which are                   type of feature structure.
conventionally numbered zero, but can be numbered one or even                 Turning now to the overall conclusion for the task, the CLAS
not numbered at all) and repeat bars (which can have a non-               result was very good considering the difficulty, but there were
numerical number like 10a or, once again, no number). Bar                 insufficient participants and hence runs to get a reasonable spread
numbering is extremely important to us as answer passages are             of results which could be analysed properly.
identified in terms of them.                                                  Finally, concerning work which we can do in future, there are
    Fourth, we adopted points in the score in addition to the             several possibilities. First, we can derive more queries from real
passages we have used since 2014. A point can capture an event            texts; we need to work on this more systematically, and over a
which is instantaneous and which therefore does not involve a             longer time period than the C@merata task itself. Second, we
range of beats. Examples include cadences, where the change               could expand our passage representation to pick out answers more
from the V chord to the I chord is the key defining aspect (and           precisely; at present, we have vertical ‘lines’ through the score but
indeed the only one which is unambiguous), and changes of key             we have no ‘horizontal’ ones – we do not identify which stave(s)
signature or time signature. There were only a few such queries           contain the answer and also we do not know which parts within a
and, due to the low results overall, it is not possible to assess their   matching stave are relevant. For example, there could be two horn
effect reliably. However, this was useful work since points are           parts on the one stave, but only one of those could match the
clearly the correct way of answering certain questions.                   answer. Ways of doing this have been suggested in other contexts
    Fifth, this year we once again used our five-way categorisation       [10] which we might be able to adopt. Third, we can work more
of scores – 1_melod, n_melod, 1_harm, n_harm and texture – and            on the JSON representations as already discussed, and fourth we
this worked well for us. No classification can work perfectly, but        can consider widening participation by further parameterisation of
this one is largely correct for most queries and is therefore useful      the task.
enough to be worth employing. A more detailed classification is
much more complicated to work with for question encoders and is
likely to display a larger number of shortcomings as well. In
addition to the five types, we once again employed two
modification types, follow and synch; loosely applied, these can
characterise many of the more complicated kinds of queries
because when there are two or more music events, either they are
happening partly together or not together; if they are not together,
one must be either before or after the other. Hence, follow and
synch classifiers can capture many complex musical descriptions.
    Sixth, we worked on capturing the content of a queries using a
hierarchical feature structure which we expressed in JSON.
Moreover, we wrote a parser for converting queries to JSON and
this was developed in terms of the 2014-16 C@merata test sets.
What we found was that this type of analysis was indeed possible,
a high proportion of the semantic content of our queries could be
captured in such a way, and that our parser, while far from
perfect, was surprisingly good. This is work in progress, which
took place in parallel with the C@merata evaluation, and more
detailed and comprehensive work needs to be done on it. Also, an
evaluation needs to be carried out. Such a representation can
obviously not capture every subtlety of a query; consider the last
example of Table 10: ‘rocking eighth-note chords in the piano
right hand against half-note octaves in the piano left hand in
measures 1-10’. The JSON captures the length of the chords, the
hand and the instrument playing them, the harmonic octaves and
their length, the relationship between these two, and the bar
restriction. It does not capture the vaguely-specified plurality of
both components or the meaning of ‘rocking’. The former can
readily be addressed and we would propose that the latter kind of
description could be captured by ad hoc attributes (e.g.
additional_description) which could then be processed by a


                                                                                                                                              5
MediaEval’17, 13-15 September 2017, Dublin, Ireland                   R. Sutcliffe et al.

REFERENCES
[1]  Cuthbert, M. S., & Ariza C. (2010). music21: a toolkit for
     computer-aided musicology and symbolic music data.
     Proceedings of the International Symposium on Music
     Information Retrieval, Utrecht, The Netherlands, August 09 -
     13, 2010, p637-642.
[2] Katsiavalos, A. (2017). The DMUN System at the MediaEval
     2017 C@merata Task. Proceedings of the MediaEval 2017
     Workshop, Trinity College Dublin, Ireland, September 13-15,
     2017.
[3] Sutcliffe, R. F. E. (2000). Using A Robust Layered Parser to
     Analyse Technical Manual Text. Cuadernos de Filología
     Inglesa, 9(1), 167-189. Número Monográfico: Corpus-based
     Research in English Language and Linguistics.
     http://www.csis.ul.ie/staff/Richard.Sutcliffe/murcia_parsing_
     paper00_repaginated.pdf
[4] Sutcliffe, R. F. E., Collins, T., Hovy, E., Lewis, R., Fox, C.,
     Root, D. L. (2016). The C@merata task at MediaEval 2016:
     Natural Language Queries Derived from Exam Papers,
     Articles and Other Sources against Classical Music Scores in
     MusicXML. Proceedings of the MediaEval 2016 Workshop,
     Hilversum, The Netherlands, October 20-21, 2016.
     http://ceur-ws.org/Vol-1739/MediaEval_2016_paper_55.pdf.
[5] Sutcliffe, R. F. E., Crawford, T., Fox, C., Root, D. L., &
     Hovy, E. (2014). The C@merata Task at MediaEval 2014:
     Natural language queries on classical music scores.
     Proceedings of the MediaEval 2014 Workshop, Barcelona,
     Spain, October 16-17 2014. http://ceur-ws.org/Vol1263/
     mediaeval2014_submission_46.pdf
[6] Sutcliffe, R. F. E., Crawford, T., Fox, C., Root, D. L., &
     Hovy, E. (2014). Shared Evaluation of Natural Language
     Queries against Classical Music Scores: A Full Description of
     the C@merata 2014 Task. Proceedings of the C@merata Task
     at MediaEval 2014. http://csee.essex.ac.uk/camerata/.
[7] Sutcliffe, R. F. E., Crawford, T., Fox, C., Root, D. L., Hovy,
     E., & Lewis, R. (2015). Relating Natural Language Text to
     Musical Passages. Proceedings of the 16th International
     Society for Music Information Retrieval Conference, Malaga,
     Spain, 26-30 October, 2015. http://ismir2015.uma.es/.
[8] Sutcliffe, R. F. E., Fox, C., Root, D. L., Hovy, E., & Lewis,
     R. (2015). The C@merata Task at MediaEval 2015: Natural
     language queries on classical music scores. Proceedings of
     the MediaEval 2015 Workshop, Dresden, Germany,
     September 14-15 2015. http://ceur-ws.org/Vol-1436/Paper12.
     pdf.
[9] Sutcliffe, R. F. E., Fox, C., Root, D. L., Hovy, E. and Lewis,
     R. (2015). Second Shared Evaluation of Natural Language
     Queries against Classical Music Scores: A Full Description of
     the C@merata 2015 Task. Proceedings of the C@merata Task
     at MediaEval 2015. http://csee.essex.ac.uk/ camerata/.
[10] Viglianti, R. (2015). Enhancing Music Notation
     Addressability. http://mith.umd.edu/research/project/enhancin
     g-music-notation-addressability/.
[11] Wan, S. (2017). The CLAS System at the MediaEval 2017
     C@merata Task. Proceedings of the MediaEval 2017
     Workshop, Trinity College Dublin, Ireland, September 13-15,
     2017.
MediaEval’17, 13-15 September 2017, Dublin, Ireland                                                                       R. Sutcliffe et al.

                                                        Table 1: Query Types


                       Type         No                                          Example
                                                                         G# at the start of a bar
                                                           a dip down to the lowest note on the instrument
                     1_melod        26
                                                             half-note C in the left hand in the treble clef
                                                                    the highest note in the vocal line
                                                                semiquaver melody G D B A B D B D
                                                      ten consecutive quavers in the left hand in bars 50-end
                     n_melod        75
                                                     arpeggio-like passage in the right hand in measures 1-12
                                                  series of eighth notes in the right hand, starting on an off beat
                                                                        Db triad in the right hand
                                                           dominant seventh broken chord of the key of C
                      1_harm        30
                                                                             five-note chord
                                           quarter-note chord F# C# A# E in the whole orchestra in measures 150-180
                                                        six consecutive sixths in the right hand in bars 1-25
                                                                  a cadence, G to C in measures 7-10
                      n_harm        47
                                                       chord of F major, chord of C major, chord of F major
                                                        double stopping on two successive notes in the bass
                                                                               monophony
                                                      first five notes of the fugal entry commencing at bar 27
                      texture       22
                                                            melody with accompaniment in measures 1-6
                                                              homophonic passage in measures 164-170
                        All         200


                        Table 2: follow and synch Queries within 1_melod, n_melod, 1_harm and n_harm


                         Type         No                                    Example
                                         three thirds followed by three crotchets followed by three thirds in the left
                                                                         hand in bars 1-43
                        follow        19                 melody  Bb  C  Eb Bb A followed by a Db chord
                                          six repeated chords E G B C# followed by six repeated chords F# C# A#
                                                  continuo passage then a ripieno passage in measures 5-18
                                                     triplets against even eighth notes in measures 1 to 10
                                                            quarter notes C#, F# during a crescendo
                         synch        14 four  descending   sixteenth notes in the strings against a chord in the winds
                                                                       in measures 190-199
                                              cellos and basses leading into the shadows while the upper strings
                                               accompany with gently throbbing harmonies in measures 73-87
MediaEval’17, 13-15 September 2017, Dublin, Ireland                                                                 R. Sutcliffe et al.

                        Table 3: Scores Used. ‘Origin 2014’ means the score was previously used in 2014.


                                      Work                          Staves        Scoring         Lang     Origin
                       bach_cello_suite_1_bwv1007_prelude              1             vc           Eng.     2014
                               scarlatti_sonata_k30                    2            hpd           Eng.     2015
                              scarlatti_sonata_k281                    2            hpd           Eng.     2016
                              scarlatti_sonata_k320                    2            hpd           Eng.     2016
                              scarlatti_sonata_k466                    2            hpd           Amer.    2014
                       bartok_10_easy_pieces_n4_sostenuto              2             pf           Amer.    2017
                       mussorgsky_pictures_promenade_m1                2             pf           Eng.     2017
                   beethoven_piano_sonata_14_op_27_no_2_m1             2             pf           Amer.    2017
                   beethoven_piano_sonata_14_op_27_no_2_m2             2             pf           Amer.    2017
                            schubert_staendchen_d923                   3           T, pf          Amer.    2017
                    beethoven_string_quartet_op_18_no_3_m1             4        2 vn, va, vc      Amer.    2017
                   haydn_string_quartet_no_57_op_74_no_1_m1            4        2 vn, va, vc      Amer.    2017
                           vivaldi_concerto_4_vn_rv580                 8     4 vn, 2 va, vc, db   Amer.    2016
                                                                               3 vn, va, vc, db,
                       vivaldi_conc_vn_op6_no6_rv239_m1                8                           Eng.    2016
                                                                                      hpd
                                                                               fl, 2 ob, 2 bn, 2
                           mozart_symphony_no40_m4                    10       hn, 2 vn, va, vc,   Eng.    2016
                                                                                       db
                                                                              2 fl, 2 ob, 2 cl, 2
                                                                                bs, 2 hn, 2 tpt,
                         beethoven_symphony_no_1_m1                   12                           Amer.   2017
                                                                             timp, 2 vn, va, vc,
                                                                                       db
                                                                             fl, 2 ob, 2 cl, 2 bs,
                         beethoven_symphony_no_4_m1                   12     2 hn, 2 tpt, timp, 2 Amer.    2017
                                                                                vn, va, vc, db
                                                                              2 fl, 2 ob, 2 cl, 2
                                                                                bs, 2 hn, 2 tpt,
                   beethoven_symphony_3_movement_iii_muse             13                           Eng.    2016
                                                                             timp, 2 vn, va, vc,
                                                                                      db,
                                                                             fl, 2 ob, 2 cl, 4 bs,
                                                                             4 hn, 2 tpt, 3 trbn,
                          berlioz_corsaire_overture_h101              17                           Eng.    2017
                                                                              tuba, timp, 2 vn,
                                                                                   va, vc, db
                                                                             fl, 2 ob, cl, bs, hn,
                                                                             trbn, tuba, SATB,
                          handel_messiah_and_the_glory                18                           Eng.    2016
                                                                              hpd, 2 vn, va, vc,
                                                                                       db


                                                                                                                                     3
MediaEval’17, 13-15 September 2017, Dublin, Ireland                                            R. Sutcliffe et al.

                                       Table 4: Distribution of Scores by Number of Staves


                                                        Staves       Frequency
                                                           2             6
                                                           3             1
                                                           4             6
                                                           5             2
                                                           8             2
                                                          10             1
                                                          13             1
                                                          18             1

                                                          All            20


                                                Table 5: C@merata Participants


                                       Runtag           Leader       Affiliation   Country
                                        CLAS          Stephen Wan      CSIRO       Australia
                                                        Andreas      De Montfort
                                       DMUN                                        England
                                                       Katsiavalos    University
MediaEval’17, 13-15 September 2017, Dublin, Ireland                                                                  R. Sutcliffe et al.

                  Table 6: Results for All Questions: The DMUN01 was experimental and used manual intervention
                  on all but 2-3 queries. Thus CLAS01 is the best automatic run. BP=Beat Pecision, BP=Beat Recall,
                  BF=Beat F-Score, MP=Measure Precision, MR=Measure Recall, MF=Measure F-Score.


                           Run          BP            BR         BF         MP          MR          MF
                         CLAS01        0.099       0.212        0.135      0.122       0.260       0.166
                         DMUN01        0.833       0.155        0.261      0.924       0.172       0.290
                        Maximum        0.833       0.212        0.261      0.924       0.260       0.290
                        Minimum        0.099       0.155        0.135      0.122       0.172       0.166
                         Average       0.466       0.184        0.198      0.523       0.216       0.228


                  Table 7: Average Results by Question Type: BP=Beat Pecision, BP=Beat Recall, BF=Beat F-
                  Score, MP=Measure Precision, MR=Measure Recall, MF=Measure F-Score. Note that follow and
                  synch questions are across 1_melod, n_melod, 1_harm and n_harm.


                           Type              BP            BR         BF       MP          MR          MF
                         1_melod          0.355        0.230       0.148      0.448       0.320       0.192
                         n_melod          0.550        0.167       0.218      0.569       0.190       0.239
                          1_harm          0.727        0.178       0.286      0.727       0.178       0.286
                          n_harm          0.482        0.218       0.255      0.632       0.254       0.310
                          texture         0.588        0.103       0.158      0.588       0.103       0.158
                          follow          0.534        0.026       0.044      0.567       0.040       0.063
                           synch          0.000        0.000       0.000      0.000       0.000       0.000


                  Table 8: CLAS Results by Question Type: BP=Beat Pecision, BP=Beat Recall, BF=Beat F-Score,
                  MP=Measure Precision, MR=Measure Recall, MF=Measure F-Score.


                           Type              BP            BR         BF       MP          MR          MF
                         1_melod          0.043        0.328       0.076      0.062       0.475       0.110
                         n_melod          0.137        0.167       0.151      0.176       0.213       0.193
                          1_harm          0.636        0.156       0.251      0.636       0.156       0.251
                          n_harm          0.250        0.290       0.269      0.263       0.304       0.282
                          texture         0.176        0.103       0.130      0.176       0.103       0.130
                          follow          0.067        0.026       0.037      0.133       0.053       0.076
                           synch          0.000        0.000       0.000      0.000       0.000       0.000


                                                                                                                                      5
MediaEval’17, 13-15 September 2017, Dublin, Ireland                                                     R. Sutcliffe et al.

                                           Table 9: JSON Analysis: Stages of Processing


   No           Stage                Example Input                                        JSON
    1          texture               two-part texture                     { 'texture': 'two_part' }
                                                                   { 'key_name': 'C', 'key_accidental': 0,
    2            key                     C major
                                                                            'key_type': 'major' }
                                                                 { "note_divisions": 48, "note_length": 384,
    3           note          slurred double whole note trill   "note_ornament": "trill", "note_performance":
                                                                                 "slurred" }
                                                                 "note_sequence": [ { "note_accidental": 1,
                                                                   "note_name": "c", "note_octave": 4 }, {
    4      note_sequence                 C#4 D4
                                                                   "note_accidental": 0, "note_name": "d",
                                                                            "note_octave": 4 } ]
    5         measure                   bars 1-10                  { "measure_from": 1, "measure_to": 10 }
    6         underlay              on the word "Der"                      { "note_underlay": "Der" }
    7     instrument_quote               "Cello"                            { "instrument": "Cello" }
    8           staff                    left hand                       { 'staff_hand': 'left' }
                                                           { "instrument": "violin", "instrument_direction":
    9         instrument            violin I divisi
                                                                    "divisi", "instrument_group": 1 }
                                                           { "instrument_list": [ { "instrument": "cello" },
   10   instrument_sequence   cellos and double basses
                                                                  { "instrument": "doublebass" } ] }
   11             triad                 Ib triad             { "relative_pitch": 1, "triad_inversion": 1 }
                                                                     { "interval_augmentation": -2,
                            doubly diminished harmonic
   12           interval                                          "interval_harm_melod": "harmonic",
                                          fifth
                                                                           "interval_size": 5 }
                                                             { "interval_list": [ { "interval_harm_melod":
                                                                  "harmonic", "interval_size": 4 }, {
   13     interval_sequence alternating fourths and fifths         "interval_harm_melod": "harmonic",
                                                            "interval_size": 5 } ], "interval_seq_pattern":
                                                                              "alternating" }
   14           cadence          interrupted cadence                  { "cadence": "interrupted" }
   15         inversion            in the first inversion                  { "triad_inversion": 1 }
                                                                 { "chord_word": true, "note_sequence": [ {
                                                                   "note_accidental": 1, "note_name": "f",
                                                                 "note_octave": 3 }, { "note_accidental": 0,
   16           chord           chord of F#3, D4 and A4
                                                                   "note_name": "d", "note_octave": 4 }, {
                                                                   "note_accidental": 0, "note_name": "a",
                                                                            "note_octave": 4 } ] }
                                                                { "arpeggio_word": true, "key_accidental": 1,
   17         arpeggio           F sharp minor arpeggio
                                                                   "key_name": "F", "key_type": "minor" }
                                                                   { "key_accidental": 0, "key_name": "C",
   18           scale                 C major scale
                                                                  "key_type": "major", "scale_word": true }
   19          melody               five-note melody              { "melody_word": true, "note_count": 5 }
   20            by                See examples below
   21       simultaneous           See examples below
                                                                 { "note_divisions": 48, "note_length": 192,
   22     loose_indication       fermata on a whole note
                                                                       "note_performance": "fermata" }
   23      time_signature                  12/8                    { "time_higher": 12, "time_lower": 8 }
MediaEval’17, 13-15 September 2017, Dublin, Ireland                                         R. Sutcliffe et al.

                                          Table 10: Examples of Queries Converted to JSON

                               dotted crotchet Bb in the right hand in bars 23-40

                               {
                                   "first": {
                                      "measure_from": 23,
                                      "measure_to": 40,
                                      "note_accidental": -1,
                                      "note_divisions": 48,
                                      "note_length": 48,
                                      "note_length_multiplier": 1.5,
                                      "note_name": "b",
                                      "note_octave": -1,
                                      "staff_hand": "right"
                                   },
                                   "second": {},
                                   "type": "simple"
                               }
                               dotted crotchet chord B2 B3 D#5 in bars 1-46

                               {
                                   "first": {
                                      "chord_word": true,
                                      "measure_from": 1,
                                      "measure_to": 46,
                                      "note_divisions": 48,
                                      "note_length": 48,
                                      "note_length_multiplier": 1.5,
                                      "note_sequence": [
                                         {
                                            "note_accidental": 0,
                                            "note_name": "b",
                                            "note_octave": 2
                                         },
                                         {
                                            "note_accidental": 0,
                                            "note_name": "b",
                                            "note_octave": 3
                                         },
                                         {
                                            "note_accidental": 1,
                                            "note_name": "d",
                                            "note_octave": 5
                                         }
                                      ]
                                   },
                                   "second": {},
                                   "type": "simple"
                               }


                                                                                                             7
MediaEval’17, 13-15 September 2017, Dublin, Ireland                                                                    R. Sutcliffe et al.

                               monophonic passage lasting twelve crotchet beats

                               {
                                   "first": {
                                      "note_divisions": 48,
                                      "note_length": 48,
                                      "number": 12,
                                      "texture": "monophony"
                                   },
                                   "second": {},
                                   "type": "simple"
                               }
                               G# quaver in the right hand against a crotchet in the left hand in bars 1-25

                               {
                                   "first": {
                                      "note_accidental": 1,
                                      "note_divisions": 48,
                                      "note_length": 24,
                                      "note_name": "g",
                                      "note_octave": -1,
                                      "staff_hand": "right"
                                   },
                                   "second": {
                                      "measure_from": 1,
                                      "measure_to": 25,
                                      "note_divisions": 48,
                                      "note_length": 48,
                                      "staff_hand": "left"
                                   },
                                   "type": "against"
                               }
                               descending arpeggio in quavers followed by ascending arpeggio in quavers in bars 1-30

                               {
                                   "first": {
                                      "arpeggio_word": true,
                                      "direction": "falling",
                                      "note_divisions": 48,
                                      "note_length": 24
                                   },
                                   "second": {
                                      "arpeggio_word": true,
                                      "direction": "rising",
                                      "measure_from": 1,
                                      "measure_to": 30,
                                      "note_divisions": 48,
                                      "note_length": 24
                                   },
                                   "type": "followed_now"
                               }
MediaEval’17, 13-15 September 2017, Dublin, Ireland                                                                        R. Sutcliffe et al.

                               rocking eighth-note chords in the piano right hand against half-note octaves in the piano
                               left hand in measures 1-10

                               {
                                   "first": {
                                      "chord_word": true,
                                      "instrument": "piano",
                                      "note_divisions": 48,
                                      "note_length": 24,
                                      "staff_hand": "right"
                                   },
                                   "second": {
                                      "instrument": "piano",
                                      "interval_harm_melod": "harmonic",
                                      "interval_size": 8,
                                      "measure_from": 1,
                                      "measure_to": 10,
                                      "note_divisions": 48,
                                      "note_length": 96,
                                      "staff_hand": "left"
                                   },
                                   "type": "against"
                               }


                                                                                                                                            9