<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>The C@merata task at MediaEval 2017: Natural Language Queries about Music, their JSON Representations, and Matching Passages in MusicXML Scores</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Richard Sutcliffe</string-name>
          <email>C@merata</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Donncha S. Ó Maidín</string-name>
          <email>donncha.omaidin@ul.ie</email>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Eduard Hovy</string-name>
          <email>hovy@cmu.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Carnegie-Mellon University</institution>
          ,
          <country country="US">USA</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>University of Essex</institution>
          ,
          <country country="UK">UK</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>University of Limerick</institution>
          ,
          <country country="IE">Ireland</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2017</year>
      </pub-date>
      <fpage>13</fpage>
      <lpage>15</lpage>
      <abstract>
        <p>The C@merata task at MediaEval started in 2014 and is now in its fourth year. It is a combination of Natural Language Processing and Music Information Retrieval. The input is a short query ('six consecutive sixths in the right hand in bars 1-25') against a classical music score in MusicXML. The required output is a set of matching passages in the score. There are 200 queries and 20 scores each year. There were several innovations for 2017: First, some queries such as cadences required an answer which was a point in a score rather than a passage; second, queries were contributed by participants as well as by the organisers; third, some of the queries were directly taken from real texts such as articles and webpages; fourth, the organisers provided experimental representations of the input queries in the form of JSON feature structures. These capture many aspects of the queries in a form which is much closer to an MIR query. There were just two participants in the evaluation, and scores were understandably low given the considerable difficulty of the queries. However, this year we have significantly advanced our knowledge of how music is talked about in natural language texts, how these relate to MIR queries, and how to go about converting a text into a query.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1 INTRODUCTION</title>
      <p>The C@merata evaluations are concerned with the relationship
between Natural Language Processing (NLP) and Music
Information Retrieval (MIR). Descriptions of classical music in
books, papers, reviews and web pages are often very detailed and
technical; however, experts can readily understand how they
relate to the music. How can computers attain an equal level of
understanding? The C@merata task aims to answer this question.</p>
      <p>Each year, there are 200 questions against twenty classical
music scores in MusicXML. Participants have to build a system
which can return a set of one or more answer passages for each
query. In previous years, each passage marked the start and end in
the score of one answer to the question. This year, in addition,
some answers are points in the score. For example, it is sometimes
not clear where a cadence starts and ends; what is clear is the
instant when the V-I transition occurs. Therefore, by using points,
we can reduce ambiguity. The use of points allowed new types of
query such as key changes, which are clearly points not passages.</p>
      <p>In the next section we briefly describe the task, including the
method of evaluation. After that, we outline the preparation of the
Gold Standard data, comprising queries, scores and answers.
Next, we briefly describe the work on feature structure
representations for queries, expressed in JSON. The details of the
2017 campaign are then presented, together with the results.
Finally, we draw conclusions from the 2017 C@merata task.
2</p>
      <p>
        2017 TASK
The C@merata task has remained almost unchanged since 2014
and detailed descriptions can be found in previous papers
[
        <xref ref-type="bibr" rid="ref4 ref5 ref6 ref7 ref8 ref9">4,5,6,7,8,9</xref>
        ]. We summarise the main points here. There are 200
questions, each one being a single noun phrase in English. Half of
the questions use American terminology (quarter note, measure)
while the other half use English terminology (crotchet, bar).
      </p>
      <p>There are twenty MusicXML scores, and ten queries are set
against each one. Participants must answer each query by means
of one or more answer passages or answer points. There are now
two forms of answer, passages and points. A passage specifies
part of a score, beginning and ending at specific places. For
example, [4/4,1,1:1-2:4] means we are in 4/4 time, divisions is set
to one (i.e. we are measuring in crotchets) the passage starts
before the first crotchet beat of bar one (1:1) and the passage ends
after the fourth crotchet beat of bar two (2:4).</p>
      <p>A point specifies the instant at which something happens in the
score. For example, [ 3/4, 1, 7a3 ] means we are in 3/4 time,
beating in crotchets, and the point falls in bar 7 after the third
crotchet beat. In other words, this is the very end of bar seven. Of
course, we have to consider whether that is the same as the start of
bar eight, which we would write as [ 3/4, 1, 8b1 ]. There are subtle
differences; a repeat mark could be considered as being at the end
of a bar and not at the start of the next bar, while a key signature
would be at the start of a bar, not at the end of the previous one.</p>
      <p>For the 2017 task we resolve this ambiguity by stating that all
points must be specified in the ‘a’ form (e.g. ‘7a3’) except the
very start of the piece, which by definition will need to be in the
‘b’ form.</p>
      <p>We have reflected on the significance of points vs. passages in
the current campaign. Firstly, clefs, key signatures and time
signatures are all points, because they all have zero length.
Changes of clef, key signature etc. in the middle of the piece are
also points. Grace notes are points, because, while they do have a
length in performance, they have zero length from the perspective
of the beat arithmetic within the bar in which they occur. A
particularly interesting example can be found towards the end of
Der Dichter Spricht from Kinderszenen by Robert Schumann.
This is an extended passage lasting perhaps sixteen seconds, all
taking up no beats in the score! Cadences are points not passages,
because it is always clear where the transition from the V chord to
the I chord takes place as it lies immediately before the start of the
I chord. On the other hand, the start of the V chord may not be
that clear, leading to problems if a cadence is specified as a
passage with a beginning and end. Interestingly, this problem
seems not to have come to light in the context of music theory
exams and oral tests; the examiner may ask for a cadence to be
identified (as perfect or plagal for example) but they never ask
exactly where it is, as this is considered to be obvious.</p>
      <p>The next issue concerning points is that the start of a bar is not
the same as the end of the previous one, as we have already
mentioned. Finally, graphic symbols such as dynamics (p,f) could
have a position which was the closest point in the score (in the
case of printed scores).</p>
      <p>Once a system has answered a question, we need a method of
scoring the passage or passages returned. We use an automatic
evaluation procedure. A passage is beat correct if it starts at the
correct beat in the correct start bar and it ends at the correct beat
in the correct end bar. So, if the correct answer is a crotchet, the
passage must start immediately before the crotchet in question and
it must end immediately after it. Similarly, a passage is measure
correct if it starts in the correct start bar (but not necessarily at the
correct beat) and ends in the correct end bar. The notion of ‘beat
correct’ is used as the basis for the strict evaluation measures,
while ‘measure correct’ is for the lenient evaluation measures.
Based on beat correct and measure correct, we can define the
following:</p>
      <p>Beat Precision (BP) as the number of beat-correct passages
returned by a system, in answer to a question, divided by the
number of passages (correct or incorrect) returned.</p>
      <p>Beat Recall (BR) is the number of beat-correct passages
returned by a system divided by the total number of answer
passages known to exist.</p>
      <p>Beat F-Score (BF) is the harmonic mean of BP and BR.</p>
      <p>Measure Precision (MP) is the number of measure-correct
passages returned by a system divided by the number of passages
(correct or incorrect) returned.</p>
      <p>Measure Recall (MR) is the number of measure-correct
passages returned by a system divided by the total number of
answer passages known to exist.</p>
      <p>Measure F-Score (MF) is the harmonic mean of MP and MR.</p>
    </sec>
    <sec id="sec-2">
      <title>3 PREPARATION OF GOLD STANDARD</title>
      <p>The Gold standard consists of twenty scores, ten questions against
each, and a complete list of answer passages for each question.
The scores chosen can be seen in Table 3. This year we decided to
re-use some scores from earlier years, as indicated in the ‘Origin’
column. There were two from 2014, one from 2015, and seven
from 2016, with the remaining ten being new this year. Because
of our wish to set questions based on real documents, we were
particularly looking for well-known scores so that there were
plenty of suitable texts. For example, symphonies by Haydn,
Mozart and Beethoven together with string quartets by the same
composers seemed likely to fit our criteria. However, such scores
are not that readily available in high-quality MusicXML scores.
Hence, we wanted to re-use such scores wherever possible.</p>
      <p>As we have remarked in earlier papers, there are essentially
two sources of MusicXML scores. Firstly, there are exports made
from scores in Finale, Sibelius etc. Such scores are typically
created by amateur enthusiasts; they may contain mistakes and are
not subject to any consistency or accuracy checks. Moreover, the
various score-writing programs all generate different MusicXML
for the ‘same’ music. Secondly, there is a large body of scores in
Kern format, mainly from CCARH at Stanford. These scores are
extremely accurate in their original format. However, the
conversion to MusicXML is not good. Problems with scores have
dogged our evaluations in previous years, so for the current
campaign Donncha Ó Maidín undertook to check them carefully
for syntactic and semantic errors such as incorrect elements,
attributes, inconsistent bar numbering and so on.</p>
      <p>Concerning the scores themselves, they range from 1 stave up
to eighteen (Table 4). There are four symphonic movements by
Mozart and Beethoven, the Berlioz Corsaire overture, and ‘And
the Glory of the Lord’ from Handel’s Messiah. These pieces range
from ten up to eighteen staves. Next, there are two Vivaldi
concertos, each on eight staves. Then there are two string quartet
movements, by Haydn and Beethoven, on four staves, together
with a Schubert song (Ständchen, D923) which is on three staves.
Four Scarlatti sonatas, two Beethoven sonata movements, and
pieces by Mussorgsky and Bartók lie on two staves and a
movement from the first Bach Cello Suite (BWV1007) occupies a
single stave.</p>
      <p>All the scores are well-known pieces, so we were hopeful that
suitable texts discussing those pieces would be available. This
year, for the first time, we asked participants to set ten questions
each. Firstly, the participant was asked to devise five questions,
one each of types 1_melod, n_melod, 1_harm, n_harm, and
texture (see Table 1 for example queries of these types). Secondly,
the participant was asked to search likely text documents for noun
phrases which could be used as queries. For example here is such
a noun phrase about Bartók (but not about the Ten Easy Pieces in
fact): ‘two whole-tone pentachords A-E# and F#-C’1. We allowed
minor modifications to the text for practical purposes, and we also
permitted a bar restriction to be added (e.g. ‘...in bars 10-20’) in
order to reduce the search for matching passages in the scores.
The participants did indeed set good questions and also returned
well-formed answers, so this innovation was a success.
Concerning evaluation, we decided to ignore the effect of
knowing ten questions in advance, because all participants were in
1 http://homepage.tinet.ie/~braddellr/disso/ch4.htm
the same position. Thus we did not eliminate from evaluation
answers to the questions a participant had themselves set.</p>
      <p>The remaining questions were set by the organisers. We tried
to find suitable texts for the scores and then scanned these looking
for suitable noun phrases. We then made up the balance with
devised questions. In all, 49 queries were set from real texts (by
participants or organisers) while the remaining 151 were devised
by us.</p>
      <p>Concerning the categories of query, the five basic types of
previous years were used: 1_melod, n_melod, 1_harm, n_harm
and texture. Examples can be seen in Table 1. A 1_melod is based
on a single note while an n_melod is a series of notes of some
kind. A 1_harm is a single chord and an n_harm is a series of
chords. Finally, we have the texture classification for
‘counterpoint’, ‘melody with accompaniment’ etc. It is a basic
classification but it has worked well for us.</p>
      <p>As well the five basic categories, some of the queries are
additionally assigned one of the types follow and synch (Table 2).
These were also used in previous years, and allow more
complicated queries. In essence, a follow query is one musical
event coming after another (‘continuo passage then a ripieno
passage in measures 5-18’). In MIR terms, the juxtaposition of
two phrases in this manner greatly reduces the number of
matching passages, in the same way that word bigrams in Natural
Language Processing are less frequent than the consituent words
alone. This year, some follow queries had three phrases in
sequence. A synch query specifies that two musical events are
going on at the same time (‘quarter notes C#, F# during a
crescendo’).</p>
      <p>Following the pattern of previous years, queries were
composed originally in ASCII Short Form and then converted into
XML to make the Gold Standard. All questions and answers were
checked by a second expert.</p>
    </sec>
    <sec id="sec-3">
      <title>4 QUERY REPRESENTATIONS IN JSON</title>
      <p>
        At the MediaEval 2016 Technical Retreat in Hilversum, The
Netherlands, one possibility discussed was to divide the
C@merata task into two stages: The first stage was to convert the
natural language query into an intermediate representation which
was closer to an Information Retrieval query; the second stage
was to search the score using this derived information. The aim
was to separate the NLP aspect of the task from the MIR. We
decided to try this idea out. We built a multi-stage top-down
parser which carries out the recognition and analysis of many
kinds of musical textual phrase. The basic algorithm was from an
earlier parser we built [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. The output of this parser is then passed
to the SpaCY statistical parser2. In the final stage, the output is
analysed and a feature structure created. We chose JSON to
represent the information as it is very flexible, well-known, and
compatible with Python.
      </p>
      <p>The various stages of the parser can be seen in Table 9. For
each stage, an example input is shown together with the JSON
2 https://spacy.io/
output produced by the parser for that stage of processing. Each
stage is concerned with one kind of constuct. For example, Stage
2 recognises that ‘C Major’ is a key and creates some
attributevalue pairs to capture this information; similarly, Stage 19
recognises ‘five-note melody’ as a melody having a particular
number of notes, this information being represented as further
attribute-value pairs. Later on, a phrase like ‘five-note melody in
C Major’ can be converted into a feature structure by combining
the outputs of Stages 2 and 19. At the end, SpaCY parses the
entire pre-processed input and can recognise any grammatical
constructs, whether expected or not. However, the use of
preprocessing considerably reduces the parsing ambiguity.</p>
      <p>The parser was developed using the 2014-2016 data sets. It
was then run on the 2017 queries and the resulting JSONs were
added to the XML Gold Standard and made available to
participants on request.</p>
      <p>So far, this is work in progress. When we found constructs or
concepts which were not being handled, we extended the
representation accordingly. We carried out some error checking
and made corrections along the way. However, there has so far
been no formal evaluation.</p>
      <p>Table 10 shows some examples of output for queries in the
2017 test set. As can be seen, quite complex constructs can be
handled. We also have some provision for adding ‘unknown’
information into the JSON so that some downstream processor
could carry out further analysis if need be. For example, ‘a
yearning melody in A flat’ would be recognised with some
certainty as a sequence of notes in a stated key. However, we
could also note that a required property of the melody was that it
was ‘yearning’, even though we might not know what that
actually meant.</p>
      <p>The aim of this work is to see what the limit is concerning the
representation of a musical text, whether it is highly specific or
quite vague. So far, we feel that a considerable amount of
complex information can indeed be handled in this way but
further detailed work is needed to take this further.
5</p>
      <p>2017 CAMPAIGN
This year there were just two participants, CLAS and DMUN
(Table 5). CLAS took part in 2014 and 2015 but was unable to
undertake the task last year. DMUN has taken part in all four
years of the C@merata task, but with a different leader for 2016
and 2017.</p>
      <p>
        The CLAS system [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] was built using Python, NLTK,
Music21 and MongoDB. Firstly, the query is parsed using NLTK,
together with a feature-based Context-Free Grammar which
specifies the controlled language for the C@merata music queries.
This grammar was an extension of the one used in 2014 and 2015.
The result is a feature structure corresponding to the key semantic
elements of the query which is then used to retrieve results. The
twenty MusicXML scores are indexed using MongoDB tables.
For each score, there were four tables: Titles, Musical Events,
Sequences and Analysis.
      </p>
      <p>The query is answered by carrying out searches of the
MongoDB tables using the query's feature structure, and
combining information derived from the results returned. A
particular problem to be overcome was that feature unification as
used in previous versions of the CLAS system can ignore
attribute-value pairs, on one side or the other of the unification,
which do not match. MongoDB queries do not have this property.
Thus each feature structure derived from the original input query
had to be converted into a NoSQL query to take account of this.
Querying for sequences of notes in the database was performed by
a series of searches, each checking if the event at the next
timestep corresponded to the relevant sequence note. A similar
approach was taken for chords and other sequences.</p>
      <p>
        The DMUN system [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] takes as input a text query and
initialises a query-parser object by loading a .json language file, a
dictionary with single term-types for keys, and sets of terms for
values. The query parser converts the text of the query into a
Formal Information Request (FIR), another dictionary, by
gradually identifying and replacing the terms, term types and
compound types of the query with their types found in the
language file, until a top-level description of the query is found.
The FIR is then sent to the Music Information Retrieval (MIR)
module which in turn selects the corresponding
informationrequest retrieval function. All the currently possible information
requests are implemented as combinations of three core types of
MIR function that find, relate and constrain music entities such as
notes/rests and note sets (melodies, chords, etc.). Lastly, the
output of the MIR functions, which comprises music elements, is
converted into passages.
      </p>
      <p>Each participant submitted one run. The results are shown in
Table 6. According to the author of DMUN, most queries were
processed with manual intervention and only a few queries were
answered entirely automatically. Therefore the only automatic run
is CLAS. The overall BF score for CLAS is 0.135 with a very
similar MF of 0.166. Last year, the best scores were for DMUN01
(BF=0.070, MF=0.106) so this year is an improvement. Moreover,
the task is definitely harder than last year and in addition DMUN
in 2016 also declared some manual intervention and analysis on
the basis of only a subset of attempted queries. So the CLAS
result for this year is very good. CLAS scored very highly in 2014
(BF=0.797, MF=0.854) and 2015 (BF=0.620, MF=0.656). On the
other hand, the task has increased greatly in difficulty; in the early
years there were mostly simple notes etc. (F#, crotchet rest) and
none of the advanced musical concepts and complicated syntactic
structures we now have.</p>
      <p>Table 7 shows the average results over both runs for different
query types. By looking at the BF column, for example, we can
gain some insight into the relative difficulty of the different types.
Interestingly, 1_harm has the highest score (BF=0.286) followed
by n_harm (0.255), n_melod (0.218), texture (0.158) and finally
1_melod (0.148). One would expect the 1_melod questions to be
the easiest as they refer to simple individual notes. n_harm queries
are sequences of chords and include cadences. Table 8 provides
the same information but just for CLAS, this being the only fully
automatic run. Once again, n_harm is the best (BF=0.269)
followed by 1_harm (0.251), n_melod (0.151), texture (0.130) and
finally 1_melod (0.076). This is once again surprising but could
be accounted for by the fact that CLAS employed sophisticated
chord and chord sequence processing, using in part the music21
chordify function, and building on their experience with the
handling of chords in previous editions of C@merata.</p>
      <p>Perhaps the 1_melod and n_melod were not that simple.
1_melod included ‘G# at the start of a bar’ where you need to
know what the start of a bar means, and ‘a dip down to the lowest
note on the instrument’ which is similarly rather poetic, and ‘D
two octaves lower than D4’ which is a pitch relative to another
pitch. For n_melod we had ‘ten consecutive quavers in the left
hand in bars 50-end’ where you have to interpret ‘ten consecutive’
as well as knowing what ‘end’ is. Another example is ‘four
descending quavers’ which involves knowing what ‘descending
means’. Of course, these vaguer and more figurative expressions
are much more realistic and they show where natural language can
come into its own as a means of specifying musical information.
By contrast, simple, unambiguous concepts like ‘C#’ can be
adequately handled in a symbolic query language suitably adapted
to the music domain.
6</p>
    </sec>
    <sec id="sec-4">
      <title>CONCLUSIONS</title>
      <p>This year, there were several important innovations in the
C@merata task. First, queries were contributed by participants for
the first time. This involved a significant effort and commitment
by the participants, for which we are very grateful. Moreover,
they had to assimilate the finer points of the coding process such
as the exact ASCII Short Form syntax, the procedures to follow in
cases of ambiguity and the assignment of query-type tags like
1_melod and n_harm. However, it was a success and there was a
significant gain in having composed queries contributed by new
people who are moreover extremely expert in their own fields of
music. They also extracted some very nice queries from real texts
and along the way highlighted some important new sources for
such texts.</p>
      <p>As far as evaluation is concerned within a scenario where each
participant knows some of the questions, we ignored this issue for
2017. There were only ten queries for each participant, out of 200,
and indeed the participants could not necessarily answer their own
questions anyway. The overall scores are low, so knowedge of
10/200 queries, i.e. 5%, does not seem to us a significant factor.</p>
      <p>The second innovation was that more queries overall came
from real texts than before – 49/200 with twenty contributed by
participants and 29 contributed by the organisers. This number
could be higher, but there is a significant difficulty in finding and
analysing such texts. We need to devote more time to the
collection of texts and their corresponding scores as an activity in
itself. As always, we are hampered by the minimal supply of
highquality MusicXML scores.</p>
      <p>
        Third, all the MusicXML scores were carefully checked this
year for conformance to the standard in terms of elements, their
context of use (relative to other elements), the attributes they
have, and their values. For this invaluable work we must thank
Donncha Ó Maidín who devoted a great deal of time to it. In
previous years, Donncha has analysed MusicXML files from first
principles using his own CPN software, while most participants
tend to use Music21 [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. This has given him a unique insight into
the finer points. In addition to conformance, scores were also
checked in respect of bar numbering. In MusicXML scores there
can be problems in respect of anacrusis bars (which are
conventionally numbered zero, but can be numbered one or even
not numbered at all) and repeat bars (which can have a
nonnumerical number like 10a or, once again, no number). Bar
numbering is extremely important to us as answer passages are
identified in terms of them.
      </p>
      <p>Fourth, we adopted points in the score in addition to the
passages we have used since 2014. A point can capture an event
which is instantaneous and which therefore does not involve a
range of beats. Examples include cadences, where the change
from the V chord to the I chord is the key defining aspect (and
indeed the only one which is unambiguous), and changes of key
signature or time signature. There were only a few such queries
and, due to the low results overall, it is not possible to assess their
effect reliably. However, this was useful work since points are
clearly the correct way of answering certain questions.</p>
      <p>Fifth, this year we once again used our five-way categorisation
of scores – 1_melod, n_melod, 1_harm, n_harm and texture – and
this worked well for us. No classification can work perfectly, but
this one is largely correct for most queries and is therefore useful
enough to be worth employing. A more detailed classification is
much more complicated to work with for question encoders and is
likely to display a larger number of shortcomings as well. In
addition to the five types, we once again employed two
modification types, follow and synch; loosely applied, these can
characterise many of the more complicated kinds of queries
because when there are two or more music events, either they are
happening partly together or not together; if they are not together,
one must be either before or after the other. Hence, follow and
synch classifiers can capture many complex musical descriptions.</p>
      <p>Sixth, we worked on capturing the content of a queries using a
hierarchical feature structure which we expressed in JSON.
Moreover, we wrote a parser for converting queries to JSON and
this was developed in terms of the 2014-16 C@merata test sets.
What we found was that this type of analysis was indeed possible,
a high proportion of the semantic content of our queries could be
captured in such a way, and that our parser, while far from
perfect, was surprisingly good. This is work in progress, which
took place in parallel with the C@merata evaluation, and more
detailed and comprehensive work needs to be done on it. Also, an
evaluation needs to be carried out. Such a representation can
obviously not capture every subtlety of a query; consider the last
example of Table 10: ‘rocking eighth-note chords in the piano
right hand against half-note octaves in the piano left hand in
measures 1-10’. The JSON captures the length of the chords, the
hand and the instrument playing them, the harmonic octaves and
their length, the relationship between these two, and the bar
restriction. It does not capture the vaguely-specified plurality of
both components or the meaning of ‘rocking’. The former can
readily be addressed and we would propose that the latter kind of
description could be captured by ad hoc attributes (e.g.
additional_description) which could then be processed by a
downstream component if required. There are always going to be
vague and ambiguously specified aspects to a musical description,
and we need a way of working with these within a more specific
type of feature structure.</p>
      <p>Turning now to the overall conclusion for the task, the CLAS
result was very good considering the difficulty, but there were
insufficient participants and hence runs to get a reasonable spread
of results which could be analysed properly.</p>
      <p>
        Finally, concerning work which we can do in future, there are
several possibilities. First, we can derive more queries from real
texts; we need to work on this more systematically, and over a
longer time period than the C@merata task itself. Second, we
could expand our passage representation to pick out answers more
precisely; at present, we have vertical ‘lines’ through the score but
we have no ‘horizontal’ ones – we do not identify which stave(s)
contain the answer and also we do not know which parts within a
matching stave are relevant. For example, there could be two horn
parts on the one stave, but only one of those could match the
answer. Ways of doing this have been suggested in other contexts
[
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] which we might be able to adopt. Third, we can work more
on the JSON representations as already discussed, and fourth we
can consider widening participation by further parameterisation of
the task.
      </p>
      <sec id="sec-4-1">
        <title>Type</title>
      </sec>
      <sec id="sec-4-2">
        <title>1_melod</title>
        <p>n_melod</p>
      </sec>
      <sec id="sec-4-3">
        <title>1_harm n_harm texture All</title>
      </sec>
      <sec id="sec-4-4">
        <title>Type follow synch Table 2: follow and synch Queries within 1_melod, n_melod, 1_harm and n_harm</title>
        <p>14 four descending sixteenth notes in the strings against a chord in the winds
in measures 190-199
cellos and basses leading into the shadows while the upper strings
accompany with gently throbbing harmonies in measures 73-87</p>
      </sec>
      <sec id="sec-4-5">
        <title>Runtag</title>
        <p>CLAS
DMUN</p>
      </sec>
      <sec id="sec-4-6">
        <title>Leader</title>
        <p>Stephen Wan</p>
        <p>Andreas
Katsiavalos</p>
      </sec>
      <sec id="sec-4-7">
        <title>Affiliation</title>
        <p>CSIRO
De Montfort
University</p>
      </sec>
      <sec id="sec-4-8">
        <title>Country</title>
        <p>Australia
England</p>
        <p>Run</p>
      </sec>
      <sec id="sec-4-9">
        <title>CLAS01</title>
        <p>DMUN01</p>
      </sec>
      <sec id="sec-4-10">
        <title>Maximum</title>
      </sec>
      <sec id="sec-4-11">
        <title>Minimum</title>
      </sec>
      <sec id="sec-4-12">
        <title>Average BP 0.099</title>
        <p>Stage
texture
key
note
note_sequence
instrument_quote
measure
underlay</p>
        <p>staff
instrument
slurred double whole note trill</p>
      </sec>
      <sec id="sec-4-13">
        <title>Example Input</title>
        <p>two-part texture</p>
        <sec id="sec-4-13-1">
          <title>C major</title>
          <p>C#4 D4
bars 1-10
on the word "Der"
"Cello"
left hand
violin I divisi
10 instrument_sequence cellos and double basses</p>
          <p>interval_sequence alternating fourths and fifths
No
1
2
3
4
5
6
7
8
9
11
12
13
14
15
16
17
18
19
20
21
22</p>
          <p>JSON
{ 'texture': 'two_part' }
{ 'key_name': 'C', 'key_accidental': 0,</p>
          <p>'key_type': 'major' }
{ "note_divisions": 48, "note_length": 384,
"note_ornament": "trill", "note_performance":
"slurred" }
"note_sequence": [ { "note_accidental": 1,
"note_name": "c", "note_octave": 4 }, {
"note_accidental": 0, "note_name": "d",</p>
          <p>"note_octave": 4 } ]
{ "measure_from": 1, "measure_to": 10 }
{ "note_underlay": "Der" }
{ "instrument": "Cello" }
{ 'staff_hand': 'left' }
{ "instrument": "violin", "instrument_direction":
"divisi", "instrument_group": 1 }
{ "instrument_list": [ { "instrument": "cello" },
{ "instrument": "doublebass" } ] }
{ "relative_pitch": 1, "triad_inversion": 1 }
{ "interval_augmentation": -2,
"interval_harm_melod": "harmonic",</p>
          <p>"interval_size": 5 }
{ "interval_list": [ { "interval_harm_melod":
"harmonic", "interval_size": 4 }, {
"interval_harm_melod": "harmonic",
"interval_size": 5 } ], "interval_seq_pattern":
"alternating" }
{ "cadence": "interrupted" }</p>
          <p>{ "triad_inversion": 1 }
{ "chord_word": true, "note_sequence": [ {
"note_accidental": 1, "note_name": "f",
"note_octave": 3 }, { "note_accidental": 0,
"note_name": "d", "note_octave": 4 }, {
"note_accidental": 0, "note_name": "a",</p>
          <p>"note_octave": 4 } ] }
{ "arpeggio_word": true, "key_accidental": 1,
"key_name": "F", "key_type": "minor" }
{ "key_accidental": 0, "key_name": "C",
"key_type": "major", "scale_word": true }
{ "melody_word": true, "note_count": 5 }
{ "note_divisions": 48, "note_length": 192,
"note_performance": "fermata" }
{ "time_higher": 12, "time_lower": 8 }
triad
interval
cadence
inversion</p>
          <p>chord
arpeggio
scale
melody</p>
          <p>by
simultaneous
loose_indication
time_signature</p>
          <p>Ib triad
doubly diminished harmonic</p>
          <p>fifth
interrupted cadence
in the first inversion
chord of F#3, D4 and A4</p>
        </sec>
        <sec id="sec-4-13-2">
          <title>F sharp minor arpeggio</title>
        </sec>
        <sec id="sec-4-13-3">
          <title>C major scale</title>
          <p>five-note melody
See examples below</p>
          <p>See examples below
fermata on a whole note
dotted crotchet Bb in the right hand in bars 23-40</p>
          <p>]
monophonic passage lasting twelve crotchet beats
}
G# quaver in the right hand against a crotchet in the left hand in bars 1-25
"first": {
"note_divisions": 48,
"note_length": 48,
"number": 12,
"texture": "monophony"
}
descending arpeggio in quavers followed by ascending arpeggio in quavers in bars 1-30
rocking eighth-note chords in the piano right hand against half-note octaves in the piano
left hand in measures 1-10
"first": {
"chord_word": true,
"instrument": "piano",
"note_divisions": 48,
"note_length": 24,
"staff_hand": "right"
},
"second": {
"instrument": "piano",
"interval_harm_melod": "harmonic",
"interval_size": 8,
"measure_from": 1,
"measure_to": 10,
"note_divisions": 48,
"note_length": 96,
"staff_hand": "left"</p>
        </sec>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <surname>Cuthbert</surname>
            ,
            <given-names>M. S.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Ariza</surname>
            <given-names>C.</given-names>
          </string-name>
          (
          <year>2010</year>
          ).
          <article-title>music21: a toolkit for computer-aided musicology and symbolic music data</article-title>
          .
          <source>Proceedings of the International Symposium on Music Information Retrieval</source>
          , Utrecht,
          <source>The Netherlands, August 09 - 13</source>
          ,
          <year>2010</year>
          ,
          <fpage>p637</fpage>
          -
          <lpage>642</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <surname>Katsiavalos</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          (
          <year>2017</year>
          ).
          <article-title>The DMUN System at the MediaEval 2017 C@merata Task</article-title>
          .
          <source>Proceedings of the MediaEval 2017 Workshop</source>
          , Trinity College Dublin, Ireland,
          <source>September 13-15</source>
          ,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <surname>Sutcliffe</surname>
            ,
            <given-names>R. F. E.</given-names>
          </string-name>
          (
          <year>2000</year>
          ).
          <article-title>Using A Robust Layered Parser to Analyse Technical Manual Text</article-title>
          . Cuadernos de Filología Inglesa,
          <volume>9</volume>
          (
          <issue>1</issue>
          ),
          <fpage>167</fpage>
          -
          <lpage>189</lpage>
          . Número Monográfico:
          <article-title>Corpus-based Research in English Language and Linguistics</article-title>
          . http://www.csis.ul.ie/staff/Richard.Sutcliffe/murcia_parsing_ paper00_repaginated.pdf
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <surname>Sutcliffe</surname>
            ,
            <given-names>R. F. E.</given-names>
          </string-name>
          , Collins,
          <string-name>
            <given-names>T.</given-names>
            ,
            <surname>Hovy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            ,
            <surname>Lewis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            ,
            <surname>Fox</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            ,
            <surname>Root</surname>
          </string-name>
          ,
          <string-name>
            <surname>D. L.</surname>
          </string-name>
          (
          <year>2016</year>
          ).
          <article-title>The C@merata task at MediaEval 2016: Natural Language Queries Derived from Exam Papers, Articles and Other Sources against Classical Music Scores in MusicXML</article-title>
          .
          <source>Proceedings of the MediaEval 2016 Workshop</source>
          , Hilversum,
          <source>The Netherlands, October 20-21</source>
          ,
          <year>2016</year>
          . http://ceur-ws.
          <source>org/</source>
          Vol-
          <volume>1739</volume>
          /MediaEval_2016_paper_55.pdf.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <surname>Sutcliffe</surname>
            ,
            <given-names>R. F. E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Crawford</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fox</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Root</surname>
            ,
            <given-names>D. L.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Hovy</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          (
          <year>2014</year>
          ).
          <article-title>The C@merata Task at MediaEval 2014: Natural language queries on classical music scores</article-title>
          .
          <source>Proceedings of the MediaEval 2014 Workshop</source>
          , Barcelona, Spain, October
          <volume>16</volume>
          -17
          <year>2014</year>
          . http://ceur-ws.org/Vol1263/ mediaeval2014_submission_46.pdf
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <surname>Sutcliffe</surname>
            ,
            <given-names>R. F. E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Crawford</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fox</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Root</surname>
            ,
            <given-names>D. L.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Hovy</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          (
          <year>2014</year>
          ).
          <article-title>Shared Evaluation of Natural Language Queries against Classical Music Scores: A Full Description of the C@merata 2014 Task</article-title>
          .
          <article-title>Proceedings of the C@merata Task at MediaEval 2014</article-title>
          . http://csee.essex.ac.uk/camerata/.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <surname>Sutcliffe</surname>
            ,
            <given-names>R. F. E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Crawford</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fox</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Root</surname>
            ,
            <given-names>D. L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hovy</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Lewis</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          (
          <year>2015</year>
          ).
          <article-title>Relating Natural Language Text to Musical Passages</article-title>
          .
          <source>Proceedings of the 16th International Society for Music Information Retrieval Conference</source>
          , Malaga, Spain,
          <fpage>26</fpage>
          -
          <lpage>30</lpage>
          October,
          <year>2015</year>
          . http://ismir2015.uma.es/.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <surname>Sutcliffe</surname>
            ,
            <given-names>R. F. E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fox</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Root</surname>
            ,
            <given-names>D. L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hovy</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Lewis</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          (
          <year>2015</year>
          ).
          <article-title>The C@merata Task at MediaEval 2015: Natural language queries on classical music scores</article-title>
          .
          <source>Proceedings of the MediaEval 2015 Workshop</source>
          , Dresden, Germany, September 14-15
          <year>2015</year>
          . http://ceur-ws.
          <source>org/</source>
          Vol-
          <volume>1436</volume>
          /Paper12. pdf.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <surname>Sutcliffe</surname>
            ,
            <given-names>R. F. E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fox</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Root</surname>
            ,
            <given-names>D. L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hovy</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Lewis</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          (
          <year>2015</year>
          ).
          <article-title>Second Shared Evaluation of Natural Language Queries against Classical Music Scores: A Full Description of the C@merata 2015 Task</article-title>
          .
          <article-title>Proceedings of the C@merata Task at MediaEval 2015</article-title>
          . http://csee.essex.ac.uk/ camerata/.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>Viglianti</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          (
          <year>2015</year>
          ).
          <article-title>Enhancing Music Notation Addressability</article-title>
          . http://mith.umd.edu/research/project/enhancin g-music
          <article-title>-notation-addressability/.</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <surname>Wan</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          (
          <year>2017</year>
          ).
          <article-title>The CLAS System at the MediaEval 2017 C@merata Task</article-title>
          .
          <source>Proceedings of the MediaEval 2017 Workshop</source>
          , Trinity College Dublin, Ireland,
          <source>September 13-15</source>
          ,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>