=Paper= {{Paper |id=Vol-1739/MediaEval_2016_paper_55 |storemode=property |title=The C@merata task at MediaEval 2016: Natural Language Queries Derived from Exam Papers, Articles and Other Sources against Classical Music Scores in MusicXML |pdfUrl=https://ceur-ws.org/Vol-1739/MediaEval_2016_paper_55.pdf |volume=Vol-1739 |dblpUrl=https://dblp.org/rec/conf/mediaeval/SutcliffeCHLFR16 }} ==The C@merata task at MediaEval 2016: Natural Language Queries Derived from Exam Papers, Articles and Other Sources against Classical Music Scores in MusicXML== https://ceur-ws.org/Vol-1739/MediaEval_2016_paper_55.pdf
  The C@merata task at MediaEval 2016: Natural Language
   Queries Derived from Exam Papers, Articles and Other
   Sources against Classical Music Scores in MusicXML
           Richard Sutcliffe                                 Tom Collins                                   Eduard Hovy
            School of CSEE                            Department of Psychology                  Language Technologies Institute
           University of Essex                           Lehigh University                        Carnegie-Mellon University
            Colchester, UK                              Bethlehem, PA, USA                           Pittsburgh, PA, USA
         rsutcl@essex.ac.uk                        tomthecollins@gmail.com                              hovy@cmu.edu
            Richard Lewis                                 Chris Fox                                     Deane L. Root
      Department of Computing                               School of CSEE                             Department of Music
   Goldsmiths, University of London                        University of Essex                        University of Pittsburgh
            London, UK                                      Colchester, UK                             Pittsburgh, PA, USA
    richard.lewis@gold.ac.uk                            foxcj@essex.ac.uk                                  dlr@pitt.edu
ABSTRACT                                                               discuss the development of the question types over the past three
Cl@ssical Music Extraction of Relevant Aspects by Text Analysis        years and in particular focus on the more sophisticated methods
(C@merata) is a shared evaluation held at MediaEval and this is        adopted for question generation this year. We will then present
the third time the task has run. The input is a natural language       the participating systems for this year and discuss the results
query (‘F# in the cello’) and the output is a passage in a             which they obtained.
MusicXML score which contains this note played on the
instrument in question. There are 200 such questions each year
                                                                       2. TASK SUMMARY
                                                                            Paticipants have to build a system which will take as input an
and evaluation is via modified versions of Precision, Recall and F-
                                                                       XML file containing 200 questions and produce as output a file
Measure. In 2014 and 2015 the best Beat F (BF) scores were
                                                                       containing one or more answers for each of them. Questions are
0.797 and 0.620, both attained by CLAS. This year, queries were
                                                                       grouped into twenty sets of ten, each corresponding to a particular
more difficult and in addition the most experienced groups from
                                                                       MusicXML score. There are thus twenty scores in total and each
previous years were unable to take part. In consequence, the best
                                                                       year these are different. The scores for each year are listed in
BF was 0.070. This year, there was progress concerning the
                                                                       Table 2 (at the end of the paper).
development of the queries, many of these being derived from real
                                                                            Scores are chosen from public sources, are in MusicXML [9]
sources such as exam papers, books and scholarly articles. We are
                                                                       and are required to fall on a predefined distribution of staves
thus converging on our goal of relating musical references in
                                                                       (Table 3). There are two main forms of English musical
complex natural language texts to passages in music scores.
                                                                       terminology in use today, European English (crotchet, bar) and
                                                                       American English (quarter note, measure). For this reason, half of
1. INTRODUCTION                                                        the questions in the task each year are in European English and
     The aim of the Cl@ssical Music Extraction of Relevant             the other half are in American English.
Aspects by Text Analysis (C@merata) evaluations is to advance               Concerning an answer, we use the concept of a passage
our knowledge of how musical references in texts relate to actual      which starts at a particular beat in a bar and ends at another beat
passages within symbolic music scores. The evaluation takes            in a bar. Bar numbers are taken from the MusicXML. Beats are
place at MediaEval each year and consists of a series of 200           measured in divisions, a concept taken from the MusicXML
questions. Each question consists of a short string making             standard. A division is an integer and a value of one means we are
reference to a musical feature (A4 sung to the word 'bow') and a       counting in crotchets; a value of two indicates quavers and so on.
score in MusicXML. Participants have to build a system which           Where necessary, a high divisions value (e.g. 12) can be used
can answer each question by specifying a set of one or more            where we wish to beat in crotchets, quavers, crotchet triplets or
passages which exactly demarcate the feature in question. In the       quaver triplets for difference answers to the same question. In
example above, we wish to know the exact beginning and end of a        C@merata, the question always specifies the divisions value to be
note of unspecified length which has pitch A4 and which is being       used for all answers to a particular question. In summary, a
sung to the word specified (‘bow’).                                    passage such as [4/4,1,1:1-2:4] means we are in 4/4/ time,
     The general organisation of the task has remained the same        divisions is set to one (crotchets) the passage starts before the first
for three years; this includes the XML format of the question file,    crotchet beat of bar one (1:1) and the passage ends after the fourth
the format of the answer file, the type of music scores used           crotchet beat of bar two (2:4).
(MusicXML) and the means of evaluating answers (Beat                        In evaluation, a passage is beat correct if it starts at the
Recall/Precision and Measure Recall/Precision). Detailed               correct beat in the correct start bar and it ends at the correct beat
descriptions of all these can be found in the 2014 [15,16] and         in the correct end bar. So, if the correct answer is a crotchet, the
2015 [18,19] overview papers. In this paper, we will start off by      passage must start immediately before the crotchet in question and
summarising the task and the means of evaluation. We then              it must end immediately after it. Similarly, a passage is measure
                                                                       correct if it starts in the correct start bar and ends in the correct
Copyright is held by the author/owner(s).                              end bar. Based on beat correct and measure correct, we can define
MediaEval 2016 Workshop, October 20-21, 2016, Amsterdam.               the following:
     Beat Precision (BP) as the number of beat-correct passages        interesting; they were closer to the kinds of musical passage
returned by a system, in answer to a question, divided by the          which people would actually be interested in asking about.
number of passages (correct or incorrect) returned.
     Beat Recall (BR) is the number of beat-correct passages           3.3 2016 Questions
returned by a system divided by the total number of answer                   This year, questions were chosen from the Renaissance,
passages known to exist.                                               Baroque, Classical and Early Romantic periods. The twenty
     Beat F-Score (BF) is the harmonic mean of BP and BR.              MusicXML scores were selected from kern.ccarh.org and from
     Measure Precision (MP) is the number of measure-correct           musescore.com. The former scores are from Stanford and have
passages returned by a system divided by the number of passages        been prepared from various public domain and out-of-copyright
(correct or incorrect) returned.                                       sources. They are created in the kern format and are also available
     Measure Recall (MR) is the number of measure-correct              in a conversion to MusicXML.
passages returned by a system divided by the total number of                 The distribution of scores in terms of staves can be seen in
answer passages known to exist.                                        Table 4. It is similar to 2015 except that there are now five scores
     Measure F-Score (MF) is the harmonic mean of MP and               with eight or more staves: two on eight staves and one each on
MR.                                                                    ten, thirteen and eighteen staves.
                                                                             The scores themselves can be seen in Table 3, which shows
                                                                       the work, number of staves and scoring (i.e. the istruments used).
3. QUESTIONS                                                           Composers this year were Bach, Beethoven, Bennet, Chopin,
                                                                       Handel, Morley, Mozart, Palestrina, Scarlatti, Schubert, Vivaldi
3.1 2014 Questions                                                     and Weelkes. There were six works for keyboard (three for
     For the first year of the task [15,16], questions were chosen     harpsichord and three for piano), one Schubert song for voice and
from the Renaissance and Baroque periods. There were twenty            piano and two string quartet movements; there were three a
MusicXML scores chosen mainly from musescore.com. There                capella vocal works for SATB and one each for SATTB and
were six scores containing two staves and six on three staves, four    SSATB; there were two Vivaldi concertos for strings and
on one stave and two each on four staves and five staves.              continuo and two symphony movements, one by Mozart and the
Composers were Bach, Carissimi, Charpentier, Corelli, Cutting,         other by Beethoven. Finally there was a movement from Handel’s
Dowland, Lassus, Lully, Monteverdi, Purcell, Scarlatti, Tallis,        Messiah for SATB and orchestra.
Telemann, Vivaldi and Weiss. Ten questions were posed against                The distribution of question types is shown in Tables 1 and
each score, i.e. 200 in total. For ten of the twenty scores, English   2. These tables also show several examples of each type. All
terminology was used (‘crotchet’ etc) while for the other ten,         1_melod questions are concerned with one note and can me
American terminology was used (‘quarter note’ etc). The 200            modified by bar/measure (‘A#1 in bars 44-59’). Thirty-six of the
questions were on a predefined distribution: 30 simple pitch, 30       forty can also be modified by perf (‘forte’), instr (‘in the violin’),
simple length, 30 pitch and length, 10 performance specification,      clef (‘in the bass clef’), time (‘in 3/4’) or key (‘with G major key
20 stave specification, 5 word specification, 30 followed by, 19       signature’).
melodic interval, 11 harmonic interval, 5 cadence specification, 5           n_melod questions are concerned with a sequence of notes
triad specification and 5 texture specification. So half of the        which can be specified exactly (‘D4 D5 A5 D6 in sixteenth
questions were quite basic and essentially asking for a note, while    notes’) or inexactly (‘two-note dotted rhythm’). They can also be
half the questions dealt with more complex concepts such as            modified by bar/measure, perf etc in the same as 1_melod
intervals, cadences, chords and texture. Questions of type follow      questions.
asked about one note followed by another.                                    1_harm questions deal with single chords (‘whole-note
                                                                       unison E2 E3 E4’) which can be less specific (‘chord of C’ or
3.2 2015 Questions                                                     ‘five-note chord in the bass’). Once again they can be modified as
      For the second year [18,19], questions came from the             above (‘chord of F#3, D4 and A4 in the lower three parts’,
Baroque, Classical and Early Romantic periods. Once again there        ‘harmonic octave in the bass clef’). Note that we allow two notes
were twenty MusicXML scores, mostly from musescore.com.                to be a chord, including octaves etc. This year, references to
There were now some more complex scores: there were six on two         inversions (‘Ia chord’) are considered of 1_harm type.
staves and six on four staves, two on one stave and two on three             n_harm questions deal with sequences of chords with the
staves, and one each on six, seven, eight and nineteen staves. So      usual modifications (‘three consecutive thirds in bars 1-43’).
the collection now included one full orchestral score. Composers       Cadences are also included here, since they are sequences of
were Bach, CPE Bach, Beethoven, Haydn, Marcello, Monteverdi,           specific chords (‘plagal cadence in bars 134-138’). There are also
Mozart, Purcell, Schubert, Sweelinck and Vivaldi. The                  some more complex types here (‘A5 pedal in bars 116-138’).
distribution of question types of the 200 questions was now: 40              Finally, there are texture questions (‘all three violin parts in
1_melod unqualified, 40 1_melod qualified by clef, instrument          unison in measures 1-59’, ‘counterpoint in bars 1-14’). Some
etc., 20 n_melod unqualified, 20 n_melod qualified by clef,            more complex forms were added this year (‘imitative texture in
instrument etc., 20 1_harm, 6 texture, 40 follow and 14 synch.         bars 1-18’).
1_melod were essentially notes while n_melod were specified                  Table 2 shows two further forms of question, follow and
sequences of notes; 1_harm queries were chords and texture was         synch. There are twenty of the former and thirteen of the latter. In
polyphony etc. Questions of type follow were now more complex          a departure from last year, these are not separate types, but range
because chords and sequences of notes could be followed by other       over the queries of type 1_melod, n_melod, 1_harm and n_harm
chords and note sequences. The synch questions dealt with one          as shown in Table 1. Thus the examples in Table 2 are all within
event against another such as specified notes on one instrument        the distribution of query types shown in Table 1. A follow
being played at the same time as notes on another instrument.          question allows us to specify some passage followed by another
Generally the questions were getting more difficult and more           passage. Each such passage can be of type 1_melod, n_melod etc.
This allows quite complex sequences to be specified (‘D C# in the      4. 2016 CAMPAIGN
right hand, then F A G Bb in semiquavers in the left hand’, ‘5 B4s
followed by a C5’).                                                    4.1 Participants
      Questions of type synch can link two passages which must              This year, ten groups registered for the task, the largest
occur at the same time. In the simplest case, each passage is of       number so far in the three years. Unfortunately, however, only
exactly the same length (‘quarter note E5 against a quarter note       four of these were able to return a run. Participants are shown in
C#3’). However, this is not necessarily the case (‘C#3 minim and       Table 5. There was one each from England, Ireland, Poland and
E4 semibreve simultaneously’); here, according to our rules, the       Russia.
whole of the minim must lie somewhere within the duration of the
semibreve. The length of the passages need not be specified (‘D3       4.2 Approaches used by Systems
in the bass at the same time as C5 in soprano 1’, ‘three-note chord         The following are short notes on the various systems this
in the harpsichord right hand against a two-note chord in the          year. Full details can be found in the participant papers in this
harpsichord left hand in measures 45-52’).                             volume.
      When we reach the follow and synch questions, they are
starting to become interesting from a musicological perspective as     DMUN
such musical phenomena as these cannot readily be specified
except in a natural language. The key advantage of language here            This year, the DMUN system was re-written. The group
is that it can vary in specificity from the constrained to the open;   developed a text query parser that, given a sentence such as a
to interpret the open queries requires considerable musical            C@merata question, generates a script for music operations. The
knowledge. Hence, C@merata starts to become interesting and            script contains the music concepts and their relations as described
not merely a simple exercise in finding notes.                         in the query, but in a structured form related to SQL in such a way
      We will finish this section by summarising how we derived        that workflows of specific music data operations are formed. A
the questions. In previous years, we decided the question types        parser then reads the script and calls the corresponding functions
and distribution first; we then selected scores and devised queries    from a framework created on top of music21 [1].
by going through them, trying to find passages against which
queries could be posed. This reverse-logic approach was a simple       KIAM
development of the one which we had used at CLEF for many                     The KIAM system is written in PHP and is based on regular
years [10,11,12,13,14]. This year we had aimed to generate some        expressions. Queries are categorised and analysed using regular
of the questions using a more realistic approach. Two suggestions      expressions; answers are then extracted from raw MusicXML
had been made in previous years. The first was to base certain         files.
questions on First Species Counterpoint as exemplified for
example in Kitson [6] and indeed Fux [2]. Certain suggestions          OMDN
had been made by OMDN participant Donncha Ó Maidín:
                                                                            The system used is similar to last year and is based on the
•    Modes: Dorian, Phrygian, Lydian, Mixolydian, Aeolian,             author’s CPNView system.
     Locrian, Ionian (these would be n_melod queries);
•    Melodic intervals: diminished fifth, augmented fourth             UMFC
     (n_melod queries);                                                    The UMFC system was based on recurrent neural networks
•    Harmonic intervals: perfect concords, imperfect concords          which is undoubtedly the most original approach.
     and discord (1_harm queries);
•    Movement of parts: similar, contrary, oblique and parallel        4.3 Runs and Results
     (n_melod against n_melod);                                              Overall results are shown in Table 6. The best run was
•    Special relationships: false relation of the tritone (1_harm)     DMUN01 which scored BF 0.070 and MF 0.106. These results
•    exposed fifths and octaves (n_harm).                              are very low. Moreover, they are only based on a small subset of
                                                                       the queries. The best scores in 2014 were BF 0.797 and MF
      In the event, we managed queries relating to melodic             0.854, and in 2015 were BF 0.620 and MF 0.656. In both
intervals and movement of parts and we plan to investigate the         previous years, the CLAS system was the best but unfortunately
others in future. The second suggestion was to base questions on       they could not participate this year. The questions were more
music exam papers set in English schools at GCSE level (aged           difficult this year but another factor was that three of the four
sixteen) and A Level (aged eighteen). This had been proposed by        participants had not taken part before. All systems this year were
DMUN participant Tom Collins and by co-organiser Richard               able to answer simple note questions such as ‘C# crotchet’ but the
Lewis. DMUN participation was handed over to Andreas                   task has moved on and there are now very few 1_melod questions
Katsiavalos this year, so Tom Collins joined the organisers and        as simple as that.
did indeed generate some questions based on a study of exam                  Table 6 shows the average results by question type. As
papers.                                                                expected, 1_melod are the the easiest overall (BF 0.054) followed
      The third strand of work was concerned with the derivation       by n_melod (BF 0.028). After that come 1_harm (BF 0.019) and
of queries from musicological texts. For this campaign, we re-         n_harm (BF 0.013). These questions are progressively more
visited some of the texts we studied previously [17] and styled        difficult, so this order is to be expected. Texture questions this
some of the more complex questions accordingly.                        year could not be answered by any system (BF 0).
                                                                             Turning to the more complex follow and synch questions
                                                                       (which were a subset of the questions of other types this year), the
                                                                       bottom half of Table 6 shows the average scores for follow (BF
0.078) and synch (BF 0). The follow score is higher than might be      contain errors or anomalies. For example, an anacrusis bar may be
expected and this is because KIAM scored well on these questions       numbered one rather than the more usual zero.
(see Table 13, KIAM, BF 0.227, the next highest being DMUN                  The situation for the kern scores is rather similar. Conversion
with BF 0.044). While BF for synch questions was 0, MF was             to MusicXML is carried out by a script at the end; MusicXML is
0.018, so some systems could at least determine the correct start      not considered the primary format as it is assumed that kern will
and end bar, if not the exact beat.                                    always be used. We have found that some of these conversions are
     Looking at the results for different systems across question      inaccurate or crash Musescore. In some cases we were able to re-
types, the performance of UMFC on 1_harm questions is                  convert from kern using the latest version of music21 [1]. In other
noteworthy (BF 0.042) this system scored BF 0 on 1_melod and           cases, a new score had to be selected for the evaluation.
n_melod questions.                                                          Finally, looking to the future, there have now been three
     Generally, all the systems scored very low on Recall, because     years where the C@merata query and answer format remained the
they missed out many correct answers. However, they were much          same: the input was a noun phrase and the output was a passage.
better on Precision, because answers returned tended to be             Over the years we have also begun to investigate the relationship
correct. Looking at Table 6, average BP over all participants was      between our queries and actual musicological texts [17]. This year
0.167 and average MP was 0.447. The best BP was 0.420 and the          we looked also at GCSE and A Level exam questions. Naturally,
best MP was 0.640 (both DMUN). These are more respectable              such questions are not in the raw C@merata format and have to be
looking figures. In fact, looking at MP in Table 6, three out of the   converted in order to fit the task. Sometimes this is not possible
four systems scored 0.511 or better. The problem with systems          because of restrictions in our task. An interesting instance of this
this year was that they were not sophisticated enough to handle        is exam questions like ‘Which one of the following terms best
complex questions; those that they did answer were often correct.      describes the music at bars x-y? ’. This is a sort of
     If we look at the MP figures for the various different            ranking task as several terms are specified and we must say which
questions types (Tables 8-14) we can see the highest values scored     is the most important. The nature of the question implies that all
on any measure in this year’s campaign: For 1_melod, DMUN              the terms specified apply to the passage to a greater or lesser
scored 0.857 and KIAM scored 0.727; for n_melod, DMUN                  extent. This is one way of reflecting the intrinsic ambiguity of
scored 0.706; for 1_harm OMDN scored 0.8; for n_harm, DMUN             musical analysis which we could consider for future campaigns.
scored 0.5. Finally, concerning follow and synch, for follow,
KIAM scored 1 and OMDN scored 0.333; for synch, OMDN
scored 0.333. Of course, in most of these cases, a system was
chancing to match passages for the particularl queries and was not
in fact attempting the vast majority of questions, leading to very
low MR and MF scores.

5. Discussion and Conclusions
      The main achievement this year was to develop some very
interesting questions against some quite complex scores.
Questions were a lot more complicated than those of last year
which in turn were more complicated than in the first year when
many were quite elementary.
      Unfortunately, the systems have not kept up with the task.
They can answer very few of the questions, though answers when
returned are often correct, at least for MF. There is some
ambiguity concerning the exact beats of certain questions, in
particular those of types follow and synch, and this can partly
account for the much lower BF scores than MF scores.
      Concerning practical aspects of the organisation, as always
there were two problems in finding scores. The first was that there
is a limited choice of public domain works, especially ones which
are of sufficiently high quality. Over the years we have more-or-
less exhausted the musescore archive [8] because most scores
there are not licensed to be distributed, only for private use.
Moreover, our scores need to fit into a complex distribution of
musical periods, numbers of staves and instrumentation. For these
reasons we have turned increasingly to the Stanford CCARH Kern
scores [5]. There is a very good choice of scores and the level of
scholarship and accuracy there are high.
      However, this leads to the second problem, which is the
quality of the MusicXML. Scores on musescore are produced
mainly by amateurs using many different types of software.
Conversion to MusicXML is carried out at the end of the
encoding process and relies on the extent to which the score
writing software supports it. We have found that many
MusicXML files do not load in the Musescore software [7] or
                                                                       [12] Peñas, A., Hovy, E., Forner, P., Rodrigo, A., Sutcliffe, R.,
6. REFERENCES                                                               Sporleder, C., Forascu, C., Benajiba, Y., Osenova, P. (2012).
[1] Cuthbert, M. S., and Ariza C. 2010. music21: a toolkit for              Overview of QA4MRE at CLEF 2012: Question Answering
    computer-aided musicology and symbolic music data. In                   for Machine Reading Evaluation. Proceedings of QA4MRE-
    Proceedings of the International Symposium on Music                     2012. Held as part of CLEF 2012.
    Information Retrieval (Utrecht, The Nethlerands, August 09 -
                                                                       [13] Peñas, A., Hovy, E., Forner, P., Rodrigo, A., Sutcliffe, R.,
    13, 2010). 637-642.
                                                                            Morante, R. (2013). QA4MRE 2011-2013: Overview of
[2] Fux, J. J. (1725). Gradus ad Parnassum (Practical Rules for             Question Answering for Machine Reading Evaluation.
    Learning Composition translated from a Work intitled                    Lecture Notes in Computer Science Volume 8138, 2013, pp
    Gradus ad Parnassum written originally in Latin by John                 303-320.
    Joseph Feux). Translated around 1750 by unknown
                                                                       [14] Peñas, A., Magnini, B., Forner, P., Sutcliffe, R., Rodrigo, A.,
    translator. London, Welcker. http://imslp.nl/imglnks/
                                                                            & Giampiccolo, D. (2012). Question Answering at the Cross-
    usimg/3/31/IMSLP370587-PMLP187246-practicalrules
                                                                            Language Evaluation Forum 2003-2010. Language
    fo00fuxj.pdf
                                                                            Resources and Evaluation Journal, 46(2), 177-217.
[3] Huron, D. (1997). Humdrum and Kern: Selective Feature
                                                                       [15] Sutcliffe, R. F. E., Crawford, T., Fox, C., Root, D. L., &
    Encoding. In Beyond MIDI, ed E. Selfridge-Field
                                                                            Hovy, E. (2014). The C@merata Task at MediaEval 2014:
    (Cambridge, Massachusetts: The MIT Press, 1997), pp. 375-
                                                                            Natural language queries on classical music scores. In
    401.
                                                                            Proceedings of the MediaEval 2014 Workshop, Barcelona,
[4] Huron, D. (2002). Music information processing using the                Spain, October 16-17 2014. http://ceur-ws.org/Vol-
    Humdrum toolkit: concepts, examples, and lessons. Comput.               1263/mediaeval2014_submission _46.pdf
    Music J. 26, 2, 11-26.
                                                                       [16] Sutcliffe, R. F. E., Crawford, T., Fox, C., Root, D. L., &
[5] Kern Scores (2016). http://kern.ccarh.org                               Hovy, E. (2014). Shared Evaluation of Natural Language
[6] Kitson, C. H. (1907). The Art of Counterpoint and its                   Queries against Classical Music Scores: A Full Description
    Application as a Decorative Principle. Oxford, UK,                      of the C@merata 2014 Task. Proceedings of the C@merata
    Clarendon Press.                                                        Task at MediaEval 2014. http://csee.essex.ac.uk/camerata/.
    https://archive.org/details/artofcounterpoin00kitsuoft             [17] Sutcliffe, R. F. E., Crawford, T., Fox, C., Root, D. L., Hovy,
[7] Musescore (2016). Music Composition and Notation                        E., & Lewis, R. (2015). Relating Natural Language Text to
    Software. http://musescore.org/                                         Musical Passages. Proceedings of the 16th International
                                                                            Society for Music Information Retrieval Conference,
[8] Musescore Music Achive (2016). https://musescore.com                    Malaga, Spain, 26-30 October, 2015.
[9] MusicXML (2016). http://www.musicxml.com/                               http://ismir2015.uma.es/.
[10] Peñas, A., Forner, P., Sutcliffe, R., Rodrigo, A., Forascu, C.,   [18] Sutcliffe, R. F. E., Fox, C., Root, D. L., Hovy, E., & Lewis,
     Alegria, I., Giampiccolo, D., Moreau, N. and Osenova, P.               R. (2015). The C@merata Task at MediaEval 2015: Natural
     (2009). Overview of ResPubliQA 2009: Question Answering                language queries on classical music scores. In Proceedings of
     Evaluation over European Legislation Notebook of the Cross             the MediaEval 2015 Workshop, Dresden, Germany,
     Language Evaluation Forum, CLEF 2009, Corfu, Greece, 30                September 14-15 2015. http://ceur-ws.org/Vol-
     September - 2 October.                                                 1436/Paper12.pdf.
[11] Peñas, A., Hovy, E., Forner, P., Rodrigo, A., Sutcliffe, R.,      [19] Sutcliffe, R. F. E., Fox, C., Root, D. L., Hovy, E. and Lewis,
     Forascu, C., Sporleder, C. (2011). Overview of QA4MRE at               R. (2015). Second Shared Evaluation of Natural Language
     CLEF 2011: Question Answering for Machine Reading                      Queries against Classical Music Scores: A Full Description
     Evaluation. Proceedings of QA4MRE-2011. Held as part of                of the C@merata 2015 Task. Proceedings of the C@merata
     CLEF 2011.                                                             Task at MediaEval 2015. http://csee.essex.ac.uk/camerata/.
Q31: minim on the word "cease"
A: [ 4/4, 1, 7:3-7:4 ], [ 4/4, 1, 9:1-9:2 ]
Q32: note on the word "mine" against a note on the word "eyes"
A: [ 4/4, 1, 3:4-3:4 ]
Q33: semibreve tied to a minim in the Bass clef
A: [ 4/4, 1, 1:1-2:2 ], [ 4/4, 1, 3:1-4:2 ]
Q35: semibreve tied to a minim followed by crotchet, crotchet
A: [ 4/4, 1, 1:1-2:4 ], [ 4/4, 1, 2:1-3:4 ]
Q36: minor third between Basse and Tenor in bars 1-35
A: [ 4/4, 1, 1:3-2:2 ], [ 4/4, 1, 8:1-8:2 ], [ 4/4, 1, 9:3-9:4 ]
Q37: Ia chord in bars 1-10
A: [ 4/4, 1, 2:1-2:2 ], [ 4/4, 1, 5:3-5:4 ], [ 4/4, 1, 8:3-8:4 ], [ 4/4, 1, 9:3-9:4 ]
Q40: counterpoint in bars 1-14
A: [ 4/4, 1, 1:3-14:4 ]
                            Figure 1. Sample questions and anwers from 2016 task
                                    Table 1. Query Types
       Type          No                               Example
                                                   A#1 in bars 44-59
     1_melod          4
                                            quarter-note rest in measures 1-5
                                        dotted quarter note D6 in the first violin
1_melod qualified                     solo C5 in the oboe in measures 32 onwards
by perf, instr, clef, 36                flute dotted half note only against strings
    time, key                             half note on the tonic in the bass clef
                                               A4 sung to the word 'bow'
                                        two-note dotted rhythm in measures 1-24
                                        eight note rising passage in quarter notes
     n_melod         15                         repeated Bb4 whole note
                                    D4 D5 A5 D6 in sixteenth notes repeated twice
                                          two tied dotted minims in bars 72-88
                                   dotted minims C B A in the Bass clef in bars 70-90
n_melod qualified                     melodic interval of a minor 7th in the voice
by perf, instr, clef, 45            rising arpeggio in the left hand in measures 1-10
    time, key                       five-note melody in the cello in measures 20-28
                             whole note rest, quarter note in the Violin 4 in measures 1-103
                                                7th triad in measures 1-3
                                                  Ia chord in bars 1-10
     1_harm          17                                chord of C
                                              whole-note unison E2 E3 E4
                                                chord III in bars 44-59
                                    chord of F#3, D4 and A4 in the lower three parts
1_harm possibly                                harmonic fifth in the oboe
qualified by perf,                          harmonic octave in the bass clef
                   23
instr, clef, time,
       key                 harmonic perfect fourth between the Soprano and Alto in bars 1-9
                           cello and viola playing dotted minims an octave apart in bars 40-70
                                                   interrupted cadence
                                                A5 pedal in bars 116-138
     n_harm          25                   authentic cadence in measures 14-18
                                             plagal cadence in bars 134-138
                                          three consecutive thirds in bars 1-43
                           consecutive sixths between the Altos and Basses in measures 73-80
n_harm possibly                    flute, oboe and bassoon in unison in measures 1-56
qualified by perf,                   consecutive descending sixths in the left hand
                   15
instr, clef, time,
       key                       alternating fourths and fifths in the Oboe in bars 1-100
                           Soprano and Alto moving one step down together in measures 1-12
                                     all three violn parts in unison in measures 1-59
                                              polyphony in measures 5-12
      texture        20                homophonic texture in measures 125-138
                                              imitative texture in bars 1-18
                                                counterpoint in bars 1-14
        All         200
     Table 2. follow and synch Queries within 1_melod, n_melod, 1_harm and n_harm
     Type          No                               Example
                              C D E F D E C in semiquavers repeated after a semiquaver
 follow possibly         eighth-note twelfth followed by whole-note minor tenth between Cello
qualified on either                                    and Viola
 or both sides by 20
                         D C# in the right hand, then F A G Bb in semiquavers in the left hand
 perf, instr, clef,
    time, key           B flat in the cbass followed a quarter note later by B natural in the cbass
                                                5 B4s followed by a C5
                                       quarter note E5 against a quarter note C#3
 synch possibly                    C#3 minim and E4 semibreve simultaneously
qualified in either
                                 D3 in the bass at the same time as C5 in soprano 1
 or both parts by 13
 perf, instr, clef,  three-note chord in the harpsichord right hand against a two-note chord in
    time, key                       the harpsichord left hand in measures 45-52
                               A#3 in the piano and F#5 in the voice simultaneously

                                     Table 3. Scores Used
                             Work                     Staves      Scoring      Lang
              bach_2_part_invention_no1_bwv772          2           hpd         Eng.
               beethoven_piano_sonata_no2_m4            2            pf        Amer.
               beethoven_piano_sonata_no5_m1            2            pf         Eng.
                  chopin_prelude_op28_no15              2            pf         Eng.
                     scarlatti_sonata_k281              2           hpd         Eng.
                     scarlatti_sonata_k320              2           hpd        Amer.
                 schubert_an_die_musik_d547             3           S, pf      Amer.
                     bach_chorale_bwv347                4          SATB        Amer.
               beethoven_str_quartet_op127_m1           4       2 vn, va, vc    Eng.
                   bennet_weep_o_mine_eyes              4          SATB         Eng.
                 handel_water_music_suite_air           4       2 vn, va, vc   Amer.
               palestrina_alma_redemptoris_mater        4          SATB        Amer.
             schubert_str_quartet_no10_op125_d87
                                                        4       2 vn, va, vc    Eng.
                             _m3
             morley_now_is_the_month_of_maying          5         SATTB         Eng.
               weelkes_hark_all_ye_lovely_saints        5         SSATB         Eng.
                                                               4 vn, 2 va, vc,
             vivaldi_conc_4_vns_op3_no10_rv580          8                        Amer.
                                                                       db
                                                                 3 vn, va, vc,
             vivaldi_conc_vn_op6_no6_rv239_m1           8                        Amer.
                                                                    db, hpd
                                                                fl, 2 ob, 2 bn,
                  mozart_symphony_no40_m4               10     2 hn, 2 vn, va, Eng.
                                                                     vc, db
                                                               2 fl, 2 ob, 2 cl,
                                                                 2 bs, 2 hn, 2
                 beethoven_symphony_no3_m3              13                       Amer.
                                                               tpt, timp, 2 vn,
                                                                  va, vc, db,
                                                               fl, 2 ob, cl, bs,
                                                                hn, tbn, tuba,
                 handel_messiah_and_the_glory           18                       Amer.
                                                                SATB, hpd, 2
                                                                vn, va, vc, db
                     Table 4. Distribution of Scores by number of Staves
                                    Staves          Frequency
                                        2               6
                                        3               1
                                        4               6
                                        5               2
                                        8               2
                                        10              1
                                        13              1
                                        18              1
                                        All            20

                              Table 5. C@merata Participants
                    Runtag        Leader          Affiliation       Country
                                  Andreas        De Montfort
                     DMUN                                           England
                                 Katsiavalos      University
                                                   Keldysh
                                   Marina         Institute of
                     KIAM                                           Russia
                                   Mytrova         Applied
                                                 Mathematics
                                 Donncha Ó       University of
                     OMDN                                           Ireland
                                  Maidín           Limerick
                                                   Fryderyk
                                                    Chopin
                     UMFC        Paweł Cyrta                        Poland
                                                 University of
                                                    Music

Table 6. Results for All Questions: DMUN01 is the best run. BP=Beat Pecision, BP=Beat Recall,
BF=Beat F-Score, MP=Measure Precision, MR=Measure Recall, MF=Measure F-Score.
         Run         BP          BR            BF           MP         MR      MF
      DMUN01        0.420       0.038         0.070         0.640     0.058   0.106
       KIAM01       0.194       0.011         0.021         0.613     0.035   0.066
      OMDN01        0.042       0.004         0.007         0.511     0.044   0.081
       UMFC01       0.012       0.038         0.018         0.022     0.073   0.034
      Maximum       0.420       0.038         0.070         0.640     0.073   0.106
      Minimum       0.012       0.004         0.007         0.022     0.035   0.034
       Average      0.167       0.023         0.029         0.447     0.053   0.072
Table 7. Average Results by Question Type: BP=Beat Pecision, BP=Beat Recall, BF=Beat F-
Score, MP=Measure Precision, MR=Measure Recall, MF=Measure F-Score. Note that in 2016
follow and synch questions are across 1_melod, n_melod, 1_harm and n_harm.
        Type             BP          BR         BF       MP         MR          MF
       1_melod         0.232      0.044      0.054      0.520       0.101      0.129
       n_melod         0.125      0.016      0.028      0.384       0.051      0.086
       1_harm          0.076      0.023      0.019      0.300       0.033      0.035
       n_harm          0.063      0.007      0.013      0.128       0.032      0.030
       texture         0.000      0.000      0.000      0.000       0.000      0.000
        follow         0.317      0.047      0.078      0.458       0.076      0.126
        synch          0.000      0.000      0.000      0.103       0.011      0.018

Table 8. Results for 1_melod Questions: BP=Beat Pecision, BP=Beat Recall, BF=Beat F-Score,
MP=Measure Precision, MR=Measure Recall, MF=Measure F-Score.
         Run        BP          BR         BF         MP         MR          MF
      DMUN01       0.643       0.066      0.120      0.857      0.088       0.160
      KIAM01       0.273       0.044      0.076      0.727      0.118       0.203
      OMDN01       0.000       0.000      0.000      0.474      0.066       0.116
      UMFC01       0.011       0.066      0.019      0.022      0.132       0.038
      Maximum      0.643       0.066      0.120      0.857      0.132       0.203
      Minimum      0.000       0.000      0.000      0.022      0.066       0.038
       Average     0.232       0.044      0.054      0.520      0.101       0.129

Table 9. Results for n_melod Questions: BP=Beat Pecision, BP=Beat Recall, BF=Beat F-Score,
MP=Measure Precision, MR=Measure Recall, MF=Measure F-Score.
         Run        BP          BR         BF         MP         MR          MF
      DMUN01       0.412       0.050      0.089      0.706      0.085       0.152
      KIAM01       0.000       0.000      0.000      0.333      0.021       0.040
      OMDN01       0.087       0.014      0.024      0.478      0.078       0.134
      UMFC01       0.000       0.000      0.000      0.019      0.021       0.020
      Maximum      0.412       0.050      0.089      0.706      0.085       0.152
      Minimum      0.000       0.000      0.000      0.019      0.021       0.020
       Average     0.125       0.016      0.028      0.384      0.051       0.086

Table 10. Results for 1_harm Questions: BP=Beat Pecision, BP=Beat Recall, BF=Beat F-Score,
MP=Measure Precision, MR=Measure Recall, MF=Measure F-Score.
         Run        BP          BR         BF         MP         MR          MF
      DMUN01       0.273       0.018      0.034      0.364      0.024       0.045
      KIAM01       0.000       0.000      0.000      0.000      0.000       0.000
      OMDN01       0.000       0.000      0.000      0.800      0.024       0.047
      UMFC01       0.030       0.072      0.042      0.035      0.084       0.049
      Maximum      0.273       0.072      0.042      0.800      0.084       0.049
      Minimum      0.000       0.000      0.000      0.000      0.000       0.000
       Average     0.076       0.023      0.019      0.300      0.033       0.035
Table 11. Results for n_harm Questions: BP=Beat Pecision, BP=Beat Recall, BF=Beat F-Score,
MP=Measure Precision, MR=Measure Recall, MF=Measure F-Score.
         Run         BP         BR          BF         MP         MR         MF
      DMUN01        0.250      0.028      0.050       0.500      0.056       0.101
      KIAM01        0.000      0.000      0.000       0.000      0.000       0.000
      OMDN01        0.000      0.000      0.000       0.000      0.000       0.000
      UMFC01        0.000      0.000      0.000       0.011      0.070       0.019
      Maximum       0.250      0.028      0.050       0.500      0.070       0.101
      Minimum       0.000      0.000      0.000       0.000      0.000       0.000
       Average      0.063      0.007      0.013       0.128      0.032       0.030

Table 12. Results for texture Questions: BP=Beat Pecision, BP=Beat Recall, BF=Beat F-Score,
MP=Measure Precision, MR=Measure Recall, MF=Measure F-Score.
         Run         BP         BR          BF         MP         MR         MF
      DMUN01        0.000      0.000      0.000       0.000      0.000       0.000
      KIAM01        0.000      0.000      0.000       0.000      0.000       0.000
      OMDN01        0.000      0.000      0.000       0.000      0.000       0.000
      UMFC01        0.000      0.000      0.000       0.000      0.000       0.000
      Maximum       0.000      0.000      0.000       0.000      0.000       0.000
      Minimum       0.000      0.000      0.000       0.000      0.000       0.000
       Average      0.000      0.000      0.000       0.000      0.000       0.000

Table 13. Results for follow Questions: BP=Beat Pecision, BP=Beat Recall, BF=Beat F-Score,
MP=Measure Precision, MR=Measure Recall, MF=Measure F-Score.
         Run         BP         BR          BF         MP         MR         MF
      DMUN01        0.500      0.023      0.044       0.500      0.023       0.044
      KIAM01        0.600      0.140      0.227       1.000      0.233       0.378
      OMDN01        0.167      0.023      0.040       0.333      0.047       0.082
      UMFC01        0.000      0.000      0.000       0.000      0.000       0.000
      Maximum       0.600      0.140      0.227       1.000      0.233       0.378
      Minimum       0.000      0.000      0.000       0.000      0.000       0.000
       Average      0.317      0.047      0.078       0.458      0.076       0.126

Table 14. Results for synch Questions: BP=Beat Pecision, BP=Beat Recall, BF=Beat F-Score,
MP=Measure Precision, MR=Measure Recall, MF=Measure F-Score.
         Run         BP         BR          BF         MP         MR         MF
      DMUN01        0.000      0.000      0.000       0.000      0.000       0.000
      KIAM01        0.000      0.000      0.000       0.000      0.000       0.000
      OMDN01        0.000      0.000      0.000       0.333      0.021       0.040
      UMFC01        0.000      0.000      0.000       0.077      0.021       0.033
      Maximum       0.000      0.000      0.000       0.333      0.021       0.040
      Minimum       0.000      0.000      0.000       0.000      0.000       0.000
       Average      0.000      0.000      0.000       0.103      0.011       0.018