=Paper= {{Paper |id=Vol-1739/MediaEval_2016_paper_58 |storemode=property |title=OMDN at the C@merata 2016 Task: Approaches to and Issues Arising from Answering Natural Language Questions about Music Scores |pdfUrl=https://ceur-ws.org/Vol-1739/MediaEval_2016_paper_58.pdf |volume=Vol-1739 |dblpUrl=https://dblp.org/rec/conf/mediaeval/Maidin16 }} ==OMDN at the C@merata 2016 Task: Approaches to and Issues Arising from Answering Natural Language Questions about Music Scores== https://ceur-ws.org/Vol-1739/MediaEval_2016_paper_58.pdf
     OMDN at the C@merata 2016 Task: Approaches to and
       Issues Arising from Answering Natural Language
                Questions about Music Scores
                                                         Donncha S. Ó Maidín
                                                           Department of CSIS
                                                           University of Limerick
                                                             Limerick, Ireland
                                                      Donncha.OMaidin@ul.ie

ABSTRACT                                                                 represented in single templates. In addition to carrying details of
The query is modeled as a parse tree where clauses from the query        the target entity for matching, a template also carries relevant
form children of the tree’s root. Children of each clause, in turn       contextual information such as bar ranges, clef and instrument
represent the query’s atomic units. A complete search involves           designations wherever appropriate.
visiting all score objects that are potential starting points for a            The following examples use [>], [^], [] to represent
match. Atomic units of the parse tree are compared, in sequence,         respectively sequential [>], simultaneous [^] or terminal
with corresponding score objects. From each score object,                relationships in nodes.
appropriate paths are explored to find objects that match the
sequence of tree’s atomic units. The parse tree is traversed in a
depth first order. Iteration paths through the score follow either
vertical or horizontal routes, the former for chord or harmonic
interval comparisons, and the latter for melodic comparisons.
Iteration between clauses results in the selection of new score
staring points, appropriate to the temporal relationship between
clauses. Results were limited on account of problems with
importing some of the MusicXML files.                                                   Figure 1. ‘D4 G4 in sixteenth notes’


1. INTRODUCTION
     A natural language query is first split into component queries
X and Y by scanning the query for one of the following
structures:

•    X followed by Y
•    X against Y                                                                            Figure 2. ‘chord Eb Ab Bb’
•    simultaneous X and Y
•    X at the same time as Y
•    X followed [distance] by Y
•    X

      For structures 1 and 5 queries X and Y are searched in
sequence, while for structures 2, 3, and 4, X and Y are searched in
parallel. Units within X or Y in turn result in sequential or parallel
searches between their contained atomic units. A sequential                    Figure 3.‘D4 E4 F#4 followed by the chord B3 D4 G4
search is involved where the atomic units in X or Y are notes of a
melody or melodic intervals; a parallel search is required where X
or Y contains notes of a chord or a harmonic interval.
     Each atomic unit is represented by a template that contains         2. CPNView
details of all the necessary requirements for a unit match with a             Common Practice Notation View, or CPNView is used.
score object. An instance of an atomic unit may be a note, a rest or     CPNView formed the main topic of the author’s PhD
it may specify a more complex object or texture such as an               dissertation[1]. The name CPNView was not used in the
arpeggio or scale. Templates also contain information on how             dissertation, but appeared in later publications [2][3][4].
they are linked together. Lists of templates are used to represent            CPNView models a score as an objected-oriented container.
either sequential or parallel sets of notes or rests. Arpeggios or       The CPNView model is designed to provide a value-neutral
scales are examples of more complex atomic objects that are              representation of elements of the score notation. The score's
                                                                         internal content is available using iterators. The iterator object
                                                                         keeps track of the context in which an object resides in addition to
Copyright is held by the author/owner(s).                                providing access to the score object itself through its member
MediaEval 2016 Workshop, October 20-21, 2016, Amsterdam.                 functions. The iterators and their member functions can be viewed
as paralleling the actions of a human reader. Typically, a human                  For processing each question, the initial phases consist of
might access a score from the start and read through it serially in         two separate steps: (1) the importation of the music file to build a
time sequence. For some purposes the reader may traverse the                CPNView representation of the score, and (2) the creation of a
score along one of the staves. Where a harmonic or polyphonic               tree of matching templates from the query text. Following these
texture is of interest it is appropriate to access the score as a           input phases, the MusicXML file plays no further role. All
sequence of vertical slices, in time order.                                 subsequent processing is performed on the CPNView
                                                                            representation.
     A score object is created by specifying a file path:                         In the CPNView score representation, MusicXML layout and
                                                                            ‘division’ information is absent. XML fields for items such as
     Score score(path);                                                     ‘mode’ and ‘voice’ are discarded as well. Inclusion of such would
                                                                            undermine the ideal of a neutral score representation. Instead of
      This model requires no user knowledge of how the score is             relying on such added cues, the integrity of the analysis is best
represented. CPNView representations are built by importation               maintained by practices similar to those employed by musicians.
from files in a number of different standard encodings.                     For example voice leading, especially in keyboard scores, is best
      Access to the internals of the score is facilitated by an             garnered from visual cues such as note stem directions and from
iterator object:                                                            knowledge of compositional principles. The challenge involved is
                                                                            to build software that mimics music expertise.
     ScoreIterator cursor(score);                                                 As outlined in the abstract and in the introduction above, a
     or ScoreIterator cursor(score, 1);                                     tree is built from the question text, with a template at each tree
                                                                            leaf that contains matching criteria for each atomic element of the
      The first instance creates an iterator that initially points to the   search. An example is where a template is used for matching a
start and can be used to visit all of the objects in a score in time        note or a rest in the score. Such templates may additionally
order. This is an appropriate iterator for harmonic analysis. The           contain contextual information relating to factors such as bar
second form has an additional parameter and is used to iterate a            range, clef and key signature identity. Note and rest templates may
single stave; above the stave with identifier 1 is selected.                be grouped into larger clauses, corresponding to lists in the
      The iterator can step through all of the objects in the score         original query such as ‘A4 C5 F4’. Such lists are used for both
using the step member function. The step function returns a value           sequential and harmonic searches. Other types of query clauses
true following a test that the succeeding object exists. The                contain single items such as Alberti basses, pedals, cadences,
following code skeleton makes all objects available, in sequence            textures and also named chords corresponding to texts such as
to any code that replaces the ellipsis.                                     ‘tonic’ or ‘IIa’. Each such clause is matched against a score by a
                                                                            specialized component.
     while ( cursor.step()) {...}                                                 A single query with search text ‘A B’ is performed as
                                                                            follows. For illustration purposes, the score is assumed to have
    If it is required to visit only the notes in the score, a               only one stave containing a single melodic line:
parameter may be given to the step function as in the following
code to count the notes in a score.                                         1.   Score is imported into CPNView.
                                                                            2.   A search tree is created with two template nodes, the first one
     long count = 0;                                                             for identifying the note ‘A’ and the second for ‘B’.
     while ( cursor.step(NOTE)) {count++;.}                                 3.   A score iterator ‘SI1’ is created and positioned at the first
                                                                                 note of the score.
      A locate member may be used to place the score iterator in            4.   ‘SI2’, a copy of ‘SI1’ is created.
an arbitrary position. For example the iterator may be positioned           5.   If the note at ‘SI2’ is not ‘A’, move to (9).
at the start of bar 20 by means of                                          6.   Advance ‘SI2’ to the next note.
                                                                            7.   If the note ‘SI2’ is not ‘B’, move to (9).
     cursor.locate(BARLINE, 200);                                           8.   Output the result.
                                                                            9.   Finish if no more notes are available, else increment ‘SI1’ to
     The ScoreIterator object has a comprehensive range of                       point to the next note and then go to (4).
member functions to retrieve the information that is contained
within the score.                                                                The code that implements a generalized version of the above
     A query that searches for all of the D notes and prints details        is more complex. It handles note lists of any size, handles search
of each note arrived at is achieved by                                      ranges corresponding to query text entries such as ‘between bars 5
                                                                            and 6’ or ‘in the soprano’ and handles multiple staves and
     while ( cursor.step(NOTE))                                             multiple sequential paths such as is found in keyboard music.
     if ( cursor.getAlpha() == 'D' )                                             Components are available for the recognition of vertical
     cout << cursor << “\n”;                                                combinations of notes and for other constructions such as those
                                                                            corresponding to texts such as ‘major triad’, ‘dominant chord’ and
     CPNView also has a set of components that facilitate                   ‘perfect cadence’. In some cases components may invoke other
processing musical information. They include container/iterator             components. For example the component to identity cadences will
classes for Lists, Sets and Stores.                                         also invoke key-finding as well as chord-identifying components.
     A specialised class exists for calculating pitch class sets. The            While examples of coding that do substantial work are too
pitch class object is based on the classification system of Alan            long for inclusion here, the following is used to give a flavor of
Forte[6], that has been modified for the classifying tonal sets,            how CPNView programming works. It is based on one existing
such as those that occur in scales, modes and in harmony[2].                component used to check if the score iterator position in cursor is
within the query template range. The template is called templ, and    3.   A search in the fig.4 score for G3 B4 F4 E4 D4 C#4 should
the range is between the values in startBarNo and endBarNo. The            yield [3/4,2,23:1-23:6], although this solution is forbidden in
cursor object has a member function called getCurrentBarNo()               the current C@merata rules that prohibit the identification of
that retrieves the relevant bar number for the iterator. Where no          melodic sequences between notes on different staves, a rule
range is specified the template fields have a value of -1.                 that is reasonable, provided it is not applied to keyboard
                                                                           notation.
bool filterRange(ScoreIterator & cursor, Template templ)
{
   long scoreBarNo = cursor.getCurrentBarNo();
   if ( templ.startBarNo != -1 &&
        thisBarNo < nrst.startBarNo) return false;
   if ( nrst.endBarNo != -1 &&
        thisBarNo   > nrst.endBarNo)   return false;
   return true;
}


      The existing function used is a bit more complicated in order
to allow for variation in the ways that bar numbers get counted in             Figure 4. Scarlatti_k30 taken from C@merata 2015.
the MusicXML scores used in C@merata.
      All of the software is deterministic. Templates are formed      4.   Search texts such as ‘tonic triad’ or ‘triad of E major’ where
from the queries and in cases where a query has not been                   the component notes have a common starting and ending
envisaged beforehand no action takes place. The structure of               point, with no other pitch class present, are relatively easy to
queries is based largely on examples from the task description.            handle. However there are many cases where the identity of a
      The execution of queries in some cases involves taking               chord can be maintained although notes foreign to it are
‘unsafe’ short cuts that have worked with previous C@merata                present. Suspensions, retardations, passing notes either
challenges. For instance the identification of melodic segments is         accented or unaccented, and pedals or inverted pedals are
done in a relatively unsophisticated way, by searching for                 such cases. Context plays a major role here. Metrical
unbroken sequences of notes of sufficient length, bounded is some          considerations and the behavior of preceding or following
way, such as by rests. This does not take into account the full            notes have to be taken into account.
range of possibilities for melody structuring. Also key
identification is performed using the short cut of examining final    5.   Keyboard scores present additional challenges for voice-
chords and key signatures. One interesting challenge is presented          leading and chord identification tasks. In many cases the
in question1 59, ‘passing modulation to B minor’. Answering this           music textures can be viewed as having an underlying
query requires a tonality-tracking component. Another                      traditional 4-part harmonic structure, when instrumental
challenging case can be found in question 12, ‘imitation between           idioms such as octave doubling and the spreading out of
Sopranos and Clarinet in measures 110-119’, that requires the              chords and crossing of staves are allowed for. Figure 4 shows
implementation of a melodic similarity algorithm.                          a case where the middle melodic line moves from bass to
                                                                           treble clef. Figure 5 illustrates that use of spread out chords
3. MATCHING QUERY TO SCORES                                                that conform to good voice leading rules.
     The complexity of processing a search depends on both the
score and on the nature of the query. An otherwise simple task
applied to the score of a hymn tune might prove complex when
applied to a piano score or to a score with transposing
instruments. Some of the issues identified are listed below:

1.   Query ‘A3’. This specifies one of the simplest possible                   Figure 5. Mozart piano sonata no.5, first movement.
     search tasks. However when applied to a score with
     transposing instruments, it yields two valid interpretations.         The melody has a chordal accompaniment with G major root
     The issue is whether the search is for a notated or for a             position in bar 1, D dominant seventh second inversion in
     sounding A3.                                                          bar 2 continued into bar 3 but falling its first inversion and
                                                                           returning to G major in bar 4.
2.   Query ‘Bb4 G4’. This is a simple query when applied to a
     score with monophonic staves. Were it used to search a score          There is a clear bass line G in bar 1 A in bar 2, F# in bar 3
     with multiple simultaneous notes on some of the staves, an            and G in bar 4.
     issue arises on whether or not voice-leading considerations
     are taken into account when identifying all Bb4 that are              Two middle parts can be identified, the upper one on D
     adjacent in time to G4. Where voice leading is not taken into         throughout, and the lower one with B in bar 1, C in bar 2, A
     account the query ‘Bb4 G4’ applied to Figure 1 below would            in bar 3 and B in bar 4. All of these lines are well behaved,
     yield [3/4,2,23:1-23:2]. The nub of the problem here is that          avoiding forbidden parallelisms. Similar considerations are
     much keyboard writing is based on historic practices of part          relevant to instance Alberti basses and some arpeggios.
     writing from previous centuries. In the Scarlatti score
     fragment below, three parts or voices may be identified with     6.   The C@merata exercise provides an opportunity for testing
     pitch sequences (Bb4 C#5 D5 E5 F#5 G5) in the top part,               definitions of music terms. The question arises regarding the
     (G3 B4 F4 E4 D4 C#4) in the middle part and (G3) in the               meaning of some music terms. A designer of recognition of
     lower part. Hence the answer [3/4,2,23:1-23:2] is incorrect.          software might start with a definition from a music
                                                                           dictionary. While a dictionary definition gives necessary
     conditions, they frequently leave borderline cases
     unresolved, or hedge around a definition with words like
     ‘usually’ or ‘normally’. The definition of ‘chord’, for
     example may include statements such as that it ‘usually
     contains more than two notes’. This reveals that there is a
     level of complexity involved that cannot be put into a
     concise manner suitable for a dictionary definition. An
     exploration of the use of composition textbooks could prove                              x       x
     beneficial.                                                                                  Figure 9. sonata02-4

      One way of examining some of the issues here is to attempt           Figure      Initial Responses        Arpeggio Appropriate
to find how musicians conceptualise a term.                                   s            ‘arpeggio’
      A short exploration of some of the issues involved is                   4                20%                        60%
undertaken here in an attempt to design a recognizer for                      5                20%                       100%
‘arpeggio’, using instances from questions in C@merata 2015 and
2016.                                                                         6                80%                       100%
      One music dictionary definition of ‘arpeggio’ is:                       7                80%                       100%

     “A chord ‘spread’ – i.e. the note heard one after the other            Table 1. Participant responses to arpeggios in Figures 4-7
     from the bottom upwards, or sometimes from the top
     downwards.” [5].                                                       The majority of the respondents agreed with the designation
                                                                        except in the case of figure 6. Two of the participants
      Interestingly, the answers in the Gold Standard 2015 are in       conceptualized an arpeggio as having at least four notes, requiring
accordance with the dictionary definition.                              one octave to be present. One found the term ‘broken chords’ to
      While the definition seems to meet the necessary conditions       be more appropriate to the configuration in figure 6.
for an arpeggio, the question arises whether the definition is              Following this two cases were presented where the arpeggio
sufficient. It depends on the meaning of ‘chord’ that itself can be     candidate notes lay in larger context. In the first case the
somewhat problematic, allowing for the possibility of a two-note        candidate arpeggio was three notes within a melodic line; in the
arpeggio. Additionally, if an arpeggio-like construct is part of a      second case it lay in a larger broken chord context.
larger structure, such as a broken chord, can it still be regarded as
an arpeggio?                                                                              x       x
      In a brief exploration, responses were elucidated from five
practicing musicians to extracts used in C@merata questions. All
respondents professional musicians with keyboard competence
and were mainly from the teaching community. Two have
substantial composing experience. The respondents were not
initially aware that arpeggio recognition was at issue.
                                                                                    Figure 10. clementi_sonatina_op36_no1_v2




                      x               x
            Figure 6. clementi_sonatina_op36_no1_v2
                                                                                       x                x
                                                                              Figure 11. bach_violin_sonata_no1_bwv1001_presto
                                                 x          x

                                                                                      Figure             Agree Arpeggio
                                                                                         s                Designation
          Figure 7. bach_violin_sonata_no1_bwv1001_presto                               10                    40%
                                                                                        11                    40%
                                       x          x
                                                                           Table 2. Participant responses to arpeggios in Figures 10-11


                                                                             The opinion of the majority was not in agreement with the
                                                                        dictionary definition of ‘arpeggio’. Interesting both ‘yes’
                 Figure 8. mozart_an_chloe_k526                         respondents said that they did not make a great distinction
                                                                        between ‘broken chords’ and ‘arpeggios’, while the ‘no’
                                                                        respondents did. Excluding the respondents that were ambivalent
                                                                        about this distinction would have resulted in both figures 10 and
11 from being excluded from the category ‘arpeggio’ by all of               that feedback is available to correct and determine suitability.
remaining respondents.                                                      Previously used scores that have proved satisfactory could be
     This exercise highlights a problem in the creation of                  re-used with new questions without compromising
recognition algorithms. It would make for an interesting                    challenges in any way.
investigation to test instances of performed music for the
identification of features such as arpeggios.                          2.   Gold Standard: An intensive study of the 2015 challenge
                                                                            questions revealed problems with the contents of the Gold
                                                                            Standard document: The answers for approximately 4% of
4. FUTURE CHALLENGES FOR SCORE                                              the questions were incorrect, while an approximately 15%
SEARCHING                                                                   more valid answers were absent from the Gold Standard
     Solving C@merata queries involve challenges in both the                document. Careful visual inspection of each answer in the
interpretation of natural language concepts and in music analysis.          score in conjunction with adherence to the guideline
Music XML score representation is poorly structured. The lack of            document should minimize or eliminate the presence of
basic one-to-one correspondences from score objects to Music                incorrect answers if done in time, before the challenge is
XML entries is at the root of many problems. One other                      activated. The detection of missing answers will prove much
requirement is that score entities are modeled in an objective way,         more difficult. By publishing the Gold Standard following
a requirement not met by the Common Music Notation part of                  the deadline for submitting answers and giving participants a
MEI [6].                                                                    role in formulating revisions followed by re-scoring of all
     The following points should be considered in the design in             answers should be considered. Alternate to this is the
the future of C@merata. They focus on acquiring experience on               evaluation of all answers that are classified as incorrect
the fundamental fabric of music scores.                                     against the printed score.

1.   Voice leading: Explore the applicability of the rules of
     counterpoint. Initially use only vocal or other scores with one
     voice per stave. Focus the tasks on the first four species of
     counterpoint, by basing questions on cases such as parallel
     fifths and octaves, exposed octaves, sequences of thirds and
     fifths, false relations and others.
2.   Harmony: Explore the problems of identifying chords with
     foreign notes present.
3.   Devote only a small number of questions to keyboard scores.
4.   For keyboard scores provide questions to identify the
     underlying harmonies in broken chords.
5.   In keyboard scores use questions to explore underlying voice
     leading.

5. PRACTICAL CONSIDERATIONS
    A number of steps are proposed on how C@merata might be
improved in future.

1.   Ensure Music XML sources are error-free: In both the 2015
     and 2016 challenges many scores were found with errors and
     omissions. 14 of the 20 2016 scores could not be processed
     initially in CPNView. While it has not been possible to
     identify problems to date, it was possible to hand-edit some
     scores to make them useable. ‘Eroica’, ‘Weep, O mine eyes’
     and ‘Am die Musik’ had no time signatures. An additional
     mid-bar measure number appeared in bar 5 of Chor002. In
     Sonata02-4 tuplets were not properly formatted, and
     sextuplets were incorrectly encoded; grace note heads appear
     as whole notes. Strange part and voice numbers appear in
     quartet10-3_m21i,       Inven01_m21,        quartet10-3_m21,
     quartet12-1_m21, sometimes in a form that looks like 32-
     character hexadecimal numbers, where numbers such as 1, 2
     and 3 are normal in Music XML.

     The 2015 scores fared a bit better, but problems were found
     including scores with invalid bar lengths, missing rests, bar
     numbering and others.

     Much of this can be avoided by circulating scores to
     participants well in advance of the C@merata challenge so
                                                                       Walter B. Hewlett and Eleanor Selfridge-Field (Cambridge
6. REFERENCES                                                          Massachusetts and London, The MIT Press, 1998), pp. 65-
[1] Ó Maidin, D.S. 1995. A Programmer's Environment for                72.
    Music Analysis. PhD Thesis, University College Cork, 1995.
                                                                 [4]    Forte, A. 1973. The Structure of Atonal Music. (New Haven
[2] Ó Maidin, D.S. and Cahill, M. 2001. Score Processing for           and London: Yale University Press, 1973).
    MIR. In Proceedings of the International Society for Music
                                                                 [5] Sholes, P.A. 1974. The Concise Oxford Dictionary of Music.
    Information Retrieval, Bloomington, Indiana 2001, pp.59-
                                                                     (London, Oxford University Press).
    64.
                                                                 [6] Music Encoding Initiative Guidelines, Version 3.0.0. on
[3] Ó Maidin, D.S. 1998. A Geometrical Algorithm for Melodic
                                                                     http://music-encoding.org/downloads/guide
    Difference. In Computing and Musicology II. Melodic
    Similarity. Concepts, Procedures, and Applications, ed