OMDN at the C@merata 2016 Task: Approaches to and Issues Arising from Answering Natural Language Questions about Music Scores Donncha S. Ó Maidín Department of CSIS University of Limerick Limerick, Ireland Donncha.OMaidin@ul.ie ABSTRACT represented in single templates. In addition to carrying details of The query is modeled as a parse tree where clauses from the query the target entity for matching, a template also carries relevant form children of the tree’s root. Children of each clause, in turn contextual information such as bar ranges, clef and instrument represent the query’s atomic units. A complete search involves designations wherever appropriate. visiting all score objects that are potential starting points for a The following examples use [>], [^], [] to represent match. Atomic units of the parse tree are compared, in sequence, respectively sequential [>], simultaneous [^] or terminal with corresponding score objects. From each score object, relationships in nodes. appropriate paths are explored to find objects that match the sequence of tree’s atomic units. The parse tree is traversed in a depth first order. Iteration paths through the score follow either vertical or horizontal routes, the former for chord or harmonic interval comparisons, and the latter for melodic comparisons. Iteration between clauses results in the selection of new score staring points, appropriate to the temporal relationship between clauses. Results were limited on account of problems with importing some of the MusicXML files. Figure 1. ‘D4 G4 in sixteenth notes’ 1. INTRODUCTION A natural language query is first split into component queries X and Y by scanning the query for one of the following structures: • X followed by Y • X against Y Figure 2. ‘chord Eb Ab Bb’ • simultaneous X and Y • X at the same time as Y • X followed [distance] by Y • X For structures 1 and 5 queries X and Y are searched in sequence, while for structures 2, 3, and 4, X and Y are searched in parallel. Units within X or Y in turn result in sequential or parallel searches between their contained atomic units. A sequential Figure 3.‘D4 E4 F#4 followed by the chord B3 D4 G4 search is involved where the atomic units in X or Y are notes of a melody or melodic intervals; a parallel search is required where X or Y contains notes of a chord or a harmonic interval. Each atomic unit is represented by a template that contains 2. CPNView details of all the necessary requirements for a unit match with a Common Practice Notation View, or CPNView is used. score object. An instance of an atomic unit may be a note, a rest or CPNView formed the main topic of the author’s PhD it may specify a more complex object or texture such as an dissertation[1]. The name CPNView was not used in the arpeggio or scale. Templates also contain information on how dissertation, but appeared in later publications [2][3][4]. they are linked together. Lists of templates are used to represent CPNView models a score as an objected-oriented container. either sequential or parallel sets of notes or rests. Arpeggios or The CPNView model is designed to provide a value-neutral scales are examples of more complex atomic objects that are representation of elements of the score notation. The score's internal content is available using iterators. The iterator object keeps track of the context in which an object resides in addition to Copyright is held by the author/owner(s). providing access to the score object itself through its member MediaEval 2016 Workshop, October 20-21, 2016, Amsterdam. functions. The iterators and their member functions can be viewed as paralleling the actions of a human reader. Typically, a human For processing each question, the initial phases consist of might access a score from the start and read through it serially in two separate steps: (1) the importation of the music file to build a time sequence. For some purposes the reader may traverse the CPNView representation of the score, and (2) the creation of a score along one of the staves. Where a harmonic or polyphonic tree of matching templates from the query text. Following these texture is of interest it is appropriate to access the score as a input phases, the MusicXML file plays no further role. All sequence of vertical slices, in time order. subsequent processing is performed on the CPNView representation. A score object is created by specifying a file path: In the CPNView score representation, MusicXML layout and ‘division’ information is absent. XML fields for items such as Score score(path); ‘mode’ and ‘voice’ are discarded as well. Inclusion of such would undermine the ideal of a neutral score representation. Instead of This model requires no user knowledge of how the score is relying on such added cues, the integrity of the analysis is best represented. CPNView representations are built by importation maintained by practices similar to those employed by musicians. from files in a number of different standard encodings. For example voice leading, especially in keyboard scores, is best Access to the internals of the score is facilitated by an garnered from visual cues such as note stem directions and from iterator object: knowledge of compositional principles. The challenge involved is to build software that mimics music expertise. ScoreIterator cursor(score); As outlined in the abstract and in the introduction above, a or ScoreIterator cursor(score, 1); tree is built from the question text, with a template at each tree leaf that contains matching criteria for each atomic element of the The first instance creates an iterator that initially points to the search. An example is where a template is used for matching a start and can be used to visit all of the objects in a score in time note or a rest in the score. Such templates may additionally order. This is an appropriate iterator for harmonic analysis. The contain contextual information relating to factors such as bar second form has an additional parameter and is used to iterate a range, clef and key signature identity. Note and rest templates may single stave; above the stave with identifier 1 is selected. be grouped into larger clauses, corresponding to lists in the The iterator can step through all of the objects in the score original query such as ‘A4 C5 F4’. Such lists are used for both using the step member function. The step function returns a value sequential and harmonic searches. Other types of query clauses true following a test that the succeeding object exists. The contain single items such as Alberti basses, pedals, cadences, following code skeleton makes all objects available, in sequence textures and also named chords corresponding to texts such as to any code that replaces the ellipsis. ‘tonic’ or ‘IIa’. Each such clause is matched against a score by a specialized component. while ( cursor.step()) {...} A single query with search text ‘A B’ is performed as follows. For illustration purposes, the score is assumed to have If it is required to visit only the notes in the score, a only one stave containing a single melodic line: parameter may be given to the step function as in the following code to count the notes in a score. 1. Score is imported into CPNView. 2. A search tree is created with two template nodes, the first one long count = 0; for identifying the note ‘A’ and the second for ‘B’. while ( cursor.step(NOTE)) {count++;.} 3. A score iterator ‘SI1’ is created and positioned at the first note of the score. A locate member may be used to place the score iterator in 4. ‘SI2’, a copy of ‘SI1’ is created. an arbitrary position. For example the iterator may be positioned 5. If the note at ‘SI2’ is not ‘A’, move to (9). at the start of bar 20 by means of 6. Advance ‘SI2’ to the next note. 7. If the note ‘SI2’ is not ‘B’, move to (9). cursor.locate(BARLINE, 200); 8. Output the result. 9. Finish if no more notes are available, else increment ‘SI1’ to The ScoreIterator object has a comprehensive range of point to the next note and then go to (4). member functions to retrieve the information that is contained within the score. The code that implements a generalized version of the above A query that searches for all of the D notes and prints details is more complex. It handles note lists of any size, handles search of each note arrived at is achieved by ranges corresponding to query text entries such as ‘between bars 5 and 6’ or ‘in the soprano’ and handles multiple staves and while ( cursor.step(NOTE)) multiple sequential paths such as is found in keyboard music. if ( cursor.getAlpha() == 'D' ) Components are available for the recognition of vertical cout << cursor << “\n”; combinations of notes and for other constructions such as those corresponding to texts such as ‘major triad’, ‘dominant chord’ and CPNView also has a set of components that facilitate ‘perfect cadence’. In some cases components may invoke other processing musical information. They include container/iterator components. For example the component to identity cadences will classes for Lists, Sets and Stores. also invoke key-finding as well as chord-identifying components. A specialised class exists for calculating pitch class sets. The While examples of coding that do substantial work are too pitch class object is based on the classification system of Alan long for inclusion here, the following is used to give a flavor of Forte[6], that has been modified for the classifying tonal sets, how CPNView programming works. It is based on one existing such as those that occur in scales, modes and in harmony[2]. component used to check if the score iterator position in cursor is within the query template range. The template is called templ, and 3. A search in the fig.4 score for G3 B4 F4 E4 D4 C#4 should the range is between the values in startBarNo and endBarNo. The yield [3/4,2,23:1-23:6], although this solution is forbidden in cursor object has a member function called getCurrentBarNo() the current C@merata rules that prohibit the identification of that retrieves the relevant bar number for the iterator. Where no melodic sequences between notes on different staves, a rule range is specified the template fields have a value of -1. that is reasonable, provided it is not applied to keyboard notation. bool filterRange(ScoreIterator & cursor, Template templ) { long scoreBarNo = cursor.getCurrentBarNo(); if ( templ.startBarNo != -1 && thisBarNo < nrst.startBarNo) return false; if ( nrst.endBarNo != -1 && thisBarNo > nrst.endBarNo) return false; return true; } The existing function used is a bit more complicated in order to allow for variation in the ways that bar numbers get counted in Figure 4. Scarlatti_k30 taken from C@merata 2015. the MusicXML scores used in C@merata. All of the software is deterministic. Templates are formed 4. Search texts such as ‘tonic triad’ or ‘triad of E major’ where from the queries and in cases where a query has not been the component notes have a common starting and ending envisaged beforehand no action takes place. The structure of point, with no other pitch class present, are relatively easy to queries is based largely on examples from the task description. handle. However there are many cases where the identity of a The execution of queries in some cases involves taking chord can be maintained although notes foreign to it are ‘unsafe’ short cuts that have worked with previous C@merata present. Suspensions, retardations, passing notes either challenges. For instance the identification of melodic segments is accented or unaccented, and pedals or inverted pedals are done in a relatively unsophisticated way, by searching for such cases. Context plays a major role here. Metrical unbroken sequences of notes of sufficient length, bounded is some considerations and the behavior of preceding or following way, such as by rests. This does not take into account the full notes have to be taken into account. range of possibilities for melody structuring. Also key identification is performed using the short cut of examining final 5. Keyboard scores present additional challenges for voice- chords and key signatures. One interesting challenge is presented leading and chord identification tasks. In many cases the in question1 59, ‘passing modulation to B minor’. Answering this music textures can be viewed as having an underlying query requires a tonality-tracking component. Another traditional 4-part harmonic structure, when instrumental challenging case can be found in question 12, ‘imitation between idioms such as octave doubling and the spreading out of Sopranos and Clarinet in measures 110-119’, that requires the chords and crossing of staves are allowed for. Figure 4 shows implementation of a melodic similarity algorithm. a case where the middle melodic line moves from bass to treble clef. Figure 5 illustrates that use of spread out chords 3. MATCHING QUERY TO SCORES that conform to good voice leading rules. The complexity of processing a search depends on both the score and on the nature of the query. An otherwise simple task applied to the score of a hymn tune might prove complex when applied to a piano score or to a score with transposing instruments. Some of the issues identified are listed below: 1. Query ‘A3’. This specifies one of the simplest possible Figure 5. Mozart piano sonata no.5, first movement. search tasks. However when applied to a score with transposing instruments, it yields two valid interpretations. The melody has a chordal accompaniment with G major root The issue is whether the search is for a notated or for a position in bar 1, D dominant seventh second inversion in sounding A3. bar 2 continued into bar 3 but falling its first inversion and returning to G major in bar 4. 2. Query ‘Bb4 G4’. This is a simple query when applied to a score with monophonic staves. Were it used to search a score There is a clear bass line G in bar 1 A in bar 2, F# in bar 3 with multiple simultaneous notes on some of the staves, an and G in bar 4. issue arises on whether or not voice-leading considerations are taken into account when identifying all Bb4 that are Two middle parts can be identified, the upper one on D adjacent in time to G4. Where voice leading is not taken into throughout, and the lower one with B in bar 1, C in bar 2, A account the query ‘Bb4 G4’ applied to Figure 1 below would in bar 3 and B in bar 4. All of these lines are well behaved, yield [3/4,2,23:1-23:2]. The nub of the problem here is that avoiding forbidden parallelisms. Similar considerations are much keyboard writing is based on historic practices of part relevant to instance Alberti basses and some arpeggios. writing from previous centuries. In the Scarlatti score fragment below, three parts or voices may be identified with 6. The C@merata exercise provides an opportunity for testing pitch sequences (Bb4 C#5 D5 E5 F#5 G5) in the top part, definitions of music terms. The question arises regarding the (G3 B4 F4 E4 D4 C#4) in the middle part and (G3) in the meaning of some music terms. A designer of recognition of lower part. Hence the answer [3/4,2,23:1-23:2] is incorrect. software might start with a definition from a music dictionary. While a dictionary definition gives necessary conditions, they frequently leave borderline cases unresolved, or hedge around a definition with words like ‘usually’ or ‘normally’. The definition of ‘chord’, for example may include statements such as that it ‘usually contains more than two notes’. This reveals that there is a level of complexity involved that cannot be put into a concise manner suitable for a dictionary definition. An exploration of the use of composition textbooks could prove x x beneficial. Figure 9. sonata02-4 One way of examining some of the issues here is to attempt Figure Initial Responses Arpeggio Appropriate to find how musicians conceptualise a term. s ‘arpeggio’ A short exploration of some of the issues involved is 4 20% 60% undertaken here in an attempt to design a recognizer for 5 20% 100% ‘arpeggio’, using instances from questions in C@merata 2015 and 2016. 6 80% 100% One music dictionary definition of ‘arpeggio’ is: 7 80% 100% “A chord ‘spread’ – i.e. the note heard one after the other Table 1. Participant responses to arpeggios in Figures 4-7 from the bottom upwards, or sometimes from the top downwards.” [5]. The majority of the respondents agreed with the designation except in the case of figure 6. Two of the participants Interestingly, the answers in the Gold Standard 2015 are in conceptualized an arpeggio as having at least four notes, requiring accordance with the dictionary definition. one octave to be present. One found the term ‘broken chords’ to While the definition seems to meet the necessary conditions be more appropriate to the configuration in figure 6. for an arpeggio, the question arises whether the definition is Following this two cases were presented where the arpeggio sufficient. It depends on the meaning of ‘chord’ that itself can be candidate notes lay in larger context. In the first case the somewhat problematic, allowing for the possibility of a two-note candidate arpeggio was three notes within a melodic line; in the arpeggio. Additionally, if an arpeggio-like construct is part of a second case it lay in a larger broken chord context. larger structure, such as a broken chord, can it still be regarded as an arpeggio? x x In a brief exploration, responses were elucidated from five practicing musicians to extracts used in C@merata questions. All respondents professional musicians with keyboard competence and were mainly from the teaching community. Two have substantial composing experience. The respondents were not initially aware that arpeggio recognition was at issue. Figure 10. clementi_sonatina_op36_no1_v2 x x Figure 6. clementi_sonatina_op36_no1_v2 x x Figure 11. bach_violin_sonata_no1_bwv1001_presto x x Figure Agree Arpeggio s Designation Figure 7. bach_violin_sonata_no1_bwv1001_presto 10 40% 11 40% x x Table 2. Participant responses to arpeggios in Figures 10-11 The opinion of the majority was not in agreement with the dictionary definition of ‘arpeggio’. Interesting both ‘yes’ Figure 8. mozart_an_chloe_k526 respondents said that they did not make a great distinction between ‘broken chords’ and ‘arpeggios’, while the ‘no’ respondents did. Excluding the respondents that were ambivalent about this distinction would have resulted in both figures 10 and 11 from being excluded from the category ‘arpeggio’ by all of that feedback is available to correct and determine suitability. remaining respondents. Previously used scores that have proved satisfactory could be This exercise highlights a problem in the creation of re-used with new questions without compromising recognition algorithms. It would make for an interesting challenges in any way. investigation to test instances of performed music for the identification of features such as arpeggios. 2. Gold Standard: An intensive study of the 2015 challenge questions revealed problems with the contents of the Gold Standard document: The answers for approximately 4% of 4. FUTURE CHALLENGES FOR SCORE the questions were incorrect, while an approximately 15% SEARCHING more valid answers were absent from the Gold Standard Solving C@merata queries involve challenges in both the document. Careful visual inspection of each answer in the interpretation of natural language concepts and in music analysis. score in conjunction with adherence to the guideline Music XML score representation is poorly structured. The lack of document should minimize or eliminate the presence of basic one-to-one correspondences from score objects to Music incorrect answers if done in time, before the challenge is XML entries is at the root of many problems. One other activated. The detection of missing answers will prove much requirement is that score entities are modeled in an objective way, more difficult. By publishing the Gold Standard following a requirement not met by the Common Music Notation part of the deadline for submitting answers and giving participants a MEI [6]. role in formulating revisions followed by re-scoring of all The following points should be considered in the design in answers should be considered. Alternate to this is the the future of C@merata. They focus on acquiring experience on evaluation of all answers that are classified as incorrect the fundamental fabric of music scores. against the printed score. 1. Voice leading: Explore the applicability of the rules of counterpoint. Initially use only vocal or other scores with one voice per stave. Focus the tasks on the first four species of counterpoint, by basing questions on cases such as parallel fifths and octaves, exposed octaves, sequences of thirds and fifths, false relations and others. 2. Harmony: Explore the problems of identifying chords with foreign notes present. 3. Devote only a small number of questions to keyboard scores. 4. For keyboard scores provide questions to identify the underlying harmonies in broken chords. 5. In keyboard scores use questions to explore underlying voice leading. 5. PRACTICAL CONSIDERATIONS A number of steps are proposed on how C@merata might be improved in future. 1. Ensure Music XML sources are error-free: In both the 2015 and 2016 challenges many scores were found with errors and omissions. 14 of the 20 2016 scores could not be processed initially in CPNView. While it has not been possible to identify problems to date, it was possible to hand-edit some scores to make them useable. ‘Eroica’, ‘Weep, O mine eyes’ and ‘Am die Musik’ had no time signatures. An additional mid-bar measure number appeared in bar 5 of Chor002. In Sonata02-4 tuplets were not properly formatted, and sextuplets were incorrectly encoded; grace note heads appear as whole notes. Strange part and voice numbers appear in quartet10-3_m21i, Inven01_m21, quartet10-3_m21, quartet12-1_m21, sometimes in a form that looks like 32- character hexadecimal numbers, where numbers such as 1, 2 and 3 are normal in Music XML. The 2015 scores fared a bit better, but problems were found including scores with invalid bar lengths, missing rests, bar numbering and others. Much of this can be avoided by circulating scores to participants well in advance of the C@merata challenge so Walter B. Hewlett and Eleanor Selfridge-Field (Cambridge 6. REFERENCES Massachusetts and London, The MIT Press, 1998), pp. 65- [1] Ó Maidin, D.S. 1995. A Programmer's Environment for 72. Music Analysis. PhD Thesis, University College Cork, 1995. [4] Forte, A. 1973. The Structure of Atonal Music. (New Haven [2] Ó Maidin, D.S. and Cahill, M. 2001. Score Processing for and London: Yale University Press, 1973). MIR. In Proceedings of the International Society for Music [5] Sholes, P.A. 1974. The Concise Oxford Dictionary of Music. Information Retrieval, Bloomington, Indiana 2001, pp.59- (London, Oxford University Press). 64. [6] Music Encoding Initiative Guidelines, Version 3.0.0. on [3] Ó Maidin, D.S. 1998. A Geometrical Algorithm for Melodic http://music-encoding.org/downloads/guide Difference. In Computing and Musicology II. Melodic Similarity. Concepts, Procedures, and Applications, ed