=Paper=
{{Paper
|id=Vol-1984/Mediaeval_2017_paper_36
|storemode=property
|title=The CLAS System at the MediaEval 2017 C@merata Task
|pdfUrl=https://ceur-ws.org/Vol-1984/Mediaeval_2017_paper_36.pdf
|volume=Vol-1984
|authors=Stephen Wan
|dblpUrl=https://dblp.org/rec/conf/mediaeval/Wan17
}}
==The CLAS System at the MediaEval 2017 C@merata Task==
The CLAS System at the MediaEval 2017 C@merata Task Stephen Wan CSIRO Data61, Australia Stephen.Wan@csiro.au ABSTRACT query task to structured data, where the data has a temporal element and can be decomposed into multiple aligned streams of In this paper, we describe the 2017 CLAS system as entered into data. In terms of music, these streams correspond to different the C@merata shared task. This year, our aim was to use the musical instruments or parts, temporally aligned. This year, challenge as a case study of how one manages natural language however, we focused on using the shared task as a case study on queries to structured data, and so we focused on the use of a natural language queries to structured data using database NoSQL database for managing the retrieval of passages from technologies. musical scores. We also extended the 2015 CLAS system to The 2017 submission builds on the 2015 CLAS entry, which handle queries about harmonies between specific parts, repetition uses a feature-based Context-Free Grammar (CFG) to specify the of sequences, and thematic queries about sequences and imitation. controlled language for C@merata music queries. Parsing using Given that the queries were quite diverse, we explored the use of NLTK [4] provides a feature structure corresponding to the key paraphrase methods to transform queries into a canonical form, semantic elements of the query which is then used to retrieve where possible, which might then be parsed using a feature-based results. In the 2015 CLAS system, the feature structure was used CFG. Our system achieved a measure precision and recall of to find the matching events in the musical score. The Music21 [5] 0.122 and 0.26 respectively. library was used to transform the XML version of a score into an array of music events, each represented with a set of attribute- 1 INTRODUCTION value features. Events were then retrieved for a query, using In this paper, we describe the CLAS submission for the 2017 feature unification between the query feature structure and the C@merata shared task [1]. As in previous years, the task is one features of events. where a system must find portions of a musical score that match a natural language query. For example, the query “two eighth This year, we varied our approach in the following ways: notes, an eighth note rest, three eighth notes, an eighth note rest and three eighth notes in measures 68-80, all in the Violoncello” 1. Instead of making sequential passes through the (example from the 2016 data set) might be used to identify a music events for a piece of music, we used the nosql portion of the score, as specified by the starting and ending bar database (MongoDB) to store and retrieve musical numbers as well as the beat offsets in a manner prescribed in [1], events using mongoDB queries based on attribute- such as: value sets. Passage 2. We enhanced the 2015 feature-based CFG to • end_bar="79" recognise queries about: • end_beat_type="4" a. multiple parts • end_beats="4" b. repetition • end_divisions="1" c. sequences and themes • end_offset="1" • start_bar="78" 3. We used a paraphrase or near-paraphrase back-off • start_beat_type="4" stage if the original query could not be parsed • start_beats="4" Our system focuses on the queries based on music theory as • start_divisions="1" opposed to those that would rely on music interpretation. In • start_offset="1" preparation for the 2017 shared task, we used the gold standard data from 2014-2016 as software development regression tests. Our general approach in 2017 is consistent with the earlier This year, our CLAS system was able to achieve a measure CLAS submissions in 2014 [2] and 2015 [3], which viewed the precision and recall of 0.122 and 0.26 respectively. task as a Q&A problem with natural language queries posed in a controlled language. We view the task as a natural language Copyright is held by the owner/authors(s). MediaEval’17, September 2017, Dublin, Ireland MediaEval’17, 13-15 September 2017, Dublin, Ireland S. Wan 3 DATA STORAGE for creating chords at each offset was performed on the entire score, and then this list of chord objects was indexed. An 3.1 Indexing with MongoDB example of a note record is presented in Figure 2 and a chord Instead of making a single pass of a music score (iterating through record in Figure 3. all possible notes) to find matching events using feature For some queries, chords (or harmonies) across specific parts unification, we used a database to store and retrieve all music were required. To avoid computing every single permutation of events. parts, the specific combination was checked for at query time, and if it did not exist, the relevant parts were extracted from the XML, Specifically, we used four tables: merged into a temporary score and then the entire score 1. Titles (and global score attributes) “chordified”, and the results indexed dynamically. Querying then 2. Musical Events, one table per musical score resumed as above. 3. Sequences, one table per musical score The Sequences table for a score stores passages or sequences 4. Analysis, one table per musical score detected in the music. These were found by segmenting series of notes using rests, using a maximum sequence length of 8 bars. In the Titles (and global score attributes) database table, { information mapping from the XML music score filename to an "_id":ObjectId("598ddda11d41c896ba36332c"), internal ID was kept, in addition to global information "name":"A", (determined using Music21 functions) such as the time signature, "letter":"A", key signature, and the number parts. Examples of such records "accidental":"", are presented in Figure 1. "pitch_class":9, "octave":4, { "_id":ObjectId("598ddda11d41c896ba36332b"), "bar":1, "name":"air_from_handels_water_music_suite.xml", "offset":0, "time_signature":{ "length":0.75, "1":[ "part":0, 4, 4 "lyric":null, ] "freq":440, }, "articulation":[ "key":"F major", "DOWNBOW","BOWING","TECHNICALINDICATION","ARTICULAT "parts":4 ION","MUSIC21OBJECT","OBJECT" } { ], "_id":ObjectId("598dde0b1d41c896ba36364e"), "expression":null, "name":"and_the_glory_of_the_lord_from_handels_mess "solfeg":"", iah.xml", "type":"note", "time_signature":{ "dynamic":"piano", "1":[ 3, "tie":"", 4 "slur":"1", ] "voice":"VIOLIN I", }, "clef":"G", "key":"E major", "parts":18 "voice_num":"0", } "stream":"notes", "ordering":0 Figure 1: Example JSON records from the Titles table. } For each musical score, we dynamically created a Music Figure 2: Example of a note JSON record from the Music Events table. The table name is the unique identifier specified in Events table for “air_from_handels_water_music_ the Titles table. The table for a score stores music events which suite.xml” which in this case was stored in table could either be of a note or a chord type. Events contain offsets “db.T598ddda11d41c896ba36332b” (notice the identifier and other attributes. For note events, for a score consisting of following the “T” is the same as in the first record of Figure multiple parts, the notes of each part were read sequentially to 1). form the note events. For chord events, each part was also transformed into a series of Music21 Chord objects using the corresponding function in the Music21 library. Metadata for each chord was then stored in the table. Finally, the same functionality MediaEval’17, 13-15 September 2017, Dublin, Ireland S. Wan { { "_id":ObjectId("598ddda11d41c896ba36348a"), "_id":ObjectId("598de46b1d41c896ba374f8f"), "name":"C3-dominant seventh chord", "name":"ABS::++E-_3.0000_0.0000++D_1.0000_0.0000++ "bar":3, F_1.0000_1.0000++D_1.0000_2.0000++C_1.0000_0.0000++E- "offset":2, _1.0000_1.0000++C_1.0000_2.0000++B-_1.0000_0.0000", "length":0.75, "key_type":"absolute", "key":"F major", "seen_parts":[ "chord_fn":"5", "FLUTE 1 2", "chord_match":3, "inversion":0, "BASSOON 1 2" "bass":"C3", ], "root":"C3", "motif_length":8 "notes":[ } "B-", { "E", "_id":ObjectId("598de46b1d41c896ba374f90"), "G", "name":"DIA::++-2_1.0000_0.0000++3_1.0000_1.0000++ "C" -3_1.0000_2.0000++-2_1.0000_0.0000++3_1.0000_1.0000++ ], -3_1.0000_2.0000++-2_1.0000_0.0000", "notes_with_octave":[ "key_type":"diatonic", "B-4", "seen_parts":[ "E4", "FLUTE 1 2", "G3", "BASSOON 1 2" "C3" ], ], "ordered_pitch_classes":[ "motif_length":8 0, } 4, 7, Figure 4: A sequence in both its absolute note and diatonic 10 interval forms that occurs in two parts, here Flute and ], Bassoon from “beethoven_symphony_3_movement_iii_muse. "root_fn":"5", xml”. "function":"5", "type":"dominant seventh chord", { "raw_type":"dominant seventh chord", "_id":ObjectId("598de4691d41c896ba3748f2"), "ties":4, "passing":true, "name":"ABS::++E-_3.0000_0.0000++D_1.0000_0.0000++ "dynamic":null, F_1.0000_1.0000++D_1.0000_2.0000++C_1.0000_0.0000++E- "intervals":[ _1.0000_1.0000++C_1.0000_2.0000++B-_1.0000_0.0000", 0, "voice":"FLUTE 1 2", 2, "clef":"G", 3, "voice_num":"0", 4, "key_type":"ABS", 5, "type":"motif", 6 "start_bar":26, ], "start_offset":0, "interval_names":[ "end_bar":29, "Minor Tenth", "Perfect Fifth", "end_offset":1, "Major Sixth", "end_duration":2, "Perfect Unison", "note_length":8, "Major Tenth", "duration":10 "Diminished Fifth", } "Minor Fourteenth" ], Figure 5: Metadata for the sequence with "stream":"chords", id=598de46b1d41c896ba374f8f in the score “beethoven_ "ordering":19 symphony_3_movement_iii_muse.xml”. } Sequences were represented with unique identifiers Figure 3: A chord JSON record from the Music Events table representing the entirety of the passage, using each of three for “air_from_handels_water_music_suite.xml”. possible key generators: the unique names of the notes in the 3 MediaEval’17, 13-15 September 2017, Dublin, Ireland S. Wan sequence and their lengths (but not their octaves offsets), a variant interval classes between all note pairs in the chord. Each of these of this using relative displacement between notes in terms of intervals is stored, at indexing time, as a list (with an equivalence semitones, and a diatonic variant using interval classes. Each encoded for augmented fourths, diminished fifths, and tritones). sequence had an associated start and end point (specified as bar The mongoDB query is then a list membership test for the string and offset), which was stored in the table. name of the interval of interest. We note that the analysis of the The Analysis table for each score keeps track of which chord into its component intervals is provided by Music21 and the sequence unique identifiers occur in different parts of the score, success of this method depends on that library’s ability to allowing some representation of thematic sequences (those that correctly analyse the chord. For harmonic intervals and chords, appear in multiple parts). An example of a sequence from the we added search constraints like there needing to be at least 2 Analysis table is presented in Figure 4 and the corresponding notes in the chord event. metadata from the Sequence table is presented in Figure 5. For chord queries specifying the exact notes of the chord, this is implemented using tests for membership of the notes in the 3.2 Querying the Database chord event. Each record in the table (aside from the Titles table) is an event Cadences are treated as sequences of chord events. Tests are represented as a set of attribute-value features. Using the Python based on the features of adjacent chords such as its chord function programming language and the PyMongo1 library which provides (as a roman numeral) and the notes one would expect to see an interface to mongoDB 2 , queries are essentially comparisons (derived from the chord function). between dictionary objects. In this way, the queries are very Whenever string matches are required in the mongoDB query, similar to the feature unification method in the 2015 CLAS these are implemented as regular expressions to allow “fourth” to system. There is a subtle difference however. Feature unification match to “perfect fourth”, or “Violin” to match to “Violin 1”. would allow under-specifications on either of the two structures being compared to unify as long as there were no direct 4 QUERY EXTENSIONS contradictions. This is not the case with mongoDB queries. If a In this system, we extended the feature-based CFG and the query query is overly specific and includes attributes not present in the generation components to cater to three new types of queries: event, then a match is not possible. Thus, each feature structure 1. Harmonies between multiple parts first had to be transformed into an equivalent NoSQL query to 2. repetition account for this. 3. sequences and themes Querying for sequences of notes was performed by first using a query for the first note of the sequence and then performing an 4.1 Harmonies between Multiple Parts ordered series of queries, each checking if the event at the next The extension for queries indicating multiple parts specifically timestep corresponded to the relevant sequence note. This was relates to harmonies, such as “minor third between Quintus and handled through a recursive function that worked its way down a Tenor in bars 11-18”. In this case, the part names “Quintus” and list of elements in the query. “Tenor” are mapped to a generic symbol “PART_NAME”, which Using mongoDB to provide search facilities greatly increased can be resolved later to the original values. Special grammar rules the speed at which matches could be found, utilizing a database for part names in this configuration (“between part1 and part 2”) index for direct lookup instead of doing a sequentially pass of the are used to keep track of semantic features that flag that score for each query. processing of the query should use logic relating to a harmony between parts. When such flags are encountered during resolution 3.3 Forming NoSQL Queries of the query, the system first checks to see if the required If we are searching for a single note, then the query would be the combination of parts has been indexed in the database. If not, this set of attribute-value features for that note as represented by a then triggers the dynamic indexing of chords in the specified parts Python dictionary. Such queries were simple to implement with a (as mentioned above). NoSQL database like mongoDB. Others, however were more We note that queries such as “B4 in the left hand followed by complex. C5 in the right hand in bars 7-8” are covered by a mechanism For melodic intervals, these are treated as sequences. In this from the 2015 system in which the query is divided into two case, for every note in each part, we search for the absolute pitch portions (split by the phrase “followed by”). Each is an atomic and octave of a note that would correspond to the interval based query that could be found in different parts (here, left hand versus on the current note. Thus for melodic intervals, “search” becomes right hand, assuming a simple mapping from left to bass clef and a single pass through the score. right to treble clef). For harmonic intervals, we search through the chord events, which at the time of indexing is broken down into its component 4.2 Repetition Queries like “six crotchet notes repeated twice” are treated as a 1 https://api.mongodb.com/python/current/ sequence of “six crotchet notes” which is then copied a second 2 www.mongodb.com time. In this system, handling of repetition requires that the MediaEval’17, 13-15 September 2017, Dublin, Ireland S. Wan number of copies be specified. That is an unbounded repetition once acquired, to be used however they are obtained (by manual query, such as “alternating fourths and fifths in the Oboe in bars inspection or machine learning). 1-100” is not handled. To allow such queries, the system arbitrarily transforms such as query to be equivalent to “repeated 6 RESULTS AND ANALYSIS twice”. This copying of sequences extends the 2015 system Our overall approach for preparing this year’s submission was to which allowed copies of single notes to be specified in the query. use the gold standard results as regression tests, specifically using the 2016 test set for the development of new features. The recall 4.3 Sequences and Themes and precision at the beat level for the three preceding years of data To make use of the sequence and analysis tables described above, is shown in Table 1. While, this is the result of development on we extend our grammar to accept relevant queries, focusing on the 2014-2016 test sets, the performance is based on generic repeated sequences or imitation between parts. For such queries mechanisms without memorisation of answers. We note that to be resolved, we search the sequence and analysis tables these score for 2014 and 2015 are slightly under the official report outlined in Section 3. The analysis table shows when a sequence performance of the CLAS system for those years. This is due to has been repeated verbatim, with the same note names (but system differences between the 2017 system and the earlier allowing variation in the octave), or transposition with relative systems: the 2015 system was a blend of the 2014 system and the intervals between sequence notes, specified in both a diatonic and use of the feature-based CFG. For the 2017 system, the earlier chromatic form. chunking system of 2014 has been dropped altogether. The analysis table shows when any of these sequences has been repeated between parts, and includes metadata that allows trivial sequences below a certain length to be filtered out. Once Table 1: Performance on data from prior years. repeated sequences are found, the start and end offsets are retrieved from the relevant sequence table. Recall Precision 2016 0.387 0.350 5 SIMPLE PARAPHRASE AND CONSTRAINT 2015 0.565 0.541 RELAXATION 2014 0.786 0.720 In this system, we introduced a new method to allow graceful For the 2017 results, our system achieved the following scores: degradation from the case where the grammar did not cover the Beat Precision: 0.099 input query. In such a case, a prioritised set of paraphrase rules Beat Recall: 0.212 was used to see if the query could be transformed into a form that Measure Precision: 0.122 was covered by the grammar. Measure Recall: 0.260 These paraphrase rules included synonyms, such as the mapping from “Violin 1” and “Vln 1”, and phrasal equivalents, Table 2 reports the results by question type. We note that the such as “from bars X-Y” and “in bars X-Y”. system is does best for harmony queries, followed by melodic The rules also included “movement” of certain phrases such queries, as shown by the recall scores. This is unsurprising given that occurred at set positions. For example, constraints about the then the 2017 was developed with a focus on queries related to bars (“from bar X-Y”) was moved to the end, as was expected by music theory. For the more complex texture and synch queries, the grammar. the system does not do so well. We can thus think of the grammar covering the canonical form Although a system for sequences and themes was included this of a query. For example, constraints about multiple parts in year, we note that the method for segmenting sequences in this queries such as “cello and viola playing dotted minims an octave system is quite simplistic and may not correspond to analyses by a apart in bars 40-70” was mapped to “dotted minim octave musicologist. Furthermore, the grammar coverage for these between cello and viola in bars 40-70”. Similarly, for repetition queries was not exhaustive. in queries such as “repeated Bb4 whole note” was transformed into a canonical form like:. Finally, we encoded near paraphrases to handle concepts that Measure Measure Beat Beat were similar to those already covered by the grammar. For Recall Precision Recall Precision example, we used a mapping from “ascending in single steps” to 1 Melod 0.475 0.062 0.328 0.043 “ascending”. For some queries, the additional information was N melod 0.213 0.176 0.167 0.137 deemed to be of little value (given the default behaviour of the 1 harm 0.156 0.636 0.156 0.636 system) and was thus dropped. For example, “in a row” was N harm 0.304 0.263 0.290 0.250 simply deleted given that most sequence queries require that the Texture 0.103 0.176 0.103 0.176 target notes occur in series. Follow 0.052 0.133 0.026 0.067 Ideally, these rules would be learnt from data. We see this a Synch 0.000 0.000 0.000 0.000 future work. Here we simply provide a way for paraphrase rules, 5 MediaEval’17, 13-15 September 2017, Dublin, Ireland S. Wan 7 DISCUSSION The 2017 system is based on the feature-based CFG system from 2015. We note that the queries have become more and more complex to the extent that one might question if treating the queries as coming from a controlled language is still viable. This may be one reason why the performance of the system has degraded compared to previous year’s submissions. From software engineering point of view, we note that the grammar for the queries is increasingly difficult to manage with each subsequent year’s set of queries. Whether the paraphrase component is an effective way to manage this diversity of queries remains to be tested. The move to the mongoDB search engine did introduce efficiencies in running large sets of queries from prior years as regression tests. Previously, the 2015 system would have required hours to complete given that each query required a pass through the score. Currently, each year’s queries take a few minutes to answer. This would be faster if not for the passes needed through the scores for melodic queries. We note that melodic intervals too could be annotated upfront at indexing time, an optimization which we simply did not have time to implement. However, some analyses require batch processing of the score to identify features of interest. For these, the analysis, potentially time consuming, has to be performed first ahead of database indexing. 8 FUTURE WORK In future work, we intend to further extend the system to answer musicology related queries as opposed to just music theory queries. This will involve additional preprocessing of the scores in advance. For example, currently thematic sequences are currently delimited using rests. However, additional cues in the score, such as phrase boundaries and lyrics, could also help provide other possible segmentations. 9 CONCLUSIONS We describe the CLAS system as entered into the 2017 C@merata shared task. In this system, we focused on the use of a NoSQL database for managing the retrieval of passages from musical scores. We also extended the 2015 CLAS system to handle queries about harmonies between specific parts, repetition of sequences, and thematic queries about sequences and imitation. We also explored the use of paraphrase methods to transform queries into a canonical form, where possible, which might then be parsed using a feature-based CFG. ACKNOWLEDGMENTS We thank the organisers of the C@merata 2017 challenge for their hard work and efforts in running the event. MediaEval’17, 13-15 September 2017, Dublin, Ireland S. Wan REFERENCES [1] Sutcliffe, R. F. E., Ó Maidín, D. S., Hovy, E. (2017). The C@merata task at MediaEval 2017: Natural Language Queries about Music, their JSON Representations, and Matching Passages in MusicXML Scores. Proceedings of the MediaEval 2017 Workshop, Trinity College Dublin, Ireland, September 13-15, 2017. [2] Wan, S. (2014). The CLAS System at the MediaEval 2014 C@merata Task. In the Working Notes Proceedings of MediaEval 2014 Workshop. Barcelona, Catalunya, Spain, October 16-17, 2014, CEUR-WS.org [3] Wan, S. (2015). The CLAS System at the MediaEval 2015 C@merata Task. In: Mediaeval 2015; 14-15 September 2015; Wurzen, Germany. Mediaeval; 2015. 1-6. [4] Bird, S., Loper, E., Klein, E. (2009). Natural Language Processing with Python. O’Reilly Media Inc. [5] Cuthbert, M., Ariza, C. (2010). music21: A Toolkit for Computer-Aided Musicology and Symbolic Music Data. Proceedings of the International Symposium on Music Information Retrieval, pp. 637–42, 2010. 7