UNLP at the MediaEval 2015 C@merata Task Kartik Asooja Sindhu Kiranmai Ernala Paul Buitelaar Insight Centre for Data Analytics, Center for Exact Humanities, Insight Centre for Data Analytics, NUI Galway, Ireland IIIT Hyderabad, India NUI Galway, Ireland kartik.asooja@insight- sindhukiranmai.ernala@student paul.buitelaar@insight- centre.org s.iiit.ac.in centre.org ABSTRACT the corresponding values in the music score for retrieving the This paper presents a description of our submission to the answer passages. For the complex queries requiring some C@merata task in MediaEval 2015. This submission is a revision combinations according to particular relations between the to the system submitted for the same task in MediaEval 2014 entities, we just consider the sequential relation between the including some bug fixing. The system answers the natural musical entities as they appear in the query string. Rather than language queries over the musical scores. The approach is based making a system,which differentiates between question types, we upon two main steps: identifying the musical entities and relations apply a rather simple approach assuming just the sequential present in the query, and retrieving the relevant music passages relation. On the other hand, the approach we submitted last year containing those entities from the associated MusicXML file. Our performed union or intersection of the answer passages found for approach makes a sequence of the musical entities in the query, each musical entity. Our current approach consists of the and then searches for a sequence of passages satisfying the following two main steps: Identification of the sequence of the sequence of the entities. Musical entities in the query are musical entities in the query string, and retrieval of the answer recognized with the help of regular expressions. sequences of the relevant music passages matching the sequence of entities. Figure 1 summarizes the followed approach. 1. INTRODUCTION 2.1 Identification of Musical Entities This work explains our system submitted in the C@merata task We use regular expressions and create dictionaries to recognize [1] at MediaEval 2015. The task targets natural language question musical entities in the query strings. The target entity types are: answering over musical scores. We were provided with a set of question types, and the data over which the search was required to 1. Notes: A note defines a particular pitch, duration or dynamics be performed. The questions in the task consist of short noun using strings such as Do, crotchet C, quarter note C in the right phrases in English referring to musical features in the music hand, or semibreve C. The note recognizer comprises of three scores, for instance, “F# followed two crotchets later by a G”. basic music entity recognizers: duration, pitch and staff. We first Every question refers to a single natural language noun phrase recognize all the pitches appearing in the query string, and using English or American music terminology. The music scores separately identify all the durations and staves. To assign the are provided in MusicXML [2][3], which is a standard open correct duration/staff for a pitch, we measure the string distance format for exchanging digital sheet music. The music repertoire between all the pitches and duration/staff. The duration/staff, consists of Western Classical works from the Renaissance and the which occurs within a threshold distance from a pitch, is paired Baroque periods by composers like Dowland, Bach, Handel, and with it in order to form the note. The pitches and durations are Scarlatti. The answers comprise the music passages from the identified using regular expressions. music score containing the musical features mentioned in the Duration: This defines the playing time of the pitch. In natural query string. Thus, it points to the location(s) of the requested language, it can be reflected by the terms like quarter, semibreve, musical features in the score. The answer passage consists of and whole. We write a regular expression covering the extensive start/end time signature, start/end division value, and start/end vocabulary defining the duration in both English and American beat. The task provides two datasets, one for training and music terminology. development consisting of the 236 natural language queries used for the task last year while the other dataset is newly introduced Pitch: This is a perceptual property that allows the ordering of for testing, which contain 200 questions. This year, questions are sounds on a frequency-related scale. Some examples of writing linguistically more difficult and the scores are more complex. pitches in natural language are: D sharp, E#, and A flat. We form a regular expression to identify the pitches in a query string. 2. APPROACH Staff: To identify the staves mentioned in a string, we find the There can be different types of musical features mentioned in the occurrences of “right hand” and “left hand” strings in it. query such as note, melodic phrase and others. These different musical features can be referred to as musical entities or can be The three basic musical entities: duration, pitch and staff defined with the help of such entities. Therefore, we identify some collectively form the note entity. of the musical entities from the natural language text, and perform a location search by comparing the extracted entity values against 2. Instruments: In order to find the instruments mentioned in the query string, we manually created a dictionary of instrument Copyright is held by the author/owner(s). MediaEval 2015 Workshop, September 14-15, 2015, Wurzen, Germany. related n-grams using the training and test data. The dictionary includes words like viola, piano, alto, violoncello, soprano, tenor, question types like texture, harmonic etc. It is because detection of bass, violin, guitar, sopran, alt, violin, voice, and harpsichord. such musical features was not implemented in the current system. Comparing to the system submitted last year, we removed many 3. CLEF: To identify the clef, we just check the presence of bugs related to the meaning of different tags in the Music XML strings like bass clef, F-clef, treble clef and G-clef in the query. reader, as we implemented our own reader in Java. In the current The implementation including the regular expressions and the version, our system only uses string and regular expression dictionaries used can be found at the publicly available code matching for the identification of musical elements, while string repository at GitHub1. distance is used to identify the relations between the elements, if required. However, deep syntactic and lexical analysis of the query has the potential to identify relations between the entities more accurately. Table 1. Result table Query Beat Beat Measure Measure Type Precision Recall Precision Recall Overall 0.126 0.43 0.149 0.508 1 0.0 0.0 0.0 0.0 Harmonic Synch 0.0181 0.194 0.0207 0.222 1 Melody 0.79 0.942 0.79 0.942 Alone Perf. 0.0789 0.6 0.0877 0.667 Instru. 0.562 0.202 0.562 0.202 Clef 0.145 0.481 0.157 0.519 Followed 0.25 0.0968 0.625 0.242 by 1 Melody 0.406 0.769 0.408 0.773 n Melody 0.0247 0.196 0.058 0.461 Key 0.0 0.0 0.0 0.0 Time 0.208 0.762 0.247 0.905 Texture 0.0 0.0 0.0 0.0 1 Melody 0.875 0.875 0.875 0.875 Clef 4. CONCLUSION We have presented a simple pipeline for natural language question answering on musical scores. The pipeline is based upon identifying the different types of musical entities and their Figure 1: Approach relations in the query string, and comparing them against the corresponding values extracted from the MusicXML file. 2.2 Music Passage Retrieval The values of the identified musical entities in the query are 5. ACKNOWLEDGEMENTS compared against the corresponding values extracted from the This work has been funded in part by a research grant from music score XML file associated with the question. The system Science Foundation Ireland (SFI) under Grant Number matches the musical features sequentially as they appear in the SFI/12/RC/2289 (INSIGHT). We are very grateful to Mr. Robert query string. Finally, the passage sequences matching completely Solyom from Music Academy, Galway for helpful suggestions with the sequence of musical entities are selected as answer and references. passages. 6. REFERENCES 3. RESULTS AND DISCUSSION [1] R. Sutcliffe, T. Crawford, C. Fox, D.L. Root and E. Hovy. The system performance is measured for each question type, and The C@merata Task at MediaEval 2015: Natural language an overall weighted average for all the questions is also queries on classical music scores. In MediaEval 2015 Workshop, Wurzen, Germany, September 15-16, 2015. calculated. Table 1. shows the results obtained by our submission for some question types. As discussed in the approach section, the [2] M. Good, and L. L. C. Recordare. Lessons from the current implementation only recognizes a few types of musical Adoption of MusicXML as an Interchange Standard. 2006. entities, which constrains the question types to be answered. The [3] Music XML: http://www.musicxml.com/ results clearly show that the system could not answer many 1 https://github.com/kasooja/camerata 2