=Paper=
{{Paper
|id=Vol-1263/paper52
|storemode=property
|title=UNLP at the MediaEval 2014 C@merata Task
|pdfUrl=https://ceur-ws.org/Vol-1263/mediaeval2014_submission_52.pdf
|volume=Vol-1263
|dblpUrl=https://dblp.org/rec/conf/mediaeval/AsoojaEB14
}}
==UNLP at the MediaEval 2014 C@merata Task==
UNLP at the MediaEval 2014 C@merata Task Kartik Asooja Sindhu Kiranmai Ernala Paul Buitelaar Insight Centre for Data Analytics, Center for Exact Humanities, Insight Centre for Data Analytics, NUI Galway, Ireland IIIT Hyderabad, India NUI Galway, Ireland kartik.asooja@insight- sindhukiranmai.ernala@student paul.buitelaar@insight- centre.org s.iiit.ac.in centre.org ABSTRACT combinations according to particular relations between the This paper presents a description of our submission to the entities, we just take the union or intersection of the answer C@merata task in MediaEval 2014. The system answers the measures retrieved separately for different entities appearing in natural language queries over the musical scores. The approach is the query. Thus, our approach consists of the following two main based upon two main steps: identifying the musical entities and steps: Identification of musical entities in the query, and retrieval relations present in the query, and retrieving the relevant music of the relevant music passages from the provided MusicXML file. passages containing those entities from the associated MusicXML Figure 1 summarizes the followed approach. file. We submitted two runs for the task. The first one takes a union of the passages retrieved for each musical entity, while the 2.1 Identification of Musical Entities second approach takes their intersection to answer the query. We use regular expressions and created dictionaries to recognize Musical entities in the query are recognized with the help of musical entities in the query strings. The target entity types are: regular expressions. 1. Notes: A note defines a particular pitch, duration or dynamic, such as C, crotchet C, quarter note C in the right hand, semibreve 1. INTRODUCTION C. The note recognizer comprises of three basic music entity This work explains our system submitted in the C@merata task recognizers: duration, pitch and staff. We first recognize all the [1] at MediaEval 2014. The task targets natural language question pitches appearing in the query string, and separately identify all answering over the musical scores. We were provided with a set the durations and staves. To assign the correct duration/staff for a of question types, and the data over which the search was required pitch, we measure the string distance between all the pitches and to be performed. duration/staff. The duration/staff, which occurs within a threshold distance from a pitch, is paired with it in order to form the note. The questions in the task consist of short noun phrases in English The pitches and durations are identified using regular expressions. referring to musical features in the music scores, for instance, “F# Duration: It defines the playing time of the pitch. In natural followed two crotchets later by a G”. Every question refers to a language, it can be reflected by the terms like quarter, semibreve, single natural language noun phrase using English or American and whole. We write a regular expression covering the extensive music terminology. The music scores are provided in MusicXML vocabulary defining the duration in both English and American [2], which is a standard open format for exchanging digital sheet music terminology. music. The music repertoire consists of Western Classical works from the Renaissance and the Baroque periods by composers like Pitch: It is a perceptual property that allows the ordering of Dowland, Bach, Handel, and Scarlatti. The answers comprise of sounds on a frequency-related scale. Some examples of writing the music passages from the music score containing the musical pitches in natural language are: D sharp, E#, and A flat. We form features mentioned in the query string. Thus, it points to the a regular expression to identify the pitches in a query string. location(s) of the requested musical features in the score. The answer passage consists of start/end time signature, start/end Staff: To identify the staves mentioned in a string, we find the division value, and start/end beat. The task provides two datasets, occurrences of “right hand” and “left hand” strings in it. one for development consisting of 36 natural language queries while the other for testing containing 200 questions. The three basic musical entities: duration, pitch and staff collectively form the note entity. 2. APPROACH 2. Instruments: In order to find the instruments mentioned in the There can be different types of musical features mentioned in the query string, we manually created a dictionary of instrument query such as note, melodic phrase and others. These different related n-grams using the training and test data. The dictionary musical features can be referred as musical entities or can be includes the words like viola, piano, alto, violoncello, soprano, defined with the help of such entities. Therefore, we identify some tenor, bass, violin, guitar, sopran, alt, violin, voice, and of the basic entities from the natural language text, and perform harpsichord. the location search by comparing the extracted entity values against the corresponding values in the music score for retrieving 3. CLEF: To identify the Clef, we just check the presence of the answer passages. In the current implementation, we recognize strings like bass clef, F-clef, treble clef and G-clef in the query. only basic musical entities. For the complex ones requiring some Copyright is held by the author/owner(s). MediaEval 2014 Workshop, October 16-17, 2014, Barcelona, Spain. The implementation including the regular expressions and the answers because of incorrect octave calculation, which is now dictionaries used can be found at the publicly available code updated in the current implementation at GitHub. repository at GitHub1. The second run gives a much better measure precision than the first, especially in the question type “Followed by”. It is because such queries contain different notes separated by “Followed by”, and the union approach merges all the measures retrieved from the notes decreasing the precision, while the intersection just gives those measures, which contain both the notes. However, given query types other than “Followed by” do not generally contain more than one type of notes, therefore, similar scores are generated for both the runs. Table 1. Result table Query Beat Beat Measure Measure Type Precision Recall Precision Recall Run 1|2 Run 1|2 Run 1|2 Run 1|2 Overall 0.11|0.29 0.52|0.51 0.16|0.39 0.70|0.69 Pitch 0.42|0.42 0.79|0.79 0.48|0.48 0.89|0.89 Length 0.64|0.64 0.80|0.80 0.79|0.79 0.99|0.99 Pitch & 0.46|0.46 0.70|0.70 0.58|0.58 0.88|0.88 Length Perf. 0.05|0.05 0.59|0.59 0.05|0.05 0.69|0.69 Stave 0.17|0.17 0.44|0.37 0.23|0.24 0.59|0.52 Word 0.07|0.07 0.83|0.83 0.07|0.07 0.83|0.83 Follow- 0.00|0.00 0.00|0.00 0.03|0.26 0.70|0.63 ed by Melodic 0.00|0.00 0.00|0.00 0.00|0.00 0.00|0.00 Figure 1: Approach Harm- 0.00|0.00 0.00|0.00 0.00|0.00 0.00|0.00 onic 2.2 Music Passage Retrieval Cadence 0.00|0.00 0.00|0.00 0.00|0.00 0.00|0.00 The values of the identified musical entities in the query are Triad 0.00|0.00 0.00|0.00 0.00|0.00 0.00|0.00 compared against the corresponding values extracted from the Texture 0.00|0.00 0.00|0.00 0.00|0.00 0.00|0.00 music score xml file associated with the question. The identification of the musical entities remains same in both the submitted runs. They just vary on the basis of the following two 4. CONCLUSION approaches for music passage retrieval: The proposed approach presents an initial implementation of the 1. The union of the musical measures that contain the target natural language question answering on musical scores. The musical entities is used to create the answer passages. pipeline is based upon identifying the different types of musical 2. An intersection of the musical measures that contain the target entities and their relations in the query string, and comparing them musical entities is used in the answer passages. against the corresponding values extracted from the MusicXML file to identify the answer passages. We consider applying natural language processing on the queries to better extract the music 3. RESULTS AND DISCUSSION entities and relations, as a future direction to explore. The system performance is measured for each question type, and an overall weighted average for all the questions is also calculated. Table 1 shows the results obtained by our two runs. As 5. ACKNOWLEDGEMENTS discussed in the approach section, the current implementation This work has been funded in part by a research grant from recognizes only a few types of musical entities, which constraints Science Foundation Ireland (SFI) under Grant Number the question types to be answered. The results clearly show that SFI/12/RC/2289 (INSIGHT) and by the EU FP7 program in the the system could not answer many question types like melodic, context of the project LIDER (610782). We are very grateful to harmonic, and cadence. It is because the system could not detect Mr. Robert Solyom from Music Academy, Galway for helpful such musical features. suggestions and references. Our system only uses regular expression matching for the identification of musical elements, and string distance for to 6. REFERENCES identify the relations between the elements wherever required. [1] R. Sutcliffe, T. Crawford, C. Fox, D.L. Root and E. Hovy. However, there is a scope of deep syntactic and lexical analysis of The C@merata Task at MediaEval 2014: Natural language the query string to better identify the relations between the queries on classical music scores. In MediaEval 2014 entities. We also found a minor bug in our system related to the Workshop, Barcelona, Spain, October 16-17, 2014. “natural” appearing in a query string. It led to some wrong [2] M. Good, and L. L. C. Recordare. Lessons from the Adoption of MusicXML as an Interchange Standard. 2006. 1 https://github.com/kasooja/camerata