=Paper= {{Paper |id=Vol-1263/paper52 |storemode=property |title=UNLP at the MediaEval 2014 C@merata Task |pdfUrl=https://ceur-ws.org/Vol-1263/mediaeval2014_submission_52.pdf |volume=Vol-1263 |dblpUrl=https://dblp.org/rec/conf/mediaeval/AsoojaEB14 }} ==UNLP at the MediaEval 2014 C@merata Task== https://ceur-ws.org/Vol-1263/mediaeval2014_submission_52.pdf
                 UNLP at the MediaEval 2014 C@merata Task
             Kartik Asooja                            Sindhu Kiranmai Ernala                             Paul Buitelaar
   Insight Centre for Data Analytics,                 Center for Exact Humanities,             Insight Centre for Data Analytics,
          NUI Galway, Ireland                            IIIT Hyderabad, India                        NUI Galway, Ireland
       kartik.asooja@insight-                   sindhukiranmai.ernala@student                     paul.buitelaar@insight-
              centre.org                                   s.iiit.ac.in                                 centre.org



ABSTRACT                                                               combinations according to particular relations between the
This paper presents a description of our submission to the             entities, we just take the union or intersection of the answer
C@merata task in MediaEval 2014. The system answers the                measures retrieved separately for different entities appearing in
natural language queries over the musical scores. The approach is      the query. Thus, our approach consists of the following two main
based upon two main steps: identifying the musical entities and        steps: Identification of musical entities in the query, and retrieval
relations present in the query, and retrieving the relevant music      of the relevant music passages from the provided MusicXML file.
passages containing those entities from the associated MusicXML        Figure 1 summarizes the followed approach.
file. We submitted two runs for the task. The first one takes a
union of the passages retrieved for each musical entity, while the     2.1 Identification of Musical Entities
second approach takes their intersection to answer the query.          We use regular expressions and created dictionaries to recognize
Musical entities in the query are recognized with the help of          musical entities in the query strings. The target entity types are:
regular expressions.
                                                                       1. Notes: A note defines a particular pitch, duration or dynamic,
                                                                       such as C, crotchet C, quarter note C in the right hand, semibreve
1. INTRODUCTION                                                        C. The note recognizer comprises of three basic music entity
This work explains our system submitted in the C@merata task           recognizers: duration, pitch and staff. We first recognize all the
[1] at MediaEval 2014. The task targets natural language question      pitches appearing in the query string, and separately identify all
answering over the musical scores. We were provided with a set         the durations and staves. To assign the correct duration/staff for a
of question types, and the data over which the search was required     pitch, we measure the string distance between all the pitches and
to be performed.                                                       duration/staff. The duration/staff, which occurs within a threshold
                                                                       distance from a pitch, is paired with it in order to form the note.
The questions in the task consist of short noun phrases in English     The pitches and durations are identified using regular expressions.
referring to musical features in the music scores, for instance, “F#   Duration: It defines the playing time of the pitch. In natural
followed two crotchets later by a G”. Every question refers to a       language, it can be reflected by the terms like quarter, semibreve,
single natural language noun phrase using English or American          and whole. We write a regular expression covering the extensive
music terminology. The music scores are provided in MusicXML           vocabulary defining the duration in both English and American
[2], which is a standard open format for exchanging digital sheet      music terminology.
music. The music repertoire consists of Western Classical works
from the Renaissance and the Baroque periods by composers like         Pitch: It is a perceptual property that allows the ordering of
Dowland, Bach, Handel, and Scarlatti. The answers comprise of          sounds on a frequency-related scale. Some examples of writing
the music passages from the music score containing the musical         pitches in natural language are: D sharp, E#, and A flat. We form
features mentioned in the query string. Thus, it points to the         a regular expression to identify the pitches in a query string.
location(s) of the requested musical features in the score. The
answer passage consists of start/end time signature, start/end         Staff: To identify the staves mentioned in a string, we find the
division value, and start/end beat. The task provides two datasets,    occurrences of “right hand” and “left hand” strings in it.
one for development consisting of 36 natural language queries
while the other for testing containing 200 questions.                  The three basic musical entities: duration, pitch and staff
                                                                       collectively form the note entity.
2. APPROACH                                                            2. Instruments: In order to find the instruments mentioned in the
There can be different types of musical features mentioned in the      query string, we manually created a dictionary of instrument
query such as note, melodic phrase and others. These different         related n-grams using the training and test data. The dictionary
musical features can be referred as musical entities or can be         includes the words like viola, piano, alto, violoncello, soprano,
defined with the help of such entities. Therefore, we identify some    tenor, bass, violin, guitar, sopran, alt, violin, voice, and
of the basic entities from the natural language text, and perform      harpsichord.
the location search by comparing the extracted entity values
against the corresponding values in the music score for retrieving     3. CLEF: To identify the Clef, we just check the presence of
the answer passages. In the current implementation, we recognize       strings like bass clef, F-clef, treble clef and G-clef in the query.
only basic musical entities. For the complex ones requiring some
 Copyright is held by the author/owner(s).
 MediaEval 2014 Workshop, October 16-17, 2014, Barcelona, Spain.
The implementation including the regular expressions and the          answers because of incorrect octave calculation, which is now
dictionaries used can be found at the publicly available code         updated in the current implementation at GitHub.
repository at GitHub1.
                                                                      The second run gives a much better measure precision than the
                                                                      first, especially in the question type “Followed by”. It is because
                                                                      such queries contain different notes separated by “Followed by”,
                                                                      and the union approach merges all the measures retrieved from the
                                                                      notes decreasing the precision, while the intersection just gives
                                                                      those measures, which contain both the notes. However, given
                                                                      query types other than “Followed by” do not generally contain
                                                                      more than one type of notes, therefore, similar scores are
                                                                      generated for both the runs.
                                                                                              Table 1. Result table
                                                                       Query       Beat          Beat          Measure       Measure
                                                                       Type        Precision     Recall        Precision     Recall
                                                                                   Run 1|2       Run 1|2       Run 1|2       Run 1|2
                                                                       Overall     0.11|0.29     0.52|0.51     0.16|0.39     0.70|0.69
                                                                       Pitch       0.42|0.42     0.79|0.79     0.48|0.48     0.89|0.89
                                                                       Length      0.64|0.64     0.80|0.80     0.79|0.79     0.99|0.99
                                                                       Pitch &     0.46|0.46     0.70|0.70     0.58|0.58     0.88|0.88
                                                                       Length
                                                                       Perf.       0.05|0.05     0.59|0.59     0.05|0.05     0.69|0.69
                                                                       Stave       0.17|0.17     0.44|0.37     0.23|0.24     0.59|0.52
                                                                       Word        0.07|0.07     0.83|0.83     0.07|0.07     0.83|0.83
                                                                       Follow-     0.00|0.00     0.00|0.00     0.03|0.26     0.70|0.63
                                                                       ed by
                                                                       Melodic     0.00|0.00     0.00|0.00     0.00|0.00     0.00|0.00
                        Figure 1: Approach                             Harm-       0.00|0.00     0.00|0.00     0.00|0.00     0.00|0.00
                                                                       onic
2.2 Music Passage Retrieval                                            Cadence     0.00|0.00     0.00|0.00     0.00|0.00     0.00|0.00
The values of the identified musical entities in the query are         Triad       0.00|0.00     0.00|0.00     0.00|0.00     0.00|0.00
compared against the corresponding values extracted from the           Texture     0.00|0.00     0.00|0.00     0.00|0.00     0.00|0.00
music score xml file associated with the question. The
identification of the musical entities remains same in both the
submitted runs. They just vary on the basis of the following two      4. CONCLUSION
approaches for music passage retrieval:                               The proposed approach presents an initial implementation of the
1. The union of the musical measures that contain the target          natural language question answering on musical scores. The
musical entities is used to create the answer passages.               pipeline is based upon identifying the different types of musical
2. An intersection of the musical measures that contain the target    entities and their relations in the query string, and comparing them
musical entities is used in the answer passages.                      against the corresponding values extracted from the MusicXML
                                                                      file to identify the answer passages. We consider applying natural
                                                                      language processing on the queries to better extract the music
3. RESULTS AND DISCUSSION                                             entities and relations, as a future direction to explore.
The system performance is measured for each question type, and
an overall weighted average for all the questions is also
calculated. Table 1 shows the results obtained by our two runs. As    5. ACKNOWLEDGEMENTS
discussed in the approach section, the current implementation         This work has been funded in part by a research grant from
recognizes only a few types of musical entities, which constraints    Science Foundation Ireland (SFI) under Grant Number
the question types to be answered. The results clearly show that      SFI/12/RC/2289 (INSIGHT) and by the EU FP7 program in the
the system could not answer many question types like melodic,         context of the project LIDER (610782). We are very grateful to
harmonic, and cadence. It is because the system could not detect      Mr. Robert Solyom from Music Academy, Galway for helpful
such musical features.                                                suggestions and references.

Our system only uses regular expression matching for the
identification of musical elements, and string distance for to        6. REFERENCES
identify the relations between the elements wherever required.        [1] R. Sutcliffe, T. Crawford, C. Fox, D.L. Root and E. Hovy.
However, there is a scope of deep syntactic and lexical analysis of       The C@merata Task at MediaEval 2014: Natural language
the query string to better identify the relations between the             queries on classical music scores. In MediaEval 2014
entities. We also found a minor bug in our system related to the          Workshop, Barcelona, Spain, October 16-17, 2014.
“natural” appearing in a query string. It led to some wrong
                                                                      [2] M. Good, and L. L. C. Recordare. Lessons from the
                                                                          Adoption of MusicXML as an Interchange Standard. 2006.
1
    https://github.com/kasooja/camerata