UNLP at the MediaEval 2015 C@merata Task
             Kartik Asooja                           Sindhu Kiranmai Ernala                            Paul Buitelaar
   Insight Centre for Data Analytics,               Center for Exact Humanities,             Insight Centre for Data Analytics,
          NUI Galway, Ireland                          IIIT Hyderabad, India                        NUI Galway, Ireland
       kartik.asooja@insight-                  sindhukiranmai.ernala@student                    paul.buitelaar@insight-
              centre.org                                  s.iiit.ac.in                                centre.org


ABSTRACT                                                              the corresponding values in the music score for retrieving the
This paper presents a description of our submission to the            answer passages. For the complex queries requiring some
C@merata task in MediaEval 2015. This submission is a revision        combinations according to particular relations between the
to the system submitted for the same task in MediaEval 2014           entities, we just consider the sequential relation between the
including some bug fixing. The system answers the natural             musical entities as they appear in the query string. Rather than
language queries over the musical scores. The approach is based       making a system,which differentiates between question types, we
upon two main steps: identifying the musical entities and relations   apply a rather simple approach assuming just the sequential
present in the query, and retrieving the relevant music passages      relation. On the other hand, the approach we submitted last year
containing those entities from the associated MusicXML file. Our      performed union or intersection of the answer passages found for
approach makes a sequence of the musical entities in the query,       each musical entity. Our current approach consists of the
and then searches for a sequence of passages satisfying the           following two main steps: Identification of the sequence of the
sequence of the entities. Musical entities in the query are           musical entities in the query string, and retrieval of the answer
recognized with the help of regular expressions.                      sequences of the relevant music passages matching the sequence
                                                                      of entities. Figure 1 summarizes the followed approach.

1. INTRODUCTION                                                       2.1 Identification of Musical Entities
This work explains our system submitted in the C@merata task          We use regular expressions and create dictionaries to recognize
[1] at MediaEval 2015. The task targets natural language question     musical entities in the query strings. The target entity types are:
answering over musical scores. We were provided with a set of
question types, and the data over which the search was required to    1. Notes: A note defines a particular pitch, duration or dynamics
be performed. The questions in the task consist of short noun         using strings such as Do, crotchet C, quarter note C in the right
phrases in English referring to musical features in the music         hand, or semibreve C. The note recognizer comprises of three
scores, for instance, “F# followed two crotchets later by a G”.       basic music entity recognizers: duration, pitch and staff. We first
Every question refers to a single natural language noun phrase        recognize all the pitches appearing in the query string, and
using English or American music terminology. The music scores         separately identify all the durations and staves. To assign the
are provided in MusicXML [2][3], which is a standard open             correct duration/staff for a pitch, we measure the string distance
format for exchanging digital sheet music. The music repertoire       between all the pitches and duration/staff. The duration/staff,
consists of Western Classical works from the Renaissance and the      which occurs within a threshold distance from a pitch, is paired
Baroque periods by composers like Dowland, Bach, Handel, and          with it in order to form the note. The pitches and durations are
Scarlatti. The answers comprise the music passages from the           identified using regular expressions.
music score containing the musical features mentioned in the          Duration: This defines the playing time of the pitch. In natural
query string. Thus, it points to the location(s) of the requested     language, it can be reflected by the terms like quarter, semibreve,
musical features in the score. The answer passage consists of         and whole. We write a regular expression covering the extensive
start/end time signature, start/end division value, and start/end     vocabulary defining the duration in both English and American
beat. The task provides two datasets, one for training and            music terminology.
development consisting of the 236 natural language queries used
for the task last year while the other dataset is newly introduced    Pitch: This is a perceptual property that allows the ordering of
for testing, which contain 200 questions. This year, questions are    sounds on a frequency-related scale. Some examples of writing
linguistically more difficult and the scores are more complex.        pitches in natural language are: D sharp, E#, and A flat. We form
                                                                      a regular expression to identify the pitches in a query string.
2. APPROACH                                                           Staff: To identify the staves mentioned in a string, we find the
There can be different types of musical features mentioned in the     occurrences of “right hand” and “left hand” strings in it.
query such as note, melodic phrase and others. These different
musical features can be referred to as musical entities or can be     The three basic musical entities: duration, pitch and staff
defined with the help of such entities. Therefore, we identify some   collectively form the note entity.
of the musical entities from the natural language text, and perform
a location search by comparing the extracted entity values against    2. Instruments: In order to find the instruments mentioned in the
                                                                      query string, we manually created a dictionary of instrument
 Copyright is held by the author/owner(s).
 MediaEval 2015 Workshop, September 14-15, 2015, Wurzen, Germany.     related n-grams using the training and test data. The dictionary
includes words like viola, piano, alto, violoncello, soprano, tenor,       question types like texture, harmonic etc. It is because detection of
bass, violin, guitar, sopran, alt, violin, voice, and harpsichord.         such musical features was not implemented in the current system.
                                                                           Comparing to the system submitted last year, we removed many
3. CLEF: To identify the clef, we just check the presence of               bugs related to the meaning of different tags in the Music XML
strings like bass clef, F-clef, treble clef and G-clef in the query.       reader, as we implemented our own reader in Java. In the current
The implementation including the regular expressions and the               version, our system only uses string and regular expression
dictionaries used can be found at the publicly available code              matching for the identification of musical elements, while string
repository at GitHub1.                                                     distance is used to identify the relations between the elements, if
                                                                           required. However, deep syntactic and lexical analysis of the
                                                                           query has the potential to identify relations between the entities
                                                                           more accurately.


                                                                                                   Table 1. Result table
                                                                            Query         Beat           Beat        Measure       Measure
                                                                            Type          Precision      Recall      Precision     Recall
                                                                            Overall       0.126          0.43        0.149         0.508
                                                                            1             0.0            0.0         0.0           0.0
                                                                            Harmonic
                                                                            Synch         0.0181         0.194       0.0207        0.222
                                                                            1 Melody      0.79           0.942       0.79          0.942
                                                                            Alone
                                                                            Perf.         0.0789         0.6         0.0877        0.667
                                                                            Instru.       0.562          0.202       0.562         0.202
                                                                            Clef          0.145          0.481       0.157         0.519
                                                                            Followed      0.25           0.0968      0.625         0.242
                                                                            by
                                                                            1 Melody      0.406          0.769       0.408         0.773
                                                                            n Melody      0.0247         0.196       0.058         0.461
                                                                            Key           0.0            0.0         0.0           0.0
                                                                            Time          0.208          0.762       0.247         0.905
                                                                            Texture       0.0            0.0         0.0           0.0
                                                                            1 Melody      0.875          0.875       0.875         0.875
                                                                            Clef

                                                                           4. CONCLUSION
                                                                           We have presented a simple pipeline for natural language question
                                                                           answering on musical scores. The pipeline is based upon
                                                                           identifying the different types of musical entities and their
                        Figure 1: Approach                                 relations in the query string, and comparing them against the
                                                                           corresponding values extracted from the MusicXML file.
2.2 Music Passage Retrieval
The values of the identified musical entities in the query are             5. ACKNOWLEDGEMENTS
compared against the corresponding values extracted from the               This work has been funded in part by a research grant from
music score XML file associated with the question. The system              Science Foundation Ireland (SFI) under Grant Number
matches the musical features sequentially as they appear in the            SFI/12/RC/2289 (INSIGHT). We are very grateful to Mr. Robert
query string. Finally, the passage sequences matching completely           Solyom from Music Academy, Galway for helpful suggestions
with the sequence of musical entities are selected as answer               and references.
passages.
                                                                           6. REFERENCES
3. RESULTS AND DISCUSSION                                                  [1] R. Sutcliffe, T. Crawford, C. Fox, D.L. Root and E. Hovy.
The system performance is measured for each question type, and                 The C@merata Task at MediaEval 2015: Natural language
an overall weighted average for all the questions is also                      queries on classical music scores. In MediaEval 2015
                                                                               Workshop, Wurzen, Germany, September 15-16, 2015.
calculated. Table 1. shows the results obtained by our submission
for some question types. As discussed in the approach section, the         [2] M. Good, and L. L. C. Recordare. Lessons from the
current implementation only recognizes a few types of musical                  Adoption of MusicXML as an Interchange Standard. 2006.
entities, which constrains the question types to be answered. The          [3] Music XML: http://www.musicxml.com/
results clearly show that the system could not answer many


1
    https://github.com/kasooja/camerata


                                                                       2