=Paper=
{{Paper
|id=Vol-1263/paper51
|storemode=property
|title=TCSL at the MediaEval 2014 C@merata Task
|pdfUrl=https://ceur-ws.org/Vol-1263/mediaeval2014_submission_51.pdf
|volume=Vol-1263
|dblpUrl=https://dblp.org/rec/conf/mediaeval/Kini14
}}
==TCSL at the MediaEval 2014 C@merata Task==
<pdf width="1500px">https://ceur-ws.org/Vol-1263/mediaeval2014_submission_51.pdf</pdf>
<pre>
                  TCSL at the MediaEval 2014 C@merata Task
                                                                  Nikhil Kini
                                                     Tata Consultancy Services Ltd.
                                                        Innovation Labs, Thane
                                                          nikhil.kini@tcs.com

ABSTRACT                                                                  separated by a comma. For example, "quarter note then half note
We describe a system to address the MediaEval 2014 C@merata               then quarter note in the tenor voice" is output as "(DUR, quarter
task of natural language queries on classical music scores. Our           note) (SEQ, then) (DUR, half note) (SEQ, then) (DUR, quarter
system first tokenizes the question to tag the musically relevant         note) in the (PRT, tenor voice)". Another example is "melodic
features in the question using pattern matching. In this stage            octave" becomes "(HRML, melodic) (INT, octave)".
suitable word replacements are made in the question based on a
list of synonyms. Using the tokenized sentence we infer the
question type using a set of handwritten rules. We then search the
input music score based on the question type to find the musical
features requested. MIT's music21 library [2] is used for indexing,
accessing and traversing the score.

1. INTRODUCTION
To those interested in studying music scores, and especially in the
field of Musicology, it is often necessary to search for, or refer to,
particular sections of a music score that represent relevant musical
features, just as a data scientist might look for interesting patterns
in an abundance of data. Manually going through the score is
prone to inefficiency, oversight, and requires the expert
knowledge of actually understanding the score. This paper aims to
develop specifications for tools that automate the task of search
and retrieval of musical passages from a score with natural
language queries. The problem may be defined as: given a                   Figure 1. System for natural language sheet music querying
computer representation of music as a score in a particular format
(in our case, musicXML), and given a short English noun phrase            2.2 Synonyms List
referring to musical features in the score, search and list the           A list of synonyms is referred to during tokenizing for substituting
location of all occurrences of the said musical features in the           words that refer to the same feature. This serves two purposes: 1)
score. A complete description of the task can be found at [1].            to cover all manners of asking for the same feature and 2)
                                                                          standardizing the different ways of asking for the same thing so
While a large body of work is available in other domains for              that specifying the subsequent modules becomes simpler. The list
Natural language understanding, as well as for searching through          of synonyms can be updated as new ways of asking the same
sheet music scores, we did not come across any work that                  feature are discovered when users actually query the system.
combines these aspects. A natural language understanding
systems survey may be found in [3], and work done in non-trivial          2.3 Inferring the question type
search on sheet music scores can be found in [4], [5], [6], [7].          The tokenized output (with synonym list substitution) is the input
                                                                          to the module which infers the question type. A handcrafted set of
2. APPROACH                                                               rules was used to guess what type of question is asked based on
Figure 1 presents the main modules of our system. Since we treat          the constituent tokens (see section 2.4). Looking at all questions
the problem as one of natural language understanding (of the              available to us so far - task description, training set, test set - we
question) and searching (through the musicXML), we define a set           specify the following types of questions: simple note , note with
of question classes based on the searchable musical features, and         expression, interval (harmonic), interval (melodic), lyrics, extrema
propose a specific search method for each type of question. The           (highest or lowest note), time signature, key signature, cadence,
main operations performed by our system are as follows:                   triads, texture, bar with dynamics, consecutive notes, combination
                                                                          of the above.
2.1 Identifying tokens in the question                                    2.4 Question rules
In the tokenizing step, words representing musically important            Based on the tokens present in the question phrase, we can write
features are marked/tokenized. We use 3 or 4 letter markers for           rules to guess the type of the question. For simple questions made
the token class. After tokenization, the sentence will contain            up of only one question this is straightforward. For the phrases
tokens grouped with the value of the token, each token-value pair         which contain a combination of elementary question types, some
grouped by parentheses, and the token and the value will be               parsing capability might be necessary. We will address this in
                                                                          future work.
Copyright is held by the author/owner(s).
MediaEval 2014 Workshop, October 16-17, 2014, Barcelona, Spain.
2.5 Search scope                                                         is a type with our system too, and the system performs decently on
An important part to getting the right answer is limiting the search     both these types.
scope. For example, in the question "A sharp in the Treble clef",        Although search was not implemented for harmonic_interval,
we are not just looking for any A#, but particularly in the Treble       cadence_spec, triad_spec and texture_spec, nearly all questions
clef. Our tokens PRT and CLF can be used to scope the search.            for these types were correctly classified by our system. No
We look only in these parts during searching or we filter only           answers were returned for these types of questions, which results
those search results as answers which are within this search scope.      in the zero scores seen in the table. Only 8 questions remained
                                                                         unclassified in the test data.
2.6 Searching for the answer
The last step is searching the musicXML score for the identified         4. CONCLUSION
token/token combination. This step is still a work in progress. We       The system implemented based on the specifications in this paper
make extensive use of music21 capabilities.                              performs decently on single musical feature retrieval. A study of
                                                                         the errors in this implementation might even be able to take the
2.7 Score index                                                          precision and recall for such simple types to 1, and this will be the
This is a list of all the notes in the score stored with the following   aim of the next cycle of development.
associated information for each note: note name, note letter,
accidental, pitch class, note octave, bar, offset, note length, part     While our system performs well on the simple question phrases,
number, part id and whether this is a rest or a note. (This              the more complex question phrases still need work. As a question
terminology is as defined in music21).                                   grows more complicated to include multiple musical features, we
                                                                         will need to evolve a more complex parsing strategy to identify
                                                                         questions. It is possible that the specification of the system will
3. RESULTS AND DISCUSSION                                                need to be revisited to take into account all the possibilities.
Upon release of the results, we saw that the organizers had also
used a scheme of classification for the questions. Reconciling the       The scope of the system specification is limited mainly to what we
organizers' and our question types, we saw that                          have observed in the task description and the training set, and
as far as the test questions go, we had all possibilities covered.       these are in no way exhaustive of the types of queries that can be
                                                                         asked.
                     Table 1. Test set results
                                    Beat              Measure
   #     Question type           P          R        P         R
                                                                         5. ACKNOWLEDGMENTS
                                                                         Many thanks to Dr. Sunil Kumar Kopparapu, my supervisor, for
  1       simple_length        0.979      0.988    0.991       1
                                                                         his help in shaping this paper.
  2        simple_pitch        0.959      0.963    0.982     0.986
  3     pitch_and_length       0.723      0.892    0.754      0.93
  4         stave_spec         0.661      0.987    0.661     0.987       6. REFERENCES
  5     melodic_interval       0.894      0.683    0.904     0.691       [1]    R. Sutcliffe, T. Crawford, C. Fox, D. L. Root and E. Hovy.
                                                                               The C@merata Task at MediaEval 2014: Natural language
  6        followed_by         0.733      0.688    0.842     0.789
                                                                               queries on classical music scores. In MediaEval 2014
  7         word_spec          0.261        1      0.261       1
                                                                               Workshop, Barcelona, Spain, October 16-17 2014.
  8          perf_spec         0.066      0.897    0.066     0.897
  9    harmonic_interval         0          0        0         0         [2] Cuthbert, M. S., & Ariza, C. 2010. music21: A toolkit for
  10      cadence_spec           0          0        0         0             computer-aided musicology and symbolic music data. In
                                                                             ISMIR 2010, Utrecht, Netherlands, August 9-13 2010 (637-
  11         triad_spec          0          0        0         0
                                                                             642).
  12       texture_spec          0          0        0         0
  13             all           0.633      0.821    0.652     0.845       [3] Allam, A. M. N., & Haggag, M. H. 2012. The Question
Our classes corresponding to the organizers' "Question type" in              Answering Systems: A Survey. In International Journal of
Table 1 are: 1, 2, 3 - simple note, 4 - simple note with expression,         Research and Reviews in Information Sciences
5 - simple note with a staff scope, 6     - simple note with a lyrics        (IJRRIS), 2(3).
scope, 7 - consecutive notes, 8 - interval (melodic), 9 - interval       [4] Gabriel, J. 2013. Large data sets & recommender systems: A
(harmonic), 10 - cadence, 11- triad, 12 - texture.                           feasible approach to learning music. In Proceedings of the
Table 1 shows beat and measure precision recall scores for results           Sound and Music Computing Conference 2013, SMC 2013,
produced by our system for the test set. The strongest performance           Stockholm, Sweden, p.701–706 .
is seen in the 'simple notes' category (simple pitch, simple length,     [5] Downie, J. S. 1999. Evaluating a simple approach to music
pitch and length). This is no surprise as these question phrases are         information retrieval: Conceiving melodic n-grams as text
the easiest to handle. Perf_spec questions are simple note with              (Doctoral dissertation, The University of Western Ontario).
expression type questions (involving for example, mordant and
                                                                         [6] Viro, V. 2011. Peachnote: Music Score Search and Analysis
trill). Word_spec is pitch/length occurring over a certain word in
                                                                             Platform. In ISMIR 2011, Miami, Florida (USA) October
the lyrics. Although these were not handled by our
                                                                             24–28, 2011 (359-362).
implementation, some results were returned because the system
fell back to the simple note type, which explains the non-zero           [7] Ganseman, J., Scheunders, P., & D'haes, W. 2008. Using
precision and recall. For e.g. “F trill” returns all F notes.                XQuery on MusicXML Databases for Musicological
Followed_by is equivalent to consecutive notes. Melodic_interval             Analysis. In ISMIR 2008, Philadelphia, Pennsylvania (USA)
                                                                             September 14-18, 2008 433-438).

</pre>