CLAS at the MediaEval 2015 C@merata Task
                                                            Stephen Wan
                                                Language & Social Computing Team
                                                            CSIRO
                                                        Sydney, Australia
                                                     stephen.wan@csiro.au

ABSTRACT                                                                    Consequently, this year, the CLAS system uses the natural
The CLAS 2015 system treats the C@merata task as a Q&A                language feature-based parsing facilities in the Python modules
problem specified with a controlled language. In this year’s          distributed as part of the Natural Language ToolKit (NLTK) [4]
system, we added a context-free grammar for the music controlled      and a Context-Free Grammar (CFG) defined by the author. One
language using the Natural Language ToolKit. Crucially this           side-effect of this approach is that the feature unification facility
provides an in-built feature unification mechanism allowing us to     of the parser can be used to match against feature structures based
replace the ad-hoc unification component in the 2014 system.          on the music events.
The CLAS 2015 system with this modification finished first in the           The CLAS 2015 system achieved 0.60 precision and 0.63
C@merata shared task. In this paper, we describe the approach         recall when specifying answers at the granularity of “beats” in a
behind our participation in the shared task and discuss arguments     bar (more accurately, a subdivision of a beat, as specified by the
for and against using a feature-based context-free grammar to         question). When examining accuracy at the granularity of bars,
parse queries.                                                        our system achieved a 0.64 precision and 0.67 recall.
                                                                            In the remainder of this paper, Section 2 describes the overall
                                                                      system with an emphasis on how we employ NLTK and how we
1. INTRODUCTION                                                       designed the feature-based CFG. Section 3 presents an overview
      The C@merata task [1] provides an opportunity to                of the system’s performance in the C@merata 2015 tasks with a
investigate natural language queries to structured data, in this      preliminary discussion of how our approach fared with different
case, music data. This data, which is akin to time-series data, is    query types. We end with some final comments in Section 4.
composed of sequenced events, each with associated metadata.
      In contrast to the 2014 task [2], this year’s shared task       2. APPROACH
included complex queries with constraints to restrict candidate
answers. For example, the query “ten staccato quarter notes in the    2.1 Strategy for participation
Violoncello in measures 1-60 followed by two staccato quarter               Our general approach to participation for the 2015 entry was
notes in the Violin 1” requires finding two answer sequences that     to port the domain-specific rules mapping from a query to a
are juxtaposed together. Furthermore, the answer sequences            conceptual representation in the CLAS 2014 system to feature-
occur in different musical parts played by the violoncello and the    based CFG for use with NLTK’s parsing tools. In the advent of
first violin.                                                         an out-of-vocabulary error or an empty parse, the system reverted
      The CLAS 2015 system, like its 2014 predecessor [3], is         to the 2014 system.
based on the general notion of unification between the lexico-              This grammar was then extended to cover the complex
semantic features of the query and the metadata for each musical      queries of the 2015 shared task. We compiled a list of the new
event. In brief, the system interprets a natural language query,      queries from the documentation of the shared task. Grammar
converting the query to a conceptual representation. This is in       development was done by checking to see that there was an
turn processed to form a query representation, defining the type of   intuitive parse and that this led to candidate answer. The
answer required. Feature unification is generally used to find a      correctness of the answer was vetted manually.
subset of the data that serves as a candidate answer.                      This year, we had the benefit of a training data set. We set
      In the CLAS 2014 system, the components to detect               up a simple evaluation framework to gauge if changes to the CFG
linguistic features and unify these with metadata was purpose-        corresponded to overall improvement. As we did not have the
built for the C@merata 2014 task. In particular, our system did       evaluation code to measure precision and recall (and we did not
not heavily rely on phrase structure in the query when extracting     have sufficient time to implement our own), we used the diff tool
linguistic features to match against --- aside from specific nouns    (with the –w option to ignore whitespace) to compare between the
indicating the beginning of a new noun phrases subsequence, no        2014 gold standard and our system results. The number of
other phrase structure was inferred.                                  different lines was used as a rough measure of performance: fewer
      However, the inclusion of more complex queries provided         lines was taken as a indicating a grammar with better coverage.
some cases where a more complex syntactic phrase structure is
required to adequately represent the meaning of the query. For        2.2 Designing a Context-Free Grammar
example, numbers can be used to refer to a specific bar (for               In general, the CLAS 2015 grammar models the query as a
example, “a note in bar 4”), to specify a range of bar indices (for   nested sequence of musical noun phrases. These phrases are
example, “a note in bars 1 to 4”) or to indicate cardinality (for     based predominantly on the basic noun phrases that were handled
example, “4 crotchets”). Syntactic structure can help in these        in the 2014 CLAS system but extended to include new aspects for
cases to interpret the query correctly.                               2015 such as chords in a specific key, solfege nomenclature for
                                                                      notes, and references to scales. No morphological analysis was
Copyright is held by the author/owner(s).                             performed and plurals were hardcoded into the lexicon. A nested
MediaEval 2015 Workshop, September 14-15, 2015, Wurzen, Germany.
semantic feature structure was propagated to the root to allow for   to time restrictions, we were unable to add domain-specific
matching against the data.                                           knowledge to handle musical references such “Alberti bass”,
                                                                     “arpeggios”, and “descending scale”. We were also only partially
                                                                     able to handle the restriction “melody” as in “melody C, D, E in
  Question Type                        Example
                                                                     the violin”, as this requires an analysis of texture.
     1_melody                     dotted minim F#4                                             Beat          Beat         Bar         Bar
     n_melody                five note melody in bars 1-10              Question Type          Prec.        Recall       Prec.       Recall
      1_harm                chord D2 E5 G5 in bars 54-58                   1_melody        0.655        0.812        0.687       0.852
      texture                   monophonic passage                         n_melody        0.716        0.52         0.77        0.559
      follow                A minim followed by a quaver                    1_harm         0.66         0.62         0.702       0.66
       synch                  chord C4 E4 against a C5                      texture        0            0            0           0
       perf                          sforzando F2                           follow         0.312        0.484        0.323       0.5
       instr               harmonic second in the Violin 2                   synch         0.818        0.25         1           0.306
        clef                   four Gs in the treble clef                     perf         0.955        0.467        0.955       0.467
       time                 F sharp in 6/8 time in bars 1-20                 instr         0.677        0.708        0.72        0.753
        key                  sixteenth note G in G minor                      clef         0.415        0.519        0.431       0.538
                Table 1. Query types and examples.                           time          0.679        0.905        0.75        1
     We used domain-specific inferences to handle the queries.               key         1           0.625       1          0.625
These depended, in part, on where preposition phrases are
                                                                          Table 2. Evaluation results for different query types.
attached. For example, a preposition at the root of the parse was
used for restrictions to bars and parts. Prepositions attached to         Evaluation performance aside, it is worth reflecting on the
sequence-based phrase constituents (typically the last element in    strengths and weaknesses of our approach. Our NLTK-based
the sequence) were used to represent metadata constraints that       system was notably slower on the 2014 training data set compared
should be inherited by all elements in the sequence (for example,    to our 2014 version. It is difficult to say which system is easier to
“C, D, E in crotchets”). Finally, a prepositional phrase in the      maintain and develop. Intuitively, we believe the CFG may be
noun phrase for the musical event itself was used to qualify the     easier to maintain given the ease of porting the 2014 resources to
metadata (for example, “a chord in C”).                              a CFG and the ability to write domain-specific rules based on
     Referring expressions proved to be a minor complication as      phrase structure.
we did not want to hardcode every enumerated object, such as a            One limitation is that we are only able to handle queries
number, in the lexicon. We replaced numbers with a placeholder       licensed by the grammar, meaning we are unable to handle
token “_NUM_” during the parsing process. The actual numeric         ungrammatical query. This is potentially too prescriptive,
value was then heuristically reinserted into the parse structure.    particularly if this were to be a real application. Our CFG
The same mechanism was used for lyrics and enumerated part           deliberately allows metadata for notes to be accepted in any order
names like “Violin II”.                                              (for example, “minim dotted # C4”) but this is the extent to which
     When phrases like “followed by” were detected in the query,     we accept an ungrammatical query. Finally, we found that feature
we split the query at that point to form two component queries.      unification as a paradigm for matching against metadata breaks
Each query was then treated independently and all candidates that    down at times. The simplest case is that of intervals, the metadata
were adjacent with respect to its time (bar and beat indices) were   value of the note name for the second note depends on the context
considered an answer.                                                of another note. For example, a “perfect fifth” is not always a “C,
                                                                     G” pattern. Enumerating all fifths seems inelegant. For these
3. RESULTS AND DISCUSSION                                            cases, other answering mechanisms are needed.
     The evaluation was divided into a number of different query
types. Examples of these are presented in Table 1 with the key       4. CONCLUSION
elements in bold. We present the official evaluation results in      The CLAS 2015 system treats the C@merata task as a Q&A
Table 2. Results are reported for both beat and bar granularities.   problem using a controlled language. We use a feature-based
     In 2014, the worst category for our system was harmonic         context-free grammar to define a controlled language for the
intervals and so we focused on this category, adding domain-         music domain and parse queries using the Natural Language
specific inferences about notes, chords and scales to our handling   ToolKit. The CLAS 2015 system with this modification finished
of harmonic intervals. We are pleased to see that this led to good   first in the C@merata shared task.
performance for this category.
     Our approach to splitting a query into component parts that     5. ACKNOWLEDGEMENTS
are juxtaposed led to good precision for the “synch” category but         We would like to thank the organisers of C@merata for such
leaves room for improvement in terms of recall. Our inferences       an engaging, enlightening, and yet entertaining shared task.
mechanism based on the attachment points of prepositional
phrases led to the reasonable performance for the “instr”, “clef”
and “time” query categories.
     We ignored cadences and texture in this year’s effort and our
performance suffers correspondingly for these query types. Due
                                                                         2014 Workshop, Barcelona, Spain, October 16-17 2014.
6. REFERENCES                                                            http://ceur-ws.org/Vol-1263/.
[1] Sutcliffe, R. F. E., Fox, C., Root, D. L., Hovy, E. and Lewis,
                                                                     [3] Stephen Wan. 2014 The CLAS System at the MediaEval
    R. 2015. The C@merata Task at MediaEval 2015: Natural
                                                                         2014 C@merata Task. In the Working Notes Proceedings of
    language queries on classical music scores. In MediaEval
                                                                         MediaEval 2014 Workshop. Barcelona, Catalunya, Spain,
    2015 Workshop, Wurzen, Germany, September 14-15, 2015.
                                                                         October 16-17, 2014, CEUR-WS.org
    http://ceur-ws.org.
                                                                     [4] Bird, Steven, Edward Loper and Ewan Klein. 2009, Natural
[2] Sutcliffe, R., Crawford, T., Fox, C., Root, D.L., and Hovy, E.
                                                                         Language Processing with Python. O’Reilly Media Inc.
    2014. The C@merata Task at MediaEval 2014: Natural
    language queries on classical music scores. In MediaEval