CLAS at the MediaEval 2015 C@merata Task Stephen Wan Language & Social Computing Team CSIRO Sydney, Australia stephen.wan@csiro.au ABSTRACT Consequently, this year, the CLAS system uses the natural The CLAS 2015 system treats the C@merata task as a Q&A language feature-based parsing facilities in the Python modules problem specified with a controlled language. In this year’s distributed as part of the Natural Language ToolKit (NLTK) [4] system, we added a context-free grammar for the music controlled and a Context-Free Grammar (CFG) defined by the author. One language using the Natural Language ToolKit. Crucially this side-effect of this approach is that the feature unification facility provides an in-built feature unification mechanism allowing us to of the parser can be used to match against feature structures based replace the ad-hoc unification component in the 2014 system. on the music events. The CLAS 2015 system with this modification finished first in the The CLAS 2015 system achieved 0.60 precision and 0.63 C@merata shared task. In this paper, we describe the approach recall when specifying answers at the granularity of “beats” in a behind our participation in the shared task and discuss arguments bar (more accurately, a subdivision of a beat, as specified by the for and against using a feature-based context-free grammar to question). When examining accuracy at the granularity of bars, parse queries. our system achieved a 0.64 precision and 0.67 recall. In the remainder of this paper, Section 2 describes the overall system with an emphasis on how we employ NLTK and how we 1. INTRODUCTION designed the feature-based CFG. Section 3 presents an overview The C@merata task [1] provides an opportunity to of the system’s performance in the C@merata 2015 tasks with a investigate natural language queries to structured data, in this preliminary discussion of how our approach fared with different case, music data. This data, which is akin to time-series data, is query types. We end with some final comments in Section 4. composed of sequenced events, each with associated metadata. In contrast to the 2014 task [2], this year’s shared task 2. APPROACH included complex queries with constraints to restrict candidate answers. For example, the query “ten staccato quarter notes in the 2.1 Strategy for participation Violoncello in measures 1-60 followed by two staccato quarter Our general approach to participation for the 2015 entry was notes in the Violin 1” requires finding two answer sequences that to port the domain-specific rules mapping from a query to a are juxtaposed together. Furthermore, the answer sequences conceptual representation in the CLAS 2014 system to feature- occur in different musical parts played by the violoncello and the based CFG for use with NLTK’s parsing tools. In the advent of first violin. an out-of-vocabulary error or an empty parse, the system reverted The CLAS 2015 system, like its 2014 predecessor [3], is to the 2014 system. based on the general notion of unification between the lexico- This grammar was then extended to cover the complex semantic features of the query and the metadata for each musical queries of the 2015 shared task. We compiled a list of the new event. In brief, the system interprets a natural language query, queries from the documentation of the shared task. Grammar converting the query to a conceptual representation. This is in development was done by checking to see that there was an turn processed to form a query representation, defining the type of intuitive parse and that this led to candidate answer. The answer required. Feature unification is generally used to find a correctness of the answer was vetted manually. subset of the data that serves as a candidate answer. This year, we had the benefit of a training data set. We set In the CLAS 2014 system, the components to detect up a simple evaluation framework to gauge if changes to the CFG linguistic features and unify these with metadata was purpose- corresponded to overall improvement. As we did not have the built for the C@merata 2014 task. In particular, our system did evaluation code to measure precision and recall (and we did not not heavily rely on phrase structure in the query when extracting have sufficient time to implement our own), we used the diff tool linguistic features to match against --- aside from specific nouns (with the –w option to ignore whitespace) to compare between the indicating the beginning of a new noun phrases subsequence, no 2014 gold standard and our system results. The number of other phrase structure was inferred. different lines was used as a rough measure of performance: fewer However, the inclusion of more complex queries provided lines was taken as a indicating a grammar with better coverage. some cases where a more complex syntactic phrase structure is required to adequately represent the meaning of the query. For 2.2 Designing a Context-Free Grammar example, numbers can be used to refer to a specific bar (for In general, the CLAS 2015 grammar models the query as a example, “a note in bar 4”), to specify a range of bar indices (for nested sequence of musical noun phrases. These phrases are example, “a note in bars 1 to 4”) or to indicate cardinality (for based predominantly on the basic noun phrases that were handled example, “4 crotchets”). Syntactic structure can help in these in the 2014 CLAS system but extended to include new aspects for cases to interpret the query correctly. 2015 such as chords in a specific key, solfege nomenclature for notes, and references to scales. No morphological analysis was Copyright is held by the author/owner(s). performed and plurals were hardcoded into the lexicon. A nested MediaEval 2015 Workshop, September 14-15, 2015, Wurzen, Germany. semantic feature structure was propagated to the root to allow for to time restrictions, we were unable to add domain-specific matching against the data. knowledge to handle musical references such “Alberti bass”, “arpeggios”, and “descending scale”. We were also only partially able to handle the restriction “melody” as in “melody C, D, E in Question Type Example the violin”, as this requires an analysis of texture. 1_melody dotted minim F#4 Beat Beat Bar Bar n_melody five note melody in bars 1-10 Question Type Prec. Recall Prec. Recall 1_harm chord D2 E5 G5 in bars 54-58 1_melody 0.655 0.812 0.687 0.852 texture monophonic passage n_melody 0.716 0.52 0.77 0.559 follow A minim followed by a quaver 1_harm 0.66 0.62 0.702 0.66 synch chord C4 E4 against a C5 texture 0 0 0 0 perf sforzando F2 follow 0.312 0.484 0.323 0.5 instr harmonic second in the Violin 2 synch 0.818 0.25 1 0.306 clef four Gs in the treble clef perf 0.955 0.467 0.955 0.467 time F sharp in 6/8 time in bars 1-20 instr 0.677 0.708 0.72 0.753 key sixteenth note G in G minor clef 0.415 0.519 0.431 0.538 Table 1. Query types and examples. time 0.679 0.905 0.75 1 We used domain-specific inferences to handle the queries. key 1 0.625 1 0.625 These depended, in part, on where preposition phrases are Table 2. Evaluation results for different query types. attached. For example, a preposition at the root of the parse was used for restrictions to bars and parts. Prepositions attached to Evaluation performance aside, it is worth reflecting on the sequence-based phrase constituents (typically the last element in strengths and weaknesses of our approach. Our NLTK-based the sequence) were used to represent metadata constraints that system was notably slower on the 2014 training data set compared should be inherited by all elements in the sequence (for example, to our 2014 version. It is difficult to say which system is easier to “C, D, E in crotchets”). Finally, a prepositional phrase in the maintain and develop. Intuitively, we believe the CFG may be noun phrase for the musical event itself was used to qualify the easier to maintain given the ease of porting the 2014 resources to metadata (for example, “a chord in C”). a CFG and the ability to write domain-specific rules based on Referring expressions proved to be a minor complication as phrase structure. we did not want to hardcode every enumerated object, such as a One limitation is that we are only able to handle queries number, in the lexicon. We replaced numbers with a placeholder licensed by the grammar, meaning we are unable to handle token “_NUM_” during the parsing process. The actual numeric ungrammatical query. This is potentially too prescriptive, value was then heuristically reinserted into the parse structure. particularly if this were to be a real application. Our CFG The same mechanism was used for lyrics and enumerated part deliberately allows metadata for notes to be accepted in any order names like “Violin II”. (for example, “minim dotted # C4”) but this is the extent to which When phrases like “followed by” were detected in the query, we accept an ungrammatical query. Finally, we found that feature we split the query at that point to form two component queries. unification as a paradigm for matching against metadata breaks Each query was then treated independently and all candidates that down at times. The simplest case is that of intervals, the metadata were adjacent with respect to its time (bar and beat indices) were value of the note name for the second note depends on the context considered an answer. of another note. For example, a “perfect fifth” is not always a “C, G” pattern. Enumerating all fifths seems inelegant. For these 3. RESULTS AND DISCUSSION cases, other answering mechanisms are needed. The evaluation was divided into a number of different query types. Examples of these are presented in Table 1 with the key 4. CONCLUSION elements in bold. We present the official evaluation results in The CLAS 2015 system treats the C@merata task as a Q&A Table 2. Results are reported for both beat and bar granularities. problem using a controlled language. We use a feature-based In 2014, the worst category for our system was harmonic context-free grammar to define a controlled language for the intervals and so we focused on this category, adding domain- music domain and parse queries using the Natural Language specific inferences about notes, chords and scales to our handling ToolKit. The CLAS 2015 system with this modification finished of harmonic intervals. We are pleased to see that this led to good first in the C@merata shared task. performance for this category. Our approach to splitting a query into component parts that 5. ACKNOWLEDGEMENTS are juxtaposed led to good precision for the “synch” category but We would like to thank the organisers of C@merata for such leaves room for improvement in terms of recall. Our inferences an engaging, enlightening, and yet entertaining shared task. mechanism based on the attachment points of prepositional phrases led to the reasonable performance for the “instr”, “clef” and “time” query categories. We ignored cadences and texture in this year’s effort and our performance suffers correspondingly for these query types. Due 2014 Workshop, Barcelona, Spain, October 16-17 2014. 6. REFERENCES http://ceur-ws.org/Vol-1263/. [1] Sutcliffe, R. F. E., Fox, C., Root, D. L., Hovy, E. and Lewis, [3] Stephen Wan. 2014 The CLAS System at the MediaEval R. 2015. The C@merata Task at MediaEval 2015: Natural 2014 C@merata Task. In the Working Notes Proceedings of language queries on classical music scores. In MediaEval MediaEval 2014 Workshop. Barcelona, Catalunya, Spain, 2015 Workshop, Wurzen, Germany, September 14-15, 2015. October 16-17, 2014, CEUR-WS.org http://ceur-ws.org. [4] Bird, Steven, Edward Loper and Ewan Klein. 2009, Natural [2] Sutcliffe, R., Crawford, T., Fox, C., Root, D.L., and Hovy, E. Language Processing with Python. O’Reilly Media Inc. 2014. The C@merata Task at MediaEval 2014: Natural language queries on classical music scores. In MediaEval