=Paper=
{{Paper
|id=Vol-1263/paper49
|storemode=property
|title=The CLAS System at the MediaEval 2014 C@merata Task
|pdfUrl=https://ceur-ws.org/Vol-1263/mediaeval2014_submission_49.pdf
|volume=Vol-1263
|dblpUrl=https://dblp.org/rec/conf/mediaeval/Wan14
}}
==The CLAS System at the MediaEval 2014 C@merata Task==
The CLAS system at the MediaEval 2014 C@merata Task Stephen Wan CSIRO Sydney, Australia Stephen.Wan@csiro.au ABSTRACT 2. scans the CR and consumes concepts if they define the scope This paper describes the CLAS system which accepts natural of the answer. language queries in the domain of music theory to perform 3. parses the remaining CR list to construct the query passage retrieval from a musical score. This system was produced representation (QR), a sequence of feature structures that for participation in the C@merata MediaEval 2014 shared task. indicate the type of answer required, using handwritten The system uses a domain-specific parser to interpret the query parsing rules which implicitly capture the domain-specific and answer generation methods based on feature unification. interpretation of the NLQ. Performance on this task was encouraging with 0.76 precision and 4. Compares the QR with a subset of the data in the XML, 0.96 recall. referred to as the Scoped Data (SD), represented as a list of FS, from which candidate answers can be found using feature 1. INTRODUCTION unification. This paper describes the CLAS system which selects processes and retrieves potentially relevant answers from 2.1 Mapping Query Terms to Concepts structured data given a natural language query. In this work, the The system uses a handcrafted lexicon that maps from terms queries and the structured data are in the domain of music theory, in the NLQ to concepts in the music theory domain, using the as defined by the C@merata 2014 task [1]. The CLAS system following five steps. produces candidate answers by selecting passages from an In Step 1, multi-word entities such as “down bow” are musical score (in XML). Answers may be any consecutive time mapped to a single token “down_bow” to allow correct points spanning multiple whole and partial bars. tokenisation. In Step 2, tokens such as “Vb”, denoting the For example, a query ``4 crotchets'' should retrieve any dominant chord (“V”) in the first inversion (“b”), are separated sequence of four consecutive elements in the score where each into the two components. In Step 3, quotation marks are used tag element is a note and each note has the time duration of a crotchet quoted words as being lyrics (Note: the lexicon used here is (one quarter of a whole note). In such a system, expert knowledge limited to music theory terms only and does not include the wider is needed to interpret the query. However, this not just limited to language from which lyrics may originate). In Step 4, tokens are definitions of musical concepts (e.g., ``crotchet''). For example, separated using whitespace as a delimiter. Finally, in Step 5, the query ``4 crotchets'' should be interpreted not just as any four tokens are mapped to their conceptual form using the lexicon. notes with crotchet duration within the music (compare this to a Non-contentful words that are not used to construct the QR (e.g., general knowledge query ``4 composers'' requiring any four the article “a” or redundant information about sequence order like musical composers to be provided) but specifically four notes in “followed by”) are mapped to a null token and are thus ignored. sequence. Furthermore, these four notes would typically be For example, the word "crotchet" is mapped to expected to be in the same voice or part; for example, if it were a "_note:length.1", indicating that the word relates to a “note” FS, piano score for two hands, the four crotchets might be a sequence where the feature “length” takes the value “1”. Similarly, the written in the treble clef, played by the right hand. word "quarter" (as in “quarter note”) is also mapped to this sense In this paper, we describe a system that processes the input "_note:length.1". query, mapping from words in English to music metadata Words can have multiple meanings. For example, the word corresponding to the search criteria, or features, represented as a "perfect" is mapped to "_sequence:int_quality.PERFECT; set of attribute-value pairs. An exhaustive search of an XML _chord_sequence:cadence.PERFECT", indicating two senses: one score is performed, note by note, for candidate answers using referring to the quality of an interval (e.g., “a perfect fifth) , or a feature unification. type of chord sequence (e.g., “a perfect cadence”). This system achieved an overall performance of 0.76 precision and 0.96 recall. The remainder of the paper outlines the 2.2 Building Scoped Data The system labels each NLQ with a type T specifying the system in more detail and presents the C@merata evaluation type of answer required and the scope of the XML data to be results. examined for an answer (i.e., the SD). In this work, we defined 2. APPROACH four types: (i) harmonic, (ii) cadence, (iii) style; and (iv) note. The CLAS system interprets the natural language query Each type specifies rules for: (1) converting from the XML (NLQ) to find candidate answer passages from the score. Briefly, representation into an SD; (2) parsing rules to convert the CR into the system: a QR; and (3) candidate generation rules. A scan of the CR is used to determine the type T by 1. pre-processes tokens and maps these to a list of concepts, or searching for concepts specifying the data “granularity”. If any the concept representation (CR). are found, these are removed from CR and used to set the type. For example, “simultaneous”, as in “simultaneous second” Copyright is held by the author/owner(s). (referring to an interval of a second where both notes are sounded MediaEval 2014 Workshop, October 16-17, 2014, Barcelona, Spain. concurrently), is mapped to the concept "_data:granularity.HARMONIC", indicating the harmonic type. (e.g., “half note C”), “expression” (e.g., “fermata A natural”), In this case, the SD is defined as a list of chordal notes, taken precision and recall is above 0.86. Indeed in some cases, recall from a block chord view of the score.1 and precision is 1.0. The cadence and style types also scope the data as a list of The general approach of creating sequences of feature chords. If no other type is indicated by a concept in CR, the structures (the “followed by” query type, e.g., “quaver C# default note type is used, defining the SD as the concatenation of followed by crotchet B” performed reasonably, with precision of the sequence of notes in each voice. 0.748 and recall of 0.859 for the beat answer types (performance For queries where the voice or clef is specified, for example increases for the bar answer type). From this, we infer that the “treble clef” or “soprano part”, the corresponding concepts are general assumptions underpinning the way noun phrases about used to filter the data to include just that voice. notes are transformed into the query representations using the reduction process are sound. 2.3 Building a Query Representation (QR) The remaining tokens in CR are used to create a list of FSs of Granularity Precision Recall type T following a bespoke rule-based parsing process. The CR is Beat 0.713 0.904 processed in reverse order (assuming head-final noun phrases) Bar 0.764 0.967 and FSs are constructed in a process loosely based on reduction in Table 1. Overall Results a shift-reduce parser. For example, the query “a C sharp crotchet and a D minim” is mapped to the CR “[_note:name.C, _note:accidental.SHARP, 3.2 Future Work In this work, time constraints affected the choice of methods _note:length.1, _note:name.D, _note:length.2]”. The concepts used in the CLAS system. For example, instead of the bespoke “[_note:name.D, _note:length.2]” are consumed first and used to parsing process used here to map from the query tokens to the populate a FS. At this point, the “_note:length.1” concept is feature structures in the Query Representation, an alternative encountered. Because the current FS already has a note length method might be to create a context-free grammar for the domain value (a “minim”), the FS is popped off and pushed onto the QR sublanguage and to use a tool like NLTK2 to parse the tokens, list. A new FS is then used to consume the remaining tokens: resulting in a syntactic parse. This linguistic structure can then be “[_note:name.C, _note:accidental.SHARP, _note:length.1]”. The mapped to the feature structures. In future work, we will examine CR is now empty and the QR is a list of two FSs corresponding to the parsing of noun phrase structures in which the features for the notes. Parsing works similarly for the other types. For matching are propagated up to an appropriate node in the tree. example, cadences are sequences of chord FSs. These can then be collected to form the Query Representation. 2.4 Matching a Query Representation to Finally, instead of enumerating exhaustively through all notes, in future work, we will examine the use of search engines Scoped Data to find candidate starting positions, from which feature unification Once a QR is generated, the SD sequence is then iterated processes can then start. In this approach, notes might be treated through and at each position a match to the QR is attempted using as quasi-documents, allowing them to be indexed by metadata feature unification. If a match is found, then a candidate answer based on musical properties. passage is stored. For style answers, a different process is used based on simple 4. CONCLUSION heuristics. For example, the homophony and polyphony answer In this work, expert knowledge in music theory was directly generation processes consider chords for passing notes, indicated incorporated into a bespoke parser and lexicon. These were used by implicit ties. Consequently, the QR for this type is an empty to interpret a music NLQ, and a scoping process to reduce the list since no feature unification takes place. space for candidate answers. Parsing was performed using a reduce-style process. Matches were performed using feature 3. RESULTS AND DISCUSSION unification. Performance on this task was encouraging with 0.76 precision and 0.96 recall. 3.1 Results Performance for this system is encouraging. The overall 5. REFERENCES results are presented in Table 1, which lists the recall and [1] Sutcliffe, R., Crawford, T., Fox, C., Root, D.L., and Hovy, E. precision for answers at two granularities of answers: the correct 2014. The C@merata Task at MediaEval 2014: Natural bars and also the correct beats. Considering the hand-crafted language queries on classical music scores. In MediaEval lexicon and the bespoke parsing mechanism, the system performs 2014 Workshop, Barcelona, Spain, October 16-17 2014. reasonably well at both granularity answer types, with precision around 0.7 and recall at around 0.9. At the time of writing, the average performance of systems participating in the C@merata task is not available. The C@merata evaluation also provides additional statistics regarding performance based on the type of query. The system does well with queries related to the properties of notes in a sequence. For these categories, “simple pitch” (e.g., “G”), “simple length” (e.g., “quarter note rest”), “pitch and length” 1 The method chordify from the music21 package 2 (http://web.mit.edu/music21/) is used to produce this view. http://www.nltk.org/