=Paper=
{{Paper
|id=Vol-1984/Mediaeval_2017_paper_45
|storemode=property
|title=The DMUN System at the MediaEval 2017 C@merata Task
|pdfUrl=https://ceur-ws.org/Vol-1984/Mediaeval_2017_paper_45.pdf
|volume=Vol-1984
|authors=Andreas Katsiavalos
|dblpUrl=https://dblp.org/rec/conf/mediaeval/Katsiavalos17
}}
==The DMUN System at the MediaEval 2017 C@merata Task==
The DMUN System at the MediaEval 2017 C@merata Task Andreas Katsiavalos De Montfort University, Leicester, UK andreas.katsiavalos@gmail.com ABSTRACT between the C@merata sub-tasks enabled independent developments for each system. In 2015 [3], the focus was on the This paper presents a system that was developed for the development of highly-parameterized music-information retrieval C@merata task to perform music information retrieval using text- functions for high-level musical concepts, such as arpeggios and based queries. The system is built on findings from previous scales, while the system’s text parsing was relying on Collins’ attempts and achieved best results and functionality so far. The Stravinsqi algorithm. The following year [2], the focus shifted to C@merata task is split in two modules that handle the query- language processing for the development of an automated query parsing and music-information retrieval separately. The sub-tasks parser. The results where promising and key tasks where are connected with a formal-information-request, a dictionary that identified and addressed, however, the connection between the contains the parsing information. The system is not fully extended query-parser and the music-information retrieval functions was but key issues and methods are identified. very poor. 1 INTRODUCTION 3 APPROACH The C@merata task [1] represents a challenging task that aims to 3.1 Overview bind text and music content-based retrieval. The challenges of the task are important mainly because of the multiplicity of contexts The system presented in this paper is a prototype method to within which the content that is searched needs to be defined. The connect text parsing and music information retrieval. The variance in score formats, e.g. orchestral-scores in contrast to C@merata task is handled in two main stages: a) the text parsing, piano or single staff scores, the ambiguity in musical-concept and, b) the music information retrieval. A shell was developed descriptions and their exact positioning on the score, and, the that integrates and connects the above elements, also handling I/O technicalities of transferring the results of text-parsing to music- operations. The two stages are operating independently and are retrieval are some of the problems that need to be solved. connected by the use of a data structure, named Formal The C@merata task is important because it is addressing a Information Request (see below). fundamental need in music research, that of a simplified content- Each stage uses custom code that is not dependent on any based music information retrieval system. Content-based retrieval high-level external libraries for either language processing or systems are implemented in fields such as music informatics, with music information processing. Concerning language processing, highly specialized applications, and, in general text- and the system is not able to handle completely ‘natural’ language but multimedia-based systems in web-search engines. However, there rather a collection of word constructs where each valid sentence is are no user-friendly applications to perform what the C@merata viewed as a structure of valid terms, types and type combinations. task is challenging. Thus, the development of text-based query In this prototype system, only selected constructs were systems for music-information retrieval will fill the gap between implemented for proof of concept; however, the language is easily specialized and non-content-based retrieval services for music. extensible. While text parsing is carried out completely from A service that will satisfy the needs of the C@merata task scratch, the reading of musicXML files and some dictionary- would be helpful to everyone related with music and especially in related operations were facilitated by music21. higher-level music education where research often requires the Two important notions of the system are the Formal identification of diverse and complex musical elements in large Information Request (FIR) and the notion of (musical) ‘durational corpora. The textual-interface that is suggested from the task is element’. The FIR is a method to connect the output of the query also very practical for novice music enthusiasts that begin to parsing with the music-information retrieval functions. It basically discover the theoretic establishment of tonal music. transfers all the parsing data to a music function selector that further processes the parsing elements to be inputted to the music information retrieval functions. The notion of the durational 2 RELATED WORK element is very helpful in chaining input and output between This paper draws from works in previous C@merata events and music information retrieval functions. studies in music information retrieval generally. The clear Overall, as displayed in Figure 1, the system inputs a text distinction of query parsing and music-information retrieval query and initializes a query parser object by loading a .json language file, a dictionary with single term types for keys and sets Copyright is held by the owner/authors(s). of terms for values. The query-parser converts the text of the MediaEval’17, September 2017, Dublin, Ireland query into a Formal Information Request (FIR), another MediaEval’17, 13-15 September 2017, Dublin, Ireland A. Katsiavalos dictionary, by gradually identifying and replacing the terms, term patterns cannot integrate more and since their content, context, types and compound types of the query with their types found in and requirements are identified, they are viewed as high-level the language file, until a top-level description of the query is functions. found. The FIR is then sent to the music information retrieval (MIR) module which in turn selects the corresponding 3.2 Parsing of text queries information request retrieval function. All the currently possible The query parsing module inputs the query phrase and after a information requests are implemented as combinations between sequence of parsing operations it outputs the FIR . The parsing is three core types of MIR functions that find, relate and constrain based on a ‘language’ file that holds all the information that is music-entities such as notes/rests and note-sets (melodies, chords, required to identify the type of the query. Parts of the language etc.). Lastly, the output of the MIR functions, which are music file are generated algorithmically. elements, are converted into passages. Table 1: Example parsing of query number 58 query chord C# E G# in the bass clef ‘chord’, ‘C#’, ‘E’, ‘G#’, ‘in’, ‘the’, ‘bass’, terms ‘clef’ ‘primaryType’, ‘pitch’, ‘pitch’, ‘pitch’, types ‘contextRel’, ‘contextRel’, ‘partId’, ‘primaryType’ [0,3, ‘chord’], [4,5, ‘contextRel’], [6,7, cTypes ‘partContext’] mcTypes [0,3, ‘chord’], [4,7, ‘partQualification] function getEntityInContext() Since all the questions where converted into combinations of Entities(E), Relations(R), and. Qualifications(Q), the set of valid combinations can be given from the graph shown in Figure 3, Figure 1: The overall workflow diagram. starting with an entity (E). Following this graph in text parsing was revealing in what kind of patterns are used and what kind of functions need to be developed. Figure 3. Starting with an Entity (E), a query can have any combination of paths in this cyclic graph, however, Figure 2: The text query parsing steps. not all of them are implemented. As shown in Figure 2 from top to bottom, the query parsing Currently some of the functions that are implemented are process starts with breaking down the query phrase into word (using the abbreviations from Figure 3): E, E-E, E-En , E-R, E-Q, tokens (terms) while commas (‘,’) are removed. Next, the TYPE E-Q-Q, E-R-E-Q and E-Q-R-E-Q. of each TERM is identified based on the language TERMS set. Next, compound types (cTYPE) are identified by searching for the 3.3 Music information retrieval maximal subset of adjacent parsed TYPES. Next, the query is parsed again to check if there are any multi-compound-types The music information retrieval module starts with the formal (mcTYPE). At this point, the query is viewed as a high-level information request of the query parser and outputs the music pattern of musical-entities, relations and qualifications. These elements that satisfy the query question. In general the reverse MediaEval’17, 13-15 September 2017, Dublin, Ireland A. Katsiavalos process of text parsing is followed: while in query parsing the 4 RESULTS AND ANALYSIS language dictionary was used to find integrations of terms in order The system found great difficulties with text parsing and for that to identify the top query description, once the function is reason two groups of answers were made: identified, the descriptions are broken down into elements but this time removing and combining terms to read values and perform 1. ‘auto’, where the queries were inputted ‘as is’ from the music content searching. C@merata questions file without any alterations. The music information retrieval operations are handled by a 2. ‘altered’, where some parts of the query had to be altered simple script that was developed for this reason. The system to match the parsing capabilities. operates with ‘datapoint’ lists, where notes and rests are the atoms. The music entities that are identified in the text parser as Table 3: The ‘auto’ and ‘altered’ query groups (E)ntities are shown in Figure 4; the MIR functions can currently retrieve the elements from the top three rows. Note that all the combinations between them are possible. Type Question numbers Auto (7) 4, 58, 60, 63, 64, 92, 132 1, 2, 3, 7, 11, 12, 18, 19, 23, 27, Altered (23) 33, 36, 39, 40, 42, 43, 52, 53, 61, 62, 70, 103,189 The main reasons to alter the original queries were: • The ‘bar’ qualification is not implemented yet and the results had to be manually checked for that range. (e.g. 1, 11, 12, 13, 18, 19, 23, 42). • The ‘left’ and ‘right’ ‘hand’ qualifications are also not implemented and these queries are altered to use part names instead (e.g. 11, 12, 13, 18, 19, 23, 36, 40, 43, 52) • All the terms where altered to match a single language (e.g. 2, 7, 11, 18, 27, 33, 36, 39. 40, 62). For example query 27 ‘D D D C# C# C# B E E D D D in crotchets’ is altered to ‘D D D C# C# C# B E E D D D in Figure 4: The musical entities. quarters’. • When not all the information given is used (e.g. 3, 39, There are generally two extremes in declaring and identifying 42, 53, 61, 70). For example in query 70 ‘theme’ is Entities in queries and each one has different approach in considered a ‘melody’. retrieval. An entity may contain the specific constituents of the element, from highly specific e.g. a query ‘C4 E4 G4 chord’, to Due to the small number of ‘auto’ answers and also to the fact more abstract e.g. ‘major chord’. that the alterations that had to be made are considered trivial, the results for the two groups were summed. The alterations are considered trivial because the methods to parse the original Table 2: The MIR functions queries is known but not implemented. Also, all the answered questions were manually selected so that the MIR functions would Note, rest, harmonic/ melodic be able to run them. This explains the overall low recall and high getEntity precision of the results shown in Figure 5 meaning that when the interval, chord, melody getEntityAfterEntity Only the ‘followed by’ FIR was produced then the MIR was usually successful. ‘Part’ and ‘measure’ In general, as shown in Figure 5, the overall Beat Recall and getEntityInContext qualification Measure Recall did not exceed 0.2 percent (0.155 and 0.172 respectively), and from the total of 200 questions only 30 were The Entities in Figure 4 are durational entities, meaning that answered. The generally high precision (0.833 for beat and 0.924 they all have similar attributes such as a starting point and an for measure) is, as stated earlier, due to the manual selection of ending point in time. The system makes use of these generic queries into feasible and not feasible, and to minor alterations to properties with robust MIR functions that can handle and mix any their text. More specifically, the ‘synch’ category was completely of them. For example a query ‘G4 followed by minor’ is served excluded and very few ‘follow’ and ‘texture’ queries where tested. by an MIR function that handles ‘Entity-After-Entity’ and not Most of the emphasis was given to the ‘melodic’ and ‘harmonic’ ‘Chord-After-Note’. This is an interesting feature with only partial queries trying to answer as many as possible, but still with low exploitation. recall in both. 3 MediaEval’17, 13-15 September 2017, Dublin, Ireland A. Katsiavalos Figure 5: The results of the system. 5 CONCLUSIONS The current system presents a working paradigm for the complete C@merata task, however as a prototype, it doesn’t reach its potential. Although multi language support was not tested, this can be easily achieved by using a different language file. This way, apart from the differences in terms, different grammar constructs can also be used as the language file is fully customizable allowing the user to add their own grammatical constructs. MediaEval’17, 13-15 September 2017, Dublin, Ireland A. Katsiavalos REFERENCES [1] Sutcliffe, R. F. E., Ó Maidín, D. S., Hovy, E. (2017). The C@merata task at MediaEval 2017: Natural Language Queries about Music, their JSON Representations, and Matching Passages in MusicXML Scores. Proceedings of the MediaEval 2017 Workshop, Trinity College Dublin, Ireland, September 13-15, 2017. [2] Katsiavalos, A. (2016). DMUN: A Textual Interface for Content-Based Music Information Retrieval in the C@ merata task for MediaEval 2016. Proceedings of the MediaEval 2016 Workshop, Hilversum, The Netherlands, October 20-21, 2016. [3] Katsiavalos, A., & Collins, T. (2015). DMUN at the MediaEval 2015 C@merata Task: The Stravinsqi Algorithm. Proceedings of the MediaEval 2015 Workshop, Dresden, Germany, September 14-15 2015. 5