=Paper=
{{Paper
|id=Vol-1984/Mediaeval_2017_paper_45
|storemode=property
|title=The DMUN System at the MediaEval 2017 C@merata Task
|pdfUrl=https://ceur-ws.org/Vol-1984/Mediaeval_2017_paper_45.pdf
|volume=Vol-1984
|authors=Andreas Katsiavalos
|dblpUrl=https://dblp.org/rec/conf/mediaeval/Katsiavalos17
}}
==The DMUN System at the MediaEval 2017 C@merata Task==
<pdf width="1500px">https://ceur-ws.org/Vol-1984/Mediaeval_2017_paper_45.pdf</pdf>
<pre>
         The DMUN System at the MediaEval 2017 C@merata Task
                                                         Andreas Katsiavalos
                                                 De Montfort University, Leicester, UK
                                                    andreas.katsiavalos@gmail.com

ABSTRACT                                                               between the C@merata sub-tasks enabled independent
                                                                       developments for each system. In 2015 [3], the focus was on the
This paper presents a system that was developed for the                development of highly-parameterized music-information retrieval
C@merata task to perform music information retrieval using text-       functions for high-level musical concepts, such as arpeggios and
based queries. The system is built on findings from previous           scales, while the system’s text parsing was relying on Collins’
attempts and achieved best results and functionality so far. The       Stravinsqi algorithm. The following year [2], the focus shifted to
C@merata task is split in two modules that handle the query-           language processing for the development of an automated query
parsing and music-information retrieval separately. The sub-tasks      parser. The results where promising and key tasks where
are connected with a formal-information-request, a dictionary that     identified and addressed, however, the connection between the
contains the parsing information. The system is not fully extended     query-parser and the music-information retrieval functions was
but key issues and methods are identified.                             very poor.


1 INTRODUCTION                                                         3 APPROACH
The C@merata task [1] represents a challenging task that aims to       3.1 Overview
bind text and music content-based retrieval. The challenges of the
task are important mainly because of the multiplicity of contexts      The system presented in this paper is a prototype method to
within which the content that is searched needs to be defined. The     connect text parsing and music information retrieval. The
variance in score formats, e.g. orchestral-scores in contrast to       C@merata task is handled in two main stages: a) the text parsing,
piano or single staff scores, the ambiguity in musical-concept         and, b) the music information retrieval. A shell was developed
descriptions and their exact positioning on the score, and, the        that integrates and connects the above elements, also handling I/O
technicalities of transferring the results of text-parsing to music-   operations. The two stages are operating independently and are
retrieval are some of the problems that need to be solved.             connected by the use of a data structure, named Formal
    The C@merata task is important because it is addressing a          Information Request (see below).
fundamental need in music research, that of a simplified content-          Each stage uses custom code that is not dependent on any
based music information retrieval system. Content-based retrieval      high-level external libraries for either language processing or
systems are implemented in fields such as music informatics, with      music information processing. Concerning language processing,
highly specialized applications, and, in general text- and             the system is not able to handle completely ‘natural’ language but
multimedia-based systems in web-search engines. However, there         rather a collection of word constructs where each valid sentence is
are no user-friendly applications to perform what the C@merata         viewed as a structure of valid terms, types and type combinations.
task is challenging. Thus, the development of text-based query         In this prototype system, only selected constructs were
systems for music-information retrieval will fill the gap between      implemented for proof of concept; however, the language is easily
specialized and non-content-based retrieval services for music.        extensible. While text parsing is carried out completely from
    A service that will satisfy the needs of the C@merata task         scratch, the reading of musicXML files and some dictionary-
would be helpful to everyone related with music and especially in      related operations were facilitated by music21.
higher-level music education where research often requires the             Two important notions of the system are the Formal
identification of diverse and complex musical elements in large        Information Request (FIR) and the notion of (musical) ‘durational
corpora. The textual-interface that is suggested from the task is      element’. The FIR is a method to connect the output of the query
also very practical for novice music enthusiasts that begin to         parsing with the music-information retrieval functions. It basically
discover the theoretic establishment of tonal music.                   transfers all the parsing data to a music function selector that
                                                                       further processes the parsing elements to be inputted to the music
                                                                       information retrieval functions. The notion of the durational
2 RELATED WORK
                                                                       element is very helpful in chaining input and output between
This paper draws from works in previous C@merata events and            music information retrieval functions.
studies in music information retrieval generally. The clear                Overall, as displayed in Figure 1, the system inputs a text
distinction of query parsing and music-information retrieval           query and initializes a query parser object by loading a .json
                                                                       language file, a dictionary with single term types for keys and sets
Copyright is held by the owner/authors(s).                             of terms for values. The query-parser converts the text of the
MediaEval’17, September 2017, Dublin, Ireland                          query into a Formal Information Request (FIR), another
MediaEval’17, 13-15 September 2017, Dublin, Ireland                                                                         A. Katsiavalos

dictionary, by gradually identifying and replacing the terms, term    patterns cannot integrate more and since their content, context,
types and compound types of the query with their types found in       and requirements are identified, they are viewed as high-level
the language file, until a top-level description of the query is      functions.
found. The FIR is then sent to the music information retrieval
(MIR) module which in turn selects the corresponding                  3.2 Parsing of text queries
information request retrieval function. All the currently possible    The query parsing module inputs the query phrase and after a
information requests are implemented as combinations between          sequence of parsing operations it outputs the FIR . The parsing is
three core types of MIR functions that find, relate and constrain     based on a ‘language’ file that holds all the information that is
music-entities such as notes/rests and note-sets (melodies, chords,   required to identify the type of the query. Parts of the language
etc.). Lastly, the output of the MIR functions, which are music       file are generated algorithmically.
elements, are converted into passages.

                                                                              Table 1: Example parsing of query number 58


                                                                            query              chord C# E G# in the bass clef
                                                                                         ‘chord’, ‘C#’, ‘E’, ‘G#’, ‘in’, ‘the’, ‘bass’,
                                                                            terms
                                                                                                             ‘clef’
                                                                                           ‘primaryType’, ‘pitch’, ‘pitch’, ‘pitch’,
                                                                            types            ‘contextRel’, ‘contextRel’, ‘partId’,
                                                                                                        ‘primaryType’
                                                                                           [0,3, ‘chord’], [4,5, ‘contextRel’], [6,7,
                                                                           cTypes
                                                                                                        ‘partContext’]
                                                                          mcTypes          [0,3, ‘chord’], [4,7, ‘partQualification]
                                                                          function                   getEntityInContext()

                                                                          Since all the questions where converted into combinations of
                                                                      Entities(E), Relations(R), and. Qualifications(Q), the set of valid
                                                                      combinations can be given from the graph shown in Figure 3,
           Figure 1: The overall workflow diagram.                    starting with an entity (E). Following this graph in text parsing
                                                                      was revealing in what kind of patterns are used and what kind of
                                                                      functions need to be developed.


                                                                         Figure 3. Starting with an Entity (E), a query can have
                                                                         any combination of paths in this cyclic graph, however,
            Figure 2: The text query parsing steps.                      not all of them are implemented.

   As shown in Figure 2 from top to bottom, the query parsing            Currently some of the functions that are implemented are
process starts with breaking down the query phrase into word          (using the abbreviations from Figure 3): E, E-E, E-En , E-R, E-Q,
tokens (terms) while commas (‘,’) are removed. Next, the TYPE         E-Q-Q, E-R-E-Q and E-Q-R-E-Q.
of each TERM is identified based on the language TERMS set.
Next, compound types (cTYPE) are identified by searching for the      3.3 Music information retrieval
maximal subset of adjacent parsed TYPES. Next, the query is
parsed again to check if there are any multi-compound-types           The music information retrieval module starts with the formal
(mcTYPE). At this point, the query is viewed as a high-level          information request of the query parser and outputs the music
pattern of musical-entities, relations and qualifications. These      elements that satisfy the query question. In general the reverse
MediaEval’17, 13-15 September 2017, Dublin, Ireland                                                                         A. Katsiavalos

process of text parsing is followed: while in query parsing the        4 RESULTS AND ANALYSIS
language dictionary was used to find integrations of terms in order
                                                                       The system found great difficulties with text parsing and for that
to identify the top query description, once the function is
                                                                       reason two groups of answers were made:
identified, the descriptions are broken down into elements but this
time removing and combining terms to read values and perform              1.    ‘auto’, where the queries were inputted ‘as is’ from the
music content searching.                                                        C@merata questions file without any alterations.
    The music information retrieval operations are handled by a           2.    ‘altered’, where some parts of the query had to be altered
simple script that was developed for this reason. The system                    to match the parsing capabilities.
operates with ‘datapoint’ lists, where notes and rests are the
atoms. The music entities that are identified in the text parser as             Table 3: The ‘auto’ and ‘altered’ query groups
(E)ntities are shown in Figure 4; the MIR functions can currently
retrieve the elements from the top three rows. Note that all the
combinations between them are possible.                                     Type                           Question numbers
                                                                            Auto (7)                    4, 58, 60, 63, 64, 92, 132
                                                                                                    1, 2, 3, 7, 11, 12, 18, 19, 23, 27,
                                                                            Altered (23)             33, 36, 39, 40, 42, 43, 52, 53,
                                                                                                           61, 62, 70, 103,189

                                                                          The main reasons to alter the original queries were:
                                                                            •    The ‘bar’ qualification is not implemented yet and the
                                                                                 results had to be manually checked for that range. (e.g.
                                                                                 1, 11, 12, 13, 18, 19, 23, 42).
                                                                            •    The ‘left’ and ‘right’ ‘hand’ qualifications are also not
                                                                                 implemented and these queries are altered to use part
                                                                                 names instead (e.g. 11, 12, 13, 18, 19, 23, 36, 40, 43,
                                                                                 52)
                                                                            •    All the terms where altered to match a single language
                                                                                 (e.g. 2, 7, 11, 18, 27, 33, 36, 39. 40, 62). For example
                                                                                 query 27 ‘D D D C# C# C# B E E D D D in crotchets’
                                                                                 is altered to ‘D D D C# C# C# B E E D D D in
                   Figure 4: The musical entities.                               quarters’.
                                                                            •    When not all the information given is used (e.g. 3, 39,
    There are generally two extremes in declaring and identifying                42, 53, 61, 70). For example in query 70 ‘theme’ is
Entities in queries and each one has different approach in                       considered a ‘melody’.
retrieval. An entity may contain the specific constituents of the
element, from highly specific e.g. a query ‘C4 E4 G4 chord’, to           Due to the small number of ‘auto’ answers and also to the fact
more abstract e.g. ‘major chord’.                                      that the alterations that had to be made are considered trivial, the
                                                                       results for the two groups were summed. The alterations are
                                                                       considered trivial because the methods to parse the original
                  Table 2: The MIR functions                           queries is known but not implemented. Also, all the answered
                                                                       questions were manually selected so that the MIR functions would
                              Note, rest, harmonic/ melodic            be able to run them. This explains the overall low recall and high
           getEntity                                                   precision of the results shown in Figure 5 meaning that when the
                                interval, chord, melody
     getEntityAfterEntity        Only the ‘followed by’                FIR was produced then the MIR was usually successful.
                                  ‘Part’ and ‘measure’                    In general, as shown in Figure 5, the overall Beat Recall and
      getEntityInContext
                                       qualification                   Measure Recall did not exceed 0.2 percent (0.155 and 0.172
                                                                       respectively), and from the total of 200 questions only 30 were
    The Entities in Figure 4 are durational entities, meaning that     answered. The generally high precision (0.833 for beat and 0.924
they all have similar attributes such as a starting point and an       for measure) is, as stated earlier, due to the manual selection of
ending point in time. The system makes use of these generic            queries into feasible and not feasible, and to minor alterations to
properties with robust MIR functions that can handle and mix any       their text. More specifically, the ‘synch’ category was completely
of them. For example a query ‘G4 followed by minor’ is served          excluded and very few ‘follow’ and ‘texture’ queries where tested.
by an MIR function that handles ‘Entity-After-Entity’ and not          Most of the emphasis was given to the ‘melodic’ and ‘harmonic’
‘Chord-After-Note’. This is an interesting feature with only partial   queries trying to answer as many as possible, but still with low
exploitation.                                                          recall in both.

                                                                                                                                          3
MediaEval’17, 13-15 September 2017, Dublin, Ireland               A. Katsiavalos


               Figure 5: The results of the system.

5 CONCLUSIONS
The current system presents a working paradigm for the complete
C@merata task, however as a prototype, it doesn’t reach its
potential. Although multi language support was not tested, this
can be easily achieved by using a different language file. This
way, apart from the differences in terms, different grammar
constructs can also be used as the language file is fully
customizable allowing the user to add their own grammatical
constructs.
MediaEval’17, 13-15 September 2017, Dublin, Ireland                A. Katsiavalos

REFERENCES
[1]   Sutcliffe, R. F. E., Ó Maidín, D. S., Hovy, E. (2017). The
      C@merata task at MediaEval 2017: Natural Language
      Queries about Music, their JSON Representations, and
      Matching Passages in MusicXML Scores. Proceedings of the
      MediaEval 2017 Workshop, Trinity College Dublin, Ireland,
      September 13-15, 2017.
[2]   Katsiavalos, A. (2016). DMUN: A Textual Interface for
      Content-Based Music Information Retrieval in the C@
      merata task for MediaEval 2016. Proceedings of the
      MediaEval 2016 Workshop, Hilversum, The Netherlands,
      October 20-21, 2016.
[3]   Katsiavalos, A., & Collins, T. (2015). DMUN at the
      MediaEval 2015 C@merata Task: The Stravinsqi Algorithm.
      Proceedings of the MediaEval 2015 Workshop, Dresden,
      Germany, September 14-15 2015.


                                                                                5

</pre>