         DMUN: A Textual Interface for Content-Based Music
            Information Retrieval in the C@merata task
                        for MediaEval 2016
                                                        Andreas Katsiavalos
                                                           De Montfort University
                                                              Leicester, UK
ABSTRACT                                                                     The music information workflow interpreter connects the
     This paper describes a text-based Question-Answering (QA)          script with a set of music-related functions that are built on top of
system for content-based music information retrieval (MIR)              the music21 framework (see 2.4). discuss the development of the
according to the C@merata task description [12,13].                     question types over the past three years and in particular focus on
                                                                        the more sophisticated methods adopted for question generation
1. INTRODUCTION                                                         this year. We will then present the participating systems for this
     Content-based search of music information is an active             year and discuss the results which they obtained.
research area [4] with applications in education and general
musicological tasks. Apart from collections of music data such as
KernScores [11], even traditional library catalog servers can be        2.2 The Query Interpreter
searched based on their content [7]. To access these data and                 The query interpreter is a class that is initalised with a
extract content-based information we developed a text query             “language” file that contains information about valid terms, their
parser that, given a sentence such as a C@merata question,              types, composite types and, composite type relations. Composite
generates a script for music operations. The script contains the        types are music concepts and will be referred to as entities. This
music concepts and their relations as described in the query, but       file stores generic terms, but some values, e.g. names of the parts
in a structured form in such a way that workflows of specific           are extracted from music data.
music data operations are formed. A parser then reads the script              The terms of a query phrase can be:
and calls the corresponding functions from a framework we
created on top of music21 [6]. The questions tested are a sub-set            •    values,
of 28 random selections from the complete set of questions.                  •    music concept/entity keywords, E,
     An overview of the query system is given in section 2 with              •    music concept/entity relation keywords, R.
more detailed descriptions of important concepts and procedures.
In section 3 we present the results of the algorithm with detailed            For example, “dotted quarter note dominant 7th” is a chord
description and discuss them. Last, the conclusions are presented       entity. Entities are further categorized into “content” and
in section 4.                                                           “context” types. Although in the question set we tested, context
                                                                        entities are the parts and measures and content entities are note,
                                                                        rest, chord and simultaneity, it is the relation keywords that define
2. APPROACH                                                             what is the search context and what is the target content. Relations
                                                                        enable the transformation of the query into a structured request by
2.1 Overview                                                            defining the context-content relation. The conditions are just the
     Query parsing and music content operations are kept separate       entity attributes.
and the only connection between them is through an intermediate               Some of the relation types that were identified in the tested
layer.                                                                  question set are shown below (the “<>” symbol means any type of
     There are three major components of this approach are:             entity):

     •    A query interpreter                                           < > (" ") < > , <(duration, pitch, note, chord)>
     •    The script language                                           < > ("followed by") < >, <(duration, pitch, note, chord)>
     •    A music information workflow interpreter                      < > ("in", "in the") < > contextual and conditional
                                                                        < > ("of", "of a") < >
     The query interpreter resolves the query text into a script that   < > ("parallel")
describes a music information workflow. This is a layered process       < > ("repeated") <> ("time","times")
that required hard-coded knowledge about valid query terms and          < > ("between”, “between the") < > ("and") < >
types (see 2.2).                                                        < > ("against", "only against") < >
     The script language consists of “information request”                   …
statements that are formed by the clauses: “select”, “from” and
“where” having similar functionality as that described by the                The terms of the query phrase are processed in layers starting
Structured Query Language (SQL) (see 2.3).                              by identifying the type of each one. Next, composite types and
                                                                        words are grouped into entities. After all the types are matched,
Copyright is held by the author/owner(s).                               the entity relations are identified. Last, the query is converted into
MediaEval 2016 Workshop, October 20-21, 2016, Amsterdam.                an information request using “select-from-where” statements.
1.           Load the language file                                   selection, measure selection based on range, and get attribute
2.           Parse the query                                          values for basic elements such as note, rest and chord type.
2.1          First pass: terms to types                                    One way to avoid over-analyzing the query into complicated
2.2          Second pass: type groups and relations                   information requests is to use more complex representations, such
2.3          Third pass: Content and Context identification           as note-sequences (VIS) [1], or Directed Interval Classes [3] and
2.4          Fourth pass: Make information request                    bypass low-level relations by transferring them to the
3            Run information request script with music framework      representation.

# 14         seven-note chord in the harpsichord                      3. RESULTS AND DISCUSSION
                                                                           These are preliminary results and the approach is under
             context : parts, condition: instrument                   development. In the rest of this section we discuss how queries
             get type : chord                                         resolve into information requests and the difficulties in the
             condition : cardinality value                            process.

                   Figure 1. Example query analysis                   #3        octave leap in violin I
                                                                      context : part, instrument type and number
2.3 Information Request using a Script                                get type : melodic interval, keyword "leap"
     After the query phrase analysis a script that contains a         condition : interval value
structured information request is generated by converting the
identified entities and their relations into a sequence of “select-   #5        Bb3, A3, G3, F3, E3
from-where” statements.                                               note,con:seq:comma, note, con:seq:comma, note, con:seq:comma,
                                                                      note, con:seq:comma, note
#9           parallel thirds in measures 15-18
             FROM CONTEXT:                                            context : complete piece ? separate parts ?
                     SELECT measures                                  get type : pitch sequence
                     FROM     parts.all
                     WHERE 15 <= measure.number <=18
             SELECT CONTENT                                           #9        parallel thirds in measures 15-18
                     SELECT chords                                    con:relation, interval_type, con:where:in, key, int:comp:range
                     FROM     CONTEXT
                     WHERE chord.type IS third                        context : measures
             WHERE (RELATION)                                         get type : chords:condition:thirds
                                                                      condition : parallel
# 14         seven-note chord in the harpsichord
             FROM CONTEXT:                                            #10       authentic cadence in measures 14-18
                     SELECT      parts                                cadence_type, key, con:where:in, key, int:comp:range
                     FROM        parts.all
                     WHERE       part.name == “harpsichord”           context : measures
             SELECT CONTENT                                           get type : cadence
                     SELECT      chords
                     FROM        CONTEXT                              condition : cadence type
                     WHERE       chord.cardinality = 7

           Figure 2. Text parsing examples of a function calls

     By ordering and nesting such statements, all the queries that
were tested were successfully converted into this workflow
     The use of a “language” file is a way to pass knowledge to
the system about how to parse phrases. It contains:
                                                                      # 18      consecutive sixths between the Altos and Basses in
       •     value collections grouped in primary types                         measures 73-80
             ◦ e.g. 15-18 is type range.int
                                                                      con:temp_relation, num:position, con:selection:between_the,
       •     primitive types grouped in music concepts/entities
                                                                      term, con:and, term, con:where:in, key, num:comp:range
             ◦ “dotted quarter” is a duration entity
             ◦ “first inversion of a triad” is a chord entity         context : measures, int-range
       •     Relation definitions                                     relation : between X and Y
             ◦ groups of entities                                      X type : part
                                                                       Y type : part
2.4 Music Content Extraction                                          content : melodic sequence
     The structured information request that was described in the     condition : interval type
previous section is parsed from a music information retrieval
interpreter that compiles an executable music21 script using          #22       flute dotted half note only against strings
music21 functions such as “getElementByClass()” and a plethora        term, duration:exp, duration, key, ? , con:temp_relation:against, ?
of features for music21.elements to compare with. Operating           (find the string parts?) general_polyphony, pitch, on:where:
within the music21 ontology, we can perform conditional part          in_the, term, con:where:in, key, num:int, rule:direction
context : parts, instrument                                          4. CONCLUSION
type      : duration, composite                                            The C@merata task became very demanding this year;
 relation : only_against                                             however, this approach seems promising. The use of the
 term     : part group conditions > not empty ?                      intermediate information level created space for interpretations
                                                                     and generally allowed operations aimed at language
                                                                     understanding. Natural language was avoided but this approach
#29        flute, oboe and bassoon in unison in measures 1-56        seems to resemble natural language query patterns. Even if the
term, term, con:and, term. conection:where-condition:in,             query language stays in a limited dictionary and syntax, as long as
interval_type, context:where:in, int:comp,range                      it serves its purpose as an interface for information retrieval, it is
context : measures                                                   worth attention.
context : parts, the instruments                                           The “segmentation ontology” (Fields et al., 2011) is an
type      : notes                                                    interesting idea. This work addresses large parts of the current
condition : same notes                                               approach’s need for an ontology, it provides implementations in
                                                                     RDF-OWL language for knowledge representations.
#33        semibreve tied to a minim in the Bass clef
duration, con:notation:tied:tied_to_a, duration, con:where:in_the,
term, key=type
context    : parts ? or measures ?
relation   :  tied_to 
a type     : duration
b type     : duration

# 44     four eighth notes in the bottom part
context : part, relative position
relation : sequence
         : number 
type     : note, conditions: duration

# 63     C D E F D E C in semiquavers repeated after a
context : all
relation : X repeated after Y
 X type : sequence, type: pitch-class
 X cond : duration
 Y type : duration

# 77       harmonic octave in the bass clef
context    : measures, clef:
type       : harmonic interval

Notice the assumption in defining the context that bass clef can
appear anywhere in the score and it does mean a complete part.

# 86      whole-note unison E2 E3 E4
context : all parts
type      : chord, from notes in all parts
condition : pitch content
condition : duration

#94        crotchet tied to crotchet
context    : single parts
relation   : X "tied to" Y
 X type    : duration
 Y type    : duration

# 186      whole-note chord
context    : single part ? all parts ?
type       : chord
