<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Unsupervised Learning Approach for Identifying Sub-genres in Music Scores?</article-title>
      </title-group>
      <contrib-group>
        <aff id="aff0">
          <label>0</label>
          <institution>National University of Ireland Galway</institution>
        </aff>
      </contrib-group>
      <abstract>
        <p>Detecting the genre of a piece of music and whether two pieces of music are similar are subjective matters since audiences may perceive the same music di erently. While the problem of the automatic detection of music genres has been studied extensively, it is still an open problem, especially when looking at sub-genres of traditional music. It can however be useful for example to discover similarities between multiple collections, to study whether a particular genre has resemblances with other genres, and to trace the origin and evolution of a particular genre. In this paper, we focus on traditional Irish music and the features and algorithms that can be used for analyzing such music through structured data (music scores). More precisely, audio, spectral, and statistical features of music scores are extracted to be used as input to unsupervised clustering methods to better understand how those features and methods can help identifying sub-genres in a music collection, and support \genre-driven" similarity-based retrieval of music in such a collection. We in particular show which features best support such tasks, and how a slight modi cation of the K-Means algorithm to introduce feature weights achieves good performance. We also discuss the possible use of those results, especially through a demonstration application for music information retrieval in Irish traditional music collections.</p>
      </abstract>
      <kwd-group>
        <kwd>Music Score Music Classi cation trieval Music Similarity</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>Music is an integral part of all cultures across the world and greatly in uences
an individual's emotions, productivity, and behavior. In today's digitalized era,
on-demand access to all kinds of music has been signi cantly simpli ed and there
has been extensive research to enhance user experience. The discipline that
studies techniques for the retrieval of information from music-related data is Music
Informational Retrieval. A simple de nition of Music Information Retrieval as
stated by the International Society for Music Information Retrieval (ISMIR) is
\processing, searching, organizing and accessing music-related data".</p>
      <p>
        Music Information Retrieval as an interdisciplinary eld integrates music
theory, sound engineering, and machine learning. Music-related data is available
in abundance due to a variety of music modalities such as audio
representations, symbolic notations, and meta-data representations as well as large-scale
commercialization of music on mobile and web platforms [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. This availability
and commercialization of music-data have given rise to extensive consumption
of music from users thus motivating the music industry to render user-friendly
services and encouraging researchers to model newer techniques in Music
Information Retrieval.
      </p>
      <p>
        However, Music Information Retrieval is not restricted to popular Western
music. It also encompasses traditional or ethnic music. The discipline that
studies ethnic music is Ethnomusicology. Oramas and Cornelis [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] precisely describe
Ethnomusicology as \the study and understanding of music by comparing
different cultures to nd musical universalities and origin of music". Ethnic music
is di erent from western music in multiple ways.
      </p>
      <p>The objective of this paper is to explore features and clustering techniques
for sub-genre detection in Irish traditional music, as an example of ethnic
music. A genre usually describes the style of a piece of music. Given a genre and
identi ed sub-genres, studying the similarities and dissimilarities between the
sub-genres can help in understanding the correlation between the genres. For
example: if a collection of Irish traditional music is compared with a collection
of Scottish traditional music, the similarity between the sub-genres as well as
individual tunes within these collections can be determined. This could further
help determine whether the Irish music culture has resemblances with Scottish
music culture and vice-versa. Sub-genre detection can also help trace the origin
of music. For example, archives of genres can be clustered to identify sub-genres
and these sub-genres can be compared with newer tunes. Thus an understanding
of the evolution of newer tunes can be gained by tracing its origins.</p>
      <p>In this paper, we therefore aim to understand which music representations
are more suitable to support sub-genre detection in Irish traditional music, and
how clustering methods perform in sub-genre detection. We test various features
extracted from music scores using a common clustering method (K-Means), and
also show how an adapted version of K-Means to include weights for di erent
feature sets (e.g. pitch, beat, spectral features) achieves better performance on
a common collection of Irish traditional song scores.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Related work</title>
      <p>
        Our work generally falls in the area of music information retrieval [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], which is
concerned with the retrieval of information from music. Several popular
applications have been developed in the past to support music information retrieval, for
modern western music, as well as for traditional music. Those naturally rely on a
notion of music similarity which is central to the work presented here, including:
Shazam: Shazam1 allows users to record audio for 10-12 seconds (irrespective
of the place and presence of noise). It then checks for a perfect match on
its server using the audio ngerprint of the recording. If a perfect match
is found, it returns the title, artist and album of the song. There is also a
provision in Shazam to redirect the user to an application/webpage where
the returned music score can be played.
      </p>
      <p>SoundHound: SoundHound2 also identi es songs based on an audio recording
like Shazam and returns meta-data for the identi ed tune. However,
additional provisions include returning live-lyrics of the tune being played and
returning lists of tunes similar to the one recorded.</p>
      <p>TunePal: TunePal3 is an application that allows users to identify traditional
Irish tunes either by entering the title of the tune or by recording a tune.
It makes use of the ABC notation4 for the tunes and retrieves similar tunes
using Natural Language Processing.</p>
      <p>
        Various features are traditionally used to support information retrieval,
including pitch, timing, etc. [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. In this paper, we employ several more or less
\intuitive" representations of those features to cluster tunes into sub-genres.
In [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], a classi cation of the features to support Music Information Retrieval in
ethnic music is provided which separate them into three groups:
Low-level descriptors: Intrinsic properties of music scores that can be
extracted with the help of sound engineering techniques, including frequency,
audio spectrum, pitch, duration, beat onset and o set, etc.
      </p>
      <p>Mid-level descriptors: Features that capture what the audiences are
listening, including loudness of the music score, rhythm, timbre, tonality, etc.
High-level descriptors: Overall interpretation of the music score, including
the mood, genre, expression or the emotion of the tune.</p>
      <p>The features we use here fall into the rst two categories, being extracted from
both the music scores and the audio signal from rendering those scores
computationally.</p>
      <p>Genres are usually assigned to music scores manually, but when it comes
to a large collection of data, manual allocation of genres to music scores is not
possible. Both supervised and unsupervised machine learning algorithms have
been used for automatic genre identi cation. A number of di erent approaches
can be used to identify genres automatically, including content-based approaches
(which we employ here), semantic analysis or collaborative ltering, and it can
be achieved either in a supervised or unsupervised way. Below, we list some of
the more prominent work in genre identi cation in both cases.</p>
      <sec id="sec-2-1">
        <title>1 https://www.shazam.com/</title>
      </sec>
      <sec id="sec-2-2">
        <title>2 https://www.soundhound.com/</title>
      </sec>
      <sec id="sec-2-3">
        <title>3 https://tunepal.org/</title>
      </sec>
      <sec id="sec-2-4">
        <title>4 http://abcnotation.com/</title>
        <p>
          Using supervised methods: [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ] used a hybrid approach in which spectral features,
rhythmic features, and pitch content were extracted from the music scores, which
were then classi ed using Support Vector Machine. [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ] used J48, which is the
Java implementation of C4.5, to classify non-western music tunes based on their
genres, obtaining an average accuracy of 75%. [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ] also made use of a One R
Classi er in which rules are made for every predictor and the one giving the
smallest total error is selected as the rule. They obtained an accuracy of around
65%. [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ] used an ensemble model in which multiple classi ers were combined to
improve the overall accuracy, obtaining 75-80%.
        </p>
        <p>
          Using unsupervised methods: [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ] used the K-Means clustering algorithm on a
dataset that contained Classical, Rock, Jazz, Hip-Hop, and EDM music. [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ],
on the other hand, used Constraint-based clustering for identifying music
genres. Two types of constraints were considered, positive constraints and negative
constraints. The positive constraints consisted of attributes that were expected
to be together based on the content-based description, whereas negative
constraints consisted attributes that were not expected to be together. [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ] used a
self-organising map and observed that better and clearer results could be
obtained compared to other clustering approaches.
        </p>
        <p>In this paper, in contrast with those approaches, we focus on clustering music
using features extracted from scores that are obtained from collections of
traditional Irish music tunes. The objective, beyond obtaining genre-related clusters
for those collections, is to better understand which aspects and features of Irish
traditional music tunes are more relevant to their comparison, and how to e
ectively represent them.
3</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Data collection and pre-processing</title>
      <p>The music data used for this research was available either in the form of ABC
notation les or in the form of MIDI les. Both are symbolic representation
formats which represent scores rather than audio signals. While MIDI represents
tunes through events (e.g. a note starts playing at a time, with a certain pitch and
a certain velocity), ABC notation is a simple text format representing notes as
letters and timing information through speci c characters. The core information
required to play the tune is common to both, so to simplify processing those, we
converted all ABC notation les to MIDI.</p>
      <p>Out of the many communities and websites that facilitate access to Irish
traditional music, the data for this research was collected from the \Irish Traditional
Music Archive"5 and the \Session"6:
The Session: The Session is a community dedicated to Irish traditional
music. It hosts a variety of recordings, sessions, tunes, discussions, and music</p>
      <sec id="sec-3-1">
        <title>5 https://www.itma.ie/</title>
      </sec>
      <sec id="sec-3-2">
        <title>6 https://thesession.org</title>
        <p>collections. It also allows artists to collaborate for sessions and events. 400
ABC notation les were obtained from the Session's bulk download facility.7
These les span across 4 genres: Jigs, Reels, Polkas and Barn dances. The
ABC notation les were converted to MIDI les and WAV les (converted
from the MIDI les through the Timidity++8 software).</p>
        <p>Irish Traditional Music Archive: The Irish Traditional Music Archive is a
National Public Archive dedicated to Irish Traditional Music. The largest
Irish Folk music collections can be found at ITMA. Along with song
recordings, it holds information about the origin, history/evolution of tunes,
metadata, information about artists, and albums, instruments, and Irish dances.
ITMA also welcomes contributions from individuals/artists towards
traditional Irish music. Over 6 000 Midi les from the ITMA \port" collection of
digitised scores9 were obtained. 32 tunes belonging to both The Session and
ITMA collections were selected, with slightly di erent representations in the
two collections. Those, in ITMA, cover 7 di erent genres: Jigs, Reel, Waltz,
Three-Two, Slip Jigs, Hornpipes, and Barndances.</p>
        <p>MIDI les are instructional and the messages within these les provide
information such as which note is being played, its pitch, duration, velocity, loudness,
etc. To read the MIDI les and extract messages from them, the python library
MIDO10 was used.</p>
        <p>For extracting audio data from WAV les, the python library LibROSA11
was used. LibROSA is widely used in analyzing audio, speech recognition, and
sound engineering applications. It allows visualization of spectral data, spectral,
temporal, and statistical feature extraction, sound ltering, onset detection, etc.
4</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Feature engineering</title>
      <p>For this research, multiple features were selected and tested individually and in
combination. Based on the results, data were pre-processed again if needed. A
summary of the feature engineering processes is diagrammatically represented
in Figure 1</p>
      <p>A brief description of the types of features considered is given below:
Audio Features: The audio features considered included the notes' duration
(in tick and milliseconds, from the NOTE ON and NOTE OFF events in the
MIDI les), pitch (from the note number in MIDI les) and velocity (from
the corresponding eld in the MIDI events), as well as the beats of the tune
(extracted from the WAV les using LibROSA).</p>
      <sec id="sec-4-1">
        <title>7 https://thesession.org/tunes/download</title>
      </sec>
      <sec id="sec-4-2">
        <title>8 http://timidity.sourceforge.net/</title>
      </sec>
      <sec id="sec-4-3">
        <title>9 http://port.itma.ie</title>
        <p>10 https://mido.readthedocs.io/
11 https://librosa.github.io/</p>
        <p>Mel Frequency Cepstral Coe cients (MFCCs): In simple terms, MFCCs
capture the relation between a frequency of the tone the way it was perceived
and actual frequency of the tone. The coe cient numbers provide the
spectral energies. Lower order coe cients describe the shape of the spectrum
and average energy possessed by the input signal whereas high order coe
cients provide details of the sound spectrum incrementally. Twelve MFCCs
for each tune were extracted from the WAV les and used as input to the
unsupervised learning algorithm.</p>
        <p>Statistical Features: The statistical features considered included distributions
within a given tune of the note-speci c audio features, as well as of the
MFCCs.
5</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Clustering and cluster evaluation</title>
      <p>
        The K-means clustering algorithm [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] is used for partitioning data into k clusters.
The main objective of this algorithm is to create clusters in such a way that the
points within a cluster are very close to each other (high intra-cluster cohesion),
whereas the points in di erent clusters are away from each other (low
intercluster cohesion).
      </p>
      <p>In K-means clustering, k data points are arbitrarily selected from the dataset
as centroids. Euclidean distances of all the points from these centroids are
calculated. Points closest to a particular centroid are then assigned to that particular
cluster. The mean value of every cluster is calculated and the centroid is
updated to that mean value. The data points are then re-assigned to their updated
centroid values. This process is repeated until the shape of the cluster remains
unchanged, that is, data points belonging to the same cluster do not get
reassigned to a new cluster.</p>
      <p>The key to achieve good clustering in our case, i.e. clusters which are
representative of the sub-genres in the given collections of tunes, is to identify the
features which comparison across tunes according to the euclidean distance is
most representative of their belonging to a sub-genre.</p>
      <p>Since several sets of features are being considered, we also test a slight
modication of the K-Means clustering algorithm, weighted K-Means that is suitable
for clustering items represented by multiple vectors. While in standard K-Means,
the vectors would be concatenated (aggregated into one unique vector) to
calculate an overall euclidian distance, in weighted K-Means, a weight is provided
to represent the contribution of each of the vectors to the euclidian distance
comparison of each item. This enables putting more or less importance to each
set of features to guide the clustering mechanism.</p>
      <p>The most e ective clustering leads to minimum intra-cluster variability and
maximum inter-cluster variability. We use Silhouette Analysis to evaluate the
resulting clusters. In silhouette analysis, for every point i belonging to a cluster
C, the mean distance ai between i and all the points in C and the minimum
distance bi between i and any point outside of C are calculated. The silhouette
coe cient S(i) of an item i is then given by:</p>
      <p>S(i) =
b1</p>
      <p>ai
max(ai; bi)
(1)</p>
      <p>Silhouette coe cients are calculated for all the points present in the dataset
and these values are averaged to get an overall result. Values of Silhouette
coefcients can range from -1 to 1. Higher values are representative of better quality
clusters.
6</p>
    </sec>
    <sec id="sec-6">
      <title>Results</title>
      <p>Various features were considered as input to the unsupervised learning model
individually as well as in combination. The features giving the best results were
nally consolidated in a single dataset. Two of the features, note duration and
note pitch, however required to rst be transformed into representations that
were suitable for comparison through euclidian distance.
6.1</p>
      <sec id="sec-6-1">
        <title>Representing pitch</title>
        <p>The rst set of features considered were the pitch values of the notes. For creating
a vector of pitch values four di erent approaches were considered:
Approach 1: Vector of pitch values: In this approach, the pitch values were
selected as is, i.e. the vector consists of pitch values in the way they appeared in
the MIDI les. For example, from a MIDI le including 10 notes ranging from
A4 (69) to E5 (76), the vector representation might be:</p>
        <p>[69; 69; 74; 74; 74; 73; 74; 76; 76; 76]
Approach 2: Vector of di erences with rst pitch value: In this approach, to give
more of a notion of the progression of notes rather than of exact values, the rst
note (pitch value) was subtracted from all the notes. The vector, always starting
at 0, is therefore a vector of pitch di erences with the rst note. Considering the
pitch values of the example in Approach 1, the resulting vector would be:
[0; 0; 5; 5; 5; 4; 5; 7; 7; 7]
Approach 3: Vector of di erences with the mean pitch value: Similarly to above,
in this approach, the vector is created from the di erences between each note's
pitch value and the average pitch value. For the same example vector of pitch
value, the result would therefore be:</p>
        <p>[ 4:5; 4:5; 0:5; 0:5; 0:5; 0:5; 0:5; 2:5; 2:5; 2:5]
Approach 4: Vector of di erences with the previous note's pitch value: Finally,
to represent a similar notion of progression which is less dependent on the overall
tune, and more on the local changes in pitch in the tune, in this approach, the
vector is made of the di erence between the pitch value of the current note, with
the pitch value of the previous note. On the same example, the result would be:
[0; 5; 0; 0; 1; 1; 2; 0; 0]
6.2</p>
      </sec>
      <sec id="sec-6-2">
        <title>Representing duration and timing</title>
        <p>Similarly to the pitch feature, two di erent approaches were considered for a
representation of note duration.</p>
        <p>Approach 1: Vector of note durations: In this approach, the duration of the notes
in number of ticks as directly extracted from the MIDI le, is used as a vector.
For example, for a tune including 10 notes, some being only 1 tick long, and
some being up to 8 ticks long, the following vector might be used:
[1; 4; 1; 8; 8; 1; 4; 1; 4; 2]
Approach 1: Binary vector of note hits: Since the duration vectors such as the
one above might be di cult to meaningfully compare, we considered a di erent
approach representing the notes' timing using a binary vector where each value
represents a tick, and is equal to 1 if a note started playing on that tick, or 0
otherwise. Considering the smaller example of vector of durations:
[8; 8; 8; 4; 4]
the resulting binary vector would be:
[1; 0; 0; 0; 0; 0; 0; 0; 1; 0; 0; 0; 0; 0; 0; 0; 1; 0; 0; 0; 0; 0; 0; 0; 1; 0; 0; 0; 1; 0; 0; 0]</p>
      </sec>
      <sec id="sec-6-3">
        <title>K-Means clustering</title>
        <p>We applied the K-Means algorithms on 400 tunes from the session, belonging
to the Jigs, Reels, Polkas, and Barn dances genres with k = 4, using each of
the features mentioned in the previous sections separately (cropping each vector
to the length of the shortest tune in the collection). The average silhouette
coe cient obtained are presented in Table 1.</p>
        <p>From Table 1, it can be observed that comparing pitch values by nding
the di erence with the mean of all notes, the timing of notes in a binary form,
and beats give the best results in terms of average silhouette coe cients. Thus,
these features were used in combination to test on the 32 selected tunes that
overlap between ITMA and in the Session, with the results shown in Table 2.
The combination here was achieved by representing each tune as a concatenated
set of vectors, adding 0 padding to the shorter ones in order to have equal weight
for each of the feature sets. As can be seen, results obtained from the combination
of features are consistent, i.e. slightly better than the best feature individually.
They also fall within the same range between the two datasets.</p>
      </sec>
      <sec id="sec-6-4">
        <title>Weighted K-Means clustering</title>
        <p>As mentioned previously, considering the di erence in their individual
performances, better results might be obtained by giving more or less importance to
each of the three selected feature sets. We therefore applied a Weighted K-Means
algorithm where each of the feature sets is assigned a weight which corresponds
to the contribution of the euclidian distance comparison of that feature set to
the overall distance used in the comparison of two items.</p>
        <p>We systematically tested combinations of the weights wp, wt and wb for
the pitch, timing and beats feature sets respectively. Table 3 presents silhouette
scores in the two datasets for a sample of those combinations which demonstrate
the range of results obtained. The best results in this table are also the best
results overall. As can be seen, it is therefore possible to obtain better results
with the weighted K-Means approach than with the base one. While the results
are promising and show a reasonable ability of the clustering mechanism to
distinguish groups of tunes that are expected to correspond to genres, the two
di erent datasets perform di erently, and achieve best results on di erent sets
of weights. This is especially surprising as the two datasets contain the same
set of tunes, from two di erent collections. Those tunes however are represented
di erently and have been transcribed in di erent conditions, showing how those
aspects cannot be neglected when using automatic processes in ethnomusicology.
7</p>
      </sec>
    </sec>
    <sec id="sec-7">
      <title>Conclusion</title>
      <p>In this paper, we have explored the features and their representation that can
support comparing traditional Irish tunes for the purpose of sub-genre identi
cation. We applied clustering methods on tunes from two large collections of tunes,
represented in MIDI les and ABC notations, to show that speci c
representations of the timing, beats and pitch envelope of a tune provide better results,
especially when combined and weighted. Those promising results provide a
better understanding of approaches that can be used to explore the delineation of
genres and the relation between computable features and the perception music.
They also provide a basis for music information retrieval applications that apply
to traditional Irish music, and potentially beyond. Indeed, as shown in Figure 2,
as a next step in this direction, we have built a prototype application enabling
the user to retrieve tunes from the Session or ITMA based on their similarity to
a given tune, using the representations established in this paper. Such an
application has great potential for supporting music practitioners and researchers in
the eld.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>J Stephen</given-names>
            <surname>Downie</surname>
          </string-name>
          .
          <article-title>Music information retrieval</article-title>
          .
          <source>Annual review of information science and technology</source>
          ,
          <volume>37</volume>
          (
          <issue>1</issue>
          ):
          <volume>295</volume>
          {
          <fpage>340</fpage>
          ,
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2. John A Hartigan and
          <article-title>Manchek A Wong. Algorithm as 136: A k-means clustering algorithm</article-title>
          .
          <source>Journal of the Royal Statistical Society</source>
          . Series C (Applied Statistics),
          <volume>28</volume>
          (
          <issue>1</issue>
          ):
          <volume>100</volume>
          {
          <fpage>108</fpage>
          ,
          <year>1979</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>Kyuwon</given-names>
            <surname>Kim</surname>
          </string-name>
          , Wonjin Yun, and
          <string-name>
            <given-names>Rick</given-names>
            <surname>Kim</surname>
          </string-name>
          .
          <article-title>Clustering music by genres using supervised and unsupervised algorithms</article-title>
          .
          <source>Technical report</source>
          , Stanford University,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>J.</given-names>
            <surname>Kittler</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Hatef</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. P. W.</given-names>
            <surname>Duin</surname>
          </string-name>
          , and
          <string-name>
            <given-names>J.</given-names>
            <surname>Matas</surname>
          </string-name>
          .
          <article-title>On combining classi ers</article-title>
          .
          <source>IEEE Transactions on Pattern Analysis and Machine Intelligence</source>
          ,
          <volume>20</volume>
          (
          <issue>3</issue>
          ),
          <year>1998</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5. M. Lesa re, L. Voogdt,
          <string-name>
            <given-names>M.</given-names>
            <surname>Leman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Baets</surname>
          </string-name>
          , H. Meyer, and
          <string-name>
            <given-names>J.</given-names>
            <surname>Martens</surname>
          </string-name>
          .
          <article-title>How potential users of music search and retrieval systems describe the semantic quality of music</article-title>
          .
          <source>Journal of the American Society for Information Science and Technology</source>
          ,
          <volume>59</volume>
          (
          <issue>5</issue>
          ),
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <given-names>T.</given-names>
            <surname>Lidy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Silla</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Cornelis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Gouyon</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Rauber</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Kaestner</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Koerich</surname>
          </string-name>
          .
          <article-title>On the suitability of state-of-the-art music information retrieval methods for analyzing, categorizing and accessing non-western and ethnic music collections</article-title>
          .
          <source>Signal Processing</source>
          ,
          <volume>90</volume>
          (
          <issue>4</issue>
          ),
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <given-names>Cynthia</given-names>
            <surname>Liem</surname>
          </string-name>
          , Meinard Muller, Douglas Eck, George Tzanetakis, and
          <string-name>
            <given-names>Alan</given-names>
            <surname>Hanjalic</surname>
          </string-name>
          .
          <article-title>The need for music information retrieval with user-centered and multimodal strategies</article-title>
          .
          <source>In MM'11 - Proceedings of the 2011 ACM Multimedia Conference and Co-Located Workshops - MIRUM 2011 Workshop</source>
          , MIRUM'
          <volume>11</volume>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <given-names>Noris</given-names>
            <surname>Mohd</surname>
          </string-name>
          . Norowi, Shyamala Doraisamy, and
          <string-name>
            <given-names>Rahmita</given-names>
            <surname>Wirza</surname>
          </string-name>
          .
          <article-title>Factors a ecting automatic genre classi cation: An investigation incorporating non-western musical forms</article-title>
          .
          <source>In Proc of ISMIR</source>
          <year>2005</year>
          , 6th International Conference on Music Information Retrieval,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <given-names>S.</given-names>
            <surname>Oramas</surname>
          </string-name>
          and
          <string-name>
            <given-names>O</given-names>
            <surname>Cornelis</surname>
          </string-name>
          .
          <article-title>Past, present, and future in ethnomusicology: The computational challenge</article-title>
          .
          <source>In International Society for Music Information Retrieval</source>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Wei</surname>
            <given-names>Peng</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>Tao</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>and Mitsunori</given-names>
            <surname>Ogihara</surname>
          </string-name>
          .
          <article-title>Music clustering with constraints</article-title>
          .
          <source>In Proc. of ISMIR</source>
          <year>2007</year>
          ,
          <source>the 8th International Conference on Music Information Retrieval</source>
          ,
          <string-name>
            <surname>ISMIR</surname>
          </string-name>
          <year>2007</year>
          ,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <given-names>L.</given-names>
            <surname>Weissenberger</surname>
          </string-name>
          .
          <article-title>When \everything" is information: Irish traditional music and information retrieval</article-title>
          . In iConference,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>