<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>A Survey on Music Retrieval Systems Using A Survey on Music Retrieval Systems Using Microphone Input Microphone Input</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Ladislav Marˇs´ık</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jaroslav Pokorny´</string-name>
          <email>n@akmsi..2m5f</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Martin Ilˇc´ık</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ladislav Mars k</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jaroslav Pokorny</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Martin Ilc k</string-name>
          <email>ilcik@cg.tuwien.ac.at</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>CDheaprtle.soUfSnoivftewrsairtye</institution>
          ,
          <addr-line>EMnaglionseterrainngsk, ́eFanca ́umlt.y2o5f, MPraatghueem,aCtziceschanRdepPuhbylsicics Charles Universmiatyr,sMika,lopstorkaonrsnkye</addr-line>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>nTivheersIintystoitfuTteecohfnColoomgyp</institution>
          ,
          <addr-line>uFtearvoGrirtaepnhsitcrsaßaend9-A11l,goVriiethnmnas</addr-line>
          ,
          <institution>, Austria Vienna University of Tech1n04o0logVyi</institution>
          ,
          <addr-line>enFnaav,orAituesntsrtiraa e 9-11, Vienna</addr-line>
          ,
          <country country="AT">Austria</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2015</year>
      </pub-date>
      <volume>1343</volume>
      <fpage>131</fpage>
      <lpage>140</lpage>
      <abstract>
        <p>Interactive music retrieval systems using microphone input have become popular, with applications ranging from whistle queries to robust audio search engines capable of retrieving music from a short sample recorded in noisy environment. The availability for mobile devices brought them to millions of users. Underlying methods have promising results in the case that user provides a short recorded sample and seeks additional information about the piece. Now, the focus needs to be switched to areas where we are still unable to satisfy the user needs. Such a scenario can be the choice of a favorite music performance from the set of covers, or recordings of the same musical piece, e.g. in classical music. Various algorithms have been proposed for both basic retrieval and more advanced use cases. In this paper we provide a survey of the state-of-the-art methods for interactive music retrieval systems, from the perspective of specific user requirements.</p>
      </abstract>
      <kwd-group>
        <kwd>music information retrieval</kwd>
        <kwd>music recognition</kwd>
        <kwd>audio search engines</kwd>
        <kwd>harmonic complexity</kwd>
        <kwd>audio fingerprinting</kwd>
        <kwd>cover song identification</kwd>
        <kwd>whistling query</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>Music recognition services have gained significant popularity and user bases in
the recent years. Most of it came with the mobile devices, and the ease of using
them as an input for various retrieval tasks. That has led to the creation of
Shazam application3 and their today’s competitors, including SoundHound4 or
MusicID5, which are all capable of retrieving music based on a recording made
with a smartphone microphone. Offering these tools hand in hand with a
convenient portal for listening experience, such as Last.fm6 or Spotify7, brings a
whole new way of entertainment to the users’ portfolio. In the years to come,
the user experience in these applications can be enhanced with the advances in
music information retrieval research.
1.1</p>
      <sec id="sec-1-1">
        <title>Recent Challenges in Music Retrieval</title>
        <p>
          With each music retrieval system, a database of music has to be chosen to propel
the search, and if possible, satisfy all the different queries. Even though databases
with immense numbers of songs are used, such as the popular Million Song
Dataset [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ], they still can not satisfy the need to search music in various genres.
At the time of writing of this paper, the front-runners in the field as Shazam
Entertainment, Ltd., are working on incorporating more Classical or Jazz pieces
into their dataset, since at the moment their algorithm is not expected to return
results for these genres [
          <xref ref-type="bibr" rid="ref22">22</xref>
          ].
        </p>
        <p>Let us now imagine a particular scenario – the user is attending dance classes
and wishes his favorite music retrieval application to understand the rhythm of
the music, and to output it as a result along with other information. Can the
application adapt to this requirement?</p>
        <p>Or, if the user wishes to compare different recordings of the same piece in
Classical music? Can the resulting set comprise of all such recordings?</p>
        <p>
          There are promising applications of high-level concepts such as music
harmony to aid the retrieval tasks. De Haas et al. [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ] have shown how traditional
music theory can help the problem of extracting the chord progression.
Khadkevich and Omologo [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ] showed how the chord progression can lead us to an
efficient cover identification. Our previous work [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ] showed how music harmony
can eventually cluster the data by different music periods. These are just some
examples of how the new approaches can solve almost any music-related task
that the users can assign to the system.
1.2
        </p>
      </sec>
      <sec id="sec-1-2">
        <title>Outline</title>
        <p>In this work we provide a survey of the state-of-the-art methods for music
retrieval using microphone input, characterized by the different user requirements.
In Section 2 we describe the recent methods for retrieving music from a query
created by sample song recording using a smartphone microphone. In Section 3 we
show the methods for the complementary inputs such as humming or whistling.
We also look at the recent advances in cover song identification, in Section 4.
Finally, we form our proposals to improve the recent methods, in Section 5.
2</p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>Audio Fingerprinting</title>
      <p>We start our survey on music retrieval systems with the most popular use case –
queries made by recording a playback sample from the microphone and looking
for an exact match. This task is known in music retrieval as audio fingerprinting.
Popularized by the Shazam application, it became a competitive field in both
academic and commercial research, in the recent years.
2.1</p>
      <sec id="sec-2-1">
        <title>Basic Principle of Operation</title>
        <p>
          Patented in 2002 by Wang and Smith [
          <xref ref-type="bibr" rid="ref21">21</xref>
          ], the Shazam algorithm has a massive
use not only because of the commercial deployment, but mainly due to its
robustness in noisy conditions and its speed. Wang describes the algorithm as a
”combinatorially hashed time-frequency constellation analysis“ of the audio [
          <xref ref-type="bibr" rid="ref22">22</xref>
          ].
This means reducing the search for a sound sample in the database to a search
for a graphical pattern.
        </p>
        <p>First, using a discrete-time Fourier transform, the time-frequency
spectrogram is created from the sample, as seen on Figure 1 on the left. Points where
the frequency is present in the given time are marked darker, and the brightness
denotes the intensity of that particular frequency. A point with the intensity
considerably higher than any of its neighbors is marked as a peak. Only the
peaks stay selected while all the remaining information is discarded, resulting in
a constellation as depicted on Figure 1 on the right. This technique is also used
in a pre-processing step to extract the constellation for each musical piece in the
database.</p>
        <p>The next step is to search for the given sample constellation in the space of
all database constellations using a pattern matching method. Within a single
musical piece, it is the same as if we would match a small transparent foil with
dots to the constellation surface. However, in order to find all possible matches,
a large number of database entries must be searched. In our analogy, the
transparent foil has the ”width“ of several seconds, whereas the width of the surface
constellation is several billion seconds, when summed up all pieces together.
Therefore, optimization in form of combinatorial hashing is necessary to scale
even to large databases.</p>
        <p>As seen on Figure 1 on the right, a number of chosen peaks is associated
with an ”anchor“ peak by using a combinatorial hash function. The motivation
behind using the fingerprints is to reduce the information necessary for search.
Given the frequency f1 and time t1 of the anchor peak, the frequency f2 and
time t2 of the peak, and a hash function h, the fingerprint is produced in the
form:</p>
        <p>h(f1, f2, t2 − t1)|t1
where the operator | is a simple concatenation of strings. The concatenation
of t1 is done in order to simplify the search and help with later processing,
since it is the offset from the beginning of the piece. Sorting fingerprints in the
database, and comparing them instead of the original peak information results in
a vast increase in search speed. To finally find the sample using the fingerprint
matching, regression techniques can be used. Even simpler heuristics can be
employed, since the problem can be reduced to finding points that form a linear
correspondence between the sample and the song points in time.
2.2</p>
      </sec>
      <sec id="sec-2-2">
        <title>Summary of Audio Fingerprinting and Benchmarking</title>
        <p>
          Similar techniques have been used by other authors including Haitsma and
Kalker [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ] or Yang [
          <xref ref-type="bibr" rid="ref23">23</xref>
          ]. The approach that Yang uses is comparison of indexed
peak sequences using Euclidean distance, and then returning a sorted list of
matches. His work effectively shows how exact match has the highest retrieval
accuracy, while using covers as input result in about 20% decrease in accuracy.
As mentioned earlier, there are many other search engines besides Shazam
application, each using its own fingerprinting algorithm. We forward the reader to
a survey by Nanopoulos et al. [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ] for an exhaustive list of such services.
        </p>
        <p>
          To summarize the audio fingeprinting techniques, we need to highlight three
points:
1. Search time is short, 5-500 milliseconds per query, according to Wang.
2. Algorithms behave greatly in the noisy environment, due to the fact that
the peaks remain the same also in the degraded audio.
3. Although it is not the purpose of this use case, an improved version of
the search algorithms could abstract from other characteristics, such as the
tonal information (tones shifted up or down without affecting the result, we
suggest Sch¨onberg [
          <xref ref-type="bibr" rid="ref16">16</xref>
          ] for more information about tonality). However, the
algorithms depend on the sample and the match being exactly the same in
most of the characteristics, including tempo.
        </p>
        <p>In the end, the algorithms are efficient in the use case they are devoted to,
but are not expected to give results other than the exact match of the sample,
with respect to the noise degradation.</p>
        <p>Interestingly enough, a benchmark dataset and evaluation devoted to audio
fingerprinting has only commenced recently8, although the technology has been
around for years. We attribute this to the fact that most of the applications were
developed commercially.</p>
        <sec id="sec-2-2-1">
          <title>8 http://www.music-ir.org/mirex/wiki/2014:Audio Fingerprinting</title>
          <p>2.3</p>
        </sec>
      </sec>
      <sec id="sec-2-3">
        <title>New Use Cases in Audio Search</title>
        <p>
          There are other innovative fields emerging, when it comes to audio search.
Notable are: finding more information about a TV program or advert, or
recommendation of similar music for listening. Popularized first by the Mufin internet
radio9 and described by Schonfuss [
          <xref ref-type="bibr" rid="ref17">17</xref>
          ], these types of applications may soon
become well-known on the application market.
3
        </p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Whistling and Humming Queries</title>
      <p>Interesting applications arose with the introduction of ”whistling“ or
”humming“ queries. In this scenario, the user does not have access to the performance
recording, but remembers the melody of the music she wants to retrieve. The
input is whistling or humming the melody into the smartphone microphone.
3.1</p>
      <sec id="sec-3-1">
        <title>Basic Principle of Operation</title>
        <p>
          In their inspiring work, Shen and Lee [
          <xref ref-type="bibr" rid="ref18">18</xref>
          ] have described, how easy it is to
translate a whistle input into MIDI format. In MIDI, musical sound commencing
and halting are the events being recorded. Therefore, it is easily attainable from
human whistle due to its nature. Shen and Lee further describe, that whistling
is more suitable for input than humming, with the capture being more
noiseresistant. Whistling has a frequency ranging from 700Hz to 2.8kHz, whereas
other sounds fall under much smaller frequency span. String matching heuristics
are then used for segmented MIDI data, featuring a modification of the popular
grep Unix command-line tool, capable of searching for regular expressions, with
some deviations allowed. Heuristics exist also for extracting melody from the
song, and so the underlying database can be created from real recordings instead
of MIDI. The whole process is explained in a diagram on Figure 2.
        </p>
        <p>
          The search for the song in the database can be, as well as in Section 2,
improved by forming a fingerprint and creating an index. Unal et al. [
          <xref ref-type="bibr" rid="ref20">20</xref>
          ] have
formed the fingerprint from the relative pitch movements in the melody extracted
from humming, thus increasing the certainty of the algorithm results.
3.2
        </p>
      </sec>
      <sec id="sec-3-2">
        <title>Benchmarking for Whistling and Humming Queries</title>
        <p>
          Many algorithms are proposed every year for whistling and humming queries.
There is a natural need in finding the one that performs the best. The
evaluation of the state-of-the-art methods can be found on annual benchmarking
challenges such as MIREX10 (Music Information Retrieval Evaluation Exchange, see
Downie at al. [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ] for details). The best performing algorithm for 2014 was the one
from Hou et al. [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ]. The authors have used Hierarchical K-means Tree (HKM)
to enhance the speed and dynamic programming to compute the minimum edit
        </p>
        <sec id="sec-3-2-1">
          <title>9 http://www.mufin.com 10 http://www.music-ir.org/mirex/wiki/MIREX HOME</title>
          <p>distance between the note sequences. Another algorithm that outperformed the
competition in the past years, while also being commercially deployed was
MusicRadar11.</p>
          <p>Overall, whistling or humming queries are another efficient way of music
retrieval, having already a number of popular applications.
4</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Cover Song Identification Methods</title>
      <p>In the last years, focus has switched to more specific use cases such as efficient
search for the cover song or choosing from the set of similar performances. As
described earlier, the exact-match result is not satisfying if we, for example,
search for the best performance of Tchaikovsky’s ballet, from a vast number of
performances made. Although not geared on a microphone input (we are not
aware of applications for such use case), this section provides an overview of
recent cover song identification methods.
4.1</p>
      <sec id="sec-4-1">
        <title>Methods Based on Music Harmony</title>
        <p>
          The task requires a use of high-level concepts. Incorporation of music theory
gives us the tool to analyze the music deeper, and find similarities in its
structure from a higher perspective. The recent work of Khadkevich and Omologo [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ]
summarizes the process and shows one way how we can efficiently analyze the
music to obtain all covers as the query result. The main idea is segmenting music
to chords (musical elements in which several tones are sounding together). The
music theory, as described e.g. by Scho¨nberg [
          <xref ref-type="bibr" rid="ref16">16</xref>
          ] provides us with the taxonomy
of chords, as well as the rules to translate between chords. Taking this approach,
Khadkevich and Omologo have extracted ”chord progression“ data from a
musical piece, and used Levenshtein’s edit distance [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ] to find similarities between
11 http://www.doreso.com
the progressions, as depicted in Figure 3. A method of locality sensitive
hashing was used to speed up the process, since the resulting progressions are high
dimensional [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ].
        </p>
        <p>
          Another method was previously used by Kim et al. [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ] at the University
of Southern California. The difference between the approaches lay in the choice
of fingerprints. Kim et al. have used a simple covariance matrix to mark down
the co-sounding tones in each point of the time. Use of such fingerprints has, as
well, improved the overall speed (approximately 40% search speed improvement
over conventional systems using cross-correlation of data without the use of
fingerprints). In this case, the fingerprints also improved the accuracy of the
algorithm, since they are constructed in the way that respect music harmony.
They also made the algorithm robust to variations which we need to abstract
from, e.g. tempo. This can be attributed to the use of beat synchronization,
described by Ellis and Poliner [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ].
4.2
        </p>
      </sec>
      <sec id="sec-4-2">
        <title>Benchmarking for Cover Song Identification</title>
        <p>
          Same as in Section 3, cover song identification is another benchmarking category
on annual MIREX challenge, with around 3-5 algorithms submitted every year.
The best performing algorithm in the past few years was from The Academia
Sinica and the team around Hsin-Ming Wang, that favored the use of extracting
melody from song and using melody similarity [
          <xref ref-type="bibr" rid="ref19">19</xref>
          ]. Previous algorithm that
outperformed the competition was the one made by Simbals12 team from Bordeaux.
The authors used techniques based on local alignment of chroma sequences (see
Hanna et al. [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ]), and have also developed techniques capable of identifying
plagiarism in music (see Robine et al. [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ]). On certain datasets, the mentioned
12 http://simbals.labri.fr
algorithms were able to perform with 80-90% precision of identifying the correct
covers.
5
        </p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Proposals for Improving Music Retrieval Methods</title>
      <p>We see a way of improvement in the methods mentioned earlier. Much more
can be accomplished if we use some standardized high-level descriptors. If we
conclude that low-level techniques can not give satisfying results, we are left
with a number of high-level concepts, which are, according to music experts and
theoreticians, able to describe the music in an exhaustive manner. Among these
the most commonly used are: Melody, Harmony, Tonality, Rhythm and Tempo.
For some of these elements, it is fairly easy to derive the measures (e.g. Tempo,
using the peak analysis similar to the one described in Section 2). For others
this can be a difficult task and there are no leads what is the best technique to
use. As a consequence, the advantage of using all of these music elements is not
implemented yet in recent applications.</p>
      <p>
        In our previous work we have defined the descriptor of Harmonic complexity
[
        <xref ref-type="bibr" rid="ref13">13</xref>
        ], and described the significance of such descriptors for music similarity. The
aim was to characterize music harmony in specific time of its play. We have shown
that aggregating these harmony values for the whole piece can improve music
recognition [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]. The next step, and possible improvement can be comparing the
time series of such descriptors in music. Rather than aggregated values we can
compare the whole series in time and obtain more precise results. Heuristics such
as dynamic time warping can be used easily for this task. We now analyze the
method and its impact on music retrieval. As the future work, experiments will
take place to prove the proposed method.
      </p>
      <p>Also, we see the option of combining general methods for cover song
identification described in Section 4, with the use case of short recorded audio sample
from the microphone. One of the possible ways is abstracting from tonal
information and other aspects, as described briefly in Section 2.2. Recent benchmarking
challenges for cover song identification are focusing on analyzing the whole songs,
rather than a short sample. We believe that a combination of methods described
in previous sections can yield interesting results and applications.
6</p>
    </sec>
    <sec id="sec-6">
      <title>Summary and Conclusion</title>
      <p>We have provided a survey of recent music retrieval methods focusing on:
retrieving music based on audio input from recorded music, whistling and humming
queries, as well as cover song identification. We described how the algorithms are
performing efficiently in their use cases, but we also see ways to improve with
new requirements coming from the users.</p>
      <p>In the future work we will focus on the use of high-level descriptors and
we propose stabilizing these descriptors for music retrieval. We also propose
combining the known methods, and focusing not only on the mainstream music,
but analyzing other genres, such as Classical, Jazz or Latino music.
Acknowledgments. The study was supported by the Charles University in
Prague, project GA UK No. 708314.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Bertin-Mahieux</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ellis</surname>
            ,
            <given-names>D.P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Whitman</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lamere</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          :
          <article-title>The Million Song Dataset</article-title>
          .
          <source>In: Proceedings of the 12th International Society for Music Information Retrieval Conference. ISMIR</source>
          <year>2011</year>
          (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>De Haas</surname>
            ,
            <given-names>W.B.</given-names>
          </string-name>
          , Magalha˜es,
          <string-name>
            <given-names>J.P.</given-names>
            ,
            <surname>Wiering</surname>
          </string-name>
          ,
          <string-name>
            <surname>F.</surname>
          </string-name>
          :
          <article-title>Improving Audio Chord Transcription by Exploiting Harmonic and Metric Knowledge</article-title>
          .
          <source>In: Proceedings of the 13th International Society for Music Information Retrieval Conference. ISMIR</source>
          <year>2012</year>
          (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Downie</surname>
            ,
            <given-names>J.S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>West</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ehmann</surname>
            ,
            <given-names>A.F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vincent</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          :
          <article-title>The 2005 Music Information retrieval Evaluation Exchange (MIREX</article-title>
          <year>2005</year>
          )
          <article-title>: Preliminary Overview</article-title>
          .
          <source>In: Proceedings of the 6th International Conference on Music Information Retrieval. ISMIR</source>
          <year>2005</year>
          (
          <year>2005</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Ellis</surname>
            ,
            <given-names>D.P.W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Poliner</surname>
            ,
            <given-names>G.E.</given-names>
          </string-name>
          :
          <article-title>Identifying 'Cover Songs' with Chroma Features and Dynamic Programming Beat Tracking</article-title>
          .
          <source>In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing. ICASSP</source>
          <year>2007</year>
          (
          <year>2007</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Gionis</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Indyk</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Motwani</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          :
          <article-title>Similarity Search in High Dimensions via Hashing</article-title>
          .
          <source>In: Proceedings of the 25th International Conference on Very Large Data Bases. VLDB '99</source>
          , Morgan Kaufmann Publishers Inc., San Francisco, CA, USA (
          <year>1999</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Haitsma</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kalker</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          :
          <article-title>A Highly Robust Audio Fingerprinting System</article-title>
          .
          <source>In: Proceedings of the 3rd International Society for Music Information Retrieval Conference. ISMIR</source>
          <year>2002</year>
          (
          <year>2002</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Hanna</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ferraro</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Robine</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>On Optimizing the Editing Algorithms for Evaluating Similarity Between Monophonic Musical Sequences</article-title>
          .
          <source>Journal of New Music Research</source>
          <volume>36</volume>
          (
          <issue>4</issue>
          ) (
          <year>2007</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Hou</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wu</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Xie</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          , Liu, H.:
          <article-title>MIREX2014: Query by Humming/Singing System</article-title>
          .
          <source>In: Music Information Retrieval Evaluation eXchange. MIREX</source>
          <year>2014</year>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Khadkevich</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Omologo</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <string-name>
            <surname>Large-Scale Cover Song Identification Using Chord</surname>
          </string-name>
          <article-title>Profiles</article-title>
          .
          <source>In: Proceedings of the 14th International Society for Music Information Retrieval Conference. ISMIR</source>
          <year>2013</year>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Kim</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Unal</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Narayanan</surname>
            ,
            <given-names>S.S.</given-names>
          </string-name>
          :
          <article-title>Music Fingerprint Extraction for Classical Music Cover Song Identification</article-title>
          .
          <source>In: Proceedings of the IEEE International Conference on Multimedia and Expo. ICME</source>
          <year>2008</year>
          (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Levenshtein</surname>
            ,
            <given-names>V.I.</given-names>
          </string-name>
          :
          <article-title>Binary Codes Capable of Correcting Deletions, Insertions, and Reversals</article-title>
          .
          <source>Soviet Physics-Doklady 10/8</source>
          (
          <year>1966</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Marsik</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pokorny</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ilcik</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Improving Music Classification Using Harmonic Complexity</article-title>
          .
          <source>In: Proceedings of the 14th conference Information Technologies - Applications and Theory. ITAT</source>
          <year>2014</year>
          , Institute of Computer Science,
          <string-name>
            <surname>AS CR</surname>
          </string-name>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Marsik</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pokorny</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ilcik</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Towards a Harmonic Complexity of Musical Pieces</article-title>
          .
          <source>In: Proceedings of the 14th Annual International Workshop on Databases, Texts, Specifications and Objects (DATESO</source>
          <year>2014</year>
          ).
          <source>CEUR Workshop Proceedings</source>
          , vol.
          <volume>1139</volume>
          .
          <string-name>
            <surname>CEUR-WS.org</surname>
          </string-name>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Nanopoulos</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rafailidis</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ruxanda</surname>
            ,
            <given-names>M.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Manolopoulos</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          :
          <article-title>Music Search Engines: Specifications and Challenges</article-title>
          .
          <source>Information Processing and Management: an International Journal</source>
          <volume>45</volume>
          (
          <issue>3</issue>
          ) (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Robine</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hanna</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ferraro</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Allali</surname>
          </string-name>
          , J.:
          <article-title>Adaptation of String Matching Algorithms for Identification of Near-Duplicate Music Documents</article-title>
          .
          <source>In: Proceedings of the International SIGIR Workshop on Plagiarism Analysis</source>
          ,
          <string-name>
            <given-names>Authorship</given-names>
            <surname>Identification</surname>
          </string-name>
          , and
          <string-name>
            <surname>Near-Duplicate Detection</surname>
          </string-name>
          .
          <source>SIGIR-PAN</source>
          <year>2007</year>
          (
          <year>2007</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16. Scho¨nberg, A.:
          <source>Theory of Harmony</source>
          . University of California Press, Los Angeles (
          <year>1922</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17. Scho¨nfuss, D.:
          <article-title>Content-Based Music Discovery</article-title>
          .
          <source>In: Exploring Music Contents, Lecture Notes in Computer Science</source>
          , vol.
          <volume>6684</volume>
          . Springer (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <surname>Shen</surname>
            ,
            <given-names>H.C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lee</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>Whistle for Music: Using Melody Transcription and Approximate String Matching for Content-Based Query over a MIDI Database</article-title>
          .
          <source>Multimedia Tools and Applications</source>
          <volume>35</volume>
          (
          <issue>3</issue>
          ) (
          <year>2007</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19.
          <string-name>
            <surname>Tsai</surname>
            ,
            <given-names>W.H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yu</surname>
            ,
            <given-names>H.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>H.M.:</given-names>
          </string-name>
          <article-title>Using the Similarity of Main Melodies to Identify Cover Versions of Popular Songs for Music Document Retrieval</article-title>
          .
          <source>Journal of Information Science and Engineering</source>
          <volume>24</volume>
          (
          <issue>6</issue>
          ) (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          20.
          <string-name>
            <surname>Unal</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chew</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Georgiou</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Narayanan</surname>
            ,
            <given-names>S.S.</given-names>
          </string-name>
          :
          <article-title>Challenging Uncertainty in Query by Humming Systems: A Fingerprinting Approach</article-title>
          .
          <source>IEEE Transactions on Audio, Speech, and Language Processing</source>
          <volume>16</volume>
          (
          <issue>2</issue>
          ) (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          21.
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>A.L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Smith</surname>
            ,
            <given-names>J.O.</given-names>
          </string-name>
          :
          <article-title>Method for Search in an Audio Database</article-title>
          .
          <source>Patent (February</source>
          <year>2002</year>
          ),
          <source>WO 02/011123A2</source>
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          22.
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>A.L.</given-names>
          </string-name>
          :
          <article-title>An Industrial-Strength Audio Search Algorithm</article-title>
          .
          <source>In: Proceedings of the 4th International Society for Music Information Retrieval Conference. ISMIR</source>
          <year>2003</year>
          (
          <year>2003</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          23.
          <string-name>
            <surname>Yang</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>Macs: Music Audio Characteristic Sequence Indexing for Similarity Retrieval</article-title>
          .
          <source>In: IEEE Workshop on the Applications of Signal Processing to Audio and Acoustics. WASPAA</source>
          <year>2001</year>
          (
          <year>2001</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>