<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>BirdCLEF 2015 submission: Unsupervised feature learning from audio</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Dan Stowell</string-name>
          <email>dan.stowell@qmul.ac.uk</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Centre for Digital Music, Queen Mary University of London</institution>
        </aff>
      </contrib-group>
      <abstract>
        <p>We describe our results submitted to BirdCLEF 2015 for classifying among 999 tropical bird species. Our test attained a MAP score of over 30% in the o cial results. This note is not a self-contained paper, since our system was largely the same as used in BirdCLEF 2014 and described in detail elsewhere. The method uses raw audio without segmentation and without using any auxiliary metadata. and successfully classi es among 999 bird categories.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>Our unsupervised feature learning scales well with increasing data size:
linearly, as described in the main paper. However, in our case, due to the compute
resources available in the time leading up to the competition deadline we were
not able to submit more than one run, nor to apply model averaging.</p>
      <p>
        Our own tests using a two-fold split of the training data con rmed an
observation that we made in [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]: adding more layers gives a bene t up to a certain
limit, which appears to be related to the size of the available data set. In our
tests (Figure 2) the available data appeared insu cient to support a three-layer
variant, hence we submitted a two-layer run.
      </p>
    </sec>
    <sec id="sec-2">
      <title>Feature learning</title>
      <sec id="sec-2-1">
        <title>Spectrograms</title>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Classification</title>
      <sec id="sec-3-1">
        <title>Spectrograms</title>
      </sec>
      <sec id="sec-3-2">
        <title>High-pass filtering &amp;</title>
      </sec>
      <sec id="sec-3-3">
        <title>RMS normalisation</title>
      </sec>
      <sec id="sec-3-4">
        <title>High-pass filtering &amp;</title>
      </sec>
      <sec id="sec-3-5">
        <title>RMS normalisation</title>
      </sec>
      <sec id="sec-3-6">
        <title>Spectral median noise reduction</title>
      </sec>
      <sec id="sec-3-7">
        <title>PCA whitening</title>
      </sec>
      <sec id="sec-3-8">
        <title>Spherical k-means</title>
      </sec>
      <sec id="sec-3-9">
        <title>Learnt bases</title>
      </sec>
      <sec id="sec-3-10">
        <title>Spectral median noise reduction</title>
      </sec>
      <sec id="sec-3-11">
        <title>Feature transformation</title>
      </sec>
      <sec id="sec-3-12">
        <title>Temporal summarisation</title>
      </sec>
      <sec id="sec-3-13">
        <title>Training labels</title>
      </sec>
      <sec id="sec-3-14">
        <title>Train/test (Random Forest)</title>
      </sec>
      <sec id="sec-3-15">
        <title>Decisions</title>
        <p>For this 2015 challenge (across 999 bird species with 33,203 audio les) our
nal MAP score was 30.2% (considering only foreground species), and 26.2%
(including background species). These results are a few percentage points lower
than the results for the similar systems submitted to the 2014 challenge, as one
might expect given that the number of species to identify had been increased
from 501 to 999.</p>
        <p>Acknowledgments
We would like to thank the people and projects which made available the
data used for this research|the Xeno Canto website and its many volunteer
contributors|as well as the SABIOD research project for instigating the
contest, and the CLEF contest hosts.</p>
        <p>This work was supported by EPSRC Early Career Fellowship EP/L020505/1.
20
10
0
lifeclef2015 Classifier: binary relevance
s
m
4kflplkfl
8
4
ce
elp
s
m
s
m
4kflplkflplkfl
8
4
8
4
ce
elp
s
m</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Cappellato</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ferro</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jones</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          , San Juan, E. (eds.):
          <article-title>CLEF 2015 Labs and Workshops, Notebook Papers</article-title>
          .
          <source>CEUR Workshop Proceedings (CEUR-WS.org)</source>
          (
          <year>2015</year>
          ), http://ceur-ws.
          <source>org/</source>
          Vol-
          <volume>1391</volume>
          /
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Lakshminarayanan</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Roy</surname>
            ,
            <given-names>D.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Teh</surname>
            ,
            <given-names>Y.W.</given-names>
          </string-name>
          :
          <article-title>Mondrian forests: E cient online random forests</article-title>
          .
          <source>arXiv preprint arXiv:1406.2673</source>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Stowell</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Plumbley</surname>
          </string-name>
          , M.D.:
          <article-title>Automatic large-scale classi cation of bird sounds is strongly improved by unsupervised feature learning</article-title>
          .
          <source>PeerJ 2</source>
          ,
          <issue>e488</issue>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>