<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>LifeCLEF Bird Identi cation Task 2015</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Herve Goeau</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Herve Glotin</string-name>
          <email>glotin@univ-tln.fr</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Willem-Pier Vellinga</string-name>
          <xref ref-type="aff" rid="aff4">4</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Robert Planque</string-name>
          <xref ref-type="aff" rid="aff4">4</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Andreas Rauber</string-name>
          <email>rauber@ifs.tuwien.ac.at</email>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Alexis Joly</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Aix Marseille Univ., ENSAM, CNRS LSIS, Univ. Toulon, Institut Univ. de France</institution>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Inria ZENITH team</institution>
          ,
          <country country="FR">France</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>LIRMM</institution>
          ,
          <addr-line>Montpellier</addr-line>
          ,
          <country country="FR">France</country>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>Vienna University of Technology</institution>
          ,
          <country country="AT">Austria</country>
        </aff>
        <aff id="aff4">
          <label>4</label>
          <institution>Xeno-canto Foundation</institution>
          ,
          <country country="NL">The Netherlands</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>The LifeCLEF bird identi cation task provides a testbed for a system-oriented evaluation of 999 bird species identi cation. The main originality of this data is that it was speci cally built through a citizen science initiative conducted by Xeno-Canto, an international social network of amateur and expert ornithologists. This makes the task closer to the conditions of a real-world application than previous, similar initiatives. This overview presents the resources and the assessments of the task, summarizes the retrieval approaches employed by the participating groups, and provides an analysis of the main evaluation results.</p>
      </abstract>
      <kwd-group>
        <kwd>LifeCLEF</kwd>
        <kwd>bird</kwd>
        <kwd>song</kwd>
        <kwd>call</kwd>
        <kwd>species</kwd>
        <kwd>retrieval</kwd>
        <kwd>audio</kwd>
        <kwd>collection</kwd>
        <kwd>identi cation</kwd>
        <kwd>ne-grained classi cation</kwd>
        <kwd>evaluation</kwd>
        <kwd>benchmark</kwd>
        <kwd>bioacoustics</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        Accurate knowledge of the identity, the geographic distribution and the
evolution of bird species is essential for a sustainable development of humanity as
well as for biodiversity conservation. Unfortunately, such basic information is
often only partially available for professional stakeholders, teachers, scientists
and citizens. In fact, it is often incomplete for ecosystems that possess the
highest diversity, such as tropical regions. A noticeable cause and consequence of
this sparse knowledge is that identifying birds is usually impossible for the
general public, and often a di cult task for professionals like park rangers, ecology
consultants, and of course, the ornithologists themselves. This "taxonomic gap"
[
        <xref ref-type="bibr" rid="ref21">21</xref>
        ] was actually identi ed as one of the main ecological challenges to be solved
during United Nations Conference in Rio de Janeiro, Brazil, in 1992.
      </p>
      <p>
        The use of multimedia identi cation tools is considered to be one of the most
promising solutions to help bridging this taxonomic gap [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ], [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ], [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ],
[
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]. With the recent advances in digital devices, network bandwidth and
information storage capacities, the collection of multimedia data has indeed become
an easy task. In parallel, the emergence of "citizen science" and social
networking tools has fostered the creation of large and structured communities of nature
observers (e.g. eBird6, Xeno-canto7, iSpot 8, etc.) that have started to produce
outstanding collections of audio and/or visual records. Unfortunately, the
performance of the state-of-the-art multimedia analysis techniques on such data is
still not well understood and it is far from reaching the real world's requirements
in terms of identi cation tools. Most existing studies or available tools typically
identify a few tens of species with moderate accuracy whereas they should be
scaled-up to take one, two or three orders of magnitude more, in terms of number
of species.
      </p>
      <p>The LifeCLEF Bird task proposes to evaluate one of these challenges [?]
based on big and real-world data and de ned in collaboration with biologists
and environmental stakeholders so as to re ect realistic usage scenarios.</p>
      <p>
        Using audio records rather than bird pictures is justi ed by current practices
[
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ], [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ], [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. Birds are actually not easy to photograph; audio calls and
songs have proven to be easier to collect and su ciently species speci c.
      </p>
      <p>
        Only three notable previous worldwide initiatives on bird species identi
cation based on their songs or calls have taken place, all three in 2013. The rst
one was the ICML4B bird challenge joint to the International Conference on
Machine Learning in Atlanta, June 2013 [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. It was initiated by the SABIOD
MASTODONS CNRS group9, the University of Toulon and the National
Natural History Museum of Paris [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. It included 35 species, and 76 participants
submitted their 400 runs on the Kaggle interface. The second challenge was
conducted by F. Brigs at MLSP 2013 workshop, with 15 species, and 79
participants in August 2013. The third challenge, and biggest in 2013, was organised
by University of Toulon, SABIOD and Biotope [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], with 80 species from the
Provence, France. More than thirty teams participated, reaching 92% of average
AUC. Descriptions of the best systems of ICML4B and NIPS4B bird identi
cation challenges are given in the on-line books [
        <xref ref-type="bibr" rid="ref1 ref2">2,1</xref>
        ] including, in some cases,
references to useful scripts.
      </p>
      <p>In collaboration with the organizers of these previous challenges, BirdCLEF 2014
and 2015 go one step further by (i) signi cantly increasing the species number
by almost an order of magnitude (ii) working on real-world data collected by
hundreds of recordists (iii) moving to a more usage-driven and system-oriented
benchmark by allowing the use of meta-data and de ning information retrieval
oriented metrics. Overall, the task is expected to be much more di cult than
previous benchmarks because of the higher confusion risk between the classes,
the higher background noise and the higher diversity in the acquisition
conditions (devices, recordists uses, contexts diversity, etc.). It will therefore probably
produce substantially lower scores and o er a better progression margin towards
building real-world generalist identi cation tools.
6 http://ebird.org/
7 http://www.xeno-canto.org/
8 http://www.ispotnature.org/communities/global
9 http://sabiod.univ-tln.fr</p>
    </sec>
    <sec id="sec-2">
      <title>Dataset</title>
      <p>
        The training and test data of the bird task is composed by audio recordings
hosted on xeno-canto.org (XC). Xeno-canto is a web-based community of bird
sound recordists worldwide with more than 2300 active contributors that have
already collected more than 240,000 recordings of about 9330 species (may 2015).
999 species from Brazil are used in the BirdCLEF dataset. They represent the
species of that country with the highest number of recordings on XC, totalling
33,862 recordings contributed by hundreds of users. The dataset has between 13
and 234 recordings per species, recorded by between 1 and 72 recordists. This
dataset also contains the entire dataset from the 2014 BirdCLEF challenge [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ],
which contained about 14,000 recordings from 501 species.
      </p>
      <p>
        To avoid any bias in the evaluation related to the audio devices used, each
audio le has been normalized to a constant bandwidth of 44.1 kHz and coded
over 16 bits in .wav mono format (the right channel was selected by default).
The conversion from the original Xeno-canto data set was done using mpeg, sox
and matlab scripts. An optimized 16 Mel Filter Cepstrum Coe cients for bird
identi cation (according to an extended benchmark [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]) have been computed
with their rst and second temporal derivatives on the whole set. They were
used in the best systems run in ICML4B and NIPS4B challenges [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ],[
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ].
      </p>
      <p>Audio records are associated with various meta-data including the species
of the most active singing bird, the species of the other birds audible in the
background, the type of sound (call, song, alarm, ight, etc.), the date and
location of the observations (from which rich statistics on species distribution
can be derived), common names and collaborative quality ratings. All of them
were produced collaboratively by the Xeno-canto community.
3</p>
    </sec>
    <sec id="sec-3">
      <title>Task Description</title>
      <p>Participants were asked to determine the species of the most active singing birds
in each query le. The background noise can be used as any other meta-data,
but it is forbidden to correlate the test set of the challenge with the original
annotated Xeno-canto data base (or with any external content as many of them
are circulating on the web). More precisely, the whole BirdCLEF dataset has
been split in two parts, one for training (and/or indexing) and one for testing.
The test set was built by randomly choosing 1/3 of the observations of each
species whereas the remaining observations were kept in the reference training
set. Recordings of the same species done by the same person the same day are
considered as being part of the same observation and cannot be split across the
test and training set. The xml les containing the meta-data of the query
recordings were purged so as to erase the foreground and background species names
(the ground truth), the vernacular names (common names of the birds) and the
collaborative quality ratings (that would not be available at query stage in a
real-world mobile application). Meta-data of the recordings in the training set
are kept unaltered.</p>
      <p>The groups participating to the task were asked to produce up to 4 runs
containing a ranked list of the most probable species for each record of the test
set. Each species had to be associated with a normalized score in the range [0; 1]
re ecting the likelihood that this species was singing in the sample. For each
submitted run, participants had to say if the run was performed fully
automatically or with a human assistance in the processing of the queries, and if they
used a method based on only audio analysis or with the use of the metadata.
The metric used to compare the runs was the Mean Average Precision averaged
across all queries. Since the audio records contain a main species and often some
background species belonging to the set of 501 species in the training, we
decided to use two metrics, one focusing on all species (MAP1) and a second one
focusing only on the main species (MAP2).
4</p>
    </sec>
    <sec id="sec-4">
      <title>Participants and methods</title>
      <p>
        137 research groups worldwide registered for the task and downloaded the data
(from a total of 189 groups that registered for at least one of the three
LifeCLEF tasks). This shows the high attractiveness of the challenge in both the
multimedia community (presumably interested in several tasks) and in the
audio and bioacoustics community (presumably registered only to the bird songs
task). Finally, 6 of the registrants crossed the nish line by submitting runs and
5 of them submitted working notes explaining their runs in details. We list them
hereafter in alphabetical order and give a brief overview of the techniques they
used in their runs. We would like to point out that the LifeCLEF benchmark is
a system-oriented evaluation and not a deep or ne evaluation of the underlying
algorithms. Readers interested in the scienti c and technical details of the
implemented methods should refer to the LifeCLEF 2015 working notes or to the
research papers of each participant (referenced below):
CHIN. AC. SC., China, 3 runs: This participant attempted to experiment
a baseline audio classi cation system based on the classi cation of Mel-bands
representations and their scattering re nements [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] using a Gaussian Mixture
Model. The rst run used only MFCC features with 128 Gaussian mixtures, the
second run used the scattering re nements with 32 Gaussian mixtures, the third
run used the scattering re nements with 128 Gaussian mixtures.
Golem, Mexico, 3 runs [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ]: This participant experimented a simple yet
highly scalable system based on the classi cation of Mel-bands representations
using a random forest. The extracted Mel bands per recording were actually
pooled through simple statistics (i.e. mean, standard deviation, median and
skewness), resulting in time- and space-e cient 320-dimensional features to be
trained by the classi er.
      </p>
      <p>
        Inria Zenith, France, 3 runs [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]: Inspired by recent works on ne-grained
image classi cation, this group introduced a new match kernel based on the
shared nearest neighbors of the low level audio features extracted at the frame
level. To make such strategy scalable to the tens of millions of MFCC features
extracted from the training set, they make use of high-dimensional hashing
techniques coupled with an e cient approximate nearest neighbors search algorithm
with controlled quality. Further improvements are obtained by (i) using a sliding
window for the temporal pooling of the raw matches (ii) weighting each low level
feature according to the semantic coherence of its nearest neighbors. The nal
classi cation was then completed thanks to a support vector machine trained on
top of the resulting matching-based representations.
      </p>
      <p>
        MNB TSA, Germany, 4 runs [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]: This participant combined two main
categories of features for the classi cation: parametric acoustic features (see
openSMILE Audio Statistics) and probabilities of species-speci c spectrogram
segments (see Segment-Probabilities). This second source of information, which
performs the best, consists in extracting for each species, a set of representative
segments from spectrogram images. These segments are then used to extract
Segment-Probabilities for each le by calculating the maxima of the normalized
cross-correlation between all segments and the target spectrogram image via
template matching. Due to the very large amount of audio data not all les
belonging to a certain species were used as a source for segmentation (i.e. only good
quality les without background species were used). Additionally, to further
reduce the computation time, the spectrogram images were downsmapled before
computing the template matching. The classi cation problem was then
formulated as a multi-label regression task completed by training ensembles of
randomized decision trees with probabilistic outputs. The training was performed
in two passes, one selecting a small subset of the most discriminant features,
and one training the nal classi ers on the selected features (Run 1). To further
improve classi cation results a bagging approach was used consisting in
calculating further Segment-Probabilities from additional segments and to combine them
either by averaging (Run 2) or by blending (Run 3 and Run 4 with more blends).
QMUL, UK, 1 run [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ]: This group focused on unsupervised feature learning
in order to learn regularities in spectro-temporal content without reference to
the training labels and further help the classi er to generalise to further content
of the same type. MFCC features and several temporal variants are rst
extracted from the audio signal after a median-based thresholding pre-processing.
Extracted low level features were then reduced through PCA whitening and
clustered via spherical k-means (and a two-layer variant of it) to build the
vocabulary. During classi cation, MFCC features are pooled by projecting them
on the vocabulary with di erent temporal pooling strategies. Final supervised
classi cation is achieved thanks to a random forest classi er. This method is the
subject of a full-length article which can be read at [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ]. Details of the di erent
parameters settings used in each run are detailed in the working note [?].
MARF, Canada, 4 runs : These participants mainly attempted to transpose
a speech processing method they developed earlier to the birds case (Modular
Audio Recognition Framework (MARF)'s API, [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ]). The rst run was using
only 20 LPC coe cients as features and the Chebyshev distance. The second
run was using only the meta-data features using the MARFCAT approach [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ]
to represent the XML meta-data as a wave form without pre-processing, and
using 512-window FFT features and cosine similarity measure. The third run
was a concatenation of Run 1 and Run 2. The fourth run used the same set up
as Run 1 but split the training data by quality ratings attributes.
5
      </p>
    </sec>
    <sec id="sec-5">
      <title>Results</title>
      <p>The main outcome of the evaluation is that the use of matching-based scores
as high-dimensional features to be classi ed by supervised classi ers (as done
by MNB TSA and INRIA ZENITH) provides the best results, with a Mean
Average Precision up to 0:454 for the fourth run of the MNB TSA group. These
approaches notably outperform the unsupervised feature learning framework of
the QMUL group as well as the baseline method of the Golem group. The
matching of all the audio recordings however remains a very time-consuming process
that had to be carefully designed in order to process a large-scale dataset such
as the one deployed within the challenge. The MNB TSA group notably reduced
as much as possible the number of audio segments to be matched thanks to an
e ective audio pre-processing and segmentation framework. They also restricted
the extraction of these segments to the les having the best quality according to
the user ratings and that do not have background species. On the other side, the
INRIA ZENITH group did not use any segmentation but attempted to speed-up
the matching though the use of a hash-based approximate k-nearest neighbors
search scheme (on top of MFCC features). The better performance of the MNB
TSA runs shows that cleaning the audio segments vocabulary before applying
the matching is clearly bene cial. But using a scalable knn-based matching as
the one of the INRIA ZENITH runs could be a complementary way to speed up
the matching phase.</p>
      <p>
        It is interesting to notice that the rst run of the MNB TSA group is roughly
the same method than the one they used within the BirdCLEF challenge of the
previous year [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] and which achieved the best results (with a MAP1 equals to
0:511 vs. 0:424 this year). This shows that the impact of the increasing di culty
of the challenge (with twice the number of species) is far from negligible. The
performance loss is notably not compensated by the bagging extension of the
method which resulted in a MAP1 equals to 0:454 for MNB TSA run 4.
6
      </p>
    </sec>
    <sec id="sec-6">
      <title>Conclusion</title>
      <p>This paper presented the overview and the results of the rst LifeCLEF bird
identi cation challenge 2015. With a number of registrant exceeding hundred,
it showed a high interest of the multimedia and the bio-acoustic communities
in applying their technologies to real-world environmental data such as the ones
collected by Xeno-canto. The main outcome of this evaluation is a snapshot of
the performances of state-of-the-art techniques that will hopefully serve as a
guideline for developers interested in building end-user applications. One
important conclusion of the campaign is that the two best performing methods
were based on matching approaches attempting to construct high-dimensional
representations of the audio recordings based on their matching scores in a large
vocabulary of audio segments. The results of the evaluation clearly show the
superiority of these approaches in terms of e ectiveness but also point out the
underlying scalability issues in terms of e ciency. The increasing complexity of
the challenge over the previous year in terms of the number species and items,
notably conducted to a consistent loss of the raw identi cation performance
despite the progress of the underlying methods. Considering that the number of
bird species on earth is more than 10,000 and that the number of singing
insects is even much larger, we believe it is important to continue working on such
large-scale identi cation issues in the next years.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1. Proc.
          <article-title>of Neural Information Processing Scaled for Bioacoustics: from Neurons to Big Data, joint to NIPS (</article-title>
          <year>2013</year>
          ), http://sabiod.univ-tln.fr/NIPS4B2013_book. pdf
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <source>Proc. of the rst workshop on Machine Learning for Bioacoustics</source>
          , joint to ICML (
          <year>2013</year>
          ), http://sabiod.univ-tln.fr/ICML4B2013_book.pdf
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Anden</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mallat</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Multiscale scattering for audio classi cation</article-title>
          .
          <source>In: ISMIR</source>
          . pp.
          <volume>657</volume>
          {
          <issue>662</issue>
          (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Bas</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dufour</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Glotin</surname>
          </string-name>
          , H.:
          <article-title>Overview of the nips4b bird classi cation</article-title>
          .
          <source>In: Proc. of Neural Information Processing Scaled</source>
          for
          <article-title>Bioacoustics: from Neurons to Big Data, joint to NIPS</article-title>
          . pp.
          <volume>12</volume>
          {
          <issue>16</issue>
          (
          <year>2013</year>
          ), http://sabiod.univ-tln.fr/NIPS4B2013_ book.pdf
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Briggs</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lakshminarayanan</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Neal</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fern</surname>
            ,
            <given-names>X.Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Raich</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hadley</surname>
            ,
            <given-names>S.J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hadley</surname>
            ,
            <given-names>A.S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Betts</surname>
            ,
            <given-names>M.G.</given-names>
          </string-name>
          :
          <article-title>Acoustic classi cation of multiple simultaneous bird species: A multi-instance multi-label approach</article-title>
          .
          <source>The Journal of the Acoustical Society of America</source>
          <volume>131</volume>
          ,
          <issue>4640</issue>
          (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Cai</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ee</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pham</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Roe</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          , Zhang, J.:
          <article-title>Sensor network for the monitoring of ecosystem: Bird species recognition</article-title>
          .
          <source>In: Intelligent Sensors, Sensor Networks and Information</source>
          ,
          <year>2007</year>
          .
          <source>ISSNIP</source>
          <year>2007</year>
          . 3rd International Conference on. pp.
          <volume>293</volume>
          {
          <issue>298</issue>
          (Dec
          <year>2007</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Dufour</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Artieres</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Glotin</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Giraudet</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          :
          <article-title>Clusterized mel lter cepstral coe cients and support vector machines for bird song iden cation</article-title>
          .
          <source>In: Soundscape Semiotics - Localization and Categorization</source>
          , Glotin (Ed.) (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Gaston</surname>
          </string-name>
          ,
          <string-name>
            <surname>K.J.</surname>
            ,
            <given-names>O</given-names>
          </string-name>
          <string-name>
            <surname>'Neill</surname>
            ,
            <given-names>M.A.</given-names>
          </string-name>
          :
          <source>Automated species identi cation: why not? Philosophical Transactions of the Royal Society of London. Series B: Biological Sciences</source>
          <volume>359</volume>
          (
          <issue>1444</issue>
          ),
          <volume>655</volume>
          {
          <fpage>667</fpage>
          (
          <year>2004</year>
          ), http://rstb.royalsocietypublishing.org/ content/359/1444/655.abstract
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Glotin</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sueur</surname>
          </string-name>
          , J.:
          <article-title>Overview of the 1st int'l challenge on bird classi cation</article-title>
          .
          <source>In: Proc. of the rst workshop on Machine Learning for Bioacoustics</source>
          , joint to ICML. pp.
          <volume>17</volume>
          {
          <issue>21</issue>
          (
          <year>2013</year>
          ), http://sabiod.univ-tln.fr/ICML4B2013_book.pdf
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10. Goeau, H.,
          <string-name>
            <surname>Glotin</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vellinga</surname>
            ,
            <given-names>W.P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rauber</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Lifeclef bird identi cation task 2014</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Joly</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Champ</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Buisson</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          :
          <article-title>Shared nearest neighbors match kernel for bird songs identi cation - lifeclef 2015 challenge</article-title>
          . In: Working notes of CLEF 201 conference (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Joly</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          , Goeau, H.,
          <string-name>
            <surname>Bonnet</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bakic</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Barbe</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Selmi</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yahiaoui</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Carre</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mouysset</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Molino</surname>
            ,
            <given-names>J.F.</given-names>
          </string-name>
          , et al.:
          <article-title>Interactive plant identi cation based on social image data</article-title>
          .
          <source>Ecological Informatics</source>
          <volume>23</volume>
          ,
          <issue>22</issue>
          {
          <fpage>34</fpage>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Lasseck</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Improved automatic bird identi cation through decision tree based feature selection and bagging</article-title>
          .
          <source>In: Working notes of CLEF 2015 conference (</source>
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Lee</surname>
            ,
            <given-names>D.J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schoenberger</surname>
            ,
            <given-names>R.B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shiozawa</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Xu</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhan</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          :
          <article-title>Contour matching for a sh recognition and migration-monitoring system</article-title>
          .
          <source>In: Optics East</source>
          . pp.
          <volume>37</volume>
          {
          <fpage>48</fpage>
          . International Society for Optics and
          <string-name>
            <surname>Photonics</surname>
          </string-name>
          (
          <year>2004</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Meza</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Espino-Gamez</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Solano</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Villarreal</surname>
          </string-name>
          , E.:
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Mokhov</surname>
            ,
            <given-names>S.A.</given-names>
          </string-name>
          :
          <article-title>Study of best algorithm combinations for speech processing tasks in machine learning using median vs. mean clusters in marf</article-title>
          .
          <source>In: Proceedings of the 2008 C 3 S 2 E conference</source>
          . pp.
          <volume>29</volume>
          {
          <fpage>43</fpage>
          .
          <string-name>
            <surname>ACM</surname>
          </string-name>
          (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>Stowell</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          :
          <article-title>Birdclef 2015 submission: Unsupervised feature learning from audio</article-title>
          .
          <source>In: Working notes of CLEF 2015 conference (</source>
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <surname>Stowell</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Plumbley</surname>
          </string-name>
          , M.D.:
          <article-title>Automatic large-scale classi cation of bird sounds is strongly improved by unsupervised feature learning</article-title>
          .
          <source>arXiv preprint arXiv:1405.6524</source>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19.
          <string-name>
            <surname>Towsey</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Planitz</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nantes</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wimmer</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Roe</surname>
            ,
            <given-names>P.:</given-names>
          </string-name>
          <article-title>A toolbox for animal call recognition</article-title>
          .
          <source>Bioacoustics</source>
          <volume>21</volume>
          (
          <issue>2</issue>
          ),
          <volume>107</volume>
          {
          <fpage>125</fpage>
          (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          20.
          <string-name>
            <surname>Trifa</surname>
            ,
            <given-names>V.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kirschel</surname>
            ,
            <given-names>A.N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Taylor</surname>
          </string-name>
          , C.E.,
          <string-name>
            <surname>Vallejo</surname>
            ,
            <given-names>E.E.</given-names>
          </string-name>
          :
          <article-title>Automated species recognition of antbirds in a mexican rainforest using hidden markov models</article-title>
          .
          <source>The Journal of the Acoustical Society of America</source>
          <volume>123</volume>
          ,
          <issue>2424</issue>
          (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          21.
          <string-name>
            <surname>Wheeler</surname>
            ,
            <given-names>Q.D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Raven</surname>
            ,
            <given-names>P.H.</given-names>
          </string-name>
          , Wilson,
          <string-name>
            <surname>E.O.</surname>
          </string-name>
          :
          <source>Taxonomy: Impediment or expedient? Science</source>
          <volume>303</volume>
          (
          <issue>5656</issue>
          ),
          <volume>285</volume>
          (
          <year>2004</year>
          ), http://www.sciencemag.org/content/303/5656/ 285.short
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>