<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>LifeCLEF Bird Identi cation Task 2016</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Herve Goeau</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Herve Glotin</string-name>
          <email>glotin@univ-tln.fr</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Willem-Pier Vellinga</string-name>
          <xref ref-type="aff" rid="aff4">4</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Robert Planque</string-name>
          <xref ref-type="aff" rid="aff4">4</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Alexis Joly</string-name>
          <email>alexis.joly@inria.fr</email>
          <xref ref-type="aff" rid="aff2">2</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Aix Marseille Univ., ENSAM, CNRS LSIS, Univ. Toulon, Institut Univ. de France</institution>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>IRD, UMR AMAP</institution>
          ,
          <addr-line>Montpellier</addr-line>
          ,
          <country country="FR">France</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Inria ZENITH team</institution>
          ,
          <addr-line>Montpellier</addr-line>
          ,
          <country country="FR">France</country>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>LIRMM</institution>
          ,
          <addr-line>Montpellier</addr-line>
          ,
          <country country="FR">France</country>
        </aff>
        <aff id="aff4">
          <label>4</label>
          <institution>Xeno-canto Foundation</institution>
          ,
          <country country="NL">The Netherlands</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>The LifeCLEF bird identi cation challenge provides a largescale testbed for the system-oriented evaluation of bird species identi cation based on audio recordings. One of its main strength is that the data used for the evaluation is collected through Xeno-Canto, the largest network of bird sound recordists in the world. This makes the task closer to the conditions of a real-world application than previous, similar initiatives. The main novelty of the 2016-th edition of the challenge was the inclusion of soundscape recordings in addition to the usual xeno-canto recordings that focus on a single foreground species. This paper reports the methodology of the conducted evaluation, the overview of the systems experimented by the 6 participating research groups and a synthetic analysis of the obtained results.</p>
      </abstract>
      <kwd-group>
        <kwd>LifeCLEF</kwd>
        <kwd>bird</kwd>
        <kwd>song</kwd>
        <kwd>call</kwd>
        <kwd>species</kwd>
        <kwd>retrieval</kwd>
        <kwd>audio</kwd>
        <kwd>collection</kwd>
        <kwd>identi cation</kwd>
        <kwd>ne-grained classi cation</kwd>
        <kwd>evaluation</kwd>
        <kwd>benchmark</kwd>
        <kwd>bioacoustics</kwd>
        <kwd>ecological monitoring</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        Accurate knowledge of the identity, the geographic distribution and the
evolution of bird species is essential for a sustainable development of humanity as
well as for biodiversity conservation. The general public as well as professionals
like park rangers, ecological consultants and of course the ornithologists
themselves are potential users of an automated bird identifying system, typically in
the context of wider initiatives related to ecological surveillance or biodiversity
conservation. The LifeCLEF Bird challenge proposes to evaluate the
state-ofthe-art of audio-based bird identi cation systems at a very large scale. Before
LifeCLEF started in 2014, three previous initiatives on the evaluation of acoustic
bird species identi cation took place, including two from the SABIOD6 group
6 Scaled Acoustic Biodiversity http://sabiod.univ-tln.fr
[
        <xref ref-type="bibr" rid="ref1 ref3 ref4">4,3,1</xref>
        ]. In collaboration with the organizers of these previous challenges,
BirdCLEF 2014, 2015 and 2016 challenges went one step further by (i) signi cantly
increasing the species number by an order of magnitude, (ii) working on
realworld social data built from thousands of recordists, and (iii) moving to a more
usage-driven and system-oriented benchmark by allowing the use of meta-data
and de ning information retrieval oriented metrics. Overall, the task is much
more di cult than previous benchmarks because of the higher confusion risk
between the classes, the higher background noise and the higher diversity in
the acquisition conditions (di erent recording devices, contexts diversity, etc.).
It therefore produces substantially lower scores and o ers a better progression
margin towards building real-world generalist identi cation tools.
The main novelty of the 2016-th edition of the challenge with respect to the
two previous years was the inclusion of soundscape recordings in addition to the
usual xeno-canto recordings that focus on a single foreground species (usually
thanks to mono-directional recording devices). Soundscapes, on the other hand,
are generally based on omnidirectional recording devices that continuously
monitor a speci c environment over a long period. This new kind of recording ts
better to the (possibly crowdsourced) passive acoustic monitoring scenario that
could augment the number of collected records by several orders of magnitude.
In this paper, we report the methodology of the conducted evaluation as well as
the synthetic analysis of the results achieved by the 6 participating groups.
2
      </p>
    </sec>
    <sec id="sec-2">
      <title>Dataset</title>
      <p>
        The training and test data of the challenge consists of audio recordings
collected by Xeno-canto (XC)7. Xeno-canto is a web-based community of bird sound
recordists worldwide with about 3,000 active contributors that have already
collected more than 300,000 recordings of about 9550 species (numbers for June
2016). Nearly 1000 (in fact 999) species were used in the BirdCLEF dataset,
representing the 999 species with the highest number of recordings in October 2014
(14 or more) from the combined area of Brazil, French Guiana, Surinam, Guyana,
Venezuela and Colombia, totalling 33,203 recordings produced by thousands of
users. This dataset includes the entire dataset from the 2015 BirdCLEF challenge
[
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], which contained about 33,000 recordings. The newly introduced test data in
2016, contains 925 soundscapes provided by 7 xeno-canto members, sometimes
working in pairs. Most of the soundscapes have a length of (more or less) 10
minutes, each coming often from a set of 10-12 successive recording made at one
location. The total duration of new testing data to process and analyse is thus
equivalent to approximately 6 days of continuous sound recording. The number
of known species (i.e. belonging to the 999 species in the training dataset) varies
from 1 to 25 species, with an average of 10.1 species per soundscape.
      </p>
      <p>
        To avoid any bias related to the used audio devices in the evaluation , each
audio le was normalized to a constant bandwidth of 44.1 kHz and coded with 16
bits in wav mono format (the right channel is selected by default). The conversion
7 http://www.xeno-canto.org/
from the original Xeno-canto data set was done using mpeg, sox and matlab
scripts. The optimized 16 Mel Filter Cepstrum Coe cients for bird identi
cation (according to an extended benchmark [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]) were computed with their rst
and second temporal derivatives on the whole set. They were used in the best
systems run in ICML4B and NIPS4B challenges. However, due to some technical
limitations, the soundscapes were not normalized and directly provided to the
participants in mp3 format (shared on the xeno-canto website, the original raw
les being not available).
      </p>
      <p>All audio records are associated with various meta-data including the species
name of the most active singing bird, the species of the other birds audible in the
background, the type of sound (call, song, alarm, ight, etc.), the date and
location of the observations (from which rich statistics on species distribution may
be derived), some textual comments by the authors, multilingual common names
and collaborative quality ratings. All of them were produced collaboratively by
the Xeno-canto community.
3</p>
    </sec>
    <sec id="sec-3">
      <title>Task Description</title>
      <p>Participants were asked to determine all the active singing birds species in each
query le. It was forbidden to correlate the test set of the challenge with the
original annotated Xeno-canto database (or with any external content as many of
them are circulating on the web). The whole data was split in two parts, one for
training (and/or indexing) and one for testing. The test set was composed of (i)
all the newly introduced soundscapes recordings and (ii), the entire test set used
in 2015 (equal to about 1/3 of the observations in the whole 2015 dataset). The
training set was exactly the same as the one used in 2015 (i.e. the remaining 2/3
of the observations). Note that recordings of the same species made by the same
person on the same day are considered as being part of the same observation
and cannot be split across the test and training set. The XML les containing
the meta-data of the query recordings were purged so as to erase the taxon name
(the ground truth), the vernacular name (common name of the bird) and the
collaborative quality ratings (that would not be available at query stage in a
real-world mobile application). Meta-data of the recordings in the training set
were kept unaltered.</p>
      <p>The groups participating in the task were asked to produce up to 4 runs
containing a ranked list of the most probable species for each query records of
the test set. Each species was associated with a normalized score in the range
[0; 1] re ecting the likelihood that this species is singing in the sample. For each
submitted run, participants had to say if the run was performed fully
automatically or with human assistance in the processing of the queries, and if they used
a method based only on audio analysis or with the use of the metadata.</p>
      <p>The primary metric used was the mean Average Precision (mAP) averaged
across all queries, considering each audio le of the test set as a query and
computed as:
mAP =</p>
      <p>PQ
q=1 AveP (q)</p>
      <p>Q
;
where Q is the number of test audio les and AveP (q) for a given test le q is
computed as</p>
      <p>AveP (q) = Pkn=1(P (k) rel(k)) :</p>
      <p>number of relevant documents
Here k is the rank in the sequence of returned species, n is the total number of
returned species, P (k) is the precision at cut-o k in the list and rel(k) is an
indicator function equaling 1 if the item at rank k is a relevant species (i.e. one
of the species in the ground truth).
4</p>
    </sec>
    <sec id="sec-4">
      <title>Participants and methods</title>
      <p>
        84 research groups worldwide registered for the task and downloaded the data
(from a total of 130 groups that registered for at least one of the three
LifeCLEF tasks). This shows the high attractiveness of the challenge in both the
multimedia community (presumably interested in several tasks) and in the
audio and bioacoustics community (presumably registered only to the bird songs
task). Finally, 6 of the registrants crossed the nish line by submitting runs and
5 of them submitted working notes explaining their runs in detail. We list them
hereafter in alphabetical order and give a brief overview of the techniques they
used in their runs. We would like to point out that the LifeCLEF benchmark is
a system-oriented evaluation and not a deep or ne evaluation of the underlying
algorithms. Readers interested in the scienti c and technical details of the
implemented methods should refer to the LifeCLEF 2016 working notes or to the
research papers of each participant (referenced below):
BME TMIT, Hungary, 4 runs [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]: BME TMIT is one of the three teams
who used a Convolutional Neural Network with CUBE and WUT teams. As
pre-processing, they rst downsampled each audio le to 16 kHz frequency and
applied a low-pass lter with cuto frequency of 6250 Hz in order to reduce the
size of the training data. Then they subdivided the spectograms into cells of 0.5
seconds x 10 bands of frequency, and removed the cells with few information
(according to the mean and variance). After these preprocessing steps, they
assembled and re-split the remaining parts of the spectrograms to ve second long
pieces, and obtained arrays of 200310 (where 310 samples corresponds to ve
seconds), used as input of the CNN. They used two distincts CNN architectures:
the well-know AlexNet [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] with the addition of a batch normalisation (run 1 &amp;
2), and a CNN more inspired by audio recognition systems based on 4
convolutional layers, one full connected layer, ReLU activation functions and batch
normalisation (run 3 &amp; 4).
      </p>
      <p>
        CUBE, Switzerland, 4 runs: This system is based on a CNN architecture
of 5 convolutional layers combined with the use of a rectify activation function
followed by a max-pooling layer. Based on spectrogram analysis and
morphological operations, silent and noisy parts were rst detected and separated from
the call and song parts. Spectrograms were then split into chunks of 3 seconds
that were used as inputs of the CNN after several data augmentation techniques.
Each chunk identi ed as a singing bird was rst concatenated with 3 randomly
selected chunks of background noise. Time shift, pitch sift and mixes of audio
les from the same species were then used as complementary data augmentation
techniques. Considering one test record, all predictions from its distinct chunks
are nally averaged. Run 1 was an intermediate result obtained after only one
day of training. Run 2 di ers from run 3 by using 50% smaller spectrograms in
(pixel) size for doubling the batch size and thus allowing to have more iterations
for the same training time (4 days). Run 4 is the average of predictions from run
2 and 3 and reaches the best performance, showing the bene t of bagging.
DYNI LSIS, France, 1 runs [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]: The algorithm presented here is quite
standard and was initially used on smaller datasets to improve, in a late fusion
scheme, a classi er based on pairs of spectrogram peaks, described in the context
of audio ngerprinting. The method is based on the bag-of-words approach: rst
the 44.1 kHz audio les were split in 0.2s segments with 50% overlap, and only
the segments having energy values higher than a relative (to the whole audio
le) value and spectral atness values smaller than an absolute thresh-old were
kept for Mel Frequency Cepstral Coe cient computation (MFCC). A k-means
clustering was performed on all the MFCC and their derivatives with k=500, in
order to extract for every les the normalized histogram of MFCC-based words
(i.e. the 500 clusters), using only segments kept in step 2. The resulting feature
vectors were then fed to a random forest classi er.
      </p>
      <p>
        MNB TSA, Germany, 4 runs [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]: As in 2014 and 2015, this participant
used two hand-crafted parametric acoustic features and probabilities of
speciesspeci c spectrogram segments in a template matching approach. Long segments
extracted during BirdCLEF2015 were re-segmented with a more sensitive
algorithm. The segments were then used to extract Segment-Probabilities for each le
by calculating the maxima of the normalized cross-correlation between all
segments and the target spectrogram image via template matching. Due to the very
large amount of audio data, not all les were used as a source for segmentation
(i.e. only good quality les without background species were used). The
classication problem was then formulated as a multi-label regression task solved by
training ensembles of randomized decision trees with probabilistic outputs. The
training was performed in 2 passes, one selecting a small subset of the most
discriminant features by optimizing the internal mAP score on the training set, and
one training the nal classi ers on the selected features. Run 1 used one single
model on a small but highly optimized selection of Segment-Probabilities. A
bagging approach was used consisting in calculating further Segment-Probabilities
from additional segments and to combine them either by blending (24 models
in Run 3). Run 4 also used blending to aggregate model predictions, but the
predictions were included that after blending resulted in the highest possible
mAP score calculated on the entire training set (13 models including the best
model from 2015).
      </p>
      <p>
        WUT, Poland, 4 runs [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]: as the Cube and the BME TMIT teams, they
used a Convolutional Neural Network learning framework. Starting from
denoised spectrograms, silent parts were removed with percentile thresholding,
giving thus around 86,000 training segments varying in length and associated
each with a single main species. As a data augmentation technique and for
tting the 5 seconds xed input size of the CNN, segments were adjusted by
either trimming or padding. The 3 rst successive runs are produced by deeper
and deeper, or/and, wider and wider lters. Run 4 is as an ensemble of neural
networks averaging the predictions of the 3 rst runs.
5
      </p>
    </sec>
    <sec id="sec-5">
      <title>Results</title>
      <p>
        Figure 1 reports the performance measured for the 18 submitted runs. For
each run (i.e. each evaluated system), we report the overall mean Average
Precision (o cial metric) as well as the mAP for the two categories of queries: the
soundscapes recordings (newly introduced) and the common observations (the
same as the one used in 2015). To measure the progress over last year, we also
plot on the graph the performance of the last year best system [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] (orange dotted
line). The rst noticeable conclusion is that, after two years of resistance of bird
songs identi cation systems based on engineering features, convolutional neural
networks nally managed to outperform them (as in many other domains). The
best run based on CNN (Cube Run 4) actually reached an impressive mAP of
0:69 on the 2015 testbed to be compared to respectively 0:45 and 0:58 for the
best systems based on hand-crafted features evaluated in 2015 and 2016. To our
knowledge, BirdCLEF is the rst comparative study reporting such an important
performance gap in bioacoustic large-scale classi cation. A second important
remark is that this performance of CNN's was achieved without any ne-tuning
contrary to most computer vision challenges in which the CNN is generally
pretrained on a large training data such as ImageNet. Thus, we could hope even
better performance, e.g. by transferring knowledge from other bio-acoustic contexts
or other domains. Now, it is important to notice that the other systems based
on CNN (WUT and BME TMIT) did not perform as well as the Cube system
and did not outperformed the system of TSA based on hand-crafted features.
Looking at the detailed description of the three CNN architectures and their
learning framework, it appears that the way in which audio segment extraction
and data augmentation is performed does play a crucial role. Cube system does
notably include a randomized background noise addition phase which makes it
much more robust to the diversity of noise encountered in the test data. If we
now look at the scores achieved by the evaluated systems on the soundscape
recordings only (yellow plot), we can draw very di erent conclusions. First of
all, we can observe that the performance on the soundscapes is much lower than
on the classical queries, whatever the system. Although the classical recordings
also include multiple species singing in the background, the soundscapes appear
to be much more challenging. Several tens of species and even much more
individual birds can actually be singing simultaneously. Separating all these sources
seem to be beyond the scope of state-of-the-art audio representation learning
methods. Interestingly, the best system on the soundscape queries was the one
of TSA based on the extraction of very short species-speci c spectrogram
segments and a template matching approach. This very ne-grained approach allows
the extracted audio patterns to be more robust to the species overlap problem.
On the contrary, the CNN of Cube and WUT systems were optimized for the
mono-species segment classi cation problem. The data augmentation method of
the Cube system was in particular only designed for the single species case. It
addressed the problem of several individual birds of the same species singing
together (by mixing di erent segments of the same class) but it did not address
the multi-label issue (i.e. several species singing simultaneously).
To study in more details the dynamic of the identi cation performance across
the diversity of species, Figure 2 presents the scores achieved by the best system
of each team on a selection of 3x10 species: (i) the top-10 best recognized ones
(according to the performance of the best system Cube Run 4 ), (ii) 10 species of
intermediate di culties and (iii) the worst-10 recognized ones (still based on the
performance of Cube Run 4 ). For a better interpretation of the chart, we also
included for each of the 30 selected species, the number of audio recordings in
the training set (ranging from 10 to 37 recordings). The graph rst shows that
there is a huge performance gap between the best recognized species and the
worst cases. Some species are actually perfectly classi ed by 4 of the 6 systems
whereas some others are never recognized by none of the systems. Interestingly,
one can see that the performance does not seem to be correlated to the number
of training samples. In the same way, we did observed that it is not correlated
to the average length of the recordings in the class. This means that the high
variability in performance is more related to other factors such as (i) the bird
sounds variability (some birds are more audible than others), (ii) the acquisition
di culty (some birds are easier to record than others), (iii) the degree of
confusion across close species. Another interesting remark is that two of the species
that are not recognized at all by the CNN are comparatively pretty well
recognized by the template matching kernel approach of MNB TSA. Thus, it would
be interesting to study in more details the kind of audio patterns that have been
matched by their method so as to understand what the CNN missed and how
such patterns could be automatically learned as well.
This paper presented the overview and the results of the LifeCLEF bird identi
cation challenge 2016. The main outcome was that after two years of resistance
of bird song identi cation systems based on engineering features, convolutional
neural networks nally managed to outperform them with a signi cant margin.
It is noticeable that the best performing CNN did not used any ne-tuning so
that it did not bene t from the transfer learning capacities of that techniques.
We could thus expect even better performances. Also, the used CNN
architecture was mostly inspired by the ones which perform the best on computer vision
tasks. Our detailed analysis of the results tend to show that some audio
patterns might not be learned accurately through such network whereas they are
detected through template matching techniques. Anyway, it is obvious that, as
in many domains beforehand, deep learning is rede ning the boundaries of the
state-of-the-art and opens the door to further progress in the next years.
      </p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Briggs</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Huang</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Raich</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Eftaxias</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          , et al.,
          <string-name>
            <surname>Z.L.</surname>
          </string-name>
          :
          <article-title>The 9th mlsp competition: New methods for acoustic classi cation of multiple simultaneous bird species in noisy environment</article-title>
          .
          <source>In: IEEE Workshop on Machine Learning for Signal Processing (MLSP)</source>
          . pp.
          <volume>1</volume>
          {
          <issue>8</issue>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Dufour</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Artieres</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Glotin</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Giraudet</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          :
          <article-title>Clusterized mel lter cepstral coe cients and support vector machines for bird song iden cation</article-title>
          .
          <source>In: Soundscape Semiotics - Localization and Categorization</source>
          , Glotin (Ed.) (
          <year>2014</year>
          ), http://www.intechopen.com/books/ soundscape-semiotics
          <article-title>-localisation-and-categorisation</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Glotin</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Clark</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>LeCun</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dugan</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Halkias</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sueur</surname>
          </string-name>
          , J.:
          <article-title>Bioacoustic challenges in icml4b</article-title>
          .
          <source>In: in Proc. of 1st workshop on Machine Learning for Bioacoustics. No. USA, ISSN 979-10-90821-02-6</source>
          (
          <year>2013</year>
          ), http://sabiod.org/ ICML4B2013_proceedings.pdf
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Glotin</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dufour</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bas</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          :
          <article-title>Overview of the 2nd challenge on acoustic bird classi cation</article-title>
          .
          <source>In: Proc. Neural Information Processing Scaled for Bioacoustics. NIPS Int. Conf</source>
          ., Ed. Glotin H., LeCun Y.,
          <string-name>
            <surname>Artieres</surname>
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mallat</surname>
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tchernichovski</surname>
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Halkias</surname>
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>USA</surname>
          </string-name>
          (
          <year>2013</year>
          ), http://sabiod.univ-tln.
          <source>fr/nips4b</source>
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5. Goeau, H.,
          <string-name>
            <surname>Glotin</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vellinga</surname>
            ,
            <given-names>W.P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Planque</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rauber</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Joly</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Lifeclef bird identi cation task 2015</article-title>
          . In: CLEF working notes
          <year>2015</year>
          (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Krizhevsky</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sutskever</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hinton</surname>
          </string-name>
          , G.E.:
          <article-title>Imagenet classi cation with deep convolutional neural networks</article-title>
          .
          <source>In: Advances in neural information processing systems</source>
          . pp.
          <volume>1097</volume>
          {
          <issue>1105</issue>
          (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Lasseck</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Improved automatic bird identi cation through decision tree based feature selection and bagging</article-title>
          .
          <source>In: Working notes of CLEF 2015 conference (</source>
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Lasseck</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Improving bird identi cation using multiresolution template matching and feature selection during training</article-title>
          .
          <source>In: Working notes of CLEF conference</source>
          (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Piczak</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          :
          <article-title>Recognizing bird species in audio recordings using deep convolutional neural networks</article-title>
          .
          <source>In: Working notes of CLEF 2016 conference (</source>
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Ricard</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Glotin</surname>
          </string-name>
          , H.:
          <article-title>Bag of mfcc-based words for bird identi cation</article-title>
          .
          <source>In: Working notes of CLEF 2016 conference (</source>
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Toth</surname>
            ,
            <given-names>B.P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Czeba</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          :
          <article-title>Convolutional neural networks for large-scale bird song classi cation in noisy environment</article-title>
          .
          <source>In: Working notes of CLEF conference</source>
          (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>