<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Domain Adaption for Birdcall Recognition: Progressive Knowledge Distillation with Semi-Supervised and Self-Supervised Soundscape Labeling⋆</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Lihang Hong</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Accenture Japan Ltd</institution>
          ,
          <addr-line>Akasaka Intercity 1-11-44 Akasaka, Minato-ku, Tokyo, 107-8672</addr-line>
          ,
          <country country="JP">Japan</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2024</year>
      </pub-date>
      <abstract>
        <p>We present working notes for the BirdCLEF 2024 competition, focused on recogizing Indian bird species in soundscape recorded in Western Ghats. In this study, first, we utilize existing of-the-shelf models, BirdNET and Bird Vocalization Classifier, to address labeling challenges for training soundscapes from the same recording locations as the test soundscapes. Second, with the semi-supervised labeled soundscape, we execute a cycle of knowledge distillation training, self-supervised re-labeling and knowledge distillation training again. Our goal is to address the challenge of domain shift between train audio which focus on a certain species and test soundscape, and to maximize the performance of models. The solution based on the study achieves 7th rank among 974 teams at BirdCLEF 2024 challenge hosted in Kaggle.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;BirdCLEF2024</kwd>
        <kwd>audio</kwd>
        <kwd>bird species recognition</kwd>
        <kwd>Semi-supervised</kwd>
        <kwd>Self-supervised</kwd>
        <kwd>Knowledge Distillation</kwd>
        <kwd>Domain Adaption</kwd>
        <kwd>CEUR-WS</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>The rapid decline in global biodiversity has become a significant concern in recent years, putting
numerous species at risk of extinction and threatening the stability of ecosystems. As birds serve as
important indicators of biodiversity change, monitoring their populations is essential. Traditional bird
surveys, which primarily rely on direct observation and human expertise, can be resource-intensive and
face logistical challenges when applied at large scales and high temporal resolutions. This highlights
the need for more eficient, scalable, and cost-efective methods to monitor bird populations.
Advancements in passive acoustic monitoring (PAM) technology, combined with innovative machine learning
algorithms, present a promising solution to these challenges.</p>
      <p>
        Western Ghats are a biodiversity hotspot, home to diverse ecosystems and bird species, including
those that are endemic and endangered. However, these ecosystems are threatened by landscape and
climate changes. The aim of BirdCLEF 2024[
        <xref ref-type="bibr" rid="ref1">1</xref>
        ][
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] is to develop conservation technologies to carry out
automated detection and classification of bird species of the Western Ghats from soundscapes.
      </p>
    </sec>
    <sec id="sec-2">
      <title>2. Domain Shift Challenge in Birdcall Recognition</title>
      <p>
        The BirdCLEF 2024 competition focuses on recogizing Indian bird species in fully annotated 4-minute
test soundscapes recorded in Western Ghats, which we call fully-annotated dataset. Two types of
dataset are provided for training. One dataset, which we call weakly labeled dataset, comprises of
short audios with ground truth label from Xeno-canto[
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. Another dataset, which we call unlabeled
dataset, comprises of soundscapes without ground truth label recorded in the same locations as the
fully-annotated dataset.
      </p>
      <p>
        The model trained with weakly labeled dataset poses domain shift challenges when predicting
fully-annotated dataset[
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. The challenges are:
1. Covariate shift. Short audios usually focus on one certain species and the bird call appears in
the foreground. However, in soundscape, usually there are several species speaking over each
other in the background. Making the classification model trained on short audios applicable to
soundscape is very important because scientists need to identify birds recorded in a relatively
noisy environment, while short audios are cost-efective as training data.
2. Label shift. Label shift can occur due to a variety of reasons such as seasonal variations in bird
species and geographical disparities. For instance, the short audio may include a higher proportion
of certain bird species that are not as prevalent in fully-annotated dataset. The implications of
label shift are significant, as it can lead to biased predictions and poor model performance. If the
model is trained on a dataset with a high proportion of certain bird species, it might over-predict
these species in fully-annotated dataset. Conversely, it might under-predict species that were less
prevalent in the training audio but more common in fully-annotated dataset.
      </p>
      <p>Under the hypothesis that unlabeled dataset share a similar distribution with fully-annotated dataset,
we focus our eforts on addressing domain shift challenge by labeling unlabeled dataset with
semisupervised and self-supervised approach. After labeling unlabeled dataset, we train the model with the
union of weakly labeled dataset and unlabeled dataset.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Method</title>
      <sec id="sec-3-1">
        <title>3.1. Dataset</title>
        <sec id="sec-3-1-1">
          <title>3.1.1. Short Audio from Xeno-canto</title>
          <p>
            As in previous BirdCLEF challenges, training data is provided by the Xeno-canto community. 24459
short audios covering 182 species are provided by the competition host. To further expand the dataset
size, we collect additional 25710 short audios from Xeno-canto community. For pretraining, audios from
previous BirdCLEF challenges were included [
            <xref ref-type="bibr" rid="ref5">5</xref>
            ][
            <xref ref-type="bibr" rid="ref6">6</xref>
            ][
            <xref ref-type="bibr" rid="ref7">7</xref>
            ][
            <xref ref-type="bibr" rid="ref8">8</xref>
            ]. The total dataset size was 234104 covering
992 species. We call short audios from Xeno-canto weakly labeled dataset.
          </p>
        </sec>
        <sec id="sec-3-1-2">
          <title>3.1.2. Semi-supervised Labeled Soundscape</title>
          <p>
            In addition to weakly labeled dataset, 8444 unlabeled soundscapes recorded in the same locations as
the fully-annotated dataset are provided by the competition host, which we call unlabeled dataset. we
utilize existing of-the-shelf models, BirdNET[
            <xref ref-type="bibr" rid="ref9">9</xref>
            ] and Bird Vocalization Classifier[
            <xref ref-type="bibr" rid="ref10">10</xref>
            ], to extract audio
clip with high probability of birdcall presence. We call audio clips extracted from soundscapes with
BirdNet and Bird Vocalization Classifier semi-supervised unlabeled dataset.
          </p>
          <p>BirdNET is able to predict presence of all competition species except Nilgiri Wood Pigeon, while Bird
Vocalization Classifier is able to predict presence of Nilgiri Wood-Pigeon. We process every soundscape
using BirdNet to extract a prediction logit vector in 181 dimensions for every 3-second interval and
using Bird Vocalization Classifier to to extract a prediction logit vector in 1 dimension for every 5-second
interval. With the prediction logit, we extract 15-second audio clip with birdcall presence probability
larger than 30 percent.</p>
          <p>34829 audio clips are extracted from 5162 soundscapes. Comparation of species distribution between
weakly labeled dataset and semi-supervised unlabeled dataset is shown in Figure 1.</p>
          <p>As we can see in Figure 1, species distribution of weakly labeled dataset difers from that of
semisupervised unlabeled dataset, indicating the existance of label shift between weakly labeled dataset and
fully-annotated dataset, under the hypothesis that unlabeled dataset share a similar distribution with
fully-annotated dataset.</p>
        </sec>
        <sec id="sec-3-1-3">
          <title>3.1.3. Self-supervised Labeled Soundscape</title>
          <p>After training models with weakly labeled dataset and semi-supervised unlabeled dataset, we utilize
trained models to further extract audio clip with high probability of birdcall presence. We call audio
clips extracted from soundscapes with trained models self-supervised unlabeled dataset.</p>
          <p>67260 audio clips are extracted from 6654 soundscapes. As we can see in Figure 1, self-supervised
unlabeled dataset share a similar species distribution with semi-supervised unlabeled dataset.</p>
        </sec>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Training Details</title>
        <sec id="sec-3-2-1">
          <title>3.2.1. Model Architecture</title>
          <p>
            We use two types of model architecture from our work in BirdCLEF 2023[
            <xref ref-type="bibr" rid="ref11">11</xref>
            ]. One is Sound Event
Detection model[
            <xref ref-type="bibr" rid="ref12">12</xref>
            ], which we call SED model. Another is CNNs with simple pooling layer, which we
call Custom CNN[13][14]. Details of Mel-spectrogram parameters for each model are shown in Table 1.
          </p>
        </sec>
        <sec id="sec-3-2-2">
          <title>3.2.2. Knowledge Distillation and Temperature</title>
          <p>Knowledge distillation is a technique used in deep learning where a smaller, simpler model, or the
student model, is trained to mimic the behavior of a larger, more complex model, or the teacher model
[15]. The goal is to transfer the knowledge from the teacher, which may be impractical to use in
real-world applications that require fast predictions due to its complexity, to the student. The key idea
behind knowledge distillation is to use the output probabilities of the teacher model, known as soft
targets, to train the student model. These soft targets provide more information than just the correct
class labels (hard targets). This additional information helps the student model learn more efectively.</p>
          <p>To transfer the knowledge from of-the-shelf models, we use prediction logit vector extracted by
BirdNET and Bird Vocalization Classifier as soft target for model training. Using soft target is also
an efective way to address the challenge of weak labels of weakly labeled dataset. In weakly labeled
dataset, we have no information about where the birdcall appears and there is a chance that the audio
clip does not contain birdcall when we clip the audio. In that case, the presence probability of hard
target is still set to 1 for the species, which introduce noise to the training process. On the contrary,
presence probability of soft target generated by the teacher model is expected to be a value near 0,
which suppress the noise in training process.</p>
          <p>In the context of knowledge distillation, the concept of temperature comes into play when generating
soft targets. Temperature is a parameter that smooths out the probability distribution produced by the
teacher model. When the temperature is high, the diferences between the probabilities of the diferent
classes are smaller, making the distribution softer and more informative. When the temperature is low,
the distribution becomes sharper, with one class having a much higher probability than the others. By
using a higher temperature, the student model can learn more nuanced information from the teacher’s
predictions.</p>
          <p>For our experiments, we found that using a temperature value of 20 provided a good balance, making
the soft targets informative enough to significantly improve the student model’s performance.</p>
          <p>Models are trained with the following loss function:
loss function = 0.1 · hard target loss + 0.9 · soft target loss
hard target loss = BCELoss(model prediction, hard target)
soft target loss = KLDivLoss
︂( model prediction soft target )︂</p>
          <p>,
T T</p>
          <p>· T2
T = 20
(1)
(2)
(3)
(4)</p>
        </sec>
        <sec id="sec-3-2-3">
          <title>3.2.3. Sampling Strategy</title>
          <p>To address to the challenge of domain shift, we compare sampling strategies in Table 2 to find the best
sample strategy for the training.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Results</title>
      <p>Macro-average ROC-AUC is calculated as the metrics in BirdCLEF 2024 challenge’s Leaderboard, denoted
as LB which consists of two variants of public and private. Table 3 presents the experimental results
ofmodel trained with knowledge distillation method and unlabeled dataset. In our experiment, adding
both unlabeled dataset and knowledge distillation to training significantly improves both Public LB
and Private LB of single model. In addition, utilizing self-supervised unlabeled dataset extracted with
model trained with semi-supervised unlabeled dataset further improves both Public LB and Private LB.
Models with diferent model type and diferent cnn encoder share similar LB score.</p>
      <p>From Table 3, we can see that adding type 2 dataset significantly improves the model performance,
which means that model trained with unlabeled dataset is more adaptive to fully-annotated dataset,
implying that unlabeled dataset share similar distribution with fully-annotated dataset. Applying
knowledge distillation also improves the model performance, implying that soft target is an efective
way to decrease the label noise in train audio. Further training the model with self-supervised unlabeled
dataset improve the model performance. Self-supervised unlabeled dataset contains more birdcall
sample than semi-supervised unlabeled dataset, enabling further domain adaption for the model.</p>
    </sec>
    <sec id="sec-5">
      <title>5. Conclusion and future work</title>
      <p>In this study, we have presented a novel approach to address the challenge of domain shift in birdcall
recognition by leveraging semi-supervised and self-supervised soundscape labeling. Our method
utilizes existing of-the-shelf models, BirdNET and Bird Vocalization Classifier, to extract audio clips
with high probability of birdcall presence from the unlabeled unlabeled dataset. These semi-supervised
labels are then used to train our models, which are subsequently used to extract more audio clips in a
self-supervised manner.</p>
      <p>Our experimental results demonstrate that this approach significantly improves the performance of
our models, indicating that the unlabeled dataset shares a similar distribution with the fully-annotated
dataset. Furthermore, we find that applying knowledge distillation further enhances the performance,
suggesting that soft target is an efective way to decrease the label noise in training audio. Our solution
achieve a remarkable 7th rank among 974 teams at the BirdCLEF 2024 challenge hosted on Kaggle,
demonstrating its efectiveness. However, the study also revealed some areas for potential improvements.
We find that while adding unlabeled dataset significantly improved the model performance, the model
performance varied slightly with diferent sampling strategies.</p>
      <p>In future work, we plan to conduct further experiments to refine our approach. Specifically, we plan to
further explore and refine our sampling strategies to improve the model’s adaptability to the domain shift
in birdcall recognition. Furthermore, we aim to train our models with the semi-supervised unlabeled
dataset and then extract and train multiple times with the self-supervised unlabeled dataset. This
iterative process is expected to progressively improve the performance of our models by continuously
adapting them to the domain of the fully-annotated dataset.</p>
      <p>Through these eforts, we aim to further advance the field of birdcall recognition and contribute to
the development of more eficient, scalable, and cost-efective methods for monitoring bird populations.
in Signal Processing 13 (2018) 34–48. URL: https://ieeexplore.ieee.org/abstract/document/8567942.
doi:10.1109/JSTSP.2018.2885636.
[13] C. Henkel, P. Pfeifer, P. Singer, Recognizing bird species in diverse soundscapes under weak
supervision, 2021. URL: https://arxiv.org/abs/2107.07728. doi:10.48550/ARXIV.2107.07728.
[14] E. Martynov, Y. Uematsu, Dealing with class imbalance in bird sound classification, in: Working</p>
      <p>Notes of CLEF 2022 – Conference and Labs of the Evaluation Forum, 2022.
[15] G. Hinton, O. Vinyals, J. Dean, Distilling the knowledge in a neural network (2015).</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>A.</given-names>
            <surname>Joly</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Picek</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Kahl</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Goëau</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Espitalier</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Botella</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Deneu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Marcos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Estopinan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Leblanc</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Larcher</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Šulc</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Hrúz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Servajean</surname>
          </string-name>
          , et al.,
          <source>Overview of lifeclef</source>
          <year>2024</year>
          :
          <article-title>Challenges on species distribution prediction and identification</article-title>
          ,
          <source>in: International Conference of the CrossLanguage Evaluation Forum for European Languages</source>
          , Springer,
          <year>2024</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>S.</given-names>
            <surname>Kahl</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Denton</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Klinck</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Ramesh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Joshi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Srivathsa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Anand</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Arvind</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>CP</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Sawant</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V. V.</given-names>
            <surname>Robin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Glotin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Goëau</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.-P.</given-names>
            <surname>Vellinga</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Planqué</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Joly</surname>
          </string-name>
          , Overview of BirdCLEF 2024:
          <article-title>Acoustic identification of under-studied bird species in the western ghats</article-title>
          ,
          <source>Working Notes of CLEF 2024 - Conference and Labs of the Evaluation Forum</source>
          (
          <year>2024</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <article-title>[3] Xeno-canto: Sharing bird sounds from around the world</article-title>
          ,
          <year>2022</year>
          . URL: https://xeno-canto.org.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>M. V.</given-names>
            <surname>Conde</surname>
          </string-name>
          , U. Choi,
          <article-title>Few-shot long-tailed bird audio recognition</article-title>
          ,
          <source>in: Working Notes of CLEF 2022 - Conference and Labs of the Evaluation Forum</source>
          ,
          <year>2022</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>S.</given-names>
            <surname>Kahl</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Clapp</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Hopping</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Goëau</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Glotin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Planqué</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.-P.</given-names>
            <surname>Vellinga</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Joly</surname>
          </string-name>
          , Overview of birdclef 2020:
          <article-title>Bird sound recognition in complex acoustic environments (</article-title>
          <year>2020</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>S.</given-names>
            <surname>Kahl</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Denton</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Klinck</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Glotin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Goëau</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.-P.</given-names>
            <surname>Vellinga</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Planqué</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Joly</surname>
          </string-name>
          , Overview of birdclef 2021:
          <article-title>Bird call identification in soundscape recordings</article-title>
          ,
          <source>in: Working Notes of CLEF 2021 - Conference and Labs of the Evaluation Forum</source>
          ,
          <year>2021</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>S.</given-names>
            <surname>Kahl</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Navine</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Denton</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Klinck</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Hart</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Glotin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Goëau</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.-P.</given-names>
            <surname>Vellinga</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Planqué</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Joly</surname>
          </string-name>
          , Overview of birdclef 2022:
          <article-title>Endangered bird species recognition in soundscape recordings</article-title>
          ,
          <source>in: Working Notes of CLEF 2022 - Conference and Labs of the Evaluation Forum</source>
          ,
          <year>2022</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>S.</given-names>
            <surname>Kahl</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Denton</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Klinck</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Reers</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Cherutich</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Glotin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Goëau</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Vellinga</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Planqué</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Joly</surname>
          </string-name>
          , Overview of birdclef 2023:
          <article-title>Automated bird species identification in eastern africa</article-title>
          ,
          <source>in: Working Notes of CLEF 2023 - Conference and Labs of the Evaluation Forum</source>
          ,
          <year>2023</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>S.</given-names>
            <surname>Kahl</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. M.</given-names>
            <surname>Wood</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Eibl</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Klinck</surname>
          </string-name>
          ,
          <article-title>Birdnet: A deep learning solution for avian diversity monitoring</article-title>
          ,
          <source>Ecological Informatics</source>
          <volume>61</volume>
          (
          <year>2021</year>
          )
          <fpage>101236</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <article-title>Google, bird-vocalization-classifier,</article-title>
          <string-name>
            <surname>Kaggle</surname>
          </string-name>
          ,
          <year>2023</year>
          . URL: https://www.kaggle.com/models/google/ bird-vocalization-classifier.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>L.</given-names>
            <surname>Hong</surname>
          </string-name>
          ,
          <article-title>Acoustic bird species recognition at birdclef 2023: Training strategies for convolutional neural network and inference acceleration using openvino</article-title>
          ,
          <source>in: Working Notes of CLEF 2023 - Conference and Labs of the Evaluation Forum</source>
          ,
          <year>2023</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>S.</given-names>
            <surname>Adavanne</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Politis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Nikunen</surname>
          </string-name>
          , T. Virtanen,
          <article-title>Sound event localization and detection of overlapping sources using convolutional recurrent neural networks</article-title>
          ,
          <source>IEEE Journal of Selected Topics</source>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>