<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Towards Deep Active Learning in Avian Bioacoustics</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Lukas Rauch</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Denis Huseljic</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Moritz Wirth</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jens Decke</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Bernhard Sick</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Christoph Scholz</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>IEE</institution>
          ,
          <addr-line>Fraunhofer Insitute, Kassel</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>IES, University of Kassel</institution>
          ,
          <addr-line>Kassel</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <fpage>12</fpage>
      <lpage>17</lpage>
      <abstract>
        <p>Passive acoustic monitoring (PAM) in avian bioacoustics enables cost-efective and extensive data collection with minimal disruption to natural habitats. Despite advancements in computational avian bioacoustics, deep learning models continue to encounter challenges in adapting to diverse environments in practical PAM scenarios. This is primarily due to the scarcity of annotations, which requires labor-intensive eforts from human experts. Active learning (AL) reduces annotation cost and speed ups adaption to diverse scenarios by querying the most informative instances for labeling. This paper outlines a deep AL approach, introduces key challenges, and conducts a small-scale pilot study.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Deep Active Learning</kwd>
        <kwd>Avian Bioacoustics</kwd>
        <kwd>Passive Acoustic Monitoring</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
    </sec>
    <sec id="sec-2">
      <title>2. Related Work</title>
      <p>
        DL has enhanced bird species recognition from vocalizations in the context of biodiversity monitoring.
Current SOTA approaches BirdNET [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], Google’s Perch [
        <xref ref-type="bibr" rid="ref1 ref7">7, 1</xref>
        ], and BirdSet [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] have set benchmarks
in bird sound classification. While initial studies focused on model performance on focal recordings,
research is increasingly shifting towards practical PAM scenarios [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. In such environments, ARUs are
proving efective for edge deployment for continuous soundscape analysis [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. Research indicates that
pre-trained models facilitate few-shot and transfer learning in data-scarce environments by providing
valuable feature embeddings for rapid prototyping and eficient inference [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. While deep AL is suited
for quick model adaptation, its application in avian bioacoustics is still emerging. Bellafkir et al. [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] have
integrated AL into edge-based systems for bird species identification, employing reliability scores and
ensemble predictions to refine misclassifications through human feedback. This approach highlights the
necessity for research into the application of deep AL and multi-label classification in avian bioacoustics.
However, comparing these results is challenging because they utilize test datasets that are not publicly
available and employ custom AL strategies [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ].
      </p>
    </sec>
    <sec id="sec-3">
      <title>3. Active Learning in Bird Sound Classification</title>
      <p>
        Motivation. In PAM, a feature vector x ∈  represents a -dimensional instance, originating from
either a focal recording where  = ℱ , or a soundscape recording with  = . Focal recordings are
extensively available on the citizen-science platform Xeno-Canto (XC) [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] with a global collection of
over 800,000 recordings, making them particularly suitable for model training. Large-scale bird sound
classification models (e.g., BirdNET[
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]) are primarily trained on focals. These multi-class recordings
feature isolated bird vocalizations where each instance x is associated with a class label  ∈ , where
 = {1, ..., }. The focal data distribution is denoted as Focal(x, ). However, annotations from XC
often come with weak labels, lacking precise vocalization timestamps. As noted by Van Merriënboer
et al. [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ], evaluating on focals does not adequately reflect a model’s generalization performance in
realworld PAM scenarios, rendering them unsuitable for assessing deployment capabilities. Soundscape
recordings are passively recorded in specific regions, capturing the entire acoustic environment for
PAM projects using static ARUs over extended periods. For instance, the High Sierra Nevada (HSN) [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]
dataset includes long-duration soundscapes with precise labels and timestamps from multiple recording
sites. Soundscapes are treated as multi-label tasks and are valuable for assessing model deployment
in real-world PAM. Each instance x is associated with multiple class labels  ∈ , represented by
a one-hot encoded multi-label vector y = [1, . . . ,  ] ∈ [
        <xref ref-type="bibr" rid="ref1">0, 1</xref>
        ] . An instance can contain no bird
sounds, represented by a zero-vector y = 0 ∈ R . Soundscapes’ limited scale and the extensive
annotation efort make them less suitable for large-scale model training. We denote the soundscape
data distribution as Scape(x, y). The disparity in data distributions, Scape(x, y) ̸= Focal(x, ), leads
to a distribution shift that impacts the performance of SOTA bioacoustic models trained on focals
when deployed in PAM. Additionally, highly diverse deployment conditions in PAM projects - such as
background noise, recording devices, and their locations - also lead to domain diferences within and
between soundscape recordings. These variations further highlight the need for compact models that
can quickly and easily adapt to changing environments. Thus, we argue that using labeled soundscapes
in novel deployment scenarios for fine-tuning the model is vital. Therefore, we propose deep AL to
enable fast model adaption to various PAM scenarios.
      </p>
      <p>
        Our approach. Our approach is detailed in Figure 1. We leverage the BirdSet dataset collection [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]
to ensure comparability. We consider a multi-label classification problem, where we equip a model with
a pre-trained feature extractor h :  → R with parameters  that maps the inputs x to feature
embeddings h(x). Additionally, we utilize a classification head f  : R → R with parameters
  at cycle iteration  that maps the feature embeddings h(x) to class probabilities via the sigmoid
function. The resulting class probabilities are denoted by pˆ =  (f  (h(x)), where pˆ ∈ R represents
the probabilities for each class in a binary classification problem. We introduce a pool-based AL setting
with an unlabeled pool  () ⊆  and a labeled pool data set ℒ() ⊆  ×  . The pool consists of
soundscapes from PAM projects, allowing the model to adapt to the unique acoustic features of new
sites and improve performance across various scenarios. During each cycle iteration , the query
strategy compiles the most informative instances into a batch ℬ() ⊂  () of size . We represent
an annotated batch as ℬ* () ∈  ×  . We update the unlabeled pool  (+1) =  () ∖ ℬ() and the
labeled pool ℒ(+1) = ℒ() ∪ ℬ* () by adding the annotated batch. At each iteration , the model   is
retrained using the binary cross entropy loss  (x, y), resulting in the updated model parameters
 +1. The process continues until a budget  is exhausted.
      </p>
    </sec>
    <sec id="sec-4">
      <title>4. Experiments</title>
      <p>
        Setup. We employ Google’s Perch as the pre-trained feature extractor with a feature dimensionality of
 = 1280, following Ghani et al. [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. Each iteration of the AL cycle involves initializing and training
the last DNN layer for 200 epochs using the Rectified Adam optimizer [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] (batch size: 128, learning
rate: 0.05, weight decay: 0.0001) with a cosine annealing scheduler [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]. The hyperparameters are
empirically determined with convergence on random train samples as done in [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ]. We utilize the HSN
dataset [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ] from BirdSet [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], consisting of 5, 280 5-second soundscape segments from the initial
three days of recordings for our unlabeled pool. Thus, we simulate practical deployment scenario
where we initially collect data from various recording sites that we want to quickly adapt the model
to and reduce annotation efort. Subsequently, we utilize 6, 720 segments from the last two days for
testing model performance. Initially, 10 instances are selected randomly, followed by 50 iterations
of =10 acquisitions each, totaling a budget of =510. We benchmark against Random acquisitions
and use Typiclust [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ] and Badge[
        <xref ref-type="bibr" rid="ref17">17</xref>
        ] as diversity-based and hybrid strategies, respectively. As an
uncertainty-based strategy, we employ the mean Entropy of all binary predictions. The efectiveness
of each strategy is assessed by analyzing the learning curves through a collection of threshold-free
metrics [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]: T1-accuracy, class-based mean average precision (cmAP), and area under the receiver
operating characteristic curve (AUROC). The metrics are computed on the test dataset post-training in
each cycle, with learning curve improvements averaged over ten repetitions for consistency.
Results. We present the improvement curves for the metric collection in Figure 2. The results
demonstrate that no single strategy is universally superior across all metrics. However, nearly all
metrics show enhanced performance compared to Random. Notably, Typiclust displays strong
performance across all metrics at the start of the deep AL cycle, supporting the findings of [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ] that
a diverse selection is beneficial at the cycle’s onset. However, its efectiveness diminishes over time
when diversity becomes less crucial. Conversely, except for the AUROC metric where Entropy initially
performs poorly but strongly improves over time, Entropy outperforms in all iterations for cmAP and
T1-Acc, showing a consistent improvement over Random of up to 15%.
      </p>
    </sec>
    <sec id="sec-5">
      <title>5. Open Challenges and Limitations</title>
      <p>This pilot study explores the use of deep AL to tailor avian bioacoustic models for various deployment
scenarios in PAM. Although the initial results are encouraging, they remain preliminary. Several key
om0.02
d
n
aR 0.00
o
t
e
cn 0.02
e
r
e
iffD 0.04</p>
      <p>Badge
0.15
challenges, which are outlined below, need to be addressed to fully realize the potential of deep AL in
this field.</p>
      <p>
        Pool creation. The limited availability of soundscape data, which is primarily used for model evaluation
[
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], poses challenges in creating pool datasets for deep AL. The process of generating a fine-tuning
training pool can afect class balance and raises concerns about the composition methodology.
Additionally, in scenarios where data are sourced from PAM projects, the variability in recording sites is
often not disclosed in publicly available datasets. This lack of information makes it challenging to create
a diverse and representative training pool that takes recording locations into account. To efectively
investigate deep AL, a transparent approach to dataset generation is essential.
      </p>
      <p>
        Deployment in practice. Deploying deep AL in real-world PAM environments requires addressing
several practical considerations. These include determining optimal batch sizes for data annotation
and efectively allocating the total budget. The labor-intensive and costly process of labeling PAM
recordings, which requires human expertise [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ], highlights the need for accurately estimating the
expected annotation efort. Additionally, exploring various deployment settings and tasks can reveal
the versatility and potential challenges of applying deep AL, leading to more efective and scalable
solutions for avian bioacoustics. For instance, tasks might involve not only classifying bird species but
also identifying specific call densities [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ], which would require modifications to the model evaluation
process.
      </p>
      <p>
        Evaluation. Traditional metrics such as AUROC, cmAP, and T1-Acc ofer a general overview of model
performance but may be inadequate in practice-specific scenarios, such as ensuring a high recall of a
specific species or identifying bird call density [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ]. A more nuanced approach to evaluating deep AL
models involves customizing metrics to align with practical objectives, such as consistently identifying
specific species. Enhancing evaluation methodologies to capture these specialized requirements is
crucial for advancing the efectiveness of deep AL in real-world PAM applications.
      </p>
    </sec>
    <sec id="sec-6">
      <title>6. Conclusion</title>
      <p>
        In this work, we demonstrated the potential of deep active learning (AL) in computational avian
bioacoustics. We showed how deep AL can be integrated into real-world passive acoustic monitoring
by utilizing BirdSet, where a rapid model adaption through fine-tuning on soundscape recordings
is advantageous for the identification of bird species. Our results indicate that employing selection
strategies in deep AL enhances model performance and accelerates adaptation compared to random
sampling. For future work, we aim to expand the implementation of deep AL in avian bioacoustics
utilizing all datasets from the BirdSet dataset collection to provide more robust performance insights
and explore additional query strategies [
        <xref ref-type="bibr" rid="ref13 ref20">13, 20</xref>
        ].
12–17
      </p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>J.</given-names>
            <surname>Hamer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Triantafillou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B. Van</given-names>
            <surname>Merriënboer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Kahl</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Klinck</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Denton</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Dumoulin</surname>
          </string-name>
          ,
          <article-title>BIRB: A Generalization Benchmark for Information Retrieval in Bioacoustics</article-title>
          ,
          <source>CoRR</source>
          (
          <year>2023</year>
          ). URL: https: //doi.org/10.48550/arXiv.2312.07439.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>S.</given-names>
            <surname>Kahl</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. M.</given-names>
            <surname>Wood</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Eibl</surname>
          </string-name>
          , H. Klinck,
          <article-title>BirdNET: A deep learning solution for avian diversity monitoring</article-title>
          ,
          <source>Ecological Informatics</source>
          <volume>61</volume>
          (
          <year>2021</year>
          )
          <article-title>101236</article-title>
          . URL: https://doi.org/10.1016/j.ecoinf.
          <year>2021</year>
          .
          <volume>101236</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>B.</given-names>
            <surname>Ghani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Denton</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Kahl</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Klinck</surname>
          </string-name>
          ,
          <article-title>Feature Embeddings from Large-Scale Acoustic Bird Classifiers Enable Few-Shot Transfer Learning</article-title>
          ,
          <source>CoRR</source>
          (
          <year>2023</year>
          ). doi: https://10.48550/arXiv. 2307.06292.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>J.</given-names>
            <surname>Decke</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Gruhl</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Rauch</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Sick</surname>
          </string-name>
          , DADO --
          <article-title>Low-cost query strategies for deep active design optimization</article-title>
          ,
          <source>in: 2023 International Conference on Machine Learning and Applications (ICMLA)</source>
          , IEEE,
          <year>2023</year>
          , pp.
          <fpage>1611</fpage>
          -
          <lpage>1618</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>L.</given-names>
            <surname>Rauch</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Schwinger</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Wirth</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Sick</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Tomforde</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Scholz</surname>
          </string-name>
          , Active Bird2Vec:
          <article-title>Towards End-to-End Bird Sound Monitoring with Transformers</article-title>
          ,
          <source>CoRR</source>
          (
          <year>2023</year>
          ). URL: https://doi.org/10. 48550/arXiv.2308.07121.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>L.</given-names>
            <surname>Rauch</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Schwinger</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Wirth</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Heinrich</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Huseljic</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Lange</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Kahl</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Sick</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Tomforde</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Scholz</surname>
          </string-name>
          ,
          <article-title>Birdset: A dataset and benchmark for classification in avian bioacoustics</article-title>
          ,
          <source>CoRR</source>
          (
          <year>2024</year>
          ). doi:https://10.48550/arXiv.2403.10380.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>T.</given-names>
            <surname>Denton</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Wisdom</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. R.</given-names>
            <surname>Hershey</surname>
          </string-name>
          ,
          <article-title>Improving Bird Classification with Unsupervised Sound Separation</article-title>
          , in: ICASSP 2022
          <article-title>-</article-title>
          2022 IEEE International Conference on Acoustics,
          <source>Speech and Signal Processing (ICASSP)</source>
          , IEEE,
          <year>2022</year>
          , pp.
          <fpage>636</fpage>
          -
          <lpage>640</lpage>
          . URL: https://doi.org/10.1109/ICASSP43922.
          <year>2022</year>
          .
          <volume>9747202</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>J.</given-names>
            <surname>Höchst</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Bellafkir</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Lampe</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Vogelbacher</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Mühling</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Schneider</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Lindner</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Rösner</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. G.</given-names>
            <surname>Schabo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Farwig</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Freisleben</surname>
          </string-name>
          , Bird@Edge:
          <article-title>Bird Species Recognition at the Edge</article-title>
          ,
          <source>in: Networked Systems</source>
          , volume
          <volume>13464</volume>
          ,
          <string-name>
            <surname>Cham</surname>
          </string-name>
          ,
          <year>2022</year>
          , pp.
          <fpage>69</fpage>
          -
          <lpage>86</lpage>
          . URL: https://doi.org/10.1007/ 978-3-
          <fpage>031</fpage>
          -17436-
          <issue>0</issue>
          _
          <fpage>6</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>H.</given-names>
            <surname>Bellafkir</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Vogelbacher</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Schneider</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Mühling</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Korfhage</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Freisleben</surname>
          </string-name>
          ,
          <article-title>Edge-Based Bird Species Recognition via Active Learning</article-title>
          ,
          <source>in: Networked Systems</source>
          , volume
          <volume>14067</volume>
          , Springer Nature Switzerland, Cham,
          <year>2023</year>
          , pp.
          <fpage>17</fpage>
          -
          <lpage>34</lpage>
          . doi:
          <volume>10</volume>
          .1007/978-3-
          <fpage>031</fpage>
          -37765-
          <issue>5</issue>
          _
          <fpage>2</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>W.</given-names>
            <surname>Vellinga</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Planqué</surname>
          </string-name>
          ,
          <article-title>The xeno-canto collection and its relation to sound recognition and classification, CEUR-WS</article-title>
          .org,
          <year>2015</year>
          . URL: https://xeno-canto.org/.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <surname>B. Van Merriënboer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Hamer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Dumoulin</surname>
          </string-name>
          , E. Triantafillou, T. Denton,
          <article-title>Birds, Bats and beyond: Evaluating generalization in bioacoustic models</article-title>
          ,
          <source>CoRR</source>
          (
          <year>2024</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>L.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Jiang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>He</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Gao</surname>
          </string-name>
          , J. Han,
          <article-title>On the variance of the adaptive learning rate and beyond</article-title>
          , in: International Conference on Learning Representations,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>D.</given-names>
            <surname>Huseljic</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Hahn</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Herde</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Rauch</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Sick</surname>
          </string-name>
          ,
          <article-title>Fast fishing: Approximating bait for eficient and scalable deep active image classification</article-title>
          ,
          <source>CoRR</source>
          (
          <year>2024</year>
          ). doi: https://10.48550/arXiv.2404. 08981.
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>D.</given-names>
            <surname>Huseljic</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Herde</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Hahn</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Sick</surname>
          </string-name>
          ,
          <article-title>Role of hyperparameters in deep active learning</article-title>
          ,
          <source>in: Workshop on Interactive Adaptive Learning @ ECML PKDD</source>
          ,
          <year>2023</year>
          , pp.
          <fpage>19</fpage>
          -
          <lpage>24</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>S.</given-names>
            <surname>Kahl</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. M.</given-names>
            <surname>Wood</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Chaon</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. Z.</given-names>
            <surname>Peery</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Klinck</surname>
          </string-name>
          ,
          <article-title>A collection of fully-annotated soundscape recordings from the western united states</article-title>
          ,
          <year>2022</year>
          . URL: https://doi.org/10.5281/zenodo.7050014.
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>G.</given-names>
            <surname>Hacohen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Dekel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Weinshall</surname>
          </string-name>
          ,
          <article-title>Active learning on a budget: Opposite strategies suit high and low budgets</article-title>
          ,
          <source>in: International Conference on Machine Learning</source>
          ,
          <year>2022</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>J. T.</given-names>
            <surname>Ash</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Krishnamurthy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Langford</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Agarwal</surname>
          </string-name>
          ,
          <article-title>Deep batch active learning by diverse, uncertain gradient lower bounds</article-title>
          ,
          <source>in: International Conference on Learning Representations</source>
          ,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>D.</given-names>
            <surname>Stowell</surname>
          </string-name>
          ,
          <article-title>Computational bioacoustics with deep learning: A review and roadmap</article-title>
          ,
          <source>CoRR</source>
          (
          <year>2021</year>
          ). URL: https://doi.org/10.48550/arXiv.2112.06725.
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <surname>A. K. Navine</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          <string-name>
            <surname>Denton</surname>
            ,
            <given-names>M. J.</given-names>
          </string-name>
          <string-name>
            <surname>Weldy</surname>
            ,
            <given-names>P. J.</given-names>
          </string-name>
          <string-name>
            <surname>Hart</surname>
          </string-name>
          ,
          <article-title>All thresholds barred: Direct estimation of call density in bioacoustic data</article-title>
          ,
          <source>Frontiers in Bird Science</source>
          <volume>3</volume>
          (
          <year>2024</year>
          ). doi:
          <volume>10</volume>
          .3389/fbirs.
          <year>2024</year>
          .
          <volume>1380636</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>L.</given-names>
            <surname>Rauch</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Aßenmacher</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Huseljic</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Wirth</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Bischl</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Sick</surname>
          </string-name>
          ,
          <article-title>Activeglae: A benchmark for deep active learning with transformers</article-title>
          ,
          <source>in: Machine Learning and Knowledge Discovery in Databases: Research Track</source>
          , Springer Nature Switzerland,
          <year>2023</year>
          , p.
          <fpage>55</fpage>
          -
          <lpage>74</lpage>
          . URL: https://doi.org/10. 1007/978-3-
          <fpage>031</fpage>
          -43412-
          <issue>9</issue>
          _
          <fpage>4</fpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>