<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Towards Deep Active Learning in Avian Bioacoustics</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Lukas</forename><surname>Rauch</surname></persName>
							<affiliation key="aff0">
								<orgName type="department">IES</orgName>
								<orgName type="institution">University of Kassel</orgName>
								<address>
									<settlement>Kassel</settlement>
									<country key="DE">Germany</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Denis</forename><surname>Huseljic</surname></persName>
							<affiliation key="aff0">
								<orgName type="department">IES</orgName>
								<orgName type="institution">University of Kassel</orgName>
								<address>
									<settlement>Kassel</settlement>
									<country key="DE">Germany</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Moritz</forename><surname>Wirth</surname></persName>
							<affiliation key="aff0">
								<orgName type="department">IES</orgName>
								<orgName type="institution">University of Kassel</orgName>
								<address>
									<settlement>Kassel</settlement>
									<country key="DE">Germany</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Jens</forename><surname>Decke</surname></persName>
							<affiliation key="aff0">
								<orgName type="department">IES</orgName>
								<orgName type="institution">University of Kassel</orgName>
								<address>
									<settlement>Kassel</settlement>
									<country key="DE">Germany</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Bernhard</forename><surname>Sick</surname></persName>
							<affiliation key="aff0">
								<orgName type="department">IES</orgName>
								<orgName type="institution">University of Kassel</orgName>
								<address>
									<settlement>Kassel</settlement>
									<country key="DE">Germany</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Christoph</forename><surname>Scholz</surname></persName>
							<affiliation key="aff0">
								<orgName type="department">IES</orgName>
								<orgName type="institution">University of Kassel</orgName>
								<address>
									<settlement>Kassel</settlement>
									<country key="DE">Germany</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Workshop</forename><surname>Ceur</surname></persName>
						</author>
						<author>
							<persName><surname>Proceedings</surname></persName>
						</author>
						<author>
							<affiliation key="aff1">
								<orgName type="department">IEE</orgName>
								<orgName type="institution">Fraunhofer Insitute</orgName>
								<address>
									<settlement>Kassel</settlement>
									<country key="DE">Germany</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Towards Deep Active Learning in Avian Bioacoustics</title>
					</analytic>
					<monogr>
						<idno type="ISSN">1613-0073</idno>
					</monogr>
					<idno type="MD5">51FF743B8BA1B33065386B078418D1CC</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2025-04-23T17:23+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>Deep Active Learning</term>
					<term>Avian Bioacoustics</term>
					<term>Passive Acoustic Monitoring</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>Passive acoustic monitoring (PAM) in avian bioacoustics enables cost-effective and extensive data collection with minimal disruption to natural habitats. Despite advancements in computational avian bioacoustics, deep learning models continue to encounter challenges in adapting to diverse environments in practical PAM scenarios. This is primarily due to the scarcity of annotations, which requires labor-intensive efforts from human experts. Active learning (AL) reduces annotation cost and speed ups adaption to diverse scenarios by querying the most informative instances for labeling. This paper outlines a deep AL approach, introduces key challenges, and conducts a small-scale pilot study.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.">Introduction</head><p>Avian diversity is a key indicator of environmental health. Passive acoustic monitoring (PAM) in avian bioacoustics leverages mobile autonomous recording units (ARUs) to gather large volumes of soundscape recordings with minimal disruption to avian habitats. While this method is cost-effective and minimally invasive, the analysis of these recordings is labor-intensive and requires expert annotation. Recent advancements in deep learning (DL) primarily process these passive recordings by classifying bird vocalizations. Particularly, feature embeddings from large bird sound classification models (e.g., Google's Perch <ref type="bibr" target="#b0">[1]</ref> or BirdNET <ref type="bibr" target="#b1">[2]</ref>) have effectively enabled few-shot learning in scenarios with limited training data <ref type="bibr" target="#b2">[3]</ref>. These state-of-the-art (SOTA) models are trained using supervised learning on nearly 10,000 bird species from multi-class focal recordings that isolate individual bird sounds. However, practical PAM scenarios involve processing diverse multi-label soundscapes with overlapping sounds and varying background noise. Proper feature embeddings for edge deployment necessitate fine-tuning, which relies on labeled training data that is both time-consuming and costly to obtain for soundscapes.</p><p>Deep active learning (AL) addresses this challenge by actively querying the most informative instances to maximize performance gains <ref type="bibr" target="#b3">[4]</ref>. However, research on deep AL in avian bioacoustics is still limited, and the problem needs to be contextualized with comparable datasets <ref type="bibr" target="#b4">[5]</ref>. Additionally, the domain presents unique practical challenges, including adapting models from focals to soundscapes (i.e., multiclass to multi-label) in imbalanced and highly diverse scenarios <ref type="bibr" target="#b5">[6]</ref>. Consequently, we introduce the problem of deep AL in avian bioacoustics and propose an efficient fine-tuning approach for model deployment. Our contributions are: Contributions 1. We introduce deep active learning (AL) to avian bioacoustics, highlighting challenges and proposing a practical framework. 2. We conduct an initial feasibility study based on the dataset collection Birdset <ref type="bibr" target="#b5">[6]</ref>, showcasing the benefits of deep AL. Additionally, we release the dataset and code.</p><p>IAL@ECML-PKDD'24: 8 th Intl. Worksh. &amp; Tutorial on Interactive Adaptive Learning, Sep. 9 th , 2024, Vilnius, Lithuania lukas.rauch@uni-kassel.de (L. Rauch)</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">Related Work</head><p>DL has enhanced bird species recognition from vocalizations in the context of biodiversity monitoring. Current SOTA approaches BirdNET <ref type="bibr" target="#b1">[2]</ref>, Google's Perch <ref type="bibr" target="#b6">[7,</ref><ref type="bibr" target="#b0">1]</ref>, and BirdSet <ref type="bibr" target="#b5">[6]</ref> have set benchmarks in bird sound classification. While initial studies focused on model performance on focal recordings, research is increasingly shifting towards practical PAM scenarios <ref type="bibr" target="#b5">[6]</ref>. In such environments, ARUs are proving effective for edge deployment for continuous soundscape analysis <ref type="bibr" target="#b7">[8]</ref>. Research indicates that pre-trained models facilitate few-shot and transfer learning in data-scarce environments by providing valuable feature embeddings for rapid prototyping and efficient inference <ref type="bibr" target="#b2">[3]</ref>. While deep AL is suited for quick model adaptation, its application in avian bioacoustics is still emerging. Bellafkir et al. <ref type="bibr" target="#b8">[9]</ref> have integrated AL into edge-based systems for bird species identification, employing reliability scores and ensemble predictions to refine misclassifications through human feedback. This approach highlights the necessity for research into the application of deep AL and multi-label classification in avian bioacoustics. However, comparing these results is challenging because they utilize test datasets that are not publicly available and employ custom AL strategies <ref type="bibr" target="#b8">[9]</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">Active Learning in Bird Sound Classification</head><p>Motivation. In PAM, a feature vector x ∈ 𝒳 represents a 𝐷-dimensional instance, originating from either a focal recording where 𝒳 = ℱ, or a soundscape recording with 𝒳 = 𝒮. Focal recordings are extensively available on the citizen-science platform Xeno-Canto (XC) <ref type="bibr" target="#b9">[10]</ref> with a global collection of over 800,000 recordings, making them particularly suitable for model training. Large-scale bird sound classification models (e.g., BirdNET <ref type="bibr" target="#b1">[2]</ref>) are primarily trained on focals. These multi-class recordings feature isolated bird vocalizations where each instance x is associated with a class label 𝑦 ∈ 𝒴, where 𝒴 = {1, ..., 𝐶}. The focal data distribution is denoted as 𝑝 Focal (x, 𝑦). However, annotations from XC often come with weak labels, lacking precise vocalization timestamps. As noted by Van Merriënboer et al. <ref type="bibr" target="#b10">[11]</ref>, evaluating on focals does not adequately reflect a model's generalization performance in realworld PAM scenarios, rendering them unsuitable for assessing deployment capabilities. Soundscape recordings are passively recorded in specific regions, capturing the entire acoustic environment for PAM projects using static ARUs over extended periods. For instance, the High Sierra Nevada (HSN) <ref type="bibr" target="#b1">[2]</ref> dataset includes long-duration soundscapes with precise labels and timestamps from multiple recording sites. Soundscapes are treated as multi-label tasks and are valuable for assessing model deployment in real-world PAM. Each instance x is associated with multiple class labels 𝑦 ∈ 𝒴, represented by a one-hot encoded multi-label vector y = [𝑦 1 , . . . , 𝑦 𝐶 ] ∈ [0, 1] 𝐶 . An instance can contain no bird sounds, represented by a zero-vector y = 0 ∈ R 𝐶 . Soundscapes' limited scale and the extensive annotation effort make them less suitable for large-scale model training. We denote the soundscape data distribution as 𝑝 Scape (x, y). The disparity in data distributions, 𝑝 Scape (x, y) ̸ = 𝑝 Focal (x, 𝑦), leads to a distribution shift that impacts the performance of SOTA bioacoustic models trained on focals when deployed in PAM. Additionally, highly diverse deployment conditions in PAM projects -such as background noise, recording devices, and their locations -also lead to domain differences within and between soundscape recordings. These variations further highlight the need for compact models that can quickly and easily adapt to changing environments. Thus, we argue that using labeled soundscapes in novel deployment scenarios for fine-tuning the model is vital. Therefore, we propose deep AL to enable fast model adaption to various PAM scenarios. Our approach. Our approach is detailed in Figure <ref type="figure" target="#fig_0">1</ref>. We leverage the BirdSet dataset collection <ref type="bibr" target="#b5">[6]</ref> to ensure comparability. We consider a multi-label classification problem, where we equip a model with a pre-trained feature extractor h 𝜔 : 𝒳 → R 𝐷 with parameters 𝜔 that maps the inputs x to feature embeddings h 𝜔 (x). Additionally, we utilize a classification head f 𝜃𝑡 : R 𝐷 → R 𝐶 with parameters 𝜃 𝑡 at cycle iteration 𝑡 that maps the feature embeddings h 𝜔 (x) to class probabilities via the sigmoid function. The resulting class probabilities are denoted by p ˆ= 𝜎(f 𝜃𝑡 (h 𝜔 (x)), where p ˆ∈ R 𝐶 represents the probabilities for each class in a binary classification problem. We introduce a pool-based AL setting </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.">Experiments</head><p>Setup. We employ Google's Perch as the pre-trained feature extractor with a feature dimensionality of 𝐷 = 1280, following Ghani et al. <ref type="bibr" target="#b2">[3]</ref>. Each iteration of the AL cycle involves initializing and training the last DNN layer for 200 epochs using the Rectified Adam optimizer <ref type="bibr" target="#b11">[12]</ref> (batch size: 128, learning rate: 0.05, weight decay: 0.0001) with a cosine annealing scheduler <ref type="bibr" target="#b12">[13]</ref>. The hyperparameters are empirically determined with convergence on random train samples as done in <ref type="bibr" target="#b13">[14]</ref>. We utilize the HSN dataset <ref type="bibr" target="#b14">[15]</ref> from BirdSet <ref type="bibr" target="#b5">[6]</ref>, consisting of 5, 280 5-second soundscape segments from the initial three days of recordings for our unlabeled pool. Thus, we simulate practical deployment scenario where we initially collect data from various recording sites that we want to quickly adapt the model to and reduce annotation effort. Subsequently, we utilize 6, 720 segments from the last two days for testing model performance. Initially, 10 instances are selected randomly, followed by 50 iterations of 𝑏=10 acquisitions each, totaling a budget of 𝐵=510. We benchmark against Random acquisitions and use Typiclust <ref type="bibr" target="#b15">[16]</ref> and Badge <ref type="bibr" target="#b16">[17]</ref> as diversity-based and hybrid strategies, respectively. As an uncertainty-based strategy, we employ the mean Entropy of all binary predictions. The effectiveness of each strategy is assessed by analyzing the learning curves through a collection of threshold-free metrics <ref type="bibr" target="#b5">[6]</ref>: T1-accuracy, class-based mean average precision (cmAP), and area under the receiver operating characteristic curve (AUROC). The metrics are computed on the test dataset post-training in each cycle, with learning curve improvements averaged over ten repetitions for consistency. Results. We present the improvement curves for the metric collection in Figure <ref type="figure" target="#fig_2">2</ref>. The results demonstrate that no single strategy is universally superior across all metrics. However, nearly all metrics show enhanced performance compared to Random. Notably, Typiclust displays strong performance across all metrics at the start of the deep AL cycle, supporting the findings of <ref type="bibr" target="#b15">[16]</ref> that a diverse selection is beneficial at the cycle's onset. However, its effectiveness diminishes over time when diversity becomes less crucial. Conversely, except for the AUROC metric where Entropy initially performs poorly but strongly improves over time, Entropy outperforms in all iterations for cmAP and T1-Acc, showing a consistent improvement over Random of up to 15%.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.">Open Challenges and Limitations</head><p>This pilot study explores the use of deep AL to tailor avian bioacoustic models for various deployment scenarios in PAM. Although the initial results are encouraging, they remain preliminary. Several key  challenges, which are outlined below, need to be addressed to fully realize the potential of deep AL in this field.</p><p>Pool creation. The limited availability of soundscape data, which is primarily used for model evaluation <ref type="bibr" target="#b5">[6]</ref>, poses challenges in creating pool datasets for deep AL. The process of generating a fine-tuning training pool can affect class balance and raises concerns about the composition methodology. Additionally, in scenarios where data are sourced from PAM projects, the variability in recording sites is often not disclosed in publicly available datasets. This lack of information makes it challenging to create a diverse and representative training pool that takes recording locations into account. To effectively investigate deep AL, a transparent approach to dataset generation is essential.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Deployment in practice.</head><p>Deploying deep AL in real-world PAM environments requires addressing several practical considerations. These include determining optimal batch sizes for data annotation and effectively allocating the total budget. The labor-intensive and costly process of labeling PAM recordings, which requires human expertise <ref type="bibr" target="#b17">[18]</ref>, highlights the need for accurately estimating the expected annotation effort. Additionally, exploring various deployment settings and tasks can reveal the versatility and potential challenges of applying deep AL, leading to more effective and scalable solutions for avian bioacoustics. For instance, tasks might involve not only classifying bird species but also identifying specific call densities <ref type="bibr" target="#b18">[19]</ref>, which would require modifications to the model evaluation process.</p><p>Evaluation. Traditional metrics such as AUROC, cmAP, and T1-Acc offer a general overview of model performance but may be inadequate in practice-specific scenarios, such as ensuring a high recall of a specific species or identifying bird call density <ref type="bibr" target="#b18">[19]</ref>. A more nuanced approach to evaluating deep AL models involves customizing metrics to align with practical objectives, such as consistently identifying specific species. Enhancing evaluation methodologies to capture these specialized requirements is crucial for advancing the effectiveness of deep AL in real-world PAM applications.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6.">Conclusion</head><p>In this work, we demonstrated the potential of deep active learning (AL) in computational avian bioacoustics. We showed how deep AL can be integrated into real-world passive acoustic monitoring by utilizing BirdSet, where a rapid model adaption through fine-tuning on soundscape recordings is advantageous for the identification of bird species. Our results indicate that employing selection strategies in deep AL enhances model performance and accelerates adaptation compared to random sampling. For future work, we aim to expand the implementation of deep AL in avian bioacoustics utilizing all datasets from the BirdSet dataset collection to provide more robust performance insights and explore additional query strategies <ref type="bibr" target="#b12">[13,</ref><ref type="bibr" target="#b19">20]</ref>. </p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Figure 1 :</head><label>1</label><figDesc>Figure 1: Proposed deep AL cycle in avian bioacoustics with exemplary tasks from BirdSet[6].</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_2"><head>Figure 2 :</head><label>2</label><figDesc>Figure 2: Improvement curves of deep AL selection strategies Badge, Entropy, and Typiclust over Random with the metric collection a) AUROC, b) cmAP and c) T1-Acc. The results are averaged over ten randomly initialized repetitions to ensure consistency and the standard deviation is displayed.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head></head><label></label><figDesc>Lukas Rauch et al. CEUR Workshop Proceedings 12-17</figDesc><table /></figure>
		</body>
		<back>
			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<monogr>
		<title level="m" type="main">BIRB: A Generalization Benchmark for Information Retrieval in Bioacoustics</title>
		<author>
			<persName><forename type="first">J</forename><surname>Hamer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Triantafillou</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Van Merriënboer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Kahl</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Klinck</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Denton</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Dumoulin</surname></persName>
		</author>
		<idno type="DOI">10.48550/arXiv.2312.07439</idno>
		<ptr target="https://doi.org/10.48550/arXiv.2312.07439" />
		<imprint>
			<date type="published" when="2023">2023</date>
			<publisher>CoRR</publisher>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">BirdNET: A deep learning solution for avian diversity monitoring</title>
		<author>
			<persName><forename type="first">S</forename><surname>Kahl</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><forename type="middle">M</forename><surname>Wood</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Eibl</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Klinck</surname></persName>
		</author>
		<idno type="DOI">10.1016/j.ecoinf.2021.101236</idno>
		<ptr target="https://doi.org/10.1016/j.ecoinf.2021.101236" />
	</analytic>
	<monogr>
		<title level="j">Ecological Informatics</title>
		<imprint>
			<biblScope unit="volume">61</biblScope>
			<biblScope unit="page">101236</biblScope>
			<date type="published" when="2021">2021</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">Feature Embeddings from Large-Scale Acoustic Bird Classifiers Enable Few-Shot Transfer Learning</title>
		<author>
			<persName><forename type="first">B</forename><surname>Ghani</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Denton</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Kahl</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Klinck</surname></persName>
		</author>
		<idno type="DOI">10.48550/arXiv.2307.06292</idno>
	</analytic>
	<monogr>
		<title level="j">CoRR</title>
		<imprint>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">DADO --Low-cost query strategies for deep active design optimization</title>
		<author>
			<persName><forename type="first">J</forename><surname>Decke</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Gruhl</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Rauch</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Sick</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">2023 International Conference on Machine Learning and Applications (ICMLA)</title>
				<imprint>
			<publisher>IEEE</publisher>
			<date type="published" when="2023">2023</date>
			<biblScope unit="page" from="1611" to="1618" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<monogr>
		<author>
			<persName><forename type="first">L</forename><surname>Rauch</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Schwinger</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Wirth</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Sick</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Tomforde</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Scholz</surname></persName>
		</author>
		<idno type="DOI">10.48550/arXiv.2308.07121</idno>
		<ptr target="https://doi.org/10.48550/arXiv.2308.07121" />
		<title level="m">Active Bird2Vec: Towards End-to-End Bird Sound Monitoring with Transformers</title>
				<imprint>
			<publisher>CoRR</publisher>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">Birdset: A dataset and benchmark for classification in avian bioacoustics</title>
		<author>
			<persName><forename type="first">L</forename><surname>Rauch</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Schwinger</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Wirth</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Heinrich</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Huseljic</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Lange</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Kahl</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Sick</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Tomforde</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Scholz</surname></persName>
		</author>
		<idno type="DOI">10.48550/arXiv.2403.10380</idno>
	</analytic>
	<monogr>
		<title level="j">CoRR</title>
		<imprint>
			<date type="published" when="2024">2024</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">Improving Bird Classification with Unsupervised Sound Separation</title>
		<author>
			<persName><forename type="first">T</forename><surname>Denton</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Wisdom</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">R</forename><surname>Hershey</surname></persName>
		</author>
		<idno type="DOI">10.1109/ICASSP43922.2022.9747202</idno>
		<ptr target="https://doi.org/10.1109/ICASSP43922.2022.9747202" />
	</analytic>
	<monogr>
		<title level="m">ICASSP 2022 -2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)</title>
				<imprint>
			<publisher>IEEE</publisher>
			<date type="published" when="2022">2022</date>
			<biblScope unit="page" from="636" to="640" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">Bird@Edge: Bird Species Recognition at the Edge</title>
		<author>
			<persName><forename type="first">J</forename><surname>Höchst</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Bellafkir</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Lampe</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Vogelbacher</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Mühling</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Schneider</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Lindner</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Rösner</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><forename type="middle">G</forename><surname>Schabo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Farwig</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Freisleben</surname></persName>
		</author>
		<idno type="DOI">10.1007/978-3-031-17436-0_6</idno>
		<ptr target="https://doi.org/10.1007/978-3-031-17436-0_6" />
	</analytic>
	<monogr>
		<title level="m">Networked Systems</title>
				<meeting><address><addrLine>Cham</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2022">2022</date>
			<biblScope unit="volume">13464</biblScope>
			<biblScope unit="page" from="69" to="86" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">Edge-Based Bird Species Recognition via Active Learning</title>
		<author>
			<persName><forename type="first">H</forename><surname>Bellafkir</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Vogelbacher</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Schneider</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Mühling</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Korfhage</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Freisleben</surname></persName>
		</author>
		<idno type="DOI">10.1007/978-3-031-37765-5_2</idno>
	</analytic>
	<monogr>
		<title level="m">Networked Systems</title>
				<meeting><address><addrLine>Switzerland, Cham</addrLine></address></meeting>
		<imprint>
			<publisher>Springer Nature</publisher>
			<date type="published" when="2023">2023</date>
			<biblScope unit="volume">14067</biblScope>
			<biblScope unit="page" from="17" to="34" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<monogr>
		<title level="m" type="main">The xeno-canto collection and its relation to sound recognition and classification</title>
		<author>
			<persName><forename type="first">W</forename><surname>Vellinga</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Planqué</surname></persName>
		</author>
		<idno>CEUR-WS.org</idno>
		<ptr target="https://xeno-canto.org/" />
		<imprint>
			<date type="published" when="2015">2015</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<monogr>
		<author>
			<persName><forename type="first">B</forename><surname>Van Merriënboer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Hamer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Dumoulin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Triantafillou</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Denton</surname></persName>
		</author>
		<title level="m">Birds, Bats and beyond: Evaluating generalization in bioacoustic models</title>
				<imprint>
			<publisher>CoRR</publisher>
			<date type="published" when="2024">2024</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<analytic>
		<title level="a" type="main">On the variance of the adaptive learning rate and beyond</title>
		<author>
			<persName><forename type="first">L</forename><surname>Liu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Jiang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>He</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Liu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Gao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Han</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">International Conference on Learning Representations</title>
				<imprint>
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b12">
	<analytic>
		<title level="a" type="main">Fast fishing: Approximating bait for efficient and scalable deep active image classification</title>
		<author>
			<persName><forename type="first">D</forename><surname>Huseljic</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Hahn</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Herde</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Rauch</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Sick</surname></persName>
		</author>
		<idno type="DOI">10.48550/arXiv.2404.08981</idno>
	</analytic>
	<monogr>
		<title level="j">CoRR</title>
		<imprint>
			<date type="published" when="2024">2024</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b13">
	<analytic>
		<title level="a" type="main">Role of hyperparameters in deep active learning</title>
		<author>
			<persName><forename type="first">D</forename><surname>Huseljic</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Herde</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Hahn</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Sick</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Workshop on Interactive Adaptive Learning @ ECML PKDD</title>
				<imprint>
			<date type="published" when="2023">2023</date>
			<biblScope unit="page" from="19" to="24" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b14">
	<monogr>
		<title level="m" type="main">A collection of fully-annotated soundscape recordings from the western united states</title>
		<author>
			<persName><forename type="first">S</forename><surname>Kahl</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><forename type="middle">M</forename><surname>Wood</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Chaon</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">Z</forename><surname>Peery</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Klinck</surname></persName>
		</author>
		<idno type="DOI">10.5281/zenodo.7050014</idno>
		<ptr target="https://doi.org/10.5281/zenodo.7050014" />
		<imprint>
			<date type="published" when="2022">2022</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b15">
	<analytic>
		<title level="a" type="main">Active learning on a budget: Opposite strategies suit high and low budgets</title>
		<author>
			<persName><forename type="first">G</forename><surname>Hacohen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Dekel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Weinshall</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">International Conference on Machine Learning</title>
				<imprint>
			<date type="published" when="2022">2022</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b16">
	<analytic>
		<title level="a" type="main">Deep batch active learning by diverse, uncertain gradient lower bounds</title>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">T</forename><surname>Ash</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Zhang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Krishnamurthy</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Langford</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Agarwal</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">International Conference on Learning Representations</title>
				<imprint>
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b17">
	<monogr>
		<title level="m" type="main">Computational bioacoustics with deep learning: A review and roadmap</title>
		<author>
			<persName><forename type="first">D</forename><surname>Stowell</surname></persName>
		</author>
		<idno type="DOI">10.48550/arXiv.2112.06725</idno>
		<ptr target="https://doi.org/10.48550/arXiv.2112.06725" />
		<imprint>
			<date type="published" when="2021">2021</date>
			<publisher>CoRR</publisher>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b18">
	<analytic>
		<title level="a" type="main">All thresholds barred: Direct estimation of Lukas Rauch et al</title>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">K</forename><surname>Navine</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Denton</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">J</forename><surname>Weldy</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><forename type="middle">J</forename><surname>Hart</surname></persName>
		</author>
		<idno type="DOI">10.3389/fbirs.2024.1380636</idno>
	</analytic>
	<monogr>
		<title level="m">CEUR Workshop Proceedings</title>
				<imprint>
			<date type="published" when="2024">2024</date>
			<biblScope unit="volume">3</biblScope>
		</imprint>
	</monogr>
	<note>12-17 call density in bioacoustic data</note>
</biblStruct>

<biblStruct xml:id="b19">
	<analytic>
		<title level="a" type="main">Activeglae: A benchmark for deep active learning with transformers</title>
		<author>
			<persName><forename type="first">L</forename><surname>Rauch</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Aßenmacher</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Huseljic</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Wirth</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Bischl</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Sick</surname></persName>
		</author>
		<idno type="DOI">10.1007/978-3-031-43412-9_4</idno>
		<ptr target="https://doi.org/10.1007/978-3-031-43412-9_4" />
	</analytic>
	<monogr>
		<title level="m">Machine Learning and Knowledge Discovery in Databases: Research Track</title>
				<meeting><address><addrLine>Nature Switzerland</addrLine></address></meeting>
		<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="2023">2023</date>
			<biblScope unit="page" from="55" to="74" />
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
