<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>Symposium of the Norwegian AI Society, June</journal-title>
      </journal-title-group>
      <issn pub-type="ppub">1613-0073</issn>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Model Adapters to Enable Robust and Semantic Underwater Exploration</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Changkyu Choi</string-name>
          <email>changkyu.choi@uit.no</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Arangan Subramaniam</string-name>
          <email>arangan.subramaniam@fys.uio.no</email>
          <xref ref-type="aff" rid="aff4">4</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Nils Olav Handegard</string-name>
          <email>nilsolav@hi.no</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ali Ramezani-Kebrya</string-name>
          <email>ali@ifi.uio.no</email>
          <xref ref-type="aff" rid="aff4">4</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Robert Jenssen</string-name>
          <email>robert.jenssen@uit.no</email>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Workshop</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Institute of Marine Research</institution>
          ,
          <addr-line>Bergen</addr-line>
          ,
          <country country="NO">Norway</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Machine Learning Group, Department of Physics and Technology, UiT The Arctic University of Norway</institution>
          ,
          <addr-line>Tromsø</addr-line>
          ,
          <country country="NO">Norway</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Norwegian Computing Center</institution>
          ,
          <addr-line>Oslo</addr-line>
          ,
          <country country="NO">Norway</country>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>Pioneer Centre for AI, Department of Computer Science, University of Copenhagen</institution>
          ,
          <addr-line>Copenhagen</addr-line>
          ,
          <country country="DK">Denmark</country>
        </aff>
        <aff id="aff4">
          <label>4</label>
          <institution>University of Oslo</institution>
          ,
          <addr-line>Oslo</addr-line>
          ,
          <country country="NO">Norway</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2025</year>
      </pub-date>
      <volume>1</volume>
      <fpage>7</fpage>
      <lpage>18</lpage>
      <abstract>
        <p>This position paper presents a framework for intelligent underwater exploration by marrying foundation models (FMs) with multi‑frequency echosounder data. Echosounder data capture backscattered acoustic signals across a range of frequencies, providing rich insights into underwater environments by exploiting the frequency‑dependent scattering properties of underwater targets. However, their heterogeneity and complex structure complicate analysis. To address these challenges, the paper introduces four key innovations aimed at improving echosounder data analysis under dynamic ocean conditions: (1) aligning multi‑frequency echosounder data with FMs via lightweight FM adapters, (2) enabling continual adaptation to temporal distribution shifts in dynamic marine environments, (3) designing semantic tokenizers that preserve spatial structures, and (4) efectively leveraging sparse annotations to minimize dependence on costly labeled data. For each research direction, we map recent artificial intelligence (AI) methodologies to marine acoustic challenges and outline concrete pathways for technology transfer. Preliminary experiments demonstrate that a Vision Transformer (ViT), pretrained on natural images in a self-supervised manner, can segment sandeel schools from multi‑frequency echosounder data without task‑specific retraining. These results substantiate the proposed framework and illustrate the potential of cross‑disciplinary AI methods for ecologically informative underwater exploration.</p>
      </abstract>
      <kwd-group>
        <kwd>Marine intelligence</kwd>
        <kwd>foundation models</kwd>
        <kwd>distribution shifts</kwd>
        <kwd>semantic tokenizers</kwd>
        <kwd>learning with limited labels</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        As the geopolitical and environmental importance of the ocean continues to rise, the demand for
intelligent, scalable, and autonomous marine monitoring systems is becoming increasingly urgent.
Despite their pivotal role in Earth’s environmental [
        <xref ref-type="bibr" rid="ref1 ref2">1, 2</xref>
        ] and economic stability [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], oceans remain
among the least explored environments. Recognizing their strategic value, maritime nations are
accelerating eforts to develop advanced monitoring systems. In this context, AI has emerged as a
transformative enabler of marine science, ofering new capabilities for processing, interpreting, and
integrating complicated and heterogeneous marine data.
      </p>
      <p>
        Multi-frequency echosounders [
        <xref ref-type="bibr" rid="ref4 ref5">4, 5</xref>
        ] represent one of the most efective technologies for underwater
observation, and the use of AI to analyze such data is an emerging and rapidly evolving field. By emitting
and receiving acoustic signals across a broad spectrum of frequencies, these instruments capture highly
detailed information about underwater targets. They are now widely deployed across diverse monitoring
platforms, providing valuable information across diverse underwater applications [
        <xref ref-type="bibr" rid="ref6 ref7 ref8">6, 7, 8</xref>
        ]. Despite
their growing importance, echosounder data present substantial analytical challenges that require
specific adaptations for AI-driven workflows. Interpretation remains heavily reliant on specialized
      </p>
      <p>CEUR</p>
      <p>
        ceur-ws.org
domain expertise and time-intensive manual processes [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], highlighting the need for intelligent systems
that can efectively support expert decision-making. However, the data are typically heterogeneous,
high-dimensional, and sparsely annotated, presenting serious limitations for the direct application of
conventional AI methodologies. Addressing these challenges calls for a holistic approach that builds
upon established AI methodologies, while extending them to capture complex structures and derive
context-aware representations tailored to the unique characteristics of echosounder data.
      </p>
      <p>
        In addition, while training models from scratch on echosounder data is a viable approach, it is often
neither eficient nor generalizable in practice. The inherent variability of underwater environments
makes it challenging for such models to generalize and perform robustly across diverse contexts [
        <xref ref-type="bibr" rid="ref10 ref11">10,
11</xref>
        ]. Compounding this issue, energy consumption presents a growing concern in AI research, with
Transformer-based models [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] serving as a prominent example. Their well-known data-hungry nature
requires large datasets [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ], leading to substantial computational overhead and, consequently, elevated
energy demands. This raises critical questions about the environmental footprint of AI systems [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ],
which is especially important in marine science, where the imperative for sustainable practices extends
beyond the ocean itself and into the computational methods we use to study it.
      </p>
      <p>In light of these challenges, recent breakthroughs in FMs [15, 16] ofer a more sustainable and scalable
alternative for analyzing echosounder data. Pretrained via self-supervised learning (SSL) on vast
and diverse datasets [17, 18], FMs have demonstrated a remarkable ability to extract general-purpose
representations that transfer well across domains [19, 20]. Leveraging such models of-the-shelf can
dramatically reduce the need for energy-intensive pretraining, aligning with global eforts to reduce
the environmental footprint of AI systems [21].</p>
      <p>To fully unlock the potential of FMs for marine acoustic analysis, an essential step is the development
of a lightweight adapter [22, 23, 24] that aligns multi-frequency echosounder data with the input
space of these pretrained FMs. Such an adapter would serve as an eficient intermediary, translating
domain-specific acoustic signals into a representation compatible with FMs originally trained on natural
image data. Crucially, by being significantly smaller and more adaptable than the backbone FM itself,
this adapter could enable continual learning [25, 26, 27], allowing the system to incrementally adjust
to varying environmental conditions without the need for full retraining of FMs. This paradigm
holds particular promise for real-world underwater monitoring, where environmental variability and
computational constraints demand adaptable, energy-eficient solutions. However, adapting
multifrequency echosounder data to the input space of FMs remains a non-trivial challenge. The fundamental
diferences between visual and acoustic modalities necessitate new strategies for aligning these data
types while preserving their semantic content. As such, developing efective adaptation methods
that bridge this modality gap constitutes a vital research direction toward scalable, intelligent, and
environmentally responsible underwater environment monitoring.</p>
      <p>
        This position paper presents a unified framework grounded in the FM paradigm, aimed at addressing
key challenges in echosounder data analysis. The framework introduces innovations in
(1) Aligning multi-frequency echosounder data with FMs using lightweight adapters,
(2) Enabling learning under limited label scenarios [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ],
(3) Capturing temporal variability in dynamic marine environments [28], and
(4) Designing semantic tokenizers that preserve the spatial characteristics of echosounder data [29].
Together, these components lay the groundwork for scalable and environmentally conscious AI systems
for advanced underwater environment monitoring. In the following sections, we provide a conceptual
overview of each component, with preliminary results for the (1) FM adapters included in Sec.2.
      </p>
    </sec>
    <sec id="sec-2">
      <title>2. Foundation Model Adapters</title>
      <p>ViT-based FMs have shown the ability to extract general-purpose representations that enable high
performance on downstream tasks with minimal supervision [15, 17, 18]. However, these models
are primarily trained on natural images with fixed spatial and channel structures, typically assuming
three-channel RGB inputs with consistent visual semantics. Applying FMs directly to multi-frequency
echosounder data presents several fundamental challenges due to key diferences in both modality and
representation compared to natural images.</p>
      <p>First, the input is inherently multi-channel and frequency-dependent. A single echosounder
observation may consist of multiple channels, each corresponding to a diferent acoustic frequency band that
captures distinct physical properties of underwater targets. For example, the Simrad EK80 wideband
echosounder system [30], commonly used in fisheries research and ecosystem monitoring, operates
across a frequency range of 10 to 500 kHz and is often configured to emit continuous wave (CW) pulses
at six distinct frequencies, 18, 38, 70, 120, 200, and 333 kHz. These frequency-specific responses cannot
be trivially collapsed without losing important semantic variation across channels. Second, the data
representation and intensity scaling difer significantly from RGB images. Echosounder intensities are
typically log-transformed into decibel (dB) scale and clipped to suppress outliers, resulting in pixel
values that commonly range from –100 to 0 dB. However, the actual intensity distribution varies with
environmental factors and sensor configurations, leading to inconsistent dynamic ranges across datasets.
To ensure compatibility with ViT-based FMs, which are sensitive to input distribution, the acoustic
data must be carefully standardized through normalization and intensity scaling. These discrepancies
motivate the need for a dedicated preprocessing strategy before applying FMs to echosounder data.</p>
      <p>To address these challenges, we explore the use of lightweight FM adapter modules designed to
align multi-frequency echosounder inputs with the input space of ViT. Our ultimate goal is to enable
semantic segmentation of echosounder data by combining a lightweight adapter module with FMs,
without relying on manually labeled datasets. In particular, we aim to adapt specific FMs for semantic
segmentation such as Segment Anything Model (SAM) [15], which typically consists of a ViT encoder that
extracts dense visual embeddings and a decoder that produces pixel-level segmentations conditioned
on flexible prompts. As a first step, we adapt unchanged ViT-DINO encoder [ 17], pretrained on natural
images, to multi-frequency echosounder data. Specifically, we keep the ViT-DINO encoder frozen and
train only a lightweight FM adapter module that aligns multi-frequency inputs with the model’s
threechannel input requirement. To adapt the six-channel echosounder data for input into the ViT-DINO
encoder, we explore both non-parametric and parametric dimensionality reduction methods. For the
non-parametric approach, we apply principal component analysis (PCA). The parametric approach
involves training convolutional autoencoders without pooling layers to preserve spatial resolution
across frequency bands, with the three-channel latent space serving as the input to the ViT-DINO
encoder. This adapter is trained independently, and the resulting three-channel latent representation is
processed directly to the frozen ViT-DINO encoder.</p>
      <p>
        For evaluation, we leverage the patch embeddings from the ViT-DINO encoder to construct a weighted
graph in the form of an afinity matrix, where pairwise relationships (edges) between embeddings (nodes)
are computed using a radial basis function (RBF) kernel [31]. We then apply Louvain’s community
detection algorithm [32] to partition the graph into semantically coherent subgraphs, without the need
to specify the number of clusters in advance. This graph-based approach is especially well suited to
the extreme class imbalance in echosounder data, where over 99% of observations may correspond to
empty water [
        <xref ref-type="bibr" rid="ref10 ref11 ref4">10, 11, 4, 33</xref>
        ], because it enables adaptive region discovery without specifying the number
of clusters, unlike  ‑means. Figure 1 illustrates the graph clustering structures.
      </p>
    </sec>
    <sec id="sec-3">
      <title>3. Toward Continual, Semantic, and Label-eficient Learning with</title>
    </sec>
    <sec id="sec-4">
      <title>Foundation Model Adapters</title>
      <p>
        FM adapters represent a flexible and powerful mechanism to bridge the gap between FMs and the unique
characteristics of echosounder data. As independent modules, FM adapters can also be specialized
to address domain-specific challenges, ofering the potential for broad applicability across diverse
challenges. Drawing inspiration from relevant literature, we outline three promising research directions
to enhance the utility of FM adapters: (1) enabling continual adaptation to temporal distribution
shifts [ 28, 26, 34], (2) developing semantically cohesive tokenization strategies [29, 35] , and (3) leveraging
limited annotated data efectively for prediction tasks [
        <xref ref-type="bibr" rid="ref11 ref4">4, 11</xref>
        ].
      </p>
      <sec id="sec-4-1">
        <title>3.1. Continual Adaptation to Temporal Variability</title>
        <p>Marine environments are inherently dynamic, with echosounder data distributions evolving over time
due to seasonal, climate, and ecological shifts. Conventional AI methods, which often assume stationary
data distributions, struggle to maintain performance in such non-stationary settings. To address this,
we propose FM adapters equipped with continual learning capabilities to adapt to temporal distribution
shifts [ 28, 34] in echosounder data. Continual learning [26] refers to the ability of a model to learn from
a stream of incoming data while preserving knowledge from previously seen distributions. A major
challenge in this setting is catastrophic forgetting [36], a phenomenon in which adapting to new data
significantly impairs the model’s ability to perform on previously learned tasks.</p>
        <p>Incorporating continual learning at the FM adapters ofers a modular and lightweight solution,
allowing FM adapters to evolve alongside changing data distributions without altering the pretrained
FM backbone. Importantly, continual learning in this context can be framed within the
importanceweighted empirical risk minimization (IW-ERM) framework [37], where the importance weights serve
as a quantitative measure of temporal distribution shift. These importance weights allow the model to
adjust its learning process dynamically, providing greater flexibility to adapt to temporal variability in
the data. This is especially relevant to echosounder data analysis, where longitudinal survey practices,
conducted at regular intervals, naturally induce temporal distribution shifts. In such scenarios, the
underlying data distribution can vary between survey campaigns due to changing environmental
conditions, often causing static models to underperform.</p>
        <p>In estimating importance weights within the IW-ERM framework, density ratio estimation (DSE) [38]
has emerged as a promising approach. DSE ofers a principled way to quantify the discrepancy between
past and current data distributions by estimating the ratio  =  new(,) . Recent DSE works [34,
 old(,)
28] provide efective strategies for estimating these ratios under distributional shifts, particularly in
distributed learning settings. Applied to FM adapter training, DSE improves the model’s ability to
handle data exhibiting temporal shifts, supporting resilient and adaptive learning in evolving underwater
environments.</p>
      </sec>
      <sec id="sec-4-2">
        <title>3.2. Semantic Tokenizer Design for Spatial Coherence</title>
        <p>
          Tokenization is a critical component of ViT, defining how input data is divided into discrete units for
downstream processing. Standard ViTs use fixed, grid-based patch tokenization [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ], which assumes
uniform spatial structure and is agnostic to content. While efective for structured natural images, this
approach is ill-suited for echosounder data, where meaningful acoustic patterns, such as fish schools,
tend to be sparse, morphologically irregular, and weakly localized [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ]. As a result, square patches may
split or dilute semantically important regions, limiting the interpretability and efectiveness of learned
representations [29].
        </p>
        <p>To address this challenge, we propose replacing the fixed patch tokenizer in ViT-based FMs with a
semantic tokenizer integrated into the FM adapter. This strategy preserves compatibility with the frozen
FMs, allowing us to retain their representation power while adapting tokenization to the structure of
echosounder data. Recent works [29, 35] ofer promising approaches for learning meaningful token
boundaries directly from the input. Aasan et al. [29] introduce a superpixel-based tokenizer that treats
tokenization as a modular, pluggable component, decoupled from the transformer backbone. Their
superpixel token merger enables eficient, online, content-aware tokenization with strong attribution
faithfulness and pixel-level granularity. Chen et al. [35] propose a subobject-level tokenizer using
boundary detection and watershed segmentation [39, 40] to generate compact, arbitrarily shaped tokens
aligned with part-level structures. These semantic tokens are then processed by the ViT encoder to
produce patch embeddings, enabling tasks such as zero-shot segmentation without retraining the
ViT. Ongoing research aims to develop a pipeline in which the adapter simultaneously performs
dimensionality reduction, semantic tokenization, and learns positional embeddings, facilitating seamless
integration with FM encoders.</p>
      </sec>
      <sec id="sec-4-3">
        <title>3.3. Eficient Learning from Limited Annotations</title>
        <p>
          Although semantic tokenizers can produce embeddings with pixel-level precision, downstream
interpretation still requires aligning these representations with target semantic classes. This is challenging
in echosounder data, where annotations are limited, costly, and often ambiguous [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ]. Semi-supervised
learning [
          <xref ref-type="bibr" rid="ref11 ref4">11, 4</xref>
          ] ofers a promising direction by leveraging both sparse labels and abundant unlabeled
data to learn task-relevant decision boundaries. Our goal is to guide the clustering of patch embeddings
using minimal annotation, so that the resulting structure aligns with meaningful semantic categories.
One published method for semi-supervised learning with echosounder data [
          <xref ref-type="bibr" rid="ref11 ref4">11, 4</xref>
          ] alternates between
unsupervised clustering and refinement using limited labeled samples. However, it relies on ofline
pseudo-label assignment. All embeddings are first clustered to generate pseudo-labels, and the model
is then trained to fit these fixed assignments. This limits adaptability and could be improved through
online learning strategies [41].
        </p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>4. Conclusion</title>
      <p>This position paper outlines a unified framework for leveraging FMs in echosounder data analysis
through lightweight and modular FM adapters. We address core challenges in modality adaptation,
temporal variability, semantic tokenization, and annotation eficiency, each of which is critical for
enabling scalable and adaptive underwater monitoring. Our preliminary experiments demonstrate that
ViT-based FMs, when combined with minimal adaptation, can perform meaningful analysis of acoustic
signals without task-specific supervision. The proposed research directions highlight the versatility of
FM adapters and their potential to support continual learning, spatially coherent representations, and
label-eficient learning in marine environments.</p>
    </sec>
    <sec id="sec-6">
      <title>Acknowledgments</title>
      <p>This work is funded by the Research Council of Norway (RCN) through two grants: Visual Intelligence
(309439) and CRIMAC–Marine Acoustic Abundance Estimation and Backscatter Classification (309512),
both of which are Norwegian Centres for Research-based Innovation (SFI).</p>
    </sec>
    <sec id="sec-7">
      <title>Declaration on Generative AI</title>
      <p>During the preparation of this work, the author(s) used ChatGPT in order to: Grammar and spelling
check, Paraphrase and reword. After using this service, the author(s) reviewed and edited the content
as needed and take(s) full responsibility for the publication’s content.
[15] A. Kirillov, E. Mintun, N. Ravi, H. Mao, C. Rolland, L. Gustafson, T. Xiao, S. Whitehead, A. C. Berg,
W.-Y. Lo, et al., Segment anything, in: Proceedings of the IEEE/CVF International Conference on
Computer Vision, 2023, pp. 4015–4026.
[16] M. Awais, M. Naseer, S. Khan, R. M. Anwer, H. Cholakkal, M. Shah, M.-H. Yang, F. S. Khan,
Foundation models defining a new era in vision: a survey and outlook, IEEE Transactions on
Pattern Analysis and Machine Intelligence (2025).
[17] M. Caron, H. Touvron, I. Misra, H. Jégou, J. Mairal, P. Bojanowski, A. Joulin, Emerging properties
in self-supervised vision transformers, in: Proceedings of the IEEE/CVF International Conference
on Computer Vision, 2021, pp. 9650–9660.
[18] M. Oquab, T. Darcet, T. Moutakanni, H. Vo, M. Szafraniec, V. Khalidov, P. Fernandez, D. Haziza,
F. Massa, A. El-Nouby, et al., DINOv2: Learning robust visual features without supervision,
Transactions on Machine Learning Research (2023).
[19] R. J. Chen, T. Ding, M. Y. Lu, D. F. Williamson, G. Jaume, A. H. Song, B. Chen, A. Zhang, D. Shao,
M. Shaban, et al., Towards a general-purpose foundation model for computational pathology,
Nature Medicine 30 (2024) 850–862.
[20] S. Qiu, B. Han, D. C. Maddix, S. Zhang, Y. Wang, A. G. Wilson, Transferring knowledge from large
foundation models to small downstream models, International Conference on Machine Learning
(2024).
[21] Q. Wang, Y. Li, R. Li, Ecological footprints, carbon emissions, and energy transitions: the impact
of artificial intelligence (ai), Humanities and Social Sciences Communications 11 (2024) 1–18.
[22] S. Chen, G. Long, J. Jiang, C. Zhang, Personalized adapter for large meteorology model on devices:
Towards weather foundation models, Advances in Neural Information Processing Systems 37
(2024) 84897–84943.
[23] S. Ahmadi, A. Cheraghian, M. Saberi, M. T. Abir, H. Dastmalchi, F. Hussain, S. Rahman, Foundation
model-powered 3d few-shot class incremental learning via training-free adaptor, in: Proceedings
of the Asian Conference on Computer Vision, 2024, pp. 2282–2299.
[24] F. Chen, M. V. Giufrida, S. A. Tsaftaris, Adapting vision foundation models for plant phenotyping,
in: Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023, pp. 604–613.
[25] Q. Wang, R. Wang, Y. Wu, X. Jia, D. Meng, CBA: Improving online continual learning via continual
bias adaptor, in: Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023,
pp. 19082–19092.
[26] L. Wang, X. Zhang, H. Su, J. Zhu, A comprehensive survey of continual learning: Theory, method
and application, IEEE Transactions on Pattern Analysis and Machine Intelligence (2024).
[27] S. A. Bidaki, A. Mohammadkhah, K. Rezaee, F. Hassani, S. Eskandari, M. Salahi, M. M.
Ghassemi, Online continual learning: A systematic literature review of approaches, challenges, and
benchmarks, arXiv preprint arXiv:2501.04897 (2025).
[28] Z. Wu, C. Choi, X. Cao, V. Cevher, A. Ramezani-Kebrya, Addressing label shift in distributed
learning via entropy regularization, International Conference on Learning Representations (2025).
[29] M. Aasan, O. Kolbjørnsen, A. S. Solberg, A. R. Rivera, A spitting image: Modular superpixel
tokenization in vision transformers, European Conference on Computer Vision MELEX Workshop
(2024).
[30] D. A. Demer, L. N. Andersen, C. Bassett, L. Berger, D. Chu, J. Condiotty, B. Hutton, R. Korneliussen,
N. L. Boufant, G. Macaulay, et al., 2016 USA–Norway EK80 Workshop Report: Evaluation of a
wideband echosounder for fisheries and marine ecosystem science, ICES Cooperative Research
Reports (CRR), 2017.
[31] B. Schölkopf, A. J. Smola, Learning with kernels: support vector machines, regularization,
optimization, and beyond, MIT press, 2018.
[32] P. De Meo, E. Ferrara, G. Fiumara, A. Provetti, Generalized Louvain method for community
detection in large networks, in: 2011 11th International Conference on Intelligent Systems Design
and Applications, IEEE, 2011, pp. 88–93.
[33] C. Choi, S. Yu, M. Kampfmeyer, A.-B. Salberg, N. O. Handegard, R. Jenssen, DIB-X: Formulating
explainability principles for a self-explainable model through information theoretic learning, in:
ICASSP 2024-2024 IEEE International Conference on Acoustics, Speech and Signal Processing
(ICASSP), IEEE, 2024, pp. 7170–7174.
[34] A. Ramezani-Kebrya, F. Liu, T. Pethick, G. Chrysos, V. Cevher, Federated learning under covariate
shifts with generalization guarantees, Transactions on Machine Learning Research (2023).
[35] D. Chen, S. Cahyawijaya, J. Liu, B. Wang, P. Fung, Subobject-level image tokenization, arXiv
preprint arXiv:2402.14327 (2024).
[36] J. L. McClelland, B. L. McNaughton, R. C. O’Reilly, Why there are complementary learning systems
in the hippocampus and neocortex: insights from the successes and failures of connectionist
models of learning and memory., Psychological review 102 (1995) 419.
[37] A. Gretton, A. Smola, J. Huang, M. Schmittfull, K. Borgwardt, B. Schölkopf, Covariate shift by
kernel mean matching, Dataset Shift in Machine Learning 3 (2009) 5.
[38] M. Sugiyama, T. Suzuki, T. Kanamori, Density ratio estimation in machine learning, Cambridge</p>
      <p>University Press, 2012.
[39] S. Beucher, Use of watersheds in contour detection, in: Proc. Int. Workshop on Image Processing,</p>
      <p>Sept. 1979, 1979, pp. 17–21.
[40] L. Vincent, P. Soille, Watersheds in digital spaces: an eficient algorithm based on immersion
simulations, IEEE Transactions on Pattern Analysis &amp; Machine Intelligence 13 (1991) 583–598.
[41] M. Caron, I. Misra, J. Mairal, P. Goyal, P. Bojanowski, A. Joulin, Unsupervised learning of visual
features by contrasting cluster assignments, Advances in Neural Information Processing Systems
33 (2020) 9912–9924.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>C.</given-names>
            <surname>Kuhlisch</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Shemi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Barak-Gavish</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Schatz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Vardi</surname>
          </string-name>
          ,
          <article-title>Algal blooms in the ocean: hot spots for chemically mediated microbial interactions</article-title>
          ,
          <source>Nature Reviews Microbiology</source>
          <volume>22</volume>
          (
          <year>2024</year>
          )
          <fpage>138</fpage>
          -
          <lpage>154</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>N. E.</given-names>
            <surname>Bosch</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Espino</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Tuya</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Haroun</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Bramanti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Otero-Ferrer</surname>
          </string-name>
          ,
          <article-title>Black coral forests enhance taxonomic and functional distinctiveness of mesophotic fishes in an oceanic island: implications for biodiversity conservation</article-title>
          ,
          <source>Scientific Reports</source>
          <volume>13</volume>
          (
          <year>2023</year>
          )
          <fpage>4963</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>S.</given-names>
            <surname>Gaines</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Cabral</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. M.</given-names>
            <surname>Free</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Golbuu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Arnason</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Battista</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Bradley</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Cheung</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Fabricius</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Hoegh-Guldberg</surname>
          </string-name>
          , et al.,
          <article-title>The expected impacts of climate change on the ocean economy, in: The Blue Compendium: From Knowledge to Action for a Sustainable Ocean Economy</article-title>
          , Springer,
          <year>2023</year>
          , pp.
          <fpage>15</fpage>
          -
          <lpage>50</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>C.</given-names>
            <surname>Choi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Kampfmeyer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N. O.</given-names>
            <surname>Handegard</surname>
          </string-name>
          ,
          <string-name>
            <surname>A.-B. Salberg</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Jenssen</surname>
          </string-name>
          ,
          <article-title>Deep semisupervised semantic segmentation in multifrequency echosounder data</article-title>
          ,
          <source>IEEE Journal of Oceanic Engineering</source>
          <volume>48</volume>
          (
          <year>2023</year>
          )
          <fpage>384</fpage>
          -
          <lpage>400</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>A.</given-names>
            <surname>Pala</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Oleynik</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Malde</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N. O.</given-names>
            <surname>Handegard</surname>
          </string-name>
          ,
          <article-title>Self-supervised feature learning for acoustic data analysis</article-title>
          ,
          <source>Ecological Informatics</source>
          <volume>84</volume>
          (
          <year>2024</year>
          )
          <fpage>102878</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>J. K.</given-names>
            <surname>Horne</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. A.</given-names>
            <surname>Swan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T. J.</given-names>
            <surname>Tracy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G. W.</given-names>
            <surname>Holtgrieve</surname>
          </string-name>
          ,
          <article-title>Automated acoustic monitoring of fish for near-real-time resource management</article-title>
          ,
          <source>ICES Journal of Marine Science</source>
          <volume>81</volume>
          (
          <year>2024</year>
          )
          <fpage>1412</fpage>
          -
          <lpage>1423</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>L. N.</given-names>
            <surname>Andersen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Chu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N. O.</given-names>
            <surname>Handegard</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Heimvoll</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Korneliussen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G. J.</given-names>
            <surname>Macaulay</surname>
          </string-name>
          , E. Ona,
          <string-name>
            <given-names>R.</given-names>
            <surname>Patel</surname>
          </string-name>
          , G. Pedersen,
          <article-title>Quantitative processing of broadband data as implemented in a scientific split-beam echosounder</article-title>
          ,
          <source>Methods in Ecology and Evolution</source>
          <volume>15</volume>
          (
          <year>2024</year>
          )
          <fpage>317</fpage>
          -
          <lpage>328</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>V.</given-names>
            <surname>Ntouskos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Mertikas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Mallios</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Karantzalos</surname>
          </string-name>
          ,
          <article-title>Seabed classification from multispectral multibeam data</article-title>
          ,
          <source>IEEE Journal of Oceanic Engineering</source>
          <volume>48</volume>
          (
          <year>2023</year>
          )
          <fpage>874</fpage>
          -
          <lpage>887</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>G.</given-names>
            <surname>Pedersen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Johnsen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Khodabandeloo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N. O.</given-names>
            <surname>Handegard</surname>
          </string-name>
          ,
          <article-title>Broadband backscattering by Atlantic herring (clupea harengus l</article-title>
          .)
          <article-title>difers when measured from a research vessel vs. a silent uncrewed surface vehicle</article-title>
          ,
          <source>ICES Journal of Marine Science</source>
          <volume>81</volume>
          (
          <year>2024</year>
          )
          <fpage>1362</fpage>
          -
          <lpage>1370</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>O.</given-names>
            <surname>Brautaset</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. U.</given-names>
            <surname>Waldeland</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Johnsen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Malde</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Eikvil</surname>
          </string-name>
          ,
          <string-name>
            <surname>A.-B. Salberg</surname>
            ,
            <given-names>N. O.</given-names>
          </string-name>
          <string-name>
            <surname>Handegard</surname>
          </string-name>
          ,
          <article-title>Acoustic classification in multifrequency echosounder data using deep convolutional neural networks</article-title>
          ,
          <source>ICES Journal of Marine Science</source>
          <volume>77</volume>
          (
          <year>2020</year>
          )
          <fpage>1391</fpage>
          -
          <lpage>1400</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>C.</given-names>
            <surname>Choi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Kampfmeyer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N. O.</given-names>
            <surname>Handegard</surname>
          </string-name>
          ,
          <string-name>
            <surname>A.-B. Salberg</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          <string-name>
            <surname>Brautaset</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          <string-name>
            <surname>Eikvil</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Jenssen</surname>
          </string-name>
          ,
          <article-title>Semi-supervised target classification in multi-frequency echosounder data</article-title>
          ,
          <source>ICES Journal of Marine Science</source>
          <volume>78</volume>
          (
          <year>2021</year>
          )
          <fpage>2615</fpage>
          -
          <lpage>2627</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>A.</given-names>
            <surname>Dosovitskiy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Beyer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Kolesnikov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Weissenborn</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Zhai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Unterthiner</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Dehghani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Minderer</surname>
          </string-name>
          , G. Heigold,
          <string-name>
            <given-names>S.</given-names>
            <surname>Gelly</surname>
          </string-name>
          , et al.,
          <article-title>An image is worth 16x16 words: Transformers for image recognition at scale</article-title>
          ,
          <source>International Conference on Learning Representations</source>
          (
          <year>2021</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>L.</given-names>
            <surname>Pandey</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Wood</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Wood</surname>
          </string-name>
          ,
          <article-title>Are vision transformers more data hungry than newborn visual systems?</article-title>
          ,
          <source>Advances in Neural Information Processing Systems</source>
          <volume>36</volume>
          (
          <year>2023</year>
          )
          <fpage>73104</fpage>
          -
          <lpage>73121</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>M.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Cai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Yin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Jin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <article-title>Resource-eficient algorithms and systems of foundation models: A survey</article-title>
          ,
          <source>ACM Computing Surveys</source>
          <volume>57</volume>
          (
          <year>2025</year>
          )
          <fpage>1</fpage>
          -
          <lpage>39</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>