<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Individual Wildlife Recognition via Hybrid Global-Local Matching and Segmentation-Aware Filtering</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Roman Pakhomov</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Grigory Demidov</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Kristian Bogdan</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Svyatoslav Lanskikh</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Danis Dinmuhametov</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Andrey Khlopotnykh</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Central University (CU University)</institution>
          ,
          <addr-line>7 Gasheka Street, Moscow, 123056, Russian Federation</addr-line>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>National Research University Higher School of Economics (HSE University)</institution>
          ,
          <addr-line>11 Pokrovksy Bulvar, Moscow, 109028, Russian Federation</addr-line>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2025</year>
      </pub-date>
      <abstract>
        <p>Animal re-identification allows for non-invasive, scalable monitoring of wildlife by matching visual cues specific to each individual. In this paper, we present a mapping-oriented, open system that progressively filters and validates candidate images. We took the basic step-by-step solution presented in the competition and improved each step in it. First, we calculate global attachments to select a small subset of potentially suitable images for each query. Then we use a set of local feature matching tools, each of which contains a separate detector-descriptor pair and a matching algorithm, to obtain additional similarity estimates that capture the smallest visual details, such as unique markings and morphological patterns. Then we combine these estimates for each user using a studied, weighted synthesis mechanism that identifies the most reliable features for diferent types and shooting conditions. Finally, a calibrated confidence threshold allows you to separate previously seen individuals from new ones, ensuring reliable recognition when new animals are detected. We are evaluating based on the AnimalCLEF 2025 collections (loggerhead sea turtles, salamanders, and Eurasian lynx), and our system provides highly balanced accuracy for both known and unknown classes. The modular design makes it easy to adapt additional mapping devices or embedded models, demonstrating resistance to background interference, blockages, and variable shooting conditions. With this solution, we took the first place in the AnimalCLEF2025 competition.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Animal Re-identification</kwd>
        <kwd>Open-Set Identification</kwd>
        <kwd>Wildlife Conservation</kwd>
        <kwd>Computer Vision</kwd>
        <kwd>LifeCLEF 2025</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>Animal re-identification is a critical task in wildlife research and conservation, enabling the tracking of
individuals over time to study population dynamics, habitat use, migration patterns, and behavior. By
recognizing unique traits—such as markings, color patterns, or morphological features—researchers can
monitor species in a non-invasive, scalable manner. Automating this process not only accelerates data
collection but also enhances the consistency and scale at which individuals can be reliably tracked. These
capabilities are vital for identifying biodiversity threats and supporting evidence-based conservation
strategies.</p>
      <p>Despite recent progress in computer vision and machine learning, reliably identifying individual
animals remains challenging. Models often overfit to environmental cues such as background, lighting,
or camera angle—rather than focusing on species-specific, individual characteristics. This results in
poor generalization to new environments or image conditions, limiting the practical efectiveness of
many re-identification systems in real-world conservation settings.</p>
      <p>
        The AnimalCLEF 2025 challenge, part of the LifeCLEF 2025 evaluation campaign, addresses this
problem through the task of individual animal identification for three wildlife species: loggerhead sea
turtles (Caretta caretta [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]) sourced from Zakynthos, Greece; salamanders (Salamandra salamandra)
from the Czech Republic; and Eurasian lynxes (Lynx lynx) also from the Czech Republic. For each
test image, the objective is to determine whether the animal has been seen before (i.e., is present
CLEF 2025 Working Notes, 9 – 12 September 2025, Madrid, Spain
* Corresponding author.
†These authors contributed equally.
      </p>
      <p>
        © 2025 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
in the reference database) or is a new, previously unseen individual. If known, the correct identity
must be assigned. To aid generalization, participants are allowed to augment their models using the
WildlifeReID-10k dataset—a large-scale benchmark comprising over 140,000 images across 10,000+
individuals from a diverse set of species [
        <xref ref-type="bibr" rid="ref2 ref3">2, 3</xref>
        ].
      </p>
      <p>In this technical report, we present our solution to the AnimalCLEF 2025 challenge. Our pipeline
consists of four core stages: global candidate selection, local visual matching, score aggregation (bagging),
and novelty filtering. This modular structure combines coarse-to-fine similarity evaluation with a
confidence-based thresholding mechanism to distinguish between known and novel individuals. Our
approach balances precision and generalization while remaining robust across the three target species.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Work</title>
      <p>
        Several prior works address fine-grained wildlife classification and open-set identification. CNN
ensembles and metadata fusion prove efective for discriminating visually similar species. Local feature
matchers (e.g., SuperPoint[
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], DISK, ALIKED[
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] + LightGlue[
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]) combined with global descriptors yield
robust re-identification under varying viewpoints and background clutter. Calibration methods like
isotonic regression help distinguish known from unseen individuals. Our pipeline integrates these ideas
to handle both coarse filtering and fine-grained matching in an open-set setting.
      </p>
    </sec>
    <sec id="sec-3">
      <title>3. Evaluation</title>
      <sec id="sec-3-1">
        <title>3.1. Evaluation Metrics</title>
        <p>To properly evaluate the performance of our re-identification pipeline on both known and novel
individuals, we employ three metrics: Balanced Accuracy on Known Samples (BAKS), Balanced Accuracy
on Unknown Samples (BAUS), and their Geometric Mean (GeoMean).</p>
        <sec id="sec-3-1-1">
          <title>3.1.1. Balanced Accuracy on Known Samples (BAKS)</title>
          <p>The Balanced Accuracy on Known Samples (BAKS) quantifies the model’s performance on individuals
that are present in the training dataset. Unlike standard accuracy, BAKS is computed in a class-balanced
manner to mitigate the efects of class imbalance.</p>
          <p>Let  be the set of known classes. Then:</p>
          <p>BAKS =
1 ∑︁</p>
          <p>TP
|| ∈ TP + FN
,
where:
where:
nosep TP is the number of true positive predictions for unknown class ,
nosep FN is the number of false negatives for class .
nosep TP is the number of true positive predictions for class ,
nosep FN is the number of false negatives for class .</p>
        </sec>
        <sec id="sec-3-1-2">
          <title>3.1.2. Balanced Accuracy on Unknown Samples (BAUS)</title>
          <p>The Balanced Accuracy on Unknown Samples (BAUS) measures the model’s ability to correctly recognize
individuals from unseen classes, i.e., classes not present in the training dataset.</p>
          <p>Let  be the set of unknown classes. Then:</p>
          <p>BAUS =
1 ∑︁</p>
          <p>TP
| | ∈ TP + FN</p>
        </sec>
        <sec id="sec-3-1-3">
          <title>3.1.3. Overall Metric: Geometric Mean (GeoMean)</title>
          <p>To combine performance on both known and unknown samples, we compute the geometric mean of
BAKS and BAUS:</p>
          <p>GeoMean = √BAKS × BAUS.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Approach</title>
      <sec id="sec-4-1">
        <title>4.1. Data Preprocessing and Segmentation Strategy</title>
        <p>Efective individual identification in wildlife datasets is often hindered by visual noise, background
clutter, and inconsistencies in object localization. These issues are particularly prominent in the
AnimalCLEF2025 dataset, where species such as salamanders are often partially occluded by human
hands, and segmentation annotations are absent for several species (notably salamanders and sea turtles).
To address this, we implement a targeted data preprocessing strategy centered on segmentation-aware
data augmentation. For sea turtles, for instance, the ocean background in most images naturally isolates
the subject, efectively acting as an implicit segmentation mask. Lynxes are already segmented.</p>
        <sec id="sec-4-1-1">
          <title>4.1.1. Segmentation Model Fine-Tuning</title>
          <p>
            We construct a supplementary annotated dataset[
            <xref ref-type="bibr" rid="ref7">7</xref>
            ] of 817 salamander images to compensate for the lack
of precise segmentation masks. This dataset includes pixel-level annotations of the target individuals
and is used to fine-tune a YOLOv11m-seg model[
            <xref ref-type="bibr" rid="ref8">8</xref>
            ] —an instance-aware object segmentation network
with a modern convolutional backbone.
          </p>
          <p>The fine-tuned YOLOv11m-seg model significantly improves the localization quality of salamander
regions by learning to suppress irrelevant background areas, particularly human hands and other visual
artifacts. Qualitative inspection confirms more compact and accurate segmentation masks, which are
then used to crop or mask the input images during downstream embedding extraction.</p>
        </sec>
        <sec id="sec-4-1-2">
          <title>4.1.2. Impact on Re-identification Performance</title>
          <p>The implemented segmentation strategy significantly reduces the negative impact of background noise in
species where it is dominant. Most notably, the use of segmentation for salamanders yields a measurable
improvement in downstream matching metrics, confirming the hypothesis that accurate localization
of the individual is a crucial prerequisite for reliable embedding computation and similarity-based
identification.</p>
        </sec>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. Baseline Description</title>
        <p>Our baseline system integrates global deep descriptors and local keypoint-based matching to
address open-set individual identification. The implementation relies on the wildlife-datasets and
wildlife-tools libraries, designed to facilitate data loading, feature extraction, and similarity
computation for wildlife-related tasks. Throughout this work, the term baseline refers to the publicly available
solution Baseline with WildFusion.</p>
        <sec id="sec-4-2-1">
          <title>4.2.1. Pipeline Overview</title>
          <p>
            The pipeline consists of the following components: Our pipeline extracts global descriptors using the
MegaDescriptor-L-384[
            <xref ref-type="bibr" rid="ref9">9</xref>
            ] model from the timm library[
            <xref ref-type="bibr" rid="ref10">10</xref>
            ], performs local keypoint detection and
matching with ALIKED features and MatchLightGlue, fuses the resulting global and local similarity
scores via the WildFusion module, and finally calibrates the combined scores using isotonic regression.
          </p>
        </sec>
        <sec id="sec-4-2-2">
          <title>4.2.2. Global Descriptor Extraction</title>
          <p>Global features are extracted from a high-capacity vision transformer model MegaDescriptor-L-384,
pre-trained and fine-tuned to generate robust embeddings. These embeddings capture coarse-grained
appearance information and are compared using cosine similarity.</p>
          <p>Given two embeddings e, e ∈ R, cosine similarity is defined as:
simcos(e, e) =</p>
          <p>e · e .
‖e‖ ‖e‖</p>
        </sec>
        <sec id="sec-4-2-3">
          <title>4.2.3. Local Feature Extraction and Matching</title>
          <p>The baseline uses ALIKED as the local keypoint detector and descriptor. It is designed for accurate and
eficient keypoint extraction with descriptor computation. The extractor outputs dense descriptors d
and confidence scores.</p>
          <p>These local descriptors are matched using MatchLightGlue, an attention-based matcher inspired
by LightGlue. Let   denote the attention afinity between descriptor  in image  and  in image :
After filtering matches via mutual nearest neighbors, the local similarity score ℓ(, ) is:

exp(Wd · Wd )
  = ∑︀′ exp(Wd · Wd′</p>
          <p>) .
ℓ(, ) =
1</p>
          <p>∑︁
|ℳ(, )| (,)∈ℳ(,)
  .</p>
        </sec>
        <sec id="sec-4-2-4">
          <title>4.2.4. WildFusion Fusion Strategy</title>
          <p>
            The WildFusion[
            <xref ref-type="bibr" rid="ref11">11</xref>
            ] module computes the final similarity as the average of global and local scores.
          </p>
        </sec>
        <sec id="sec-4-2-5">
          <title>4.2.5. Score Calibration</title>
        </sec>
      </sec>
      <sec id="sec-4-3">
        <title>4.3. Disadvantages</title>
        <p>Fused similarity scores are passed through isotonic regression, fitting a monotonic mapping  to
minimize squared error between calibrated scores and ground truth labels.</p>
        <p>The basic level has a number of limitations. Initially, the approach presents only one matching method,
the embedding model may not suit all species equally, the similarity matrices are averaged, and the lack
of initial segmentation of salamanders creates background noise.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Methodology</title>
      <p>The proposed solution encompasses a four-stage process:
Identify top- global embedding matches.</p>
      <p>Compare candidates using multiple extractor–matcher pairs.
Combine per-pair scores via learned weights ,.
Apply threshold  to calibrated scores to flag new individuals.</p>
      <sec id="sec-5-1">
        <title>5.1. Selection of Most Relevant Instances</title>
        <p>For each test sample , we compute its embedding e ∈ R using the pre-trained embedding model.
Denote the set of all training embeddings as {etrain}=1. We compute cosine distances:
We then select the top- training samples with the smallest distances:
cos(︀ e, etrain)︀ = 1 −</p>
        <p>e · etrain
‖e‖ ‖etrain‖ .
() = arg min top{︀ cos(e, etrain)}︀ =1.</p>
        <p>These  candidates form the candidate set for local matching.</p>
        <p>
          We have replaced MegaDescriptor-L-384 with a higher-quality miewid-msv3 [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ].
        </p>
      </sec>
      <sec id="sec-5-2">
        <title>5.2. Local Feature Matching</title>
        <p>Each test image  is compared against each candidate  ∈ () using several “extractor + matcher”
pairs. Let</p>
        <p>
          ℰ = {KeyNetAfNetHardNet , DISK (Kornia[
          <xref ref-type="bibr" rid="ref13">13</xref>
          ]), SuperPoint, ALIKED, DISK (wildlife_tools)}
be the set of feature extractors, and
        </p>
        <p>ℳ = {AdaLAM, GADMatcher, MatchLightGlue}
be the set of matchers. For each extractor  ∈ ℰ and matcher  ∈ ℳ, we compute a local similarity
matrix:</p>
        <p>,(, ) = LocalSim,(, ),
where LocalSim, denotes the local matching procedure with extractor  and matcher . Each entry
of the local similarity matrix corresponds to a confidence-weighted sum of matched keypoints.</p>
        <p>The selection of the Kornia library was motivated by its provision of models capable of handling afine
transformations, which expanded the comparison capabilities beyond the single matcher
(MatchLightGlue) and limited set of extractors (SIFT, SuperPoint, ALIKED, DISK) available in wildlife_tools.</p>
      </sec>
      <sec id="sec-5-3">
        <title>5.3. Bagging (Aggregation)</title>
        <p>We replaced the ineficient averaging of the proximity matrix with a weighted vote.</p>
        <p>Let ,(, ) be the similarity score for a given pair (, ) obtained from the local similarity matrix
corresponding to each (, ).</p>
        <p>We introduce a set of coeficients, ,, where each coeficient is associated with a unique combination
of  and . These coeficients allow us to assign diferent weights to the similarity scores from each
local matrix. The final similarity score (, ) is then computed as a weighted mean of these per-pair
scores:
(, ) =</p>
        <p>The optimal values for these coeficients, ,, are determined through an optimization process.
This optimization is performed on a training dataset using scipy.optimize.minimize[14] with the
Powell method. The objective of this optimization is to minimize the geomin metric as defined by
Baks and Baus, ensuring that the final similarity score (, ) is well-calibrated for the given task.</p>
      </sec>
      <sec id="sec-5-4">
        <title>5.4. Novelty Detection</title>
        <p>Based on the aggregated similarity agg(, ), the most probable entity (predicted individual) and its
corresponding confidence score are identified for each test image. To ascertain new individuals (i.e.,
those not present in the training dataset), a confidence threshold(  ) is established.
The threshold  is chosen to maximize GeoMean on the validation set:
and assign:</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>6. Results</title>
      <p>
        ⎧⎨arg max (, ), if max() ≥ ,
 ∈ ()
⎩new_individual, otherwise.
 * = arg max √︀BAKS( ) ×
 ∈ [
        <xref ref-type="bibr" rid="ref1">0,1</xref>
        ]
BAUS( ).
      </p>
      <p>Score</p>
    </sec>
    <sec id="sec-7">
      <title>7. Discussion</title>
      <p>We explore alternative embeddings (CLIP[16], DinoV2[17]) and dense matchers (LoFTR[18]). While
these show promise, our species-specific pipeline remains most consistent. Future work includes
species-routing hybrid matchers for further gains.</p>
    </sec>
    <sec id="sec-8">
      <title>8. Acknowledgments</title>
      <p>We thank the LifeCLEF organizers for the AnimalCLEF 2025 datasets and the WildlifeReID-10k auxiliary
data. We also acknowledge developers of wildlife-datasets and wildlife-tools.</p>
    </sec>
    <sec id="sec-9">
      <title>9. Declaration on Generative AI</title>
      <p>In preparing this work, the authors used GPT-4o to check grammar and spelling. In addition, the authors
use GPT-4o to translate text into English. After using these tools/services, the authors reviewed and
edited the content as needed and are solely responsible for the content of the publication.
[14] P. Virtanen, R. Gommers, T. E. Oliphant, M. Haberland, T. Reddy, D. Cournapeau, E. Burovski,
P. Peterson, W. Weckesser, J. Bright, S. J. van der Walt, M. Brett, J. Wilson, K. J. Millman, N. Mayorov,
A. R. J. Nelson, E. Jones, R. Kern, E. Larson, C. J. Carey, İ. Polat, Y. Feng, E. W. Moore, J. VanderPlas,
D. Laxalde, J. Perktold, R. Cimrman, I. Henriksen, E. A. Quintero, C. R. Harris, A. M. Archibald,
A. H. Ribeiro, F. Pedregosa, P. van Mulbregt, SciPy 1.0 Contributors, SciPy 1.0: Fundamental
Algorithms for Scientific Computing in Python, Nature Methods 17 (2020) 261–272. doi: 10.1038/
s41592-019-0686-2.
[15] The solution is available online at https://github.com/XXXM1R0XXX/AnimalCLEF2025, ????
[16] G. Ilharco, M. Wortsman, R. Wightman, C. Gordon, N. Carlini, R. Taori, A. Dave, V. Shankar,
H. Namkoong, J. Miller, H. Hajishirzi, A. Farhadi, L. Schmidt, Openclip, 2021. URL: https://doi.org/
10.5281/zenodo.5143773. doi:10.5281/zenodo.5143773, if you use this software, please cite it
as below.
[17] M. Oquab, T. Darcet, T. Moutakanni, H. V. Vo, M. Szafraniec, V. Khalidov, P. Fernandez, D. Haziza,
F. Massa, A. El-Nouby, R. Howes, P.-Y. Huang, H. Xu, V. Sharma, S.-W. Li, W. Galuba, M. Rabbat,
M. Assran, N. Ballas, G. Synnaeve, I. Misra, H. Jegou, J. Mairal, P. Labatut, A. Joulin, P. Bojanowski,
Dinov2: Learning robust visual features without supervision, 2023.
[18] J. Sun, Z. Shen, Y. Wang, H. Bao, X. Zhou, LoFTR: Detector-free local feature matching with
transformers, CVPR (2021).</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>L.</given-names>
            <surname>Adam</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Čermák</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Papafitsoros</surname>
          </string-name>
          , L. Picek,
          <article-title>Seaturtleid2022: A long-span dataset for reliable sea turtle re-identification</article-title>
          ,
          <source>in: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision</source>
          ,
          <year>2024</year>
          , pp.
          <fpage>7146</fpage>
          -
          <lpage>7156</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>V.</given-names>
            <surname>Čermák</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Picek</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Adam</surname>
          </string-name>
          , K. Papafitsoros,
          <article-title>WildlifeDatasets: An Open-Source Toolkit for Animal Re-Identification</article-title>
          ,
          <source>in: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)</source>
          ,
          <year>2024</year>
          , pp.
          <fpage>5953</fpage>
          -
          <lpage>5963</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>L.</given-names>
            <surname>Adam</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Čermák</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Papafitsoros</surname>
          </string-name>
          , L. Picek, Wildlifereid-10k:
          <article-title>Wildlife re-identification dataset with 10k individual animals</article-title>
          ,
          <source>arXiv preprint arXiv:2406.09211</source>
          (
          <year>2024</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>D.</given-names>
            <surname>DeTone</surname>
          </string-name>
          , T. Malisiewicz,
          <string-name>
            <given-names>A.</given-names>
            <surname>Rabinovich</surname>
          </string-name>
          , Superpoint:
          <article-title>Self-supervised interest point detection and description</article-title>
          ,
          <source>in: CVPR Deep Learning for Visual SLAM Workshop</source>
          ,
          <year>2018</year>
          . URL: http://arxiv.org/ abs/1712.07629.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>X.</given-names>
            <surname>Zhao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. C. Y.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <article-title>Aliked: A lighter keypoint and descriptor extraction network via deformable transformation</article-title>
          ,
          <source>IEEE Transactions on Instrumentation Measurement</source>
          <volume>72</volume>
          (
          <year>2023</year>
          )
          <fpage>1</fpage>
          -
          <lpage>16</lpage>
          . URL: https://arxiv.org/pdf/2304.03608.pdf. doi:
          <volume>10</volume>
          .1109/TIM.
          <year>2023</year>
          .
          <volume>3271000</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>P.</given-names>
            <surname>Lindenberger</surname>
          </string-name>
          , P.-E. Sarlin, M. Pollefeys, LightGlue: Local Feature Matching at Light Speed, in: ICCV,
          <year>2023</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7] GG, remove background dataset,
          <year>2025</year>
          . URL: https://universe.roboflow.com/gg-gcd3a/
          <article-title>remove-background-nvy0p-kyxht</article-title>
          ,
          <source>visited on 2025-06-05.</source>
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>G.</given-names>
            <surname>Jocher</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Qiu</surname>
          </string-name>
          , Ultralytics yolo11, https://github.com/ultralytics/ultralytics,
          <year>2024</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>V.</given-names>
            <surname>Čermák</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Picek</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Adam</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Papafitsoros</surname>
          </string-name>
          ,
          <string-name>
            <surname>Wildlifedatasets:</surname>
          </string-name>
          <article-title>An open-source toolkit for animal re-identification</article-title>
          ,
          <source>in: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision</source>
          ,
          <year>2024</year>
          , pp.
          <fpage>5953</fpage>
          -
          <lpage>5963</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>R.</given-names>
            <surname>Wightman</surname>
          </string-name>
          , Pytorch image models, https://github.com/rwightman/pytorch-image-models,
          <year>2019</year>
          . doi:
          <volume>10</volume>
          .5281/zenodo.4414861.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>V.</given-names>
            <surname>Cermak</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Picek</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Adam</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Neumann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Matas</surname>
          </string-name>
          , Wildfusion:
          <article-title>Individual animal identification with calibrated similarity fusion</article-title>
          ,
          <source>arXiv preprint arXiv:2408.12934</source>
          (
          <year>2024</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>C. X.</given-names>
            <surname>Labs</surname>
          </string-name>
          ,
          <article-title>miewid-msv3: Multi-species vertebrate identification model</article-title>
          , https://huggingface.co/ conservationxlabs/miewid-msv3,
          <year>2024</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>E.</given-names>
            <surname>Riba</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Mishkin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Ponsa</surname>
          </string-name>
          , E. Rublee, G. Bradski,
          <string-name>
            <surname>Kornia:</surname>
          </string-name>
          <article-title>An open source diferentiable computer vision library for pytorch</article-title>
          ,
          <source>in: Winter Conference on Applications of Computer Vision</source>
          ,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>