<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Open-set Animal Re-identification via Multilevel Feature Fusion</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Jingyin Tan</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Aiguo Wang</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Foshan University</institution>
          ,
          <addr-line>Foshan, Guangdong</addr-line>
          ,
          <country country="CN">China</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>The AnimalCLEF2025 task aims to train a recognizer on the training set with good generalization ability for predicting whether the category of an animal image in the test dataset belongs to a known class or an unknown class, which is an open-set recognition problem. In this study, the fusion of deep features and low-level features is utilized to build an open-set animal recognition model. Specifically, the pretraining and fine-tuning scheme is adopted to learn high-level features, where the swin-base-patch4-window7-224 model is fine-tuned with the training set and deep features are then extracted from its feature representation layer. Second, we utilize two keypoint detection and descriptor extraction networks to obtain two sets of low-level features. Afterwards, the nearest-neighbor classification rule is utilized with the weighted multilevel features to infer the label of a test sample. Particularly, if the maximal similarity between the test sample and training samples is lower than a threshold, the test sample is predicted as unknown classes. Finally, experimental results show that the proposed model obtains 0.55826 and 0.56044 balance accuracy on the 31% and 69% test dataset, respectively. Our source code is available at https://github.com/NickyTan8899/tjy.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;open-set recognition</kwd>
        <kwd>multilevel feature</kwd>
        <kwd>pretrained model</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Open-set recognition (OSR) refers to the problem where a model is trained on a set of known classes
(the closed set), but during testing, it should correctly classify known categories and identify instances
from unknown classes as unknown. This setting better reflects real-world applications than traditional
closed-set recognition that assumes that all test samples belong to the known classes[
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. In many
real-world scenarios, machine learning systems frequently encounter inputs from previously unseen
categories, such as the AnimalCLEF2025 task[
        <xref ref-type="bibr" rid="ref2 ref3">2, 3</xref>
        ] and image classification systems encountering objects
or species not seen during training.
      </p>
      <p>
        Accordingly, researchers have explored and designed a variety of models towards enhanced open-set
recognition accuracy. Unlike traditional closed-set classifiers, OSR methods incorporate mechanisms
to detect novel instances. Approaches to OSR can be broadly categorized into discriminative models,
generative models, and distance-based methods[
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. Discriminative methods, such as energy-based
models, modify neural networks to estimate the confidence of a sample belonging to a known class.
Generative approaches such as including variational autoencoders and GAN-based[
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] techniques
attempt to model the data distribution of known classes and use reconstruction errors or synthetic
unknowns to infer unfamiliar inputs. Distance-based methods operate in the feature space and use
metrics such as nearest neighbors[
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] and class prototypes to detect out-of-distribution samples.
      </p>
      <p>
        For the problem of open-set animal recognition, though great progresses have been made, most
of existing methods still sufer from degraded performance due to complex spatial dependencies and
background information[
        <xref ref-type="bibr" rid="ref7 ref8">7, 8</xref>
        ]. To this end, we in this study propose a distance-based open-set animal
recognition model via multilevel feature fusion to better capture multi-view information of the image.
The main contributions of our work are as follows.
      </p>
      <p>(1) Deep features and low-level features are extracted. Particularly, deep features are learnt under the
pretraining and fine-tuning scheme, where the swin-base-patch4-window7-224 model is utilized and
ifne-tuned. Keypoint detection and descriptor extraction networks are also used to extract features.</p>
      <p>(2) The open-set recognition is performed with the multilevel features. The similarity between a test
sample with training samples are measured. Specifically, we first calculate the similarity from the view
of deep features and low-level features respectively, and then use the Wildfusion strategy to obtain
the final similarity score[ 9]. The prediction is made according to the similarity score. If the maximal
similarity is lower than a threshold, we categorize it into novel classes; otherwise, we report the class
of training sample corresponding to the maximal similarity.</p>
      <p>(3) Comparative experiments are conducted against two baseline models on the competition datasets.
Results show that the proposed model performs better than the baselines and obtains 0.55826 and
0.56044 balance accuracy on 31% and 69% test dataset, respectively, with the best result ranked 34th on
the leaderboard (Team Name: Already mygo).</p>
      <p>The structure of this paper is as follows. Section 2 introduces the proposed open-set animal recognition
model. Section 3 presents experimental datasets and experimental setup. Section 4 presents experimental
results, followed by the conclusion section.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Methodology</title>
      <sec id="sec-2-1">
        <title>2.1. Extraction of Features</title>
        <p>As for deep features, we adopt the pretrain and fine-tuning scheme to learn latent features. Specifically,
a Swin Transformer-based pretrained model (i.e., swin-base-patch4-window7-224 model)[10] is adopted.
We replace its cross-entropy loss with ArcFace loss that enhances the angular separation between
diferent classes in the embedding space[ 11]. Afterwards, we fine-tune the pretrained model on the
training set. Finally, we drop its classification layer to obtain a feature learning network, which enables
us to obtain the feature representations of an image. For clarity, we denote the set of features as fs1.</p>
        <p>Besides, we directly apply two diferent keypoint and descriptor extraction networks (i.e., ALIKED
and DISK) on the image to obtain two sets of features. Specifically, ALIKED[ 12] utilizes the sparse
deformable descriptor head to extract deformable descriptors, where a neural reprojection error loss
is used to measure the discrepancy between reprojection and descriptor-matching probabilities[13].
DISK[14] uses reinforcement learning to train end-to-end pipeline for keypoint detection and matching.
We denote the two sets of features as fs2 and fs3.</p>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. Open-set Recognition Procedure</title>
        <p>In classifying a test sample, we adopt the nearest-neighbor classification rule to make predictions. First,
we encode the test sample and each of the training samples with fs1, fs2, and fs3. Second, we measure
the similarity between test sample and each training sample. Particularly, to reflect the importance of
the three diferent types of features, the Wildfusion strategy is adopted, with which we can get the
similarity between test sample and each training sample and further obtain the maximal similarity. If
the maximal similarity is greater than a predefined threshold, the predicted result of the test sample
is the label of the associated training sample; otherwise, the predicted label is “unknown classes” (or
new_individual used in the competition).</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Experimental Setup</title>
      <sec id="sec-3-1">
        <title>3.1. Dataset</title>
        <p>The AnimalCLEF2025 dataset, having 15209 images, is divided into database dataset (having 13074
images) and query dataset (having 2135 images). The database plays the role of training set and the
query serves as the test set. Table 1 presents the summary of experimental data.</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Experimental Setup</title>
        <p>To obtain deep features, we first resize the images to 224 x 224 to make it suitable for the pretrained
Swin-based model. We then fine-tune it on the database dataset with the SGD optimizer. An initial
learning rate 0.001 is used and a cosine annealing learning rate scheduler is utilized, which adjusts the
learning rate following a cosine curve from the initial value decreased to 1e-6 in the end. The model is
trained for 100 epochs. Meanwhile, the batch size and the number of workers for loading data are 64
and 2 respectively. Finally, the deep features are embedded to be a 1 x 1024 vector for each sample. As
for low-level features, we resize the images to 256 x 256 and then obtain two sets of features, returned
by ALIKED and DISK, respectively.</p>
        <p>Besides, for comparison, three baseline methods including MegaDescriptor-L-384
(swin-large-patch4window12-384 model), MegaDescriptor-L-384-ALIKED (a combination of MegaDescriptor-L-384 and
ALIKED) and MegaDescriptor-L-384-ALIKED-DISK (a combination of MegaDescriptor-L-384, ALIKED
and DISK) are utilized to demonstrate the efectiveness of our proposed model.</p>
        <p>As for the performance metric, balance accuracy (denoted by score in formula (1)) is used, which is
calculated by the balanced accuracy on known samples (BAKS) and balanced accuracy on unknown
samples (BAUS).</p>
        <p>score = √BAKS × BAUS.
(1)
, where BAKS is the accuracy of individuals that are known in the database, and BAUS is the accuracy
of individuals that are unknown in the database. This formulation avoids misleadingly high scores from
trivial models, such as those predicting all samples as unknown, which would score 0% BAKS and 100%
BAUS. Unlike the arithmetic mean, the geometric mean penalizes such imbalance, providing a more
robust metric.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Experimental Results</title>
      <p>Furthermore, to investigate the impact of diferent thresholds on the recognition performance, we
conduct experiments with the candidate thresholds ranging from 0.3 to 0.8. Figure 2 presents the results,</p>
      <p>model</p>
      <p>MegaDescriptor-L-384</p>
      <p>MegaDescriptor-L-384-ALIKED
MegaDescriptor-L-384-ALIKED-DISK</p>
      <p>Ours
score
from which we can observe a general trend that the score first increases and then decreases along with
the increase of the threshold. We also observe that the use of 0.4 generally leads to better performance.</p>
    </sec>
    <sec id="sec-5">
      <title>5. Conclusion</title>
      <p>Towards higher image-based open-set animal recognition, we in this study propose a distance-based
recognition model that utilizes several sets of features. The deep features are learnt with the pretraining
and fine-tuning scheme and the low-level features are obtained by the keypoint detection and descriptor
extraction networks. To reflect the importance of diferent types of features in calculating the similarity
between samples, a weighing scheme is used. Afterwards, a threshold-based method is adopted to
infer the label of a test sample. Finally, the proposed model is evaluated on the test set and results
demonstrate its efectiveness.</p>
    </sec>
    <sec id="sec-6">
      <title>Declaration on Generative AI</title>
      <p>During the preparation of this work, the authors used OpenAI-GPT-4o in order to: Grammar and
spelling check. After using this tool, the authors reviewed and edited the content as needed and take
full responsibility for the publication’s content.
[9] V. Cermak, L. Picek, L. Adam, L. Neumann, J. Matas, Wildfusion: Individual animal identification
with calibrated similarity fusion, in: European Conference on Computer Vision, Springer, 2025,
pp. 18–36.
[10] Z. Liu, Y. Lin, Y. Cao, H. Hu, Y. Wei, Z. Zhang, S. Lin, B. Guo, Swin transformer: Hierarchical vision
transformer using shifted windows, in: Proceedings of the IEEE/CVF international conference on
computer vision, 2021, pp. 10012–10022.
[11] J. Deng, J. Guo, N. Xue, S. Zafeiriou, Arcface: Additive angular margin loss for deep face recognition,
in: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019, pp.
4690–4699.
[12] X. Zhao, X. Wu, W. Chen, P. C. Chen, Q. Xu, Z. Li, Aliked: A lighter keypoint and descriptor
extraction network via deformable transformation, IEEE Transactions on Instrumentation and
Measurement 72 (2023) 1–16.
[13] H. Germain, V. Lepetit, G. Bourmaud, Neural reprojection error: Merging feature learning and
camera pose estimation, in: Proceedings of the IEEE/CVF Conference on Computer Vision and
Pattern Recognition, 2021, pp. 414–423.
[14] M. Tyszkiewicz, P. Fua, E. Trulls, Disk: Learning local features with policy gradient, Advances in
Neural Information Processing Systems 33 (2020) 14254–14265.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>S.</given-names>
            <surname>Wolf</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Thelen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Beyerer</surname>
          </string-name>
          ,
          <article-title>Poison-aware open-set fungi classification: Reducing the risk of poisonous confusion</article-title>
          , Working Notes of CLEF (
          <year>2024</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>L.</given-names>
            <surname>Adam</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Papafitsoros</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Kovář</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Čermák</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Picek</surname>
          </string-name>
          , Overview of AnimalCLEF 2025:
          <article-title>Recognizing individual animals in images</article-title>
          ,
          <year>2025</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>L.</given-names>
            <surname>Picek</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Kahl</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Goëau</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Adam</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Larcher</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Leblanc</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Servajean</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Janoušková</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Matas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Čermák</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Papafitsoros</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Planqué</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.-P.</given-names>
            <surname>Vellinga</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Klinck</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Denton</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. S.</given-names>
            <surname>Cañas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Martellucci</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Vinatier</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Bonnet</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Joly</surname>
          </string-name>
          , Overview of lifeclef 2025:
          <article-title>Challenges on species presence prediction and identification, and individual animal identification</article-title>
          ,
          <source>in: International Conference of the Cross-Language Evaluation Forum for European Languages (CLEF)</source>
          , Springer,
          <year>2025</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>C.</given-names>
            <surname>Geng</surname>
          </string-name>
          , S.-j. Huang,
          <string-name>
            <given-names>S.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <article-title>Recent advances in open set recognition: A survey</article-title>
          ,
          <source>IEEE transactions on pattern analysis and machine intelligence</source>
          <volume>43</volume>
          (
          <year>2020</year>
          )
          <fpage>3614</fpage>
          -
          <lpage>3631</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>I.</given-names>
            <surname>Goodfellow</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Pouget-Abadie</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Mirza</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Warde-Farley</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Ozair</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Courville</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Bengio</surname>
          </string-name>
          ,
          <article-title>Generative adversarial networks</article-title>
          ,
          <source>Communications of the ACM</source>
          <volume>63</volume>
          (
          <year>2020</year>
          )
          <fpage>139</fpage>
          -
          <lpage>144</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>J.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Zhou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <article-title>Generalized out-of-distribution detection: A survey</article-title>
          ,
          <source>International Journal of Computer Vision</source>
          <volume>132</volume>
          (
          <year>2024</year>
          )
          <fpage>5635</fpage>
          -
          <lpage>5662</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>L.</given-names>
            <surname>Adam</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Čermák</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Papafitsoros</surname>
          </string-name>
          , L. Picek,
          <article-title>Seaturtleid2022: A long-span dataset for reliable sea turtle re-identification</article-title>
          ,
          <source>in: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision</source>
          ,
          <year>2024</year>
          , pp.
          <fpage>7146</fpage>
          -
          <lpage>7156</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>L.</given-names>
            <surname>Picek</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Neumann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Matas</surname>
          </string-name>
          ,
          <article-title>Animal identification with independent foreground and background modeling</article-title>
          ,
          <source>in: DAGM German Conference on Pattern Recognition</source>
          , Springer,
          <year>2024</year>
          , pp.
          <fpage>241</fpage>
          -
          <lpage>257</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>