<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Fine-Grained Classification for Poisonous Fungi Identification with Transfer Learning</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Christopher Chiu</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Maximilian Heil</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Teresa Kim</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Anthony Miyaguchi</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Georgia Institute of Technology</institution>
          ,
          <addr-line>North Ave NW, Atlanta, GA 30332</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2024</year>
      </pub-date>
      <fpage>9</fpage>
      <lpage>12</lpage>
      <abstract>
        <p>FungiCLEF 2024 addresses the fine-grained visual categorization (FGVC) of fungi species, with a focus on identifying poisonous species. This task is challenging due to the size and class imbalance of the dataset, subtle inter-class variations, and significant intra-class variability amongst samples. In this paper, we document our approach in tackling this challenge through the use of ensemble classifier heads on pre-computed image embeddings. Our team (DS@GT) demonstrate that state-of-the-art self-supervised vision models can be utilized as robust feature extractors for downstream application of computer vision tasks without the need for taskspecific fine-tuning on the vision backbone. Our approach achieved the best Track 3 score (0.345), accuracy (78.4%) and macro-F1 (0.577) on the private test set in post competition evaluation. Our code is available at https://github.com/dsgt-kaggle-clef/fungiclef-2024.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Fine-Grained Visual Categorization (FGVC)</kwd>
        <kwd>Poisonous Fungi Identification</kwd>
        <kwd>Transfer Learning</kwd>
        <kwd>Vision Transformers</kwd>
        <kwd>CEUR-WS</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <sec id="sec-1-1">
        <title>1.1. Dataset Overview</title>
        <p>
          The featured dataset for the FungiCLEF competition [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ] is the Danish Fungi dataset [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ]. This dataset
includes a training set (DF20), which includes 356,770 images over 1,604 diferent classes of fungi, and a
validation / testing dataset (DF21), consisting of 60,832 images over 2,713 species of fungi, covering
a year’s worth of observation. For species within the validation dataset that were not in the training
dataset, they were marked as an "unknown" class. The dataset provides both full sized images (110GB)
and downsized images (300px max dimension, 5.6GB). It also provides metadata for the fungi images
including date, location, substrate and metasubstrate of the fungi growth, and the full taxonomical
ranks of the classified fungi species, including phylum, class, order, family, and genus.
        </p>
        <p>These two datasets do not have the same distribution of classes (Figure 2). Moreover, there was
significant class imbalance in both datasets, with the most common class having 1,913 images, and
the least common class only ~30 images in DF20, and down to only one observation for some species
in DF21. There were also significant variations in terms of lighting, background, and clarity, due to
the real-world conditions under which fungi were photographed (Figure 1). This adds another level of
complexity on this task - Fungi classes are not only hard to distinguish due to inter-class variations or
intra-class variance, but also due to varying image quality and image features. This highlights the need
for a robust model in efectively performing fine-grained classification on this rich and varied dataset.</p>
      </sec>
      <sec id="sec-1-2">
        <title>1.2. Related Work</title>
        <p>
          State-of-the-art work on this dataset primarily utilizes models such as Swin Transformer [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ] and
MetaFormer [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ]. However, results from FungiCLEF 2023 [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ] underscore limitations in current research,
where the best accuracy from participants have not improved significantly since the competition’s
inception in 2022 [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ]. Last year’s winner incorporated metadata into the model with MetaFormer as
the vision model [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ], and utilized Seesaw Loss [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ] to handle class imbalance. This led to a macro F1
of 0.571, with a poisonous and edible species confusion rate of 5.31% and 2.05% respectively [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ]. To
handle unknown classes, the team also introduced an entropy based approach to identify unknown,
out-of-distribution species [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ].
        </p>
        <p>
          Beyond FungiCLEF, Wei et al. [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ] provides a comprehensive examination of fine-grained visual
categorization (FGVC) challenges, such as accurately localizing object parts, selecting informative
features under varied conditions, and integrating segmentation with classification. It emphasizes the
need for the model to generalize across species, maintain eficiency, and handle real-world issues like
occlusions. Other directions that demonstrated promise on FGVC datasets such as CUB-200-2011 [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ]
include Mask-CNN [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ] which outperformed other methods by better capturing subtle diferences
between species, and SR-GNNs [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ] which extracted context-aware features from relevant image regions
to discriminate between object classes.
        </p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>2. Methodology</title>
      <p>Our overall approach to this challenge of fine-grained classifying of fungi species was to:
1. Incorporate metadata as additional input / prediction targets for the model.
2. Learn the concept of unknown classes by incorporating the validation dataset into training.
3. Experiment with objective functions to induce model capability in fine-grained classification task.
4. Train only on metadata and image embeddings for rapid prototyping and model optimization.</p>
      <p>Cloud computing resources were supplied by Data Science @ Georgia Tech. Data was hosted on
Google Cloud Storage, and models developed on virtual instances with NVIDIA L4. For more memory
intensive experiments, GPUs used in model development include NVIDIA RTX 4090 and a distributed
cluster with 2x NVIDIA V100.</p>
      <p>
        Libraries used include pandas [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ], PaCMAP [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ], scikit-learn [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ] for data exploration; PySpark [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ],
PyArrow [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ], Luigi [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ] for data processing; PyTorch [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ], timm [
        <xref ref-type="bibr" rid="ref22">22</xref>
        ], Lightning [
        <xref ref-type="bibr" rid="ref23">23</xref>
        ], and transformers
[
        <xref ref-type="bibr" rid="ref24">24</xref>
        ] for model development. Evaluation functions for the FungiCLEF competition were referenced in
the development of internal model benchmarks [
        <xref ref-type="bibr" rid="ref25">25</xref>
        ].
      </p>
      <sec id="sec-2-1">
        <title>2.1. Dataset Preparation</title>
        <p>To improve the eficiency of experiments, we built a data preprocessing pipeline with PySpark (Figure 3).
We appended the 300 pixel and full versions of image data with associated metadata, and stored them as
parquet files for faster I/O. Embeddings were also precomputed and stored as parquet files separately. A
custom PyTorch dataset object was created to serve the image and embedding data alongside metadata.</p>
        <p>
          The metadata columns were grouped based on their potential use as either model inputs or prediction
targets. For the validation set / public test set, only substrate, metasubstrate, habitat, date, and location
were provided [
          <xref ref-type="bibr" rid="ref25">25</xref>
          ]. As such, these columns were used as additional inputs to the model. Categorical
columns were expanded into one-hot vectors. For date information, we converted the month and day
into cyclical encoding using sine / cosine transformation [
          <xref ref-type="bibr" rid="ref26">26</xref>
          ]. For location data, we converted longitude
and latitude into Geohash, which preserves spatial ordinality and unifying location inputs using a
Z-order curve [
          <xref ref-type="bibr" rid="ref27">27</xref>
          ]. Levels 2-5 of the resultant Geohash were extracted and converted from base-32 to
normalized base-10 integers. The toxicity and one-hot vectors of the taxonomical levels of fungi classes
were included as additional prediction targets. Other metadata columns were excluded.
        </p>
        <p>Given that unknown classes were only present in the validation dataset, we divided DF21 into three
equal sections of 20,000 cases, stratified by species. One of the sections was designated as the held
out test set, with the remaining two sections utilized as validation set / addition to the training set
in a two-fold cross validation training. By percentage, this gives us a ratio of 90.4%, 4.8%, 4.8% for
training, validation, and testing over the entire dataset. Significantly, the test and validation sets in each
validation fold have the same class distribution.</p>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. Embeddings for Transfer Learning</title>
        <p>
          Embeddings are the learned intermediate representation of deep learning models that capture structure
about the input domain. We experimented with two models as the vision model backbone to generate
embeddings - DINOv2 [
          <xref ref-type="bibr" rid="ref28">28</xref>
          ] and ResNet [
          <xref ref-type="bibr" rid="ref29">29</xref>
          ]. DINOv2 was chosen as it is state-of-the-art in terms of
vision model and its richness and robustness as a visual feature extractor [
          <xref ref-type="bibr" rid="ref28">28</xref>
          ]. ResNet was chosen due
to its widespread application and downstream [
          <xref ref-type="bibr" rid="ref29">29</xref>
          ], and serves as a representative for the CNN family
in contrast to the transformer family where DINOv2 originated from. For ResNet18, we generated
embeddings by extracting the output features from the last hidden state before the classification head.
This resulted in embeddings of shape (1000, ) per image. For DINOv2, we utilized the [CLS] token from
the last hidden state of model output. In initial experiments and ablation studies, dinov2-small [
          <xref ref-type="bibr" rid="ref28">28</xref>
          ]
was used, which results in embedding shape of (768, ). In our optimized model used for competition
submission, dinov2-large with register [
          <xref ref-type="bibr" rid="ref28">28</xref>
          ] was used, which had embedding shape of (1024, ).
        </p>
        <p>For training, the image embeddings were precomputed. For the testing set and for our competition
submission, the vision backbone model was frozen, and embeddings were generated during inference
and fed into our trained classifier heads.</p>
      </sec>
      <sec id="sec-2-3">
        <title>2.3. Model Development</title>
        <p>
          We explored two separate approaches in model development (1) Training a computer vision model from
end-to-end, and (2) Training a classifier head only on precomputed embeddings. While approach (1) is
the more traditional method for computer vision tasks, it is much more compute intensive due to the
number of parameters to be trained [
          <xref ref-type="bibr" rid="ref30">30</xref>
          ]. In comparison, approach (2) had significantly less memory
requirements and faster training time (Table 1). While using precomputed image embeddings imply
that training data could not undergo traditional augmentation techniques in computer vision such as
lfipping and random cropping, we hypothesize that modern vision models had suficient amount of
information in the feature representation that the downstream model can be robust and generalisable.
        </p>
        <sec id="sec-2-3-1">
          <title>2.3.1. Model Training</title>
          <p>
            For transfer learning with the embedding model, we use a traditional MLP classifier head with a hidden
dimension of 4096, with metadata directly concatenated to the embedding. Inspired by Diao et al. [
            <xref ref-type="bibr" rid="ref31">31</xref>
            ],
we also experiment with using a transformer block for better integration of metadata into the classifier.
We transformed metadata into the same dimensions as the embedding with a separate MLP layer, and
added them to the image embeddings before streaming all the data into a transformer block for image
classification. To leverage the benefits of cross fold validation, we utilized an ensemble model approach
[
            <xref ref-type="bibr" rid="ref32">32</xref>
            ]. Output logits of our model are averaged over all the classifier heads.
          </p>
          <p>All models were first trained on a smaller, exploratory development set, before undergoing training
runs on the full dataset. For experiments that appeared promising in its initial outcomes, training
parameters were further tuned using Optuna to generate a full model for benchmark. Training
performance were logged on Weights &amp; Biases, with the top 2 performing models saved as checkpoints.
A two-fold cross-validation was used, where each fold had 1/3rd of DF21 dataset as the validation
set, and another 1/3rd incorporated into the training data. Our experiment logs can be viewed at
https://wandb.ai/chiu/FungiClef.</p>
          <p>
            All experiments were trained 20 to 50 epochs each, with batch sizes of 64 to 512. Initial learning rates
ranged from 1 · 10− 5 to 1 · 10− 3, with AdamW [
            <xref ref-type="bibr" rid="ref33">33</xref>
            ] as optimizer. Learning rate schedulers experimented
with include cosine scheduler with restarts [
            <xref ref-type="bibr" rid="ref34">34</xref>
            ], and ReduceLROnPlateau [
            <xref ref-type="bibr" rid="ref35">35</xref>
            ].
          </p>
          <p>
            Metrics recorded during training include training / validation loss, top-1, top-3 accuracy, macro F1
score, and accuracy for correct identification of poisonous species. Calculation for specific track scores
were adapted from the FungiCLEF competition [
            <xref ref-type="bibr" rid="ref1">1</xref>
            ] for model benchmark. This includes classification
error (Track 1), cost for poisonousness confusion (Track 2), and user specific cost (Track 3) [
            <xref ref-type="bibr" rid="ref25">25</xref>
            ].
          </p>
        </sec>
        <sec id="sec-2-3-2">
          <title>2.3.2. Loss Function</title>
          <p>
            The baseline loss function for model development was unweighted, multi-class cross entropy loss. We
also explored incorporating class weights in cross-entropy loss, and other loss functions such as focal
loss [
            <xref ref-type="bibr" rid="ref36">36</xref>
            ] and seesaw loss [
            <xref ref-type="bibr" rid="ref9">9</xref>
            ], which was used by last year’s winner [
            <xref ref-type="bibr" rid="ref8">8</xref>
            ] to overcome class imbalance.
Additionally, we experimented with using various metadata such as the higher level taxonomy of the
fungi class and the toxicity of the fungi class as additional prediction targets.
          </p>
          <p>For our benchmark model, the model was trained with a custom loss function:</p>
          <p>composite = seesaw +  · poison</p>
          <p>Where seesaw is the seesaw loss of the class prediction, poison is the binary cross entropy loss of
the model’s prediction in whether the fungi is poisonous, and  an adjustable weighting factor for the
composite loss function.</p>
        </sec>
        <sec id="sec-2-3-3">
          <title>2.3.3. Weighted Sampling</title>
          <p>
            While weighted sampler is usually utilized to overcome class imbalance [
            <xref ref-type="bibr" rid="ref37">37</xref>
            ], we utilized this technique
in our data loader to ameliorate the diference in class distribution between the training and validation
set. Instead of adjusting class weights such that each class is evenly represented, we derived the
per-sample weight by dividing the class frequency of the validation set over the training set:
 =
          </p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Results</title>
      <sec id="sec-3-1">
        <title>3.1. Training Results</title>
        <p>
          Our best performing model was an ensemble model on DINOv2 embeddings consisting of two classifier
heads (180MB each) from the two-folds of cross-validation training. The model was trained on image
embeddings precomputed from DINOv2-large. The weighting for poison loss  was 0.1. The initial
learning rate was 1 · 10− 4, with AdamW [
          <xref ref-type="bibr" rid="ref33">33</xref>
          ] as optimizer, and cosine learning rate scheduler with
warm restarts [
          <xref ref-type="bibr" rid="ref34">34</xref>
          ].
        </p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Leaderboard Results</title>
        <p>
          The results of our team’s experiments are outlined in Table 4. For our first submission during competition,
we used a pre-trained MetaFormer model from the previous year’s competition as a baseline. In
postcompetition evaluation, our best model achieved an accuracy of 78.4% and a macro F1 score of 0.577 in
the private test set. Our model’s performance was comparable to previous years’ winners [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ], and was
the best performing model in this year’s competition in terms of Track 1, Track 3, and accuracy. Our
Track 2 and F1 score was ranked 2nd compared to the rest of the competitors 1. The inference time
across the full public test set (40,216 images) was 25:26 minutes, and 0.126s per image on average on a
RTX 4090.
        </p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Discussion</title>
      <p>
        We intially experimented with vision models including EficientNet [ 38], VisionTransformer [39],
and MetaFormer [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. Due to training time and memory overhead, we opted to focus our eforts into
developing a lightweight classifier on precomputed embeddings instead.
      </p>
      <sec id="sec-4-1">
        <title>4.1. ResNet v.s. DINOv2 as Vision Backbone for Embedding Generation</title>
        <p>
          Overall, while DINOv2 embeddings proved to be good input for image classification, our embedding
model using ResNet embeddings did not perform well, with best validation accuracy at 25%. This was
likely due to DINOv2 being a class agnostic, self-supervised model, whereas ResNet was trained on
ImageNet with specific classification targets. As such, the features extracted from ResNet would be
more tailored to the dataset, whereas DINOv2 features were more representative of the underlying
image [
          <xref ref-type="bibr" rid="ref28">28</xref>
          ]. To further investigate this, we visualized the embeddings with UMAP [40] in Figure 4,
which showed that ResNET embeddings did not separate well, whereas there was a clear separation in
DINOv2 embeddings.
        </p>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. Incorporation of Metadata</title>
        <p>We experimented with using metadata as additional prediction targets as seen in our ablation in Table
3. However, this did not yield additional performance, but the additional overhead required to tune the
weighting of various targets was not worth the complexity. As such, we did not utilize metadata in
our final model. The inclusion of metadata appeared to provide some marginal benefits in validation
accuracy and F1 score. This echoes the finding from previous research on the dataset, where the
incorporation of metadata as input had a positive contribution to model performance.
1Due to numerous issues with the HuggingFace platform, our best results were not recorded in the oficial competition. Our
post competition evaluation was performed under the same constraints as the oficial competition. Post-competition results
were provided and verified by the organiser of FungiCLEF.
2Oficial competition results are from test submissions with an under-tuned vision model. These results are included for
completeness.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Future Work</title>
      <p>Whilst using embeddings allowed for much faster model development time, there is still an additional
gap in the performance of the embedding classifier compared to traditional image-based models. It
is likely that the information loss in the transformation of image to embeddings was too significant
for the simple classifier architecture to overcome. It would be interesting to further fine-tune DINOv2
on the DanishFungi dataset, and repeat our experiments. Moreover, a more rigorous incorporation of
metadata into our models could provide a more holistic understanding of the data, leading to more
accurate and reliable classification systems.</p>
    </sec>
    <sec id="sec-6">
      <title>6. Conclusion</title>
      <p>In summary, we addressed the complex task of fine-grained visual categorization (FGVC) for identifying
poisonous fungi using transfer learning and advanced deep learning methodologies. The Danish Fungi
2020 dataset presented significant challenges such as class imbalance, subtle inter-class variations,
and high intra-class variability, necessitating a comprehensive data preprocessing and augmentation
pipeline.</p>
      <p>Our experiments with various deep learning models, including vision transformers, convolutional
neural networks, and linear classifiers with embeddings, highlighted the potential of DINOv2
embeddings combined with a multi-layer perceptron. Integrating multimodal metadata further enhanced
classification performance, emphasizing the value of auxiliary information. Despite promising results,
embedding-based classifiers faced limitations due to potential information loss, suggesting the need for
ifne-tuning self-supervised models on domain-specific datasets and improved metadata incorporation.
Overall, our research advances FGVC technical capabilities, providing valuable methodologies for
mycological safety and educational applications, and contributes to the broader field of fine-grained
classification tasks.</p>
    </sec>
    <sec id="sec-7">
      <title>Acknowledgements</title>
      <p>We thank the DS@GT CLEF team for providing the development and research environment for our
machine learning experiments as well as valuable comments and suggestions.
problem, in: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019,
pp. 875–884.
[38] M. Tan, Q. V. Le, Eficientnetv2: Smaller models and faster training, CoRR abs/2104.00298 (2021).</p>
      <p>URL: https://arxiv.org/abs/2104.00298. arXiv:2104.00298.
[39] A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani,
M. Minderer, G. Heigold, S. Gelly, J. Uszkoreit, N. Houlsby, An image is worth 16x16 words:
Transformers for image recognition at scale, CoRR abs/2010.11929 (2020). URL: https://arxiv.org/
abs/2010.11929. arXiv:2010.11929.
[40] L. McInnes, J. Healy, J. Melville, Umap: Uniform manifold approximation and projection for
dimension reduction (2018). arXiv:1802.03426.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>L.</given-names>
            <surname>Picek</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Sulc</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Matas</surname>
          </string-name>
          , Overview of FungiCLEF 2024:
          <article-title>Revisiting fungi species recognition beyond 0-1 cost</article-title>
          ,
          <source>in: Working Notes of CLEF 2024 - Conference and Labs of the Evaluation Forum</source>
          ,
          <year>2024</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>A.</given-names>
            <surname>Joly</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Picek</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Kahl</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Goëau</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Espitalier</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Botella</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Deneu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Marcos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Estopinan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Leblanc</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Larcher</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Šulc</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Hrúz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Servajean</surname>
          </string-name>
          , et al.,
          <source>Overview of lifeclef</source>
          <year>2024</year>
          :
          <article-title>Challenges on species distribution prediction and identification</article-title>
          ,
          <source>in: International Conference of the CrossLanguage Evaluation Forum for European Languages</source>
          , Springer,
          <year>2024</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>L.</given-names>
            <surname>Picek</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Šulc</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Matas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T. S.</given-names>
            <surname>Jeppesen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Heilmann-Clausen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Laessøe</surname>
          </string-name>
          , T. Frøslev,
          <article-title>Danish fungi 2020 - not just another image recognition dataset</article-title>
          ,
          <source>in: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)</source>
          ,
          <year>2022</year>
          , pp.
          <fpage>1525</fpage>
          -
          <lpage>1535</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Lin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Cao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Hu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Wei</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Lin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Guo</surname>
          </string-name>
          ,
          <article-title>Swin transformer: Hierarchical vision transformer using shifted windows (</article-title>
          <year>2021</year>
          ). arXiv:
          <volume>2103</volume>
          .
          <fpage>14030</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>W.</given-names>
            <surname>Yu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Luo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Zhou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Si</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Feng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Yan</surname>
          </string-name>
          ,
          <article-title>Metaformer is actually what you need for vision (</article-title>
          <year>2021</year>
          ). arXiv:
          <volume>2111</volume>
          .
          <fpage>11418</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>L.</given-names>
            <surname>Picek</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Šulc</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Chamidullin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Matas</surname>
          </string-name>
          , Overview of fungiclef 2023:
          <article-title>Fungi recognition beyond 1/0 cost</article-title>
          ,
          <source>Proceedings of the Working Notes of CLEF 2023 - Conference and Labs of the Evaluation Forum (CLEF)</source>
          (
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>J.</given-names>
            <surname>Heilmann-Clausen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Sulc</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Picek</surname>
          </string-name>
          , Overview of fungiclef 2022:
          <article-title>Fungi recognition as an open set classification problem</article-title>
          ,
          <source>in: CLEF 2022 Conference and Labs of the Evaluation Forum</source>
          , volume
          <volume>3180</volume>
          ,
          <year>2022</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>10</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>H.</given-names>
            <surname>Ren</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Jiang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Luo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Meng</surname>
          </string-name>
          , T. Zhang,
          <article-title>Entropy-guided open-set fine-grained fungi recognition</article-title>
          ,
          <source>Proceedings of the Working Notes of CLEF 2023 - Conference and Labs of the Evaluation Forum (CLEF)</source>
          (
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>J.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Cao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Pang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Gong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. C.</given-names>
            <surname>Loy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Lin</surname>
          </string-name>
          ,
          <article-title>Seesaw loss for long-tailed instance segmentation (</article-title>
          <year>2020</year>
          ). arXiv:
          <year>2008</year>
          .10032.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>D.</given-names>
            <surname>Macêdo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T. I.</given-names>
            <surname>Ren</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Zanchettin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. L. I.</given-names>
            <surname>Oliveira</surname>
          </string-name>
          , T. Ludermir,
          <article-title>Entropic out-of-distribution detection: Seamless detection of unknown examples (</article-title>
          <year>2020</year>
          ). arXiv:
          <year>2006</year>
          .04005.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>X.-S.</given-names>
            <surname>Wei</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.-W.</given-names>
            <surname>Xie</surname>
          </string-name>
          , J. Wu,
          <article-title>Mask-cnn: Localizing parts and selecting descriptors for fine-grained bird species categorization (</article-title>
          <year>2016</year>
          ). arXiv:
          <volume>1605</volume>
          .
          <fpage>06878</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>C.</given-names>
            <surname>Wah</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Branson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Welinder</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Perona</surname>
          </string-name>
          ,
          <string-name>
            <surname>S. Belongie,</surname>
          </string-name>
          <article-title>The caltech-ucsd birds</article-title>
          <string-name>
            <surname>-</surname>
          </string-name>
          200-2011 dataset,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>K.</given-names>
            <surname>He</surname>
          </string-name>
          , G. Gkioxari,
          <string-name>
            <given-names>P.</given-names>
            <surname>Dollár</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Girshick</surname>
          </string-name>
          ,
          <string-name>
            <surname>Mask</surname>
          </string-name>
          r-cnn (
          <year>2017</year>
          ). arXiv:
          <volume>1703</volume>
          .
          <fpage>06870</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>A.</given-names>
            <surname>Bera</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Wharton</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Bessis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Behera</surname>
          </string-name>
          ,
          <article-title>Sr-gnn: Spatial relation-aware graph neural network for fine-grained image categorization (</article-title>
          <year>2022</year>
          ). arXiv:
          <volume>2209</volume>
          .
          <year>02109v1</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>W.</given-names>
            <surname>McKinney</surname>
          </string-name>
          ,
          <article-title>Data structures for statistical computing in python</article-title>
          ,
          <source>in: 9th Python in Science Conference (SciPy</source>
          <year>2010</year>
          ),
          <year>2010</year>
          , pp.
          <fpage>51</fpage>
          -
          <lpage>56</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Huang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Rudin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Shaposhnik</surname>
          </string-name>
          ,
          <article-title>Understanding how dimension reduction tools work: An empirical approach to deciphering t-sne, umap, trimap, and pacmap for data visualization</article-title>
          ,
          <year>2021</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>F.</given-names>
            <surname>Pedregosa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Varoquaux</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Gramfort</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Michel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Thirion</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Grisel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Blondel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Prettenhofer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Weiss</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Dubourg</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Vanderplas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Passos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Cournapeau</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Brucher</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Perrot</surname>
          </string-name>
          , Édouard Duchesnay,
          <article-title>Scikit-learn: Machine learning in python</article-title>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>A. S.</given-names>
            <surname>Developers</surname>
          </string-name>
          , Pyspark:
          <article-title>Python api for apache spark</article-title>
          ,
          <year>2024</year>
          . URL: https://spark.apache.org/docs/ latest/api/python/index.html.
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>W.</given-names>
            <surname>McKinney</surname>
          </string-name>
          ,
          <article-title>Pyarrow: Python api for apache arrow</article-title>
          ,
          <year>2024</year>
          . URL: https://arrow.apache.org/docs/ latest/api/python/index.html.
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>E.</given-names>
            <surname>Bernhardsson</surname>
          </string-name>
          , E. Freider,
          <article-title>Luigi: A python package for building complex pipelines of batch jobs</article-title>
          ,
          <year>2024</year>
          . URL: https://luigi.readthedocs.io.
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>A.</given-names>
            <surname>Paszke</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Gross</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Massa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Lerer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Bradbury</surname>
          </string-name>
          , G. Chanan,
          <string-name>
            <given-names>T.</given-names>
            <surname>Killeen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Lin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Gimelshein</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Antiga</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Desmaison</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Kopf</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>DeVito</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Raison</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Tejani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Chilamkurthy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Steiner</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Fang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Bai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Chintala</surname>
          </string-name>
          ,
          <string-name>
            <surname>Pytorch:</surname>
          </string-name>
          <article-title>An imperative style, high-performance deep learning library (</article-title>
          <year>2019</year>
          ). arXiv:
          <year>1912</year>
          .01703.
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <given-names>R.</given-names>
            <surname>Wightman</surname>
          </string-name>
          , timm: Pytorch image models,
          <year>2019</year>
          . URL: https://github.com/rwightman/ pytorch-image-models.
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <given-names>W.</given-names>
            <surname>Falcon</surname>
          </string-name>
          , T. P. L. team, Pytorch lightning,
          <year>2024</year>
          . URL: https://lightning.ai/docs/pytorch/stable/.
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [24]
          <string-name>
            <given-names>T.</given-names>
            <surname>Wolf</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Debut</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Sanh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Chaumond</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Delangue</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Moi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Cistac</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Rault</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Louf</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Funtowicz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Davison</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Shleifer</surname>
          </string-name>
          , P. von Platen, C. Ma,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Jernite</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Plu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T. L.</given-names>
            <surname>Scao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Gugger</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Drame</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Lhoest</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. M.</given-names>
            <surname>Rush</surname>
          </string-name>
          , Transformers:
          <article-title>State-of-the-art natural language processing</article-title>
          ,
          <source>in: 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations</source>
          ,
          <year>2020</year>
          , pp.
          <fpage>38</fpage>
          -
          <lpage>45</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          [25]
          <string-name>
            <given-names>L.</given-names>
            <surname>Picek</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Šulc</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Matas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Heilmann-Clausen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T. S.</given-names>
            <surname>Jeppesen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Laessøe</surname>
          </string-name>
          , T. Frøslev,
          <article-title>Danish fungi 2020 - not just another image recognition dataset</article-title>
          ,
          <year>2021</year>
          . arXiv:
          <volume>2103</volume>
          .
          <fpage>10107</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          <source>[26] I. London, Encoding cyclical continuous features - 24-hour time</source>
          ,
          <year>2016</year>
          . URL: https://ianlondon. github.io/blog/encoding-cyclical-features
          <string-name>
            <surname>-</surname>
          </string-name>
          24hour-time/.
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          [27]
          <string-name>
            <given-names>I. S.</given-names>
            <surname>Suwardi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Dharma</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. P.</given-names>
            <surname>Satya</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. P.</given-names>
            <surname>Lestari</surname>
          </string-name>
          ,
          <article-title>Geohash index based spatial data model for corporate</article-title>
          ,
          <source>in: 2015 International Conference on Electrical Engineering and Informatics (ICEEI)</source>
          ,
          <year>2015</year>
          , pp.
          <fpage>478</fpage>
          -
          <lpage>483</lpage>
          . doi:
          <volume>10</volume>
          .1109/ICEEI.
          <year>2015</year>
          .
          <volume>7352548</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          [28]
          <string-name>
            <given-names>M.</given-names>
            <surname>Oquab</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Darcet</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Moutakanni</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Vo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Szafraniec</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Khalidov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Fernandez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Haziza</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Massa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>El-Nouby</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Assran</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Ballas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Galuba</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Howes</surname>
          </string-name>
          , P.-Y. Huang,
          <string-name>
            <given-names>S.-W.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Misra</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Rabbat</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Sharma</surname>
          </string-name>
          , G. Synnaeve,
          <string-name>
            <given-names>H.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Jegou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Mairal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Labatut</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Joulin</surname>
          </string-name>
          , P. Bojanowski,
          <article-title>Dinov2: Learning robust visual features without supervision</article-title>
          ,
          <year>2024</year>
          . arXiv:
          <volume>2304</volume>
          .
          <fpage>07193</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          [29]
          <string-name>
            <given-names>K.</given-names>
            <surname>He</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , S. Ren,
          <string-name>
            <given-names>J.</given-names>
            <surname>Sun</surname>
          </string-name>
          ,
          <article-title>Deep residual learning for image recognition (</article-title>
          <year>2015</year>
          ). arXiv:
          <volume>1512</volume>
          .
          <fpage>03385</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref30">
        <mixed-citation>
          [30]
          <string-name>
            <given-names>Z.-Y.</given-names>
            <surname>Dou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Gan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Zhu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , L. Yuan,
          <string-name>
            <given-names>N.</given-names>
            <surname>Peng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Zeng</surname>
          </string-name>
          ,
          <article-title>An empirical study of training end-to-end vision-and-language transformers (</article-title>
          <year>2021</year>
          ). arXiv:
          <volume>2111</volume>
          .
          <fpage>02387</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref31">
        <mixed-citation>
          [31]
          <string-name>
            <given-names>Q.</given-names>
            <surname>Diao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Jiang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Wen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Sun</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Yuan</surname>
          </string-name>
          ,
          <article-title>Metaformer: A unified meta framework for fine-grained recognition (</article-title>
          <year>2022</year>
          ). arXiv:
          <volume>2203</volume>
          .
          <fpage>02751</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref32">
        <mixed-citation>
          [32]
          <string-name>
            <given-names>M. A.</given-names>
            <surname>Ganaie</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Hu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. K.</given-names>
            <surname>Malik</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Tanveer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. N.</given-names>
            <surname>Suganthan</surname>
          </string-name>
          ,
          <article-title>Ensemble deep learning: A review (</article-title>
          <year>2021</year>
          ). arXiv:
          <volume>2104</volume>
          .
          <fpage>02395</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref33">
        <mixed-citation>
          [33]
          <string-name>
            <given-names>I.</given-names>
            <surname>Loshchilov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Hutter</surname>
          </string-name>
          ,
          <article-title>Decoupled weight decay regularization (</article-title>
          <year>2017</year>
          ). arXiv:
          <volume>1711</volume>
          .
          <fpage>05101</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref34">
        <mixed-citation>
          [34]
          <string-name>
            <given-names>I.</given-names>
            <surname>Loshchilov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Hutter</surname>
          </string-name>
          , Sgdr:
          <article-title>Stochastic gradient descent with warm restarts (</article-title>
          <year>2017</year>
          ). arXiv:
          <volume>1608</volume>
          .
          <fpage>03983</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref35">
        <mixed-citation>
          [35]
          <string-name>
            <given-names>A.</given-names>
            <surname>Al-Kababji</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Bensaali</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. P.</given-names>
            <surname>Dakua</surname>
          </string-name>
          ,
          <article-title>Scheduling techniques for liver segmentation: Reducelronplateau vs onecyclelr (</article-title>
          <year>2022</year>
          ). arXiv:
          <volume>2202</volume>
          .
          <fpage>06373</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref36">
        <mixed-citation>
          [36]
          <string-name>
            <surname>T.-Y. Lin</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Goyal</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Girshick</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          <string-name>
            <surname>He</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Dollár</surname>
          </string-name>
          ,
          <article-title>Focal loss for dense object detection (</article-title>
          <year>2017</year>
          ).
          <source>arXiv:1708</source>
          .
          <year>02002</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref37">
        <mixed-citation>
          [37]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Dong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Zheng</surname>
          </string-name>
          ,
          <article-title>A new weighted sampling method to handle class imbalance</article-title>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>