<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <issn pub-type="ppub">1613-0073</issn>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>with Prototypical Networks Using Multiple Pretrained Embedding Models</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Jack N. Etheredge</string-name>
          <email>jack.etheredge@gmail.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Workshop</string-name>
        </contrib>
        <contrib contrib-type="editor">
          <string-name>Embedding, FungiCLEF, Fungi Classification, Few-shot, FungiTastic</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Twosense</institution>
          ,
          <addr-line>New York, New York</addr-line>
          ,
          <country country="US">United States</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2025</year>
      </pub-date>
      <abstract>
        <p>The FungiCLEF 2025 challenge encourages improvement on few-shot fine-grained classification with large scale (2,427 classes). This represents a real-world application with a dataset of rare species in Danish Fungi. In this paper, we present our approach to the challenge, which aims to classify images of fungi given few example images per species. This method utilizes pretrained embedding models DINOv2, BEIT, and SAM. Simple image augmentations are applied at both train and test time. Embeddings from each model are concatenated into a single embedding along the feature dimension per augmented version of the image. A simple projection network was trained to improve the discriminative performance of the embeddings on the training samples. Cosine similarity between the class centroid and the observation centroid is used for class prediction, as in Prototypical Networks. Finally, an ensemble of these pipelines is utilized to further boost performance. Image augmentation is shown to be the largest contributor to the performance of the solution, followed by learning an embedding projection, and utilizing multiple embedding models. Our method secured 1st place in the FungiCLEF 2025 competition on the private leaderboard. Code is available at https://github.com/Jack-Etheredge/fungiclef2025.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
    </sec>
    <sec id="sec-2">
      <title>2. Related work</title>
      <p>https://github.com/Jack-Etheredge (J. N. Etheredge)</p>
      <p>CEUR</p>
      <p>
        ceur-ws.org
5-way, 5-shot). ImageNet [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] (21,841 classes, but most commonly used with 1,000 classes), Omniglot [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]
(1,623 classes), Meta-Dataset [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] (which comprises 10 datasets including ImageNet and Omniglot), and
iNaturalist [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] with 5,089 classes are some of the only other example datasets that are commonly used
for large-scale few-shot image classification with over 1,000 classes.
      </p>
      <p>
        Last year’s FungiCLEF 2024 challenge [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] focused on open-set recognition and minimizing confusion
between poisonous and edible species. The average number of training and validation images per class
were comparatively much greater, with 1,604 known species and 1,629 unknown species represented
across a combined 222,191 observations with 387,169 total instances. The training set for FungiCLEF
2024 was from Danish Fungi 2020 [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], while the validation set was collected from 2022.
      </p>
    </sec>
    <sec id="sec-3">
      <title>3. Methodology</title>
      <sec id="sec-3-1">
        <title>3.1. Dataset</title>
        <p>
          The FungiCLEF 2025 challenge [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ] tasked participants with classifying fungi species from images. The
dataset is created from images and metadata submitted to the Atlas of Danish Fungi before the end of
2023. Each species label was assigned by mycologists. The challenge dataset is drawn from the few-shot
dataset from [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ], which describes the dataset in depth.
        </p>
        <p>An observation refers to a real-world occurrence of fungi, which may include, but is not limited to,
an individual mushroom, a cluster of mushrooms, or mold growing on a surface, either in a natural
environment or as a collected sample. Each observation comprises one or more instances. An instance is
an individual data point associated with an observation and consists of an image, its associated metadata,
and a generated caption. For example, an individual mushroom may constitute an observation, but
multiple images of this mushroom might be captured from diferent angles. Each of these images (along
with its metadata and caption) would represent a distinct instance linked to the same observation. The
solution proposed in this paper only utilizes the images, since initial experiments with captions and
metadata were not promising (data not shown).</p>
        <p>The dataset contained 2,427 classes with 5,392 observations comprising 10,104 instances between
the training and validation sets. Most classes have a single observation and most observations have a
single instance. All classes had fewer than 5 observations. Combining the training and validation sets
into a single dataset, the class with the most instances has 39 instances. Though not as extreme as the
parent FungiTastic dataset, the challenge dataset still exhibits severe class imbalance, with most classes
having only a single observation while the largest class by instance count has 39 instances, creating a
long-tailed distribution.</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Competition objective and evaluation metrics</title>
        <p>The objective of FungiCLEF 2025 was to achieve the best average performance predicting the class of
each test observation given one or more instances per observation. The public and private leaderboards
for the competition both used average recall at rank  = 5 (recall@5), which we refer to as Top-5
accuracy or simply Top-5 hereafter.</p>
        <p>For each test observation  , let   denote its true class label, and { 1̂,  2̂, … ,  5̂} be the top 5 predicted
classes. The recall@5 for observation  is defined as:</p>
        <p>The average recall@5 over the entire test set  is then computed as:
1 if   ∈ { 1̂,  2̂, … ,  5̂}
0 otherwise
| | ∈
(1)
(2)</p>
      </sec>
      <sec id="sec-3-3">
        <title>3.3. Overall solution architecture</title>
        <p>The overall solution is illustrated in Figure 1. Figure 1A shows that training and test time difer only by
the level of hierarchy that the embeddings are averaged to. For creating the prototype embeddings, all
augmented versions of images belonging to each class are averaged together. Since the competition
expects observation-level predictions, all instances belonging to each test observation and all
augmentations of the instance images are averaged into a single embedding. For training of the projection
network, all the augmented versions of the training images are used with their class labels as targets.
Predictions are made by calculating the cosine similarity between the class-level prototype embeddings
and the observation-level test embeddings. Hereafter, this series of functions to transform a collection
of training images into prototype embeddings and test images into observation embeddings to produce
class-wise cosine similarities through the use of a specific combination of image augmentations, frozen
embedding models, and a trained projection network will be referred to as an embedding pipeline.
outputs along the embedding dimension, followed by a projection via a multilayer perceptron, and
parameterized by  . Let   denote the support set for class  . The class prototype p ∈ ℝ is defined as:
number of augmented images in the observation:</p>
        <p>Each observation  consists of a set of images ℐ = { 1,  2, … ,   }, and each image   has a set of
augmentations  (

) = { 
(0)
,  
(1), … ,  
(  )}, where</p>
        <p>(0) is the original image. Let   denote the total</p>
        <p>The final observation embedding z ∈ ℝ is computed as the mean of all augmented instance image
embeddings belonging to the observation:</p>
        <p>Per embedding pipeline, the embeddings used for the prototype embeddings and the test observation
embeddings were projected using the same trained projection network. An ensemble of 5 embedding
pipelines was used to generate the final predictions. These embedding pipelines difered only by the
training-validation split and initialization of the projection network. The validation portion was used
for early stopping during training of the projection network. The softmax probabilities over the classes
for each model were generated from the cosine similarities between each test observation and the
prototype embedding for each class. The ensemble average softmax probability of the cosine similarities
was used to rank the classes per observation. The top 10 classes were returned per observation as was
p =
1
|  |
∑   (  )
  ∈ 
  =
∑ | (  )|
  ∈ℐ
z =
1</p>
        <p>∑
    ∈ℐ 

()
∑
∈ (  )
  ( () )
(3)
(4)
(5)
expected of the participants. Only the first 5 of these 10 classes factored into the leaderboard ranking,
however, since the competition evaluation metric was the recall at top-5.</p>
      </sec>
      <sec id="sec-3-4">
        <title>3.4. Image augmentation</title>
        <p>The same image augmentations were performed for both the training samples and test time
augmentations. This was done both for simplicity and also to maximize agreement between the prototypes and
the test embeddings. Only geometric augmentations were used in the winning solution. The specific
augmentations utilized were: 80% center crop, 80% top left crop, 80% top right crop, 80% bottom left
crop, 80% bottom right crop, horizontal flip, 90-degree rotation, 270-degree rotation, 15-degree rotation,
and 345-degree rotation.</p>
      </sec>
      <sec id="sec-3-5">
        <title>3.5. Embedding models</title>
        <p>
          All experiments were performed on a machine with a single NVIDIA RTX 3090 graphics card and all
models were trained using PyTorch [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ]. Embeddings were generated from augmented images using
pretrained models. A simple two-layer network was trained to project the embeddings from these
models into a new embedding space as described in the follow section, but the pretrained models were
not fine-tuned.
        </p>
        <p>
          For all models, after the geometric augmentations were performed, the augmented image was resized
with bicubic interpolation to 1.14x the final image size used for that model and then center cropped to
the final image size. 1.14 was taken from the widely adopted practice of resizing to 256 before taking a
square crop of 224. This is common in ImageNet [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ] pre-processing and can be seen in AlexNet [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ].
        </p>
        <p>The final image sizes used are as follows:
- BEiT-Base/p16: 3842
- DINOv2-Base: 4342
- DINOv2-Large: 5182
- SAM-ViT-Huge: 10242</p>
      </sec>
      <sec id="sec-3-6">
        <title>3.6. Projection network training</title>
        <p>
          A two-layer network was trained to project the concatenated embeddings into a new embedding that
better discriminated between the classes. Using the labels for each augmented image per instance, the
network was trained using PyTorch to project the concatenated embeddings into an embedding with
dimensionality of 768. The model consists of an input layer mapped to a hidden layer with dimensionality
2048, followed by an output layer with dimensionality 768. Both layers are fully connected, with ReLU
activation after the first layer. A batch size of 64 was used. The AdamW optimizer [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ] was used
with a learning rate of 1e-4 and a weight decay of 1e-4. Early stopping was used with a patience of 5
along with a random validation split. Training was stopped when the projection model validation loss
did not improve for 5 consecutive epochs and the weights with the best validation loss were restored.
The model was trained with cross-entropy and infoNCE [16] with temperature of 0.07. The infoNCE
implementation was used from [17]. The per-class probability for cross-entropy was determined based
on the softmax of the cosine similarity. The balance between the cross-entropy and infoNCE losses
was determined through two additional learned loss weighting parameters as in [18]. Across 5 random
seeds, we report the final learned weights immediately before early stopping was triggered, as well
as the range of values both weights explored during training (Table 1). These results indicate that
while both weights are learned dynamically, they converge to stable values with modest variation
across seeds. We observe a consistent upward trend in the InfoNCE weight over training, while the
cross-entropy weight first decreases and then increases again over the course of training. The mean
projection network wall-clock training time for 5 seeds was 294 seconds. For an ensemble, this scales
linearly with the number of pipelines.
        </p>
      </sec>
      <sec id="sec-3-7">
        <title>3.7. Embedding pipeline ensemble</title>
        <p>Multiple embedding pipelines are combined into an ensemble for the final predictions. For each
embedding pipeline in the ensemble, the softmax of the cosine similarities between the prototype
embedding for each class and the test embedding were calculated. The softmax probabilities per
embedding pipeline in the ensemble were then averaged to get the final class probabilities. The mean
inference wall-clock time for 5 seeds was 7.83 milliseconds per observation. For an ensemble, this scales
linearly with the number of pipelines unless inference is performed in parallel.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Evaluation performance and ablation studies</title>
      <p>The solution described in this study achieved 1st place on the private leaderboard for FungiCLEF 2025.
This section details the results of ablations for the various components of the solution described in the
previous section. For ablation experiments, models were trained using a split of the oficial training
set into new training and validation subsets (used for early stopping), and evaluated on the oficial
validation set (treated as a test set). Unless explicitly stated otherwise (e.g., Table 8 showing the private
leaderboard performance for the top teams), all results are reported on this oficial validation set. The
baseline for each of these ablations is a single embedding pipeline (instead of the final ensemble)
with the same seed for the training-validation split and projection network initialization. All ablation
experiments use deterministic seeding as described in Section 4.4.</p>
      <sec id="sec-4-1">
        <title>4.1. Image augmentation</title>
        <p>As shown in Table 2 and Table 3, the inclusion of train and test time augmentations are the largest
contributors to the performance of this solution. The inclusion of train time augmentations without test
time augmentations results in a Top-5 accuracy reduction of 26.4 percentage points while the removal
of both train time and test time augmentations results in a reduction of 27.1 percentage points.</p>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. Learned Projection</title>
        <p>Learning a projection of the concatenated embeddings improves model performance as shown in
Table 4. The projection networks utilized by our top-ranking solution were trained with a combination
of cross-entropy and infoNCE losses. Table 5 shows that this combined loss outperforms either loss
alone.</p>
      </sec>
      <sec id="sec-4-3">
        <title>4.3. Combining multiple embedding models</title>
        <p>Both combining models at the feature level and also ensembling predictions from multiple learned
projections of those embeddings improve performance. Table 6 shows that concatenating the embeddings
from multiple pretrained models outperforms using the embedding from a single pretrained model.
DINOv2-Large proves to be a particularly strong performer as a single model. Conversely, SAM-ViT-H
performs quite poorly without the context of the other embedding models. It appears that SAM-ViT-H
can be removed from the embedding model combination to decrease the computational demands of the
solution without degrading performance.</p>
        <p>Table 7 shows that an ensemble of embedding pipelines outperforms a single embedding pipeline. As
previously described, each member of the ensemble difered only by the training-validation split used
to train the projection model and the random initialization of the projection model. For this ensemble,
the seed for the training-validation split and the projection network initialization were diferent for
each member of the ensemble, since otherwise predictions from the ensemble would be identical to that
of a single pipeline.</p>
      </sec>
      <sec id="sec-4-4">
        <title>4.4. Seeding and Replicability</title>
        <p>To ensure reproducibility and statistical robustness, all ablation experiments used deterministic seeding.
Each configuration was run with 5 independent replicates, and we report mean ± standard deviation.</p>
        <p>Seeds were computed hierarchically as
 , = 1000 ⋅  + ,
(6)
where  ∈ {0, 1, 2, 3, 4} indexes the experimental replicate and  ∈ {0, 1, … , −1} indexes the ensemble
member ( = 0 for single pipelines, and  is the ensemble size). This structure ensures non-overlapping
seeds across replicates and ensemble members while maintaining reproducibility.</p>
        <p>For single pipelines, the same seed was used for both the training-validation split and the projection
model initialization. In ensembles, each member difered only by its corresponding seed, ensuring
diversity through variation in both data splits and projection model initializations.</p>
      </sec>
      <sec id="sec-4-5">
        <title>4.5. Leaderboard performance</title>
        <p>Private leaderboard performance for the top 10 ranking teams is shown in Table 8. Our models achieved
the best performance for the competition metric (Top-5 accuracy).</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Conclusions</title>
      <p>Simple methods are suficient to achieve state-of-the-art performance for few-shot classification of fungi
from image data. In this study, we described our winning approach for the FungiCLEF 2025 challenge.
Using pretrained image classification and feature extraction networks, embeddings can be cached and
subsequently used to train lightweight projection networks. These networks can be ensembled to
further boost performance. Concatenation of embeddings from multiple frozen embedding models
and averaging embeddings from multiple image augmentations perform well despite their simplicity.
Importantly, we show that test-time augmentation is critical to the performance of this method.</p>
    </sec>
    <sec id="sec-6">
      <title>Declaration on Generative AI</title>
      <p>During the preparation of this work, the author used Anthropic Claude Sonnet 4 in order to: Paraphrase
and reword. After using this tool/service, the author reviewed and edited the content as needed and
takes full responsibility for the publication’s content.
[16] A. van den Oord, Y. Li, O. Vinyals, Representation learning with contrastive predictive coding,
2019. URL: https://arxiv.org/abs/1807.03748. arXiv:1807.03748.
[17] K. Musgrave, S. J. Belongie, S.-N. Lim, Pytorch metric learning, ArXiv abs/2008.09164 (2020).
[18] A. Kendall, Y. Gal, R. Cipolla, Multi-task learning using uncertainty to weigh losses for scene
geometry and semantics, 2018. URL: https://arxiv.org/abs/1705.07115. arXiv:1705.07115.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>K.</given-names>
            <surname>Janouskova</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Matas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Picek</surname>
          </string-name>
          , Overview of FungiCLEF 2025:
          <article-title>Few-shot classification with rare fungi species</article-title>
          ,
          <source>in: Working Notes of CLEF 2025 - Conference and Labs of the Evaluation Forum</source>
          ,
          <year>2025</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>L.</given-names>
            <surname>Picek</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Kahl</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Goëau</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Adam</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Larcher</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Leblanc</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Servajean</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Janoušková</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Matas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Čermák</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Papafitsoros</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Planqué</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.-P.</given-names>
            <surname>Vellinga</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Klinck</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Denton</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. S.</given-names>
            <surname>Cañas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Martellucci</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Vinatier</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Bonnet</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Joly</surname>
          </string-name>
          , Overview of lifeclef 2025:
          <article-title>Challenges on species presence prediction and identification, and individual animal identification</article-title>
          ,
          <source>in: International Conference of the Cross-Language Evaluation Forum for European Languages (CLEF)</source>
          , Springer,
          <year>2025</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>L.</given-names>
            <surname>Picek</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Janouskova</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Cermak</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Matas</surname>
          </string-name>
          ,
          <article-title>Fungitastic: A multi-modal dataset and benchmark for image categorization</article-title>
          ,
          <year>2025</year>
          . URL: https://arxiv.org/abs/2408.13632. arXiv:
          <volume>2408</volume>
          .
          <fpage>13632</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>J.</given-names>
            <surname>Deng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Dong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Socher</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.-J.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Fei-Fei</surname>
          </string-name>
          ,
          <article-title>ImageNet: A large-scale hierarchical image database</article-title>
          ,
          <source>in: 2009 IEEE Conference on Computer Vision and Pattern Recognition</source>
          ,
          <year>2009</year>
          , pp.
          <fpage>248</fpage>
          -
          <lpage>255</lpage>
          . URL: https://ieeexplore.ieee.org/document/5206848. doi:
          <volume>10</volume>
          .1109/CVPR.
          <year>2009</year>
          .
          <volume>5206848</volume>
          , iSSN:
          <fpage>1063</fpage>
          -
          <lpage>6919</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>B. M.</given-names>
            <surname>Lake</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Salakhutdinov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. B.</given-names>
            <surname>Tenenbaum</surname>
          </string-name>
          ,
          <article-title>Human-level concept learning through probabilistic program induction</article-title>
          ,
          <source>Science</source>
          <volume>350</volume>
          (
          <year>2015</year>
          )
          <fpage>1332</fpage>
          -
          <lpage>1338</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>E.</given-names>
            <surname>Triantafillou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Zhu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Dumoulin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Lamblin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>U.</given-names>
            <surname>Evci</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Goroshin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Gelada</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Swersky</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.-A.</given-names>
            <surname>Manzagol</surname>
          </string-name>
          , H. Larochelle,
          <article-title>Meta-dataset: A dataset of datasets for learning to learn from few examples</article-title>
          ,
          <year>2020</year>
          . URL: https://arxiv.org/abs/
          <year>1903</year>
          .03096. arXiv:
          <year>1903</year>
          .03096.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>G.</given-names>
            <surname>Van Horn</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O. Mac</given-names>
            <surname>Aodha</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Song</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Cui</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Sun</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Shepard</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Adam</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Perona</surname>
          </string-name>
          ,
          <string-name>
            <surname>S. Belongie,</surname>
          </string-name>
          <article-title>The inaturalist species classification and detection dataset</article-title>
          ,
          <source>in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)</source>
          ,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>L.</given-names>
            <surname>Picek</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Sulc</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Matas</surname>
          </string-name>
          , Overview of FungiCLEF 2024:
          <article-title>Revisiting fungi species recognition beyond 0-1 cost</article-title>
          ,
          <source>in: Working Notes of CLEF 2024 - Conference and Labs of the Evaluation Forum</source>
          ,
          <year>2024</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>L.</given-names>
            <surname>Picek</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Šulc</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Matas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Heilmann-Clausen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T. S.</given-names>
            <surname>Jeppesen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Laessøe</surname>
          </string-name>
          , T. Frøslev, Danish Fungi 2020 -
          <article-title>Not Just Another Image Recognition Dataset</article-title>
          ,
          <source>in: 2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)</source>
          ,
          <year>2022</year>
          , pp.
          <fpage>3281</fpage>
          -
          <lpage>3291</lpage>
          . URL: http://arxiv.org/abs/2103. 10107. doi:
          <volume>10</volume>
          .1109/WACV51458.
          <year>2022</year>
          .
          <volume>00334</volume>
          , arXiv:
          <fpage>2103</fpage>
          .10107 [cs, eess].
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>H.</given-names>
            <surname>Bao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Dong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Piao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Wei</surname>
          </string-name>
          , Beit:
          <article-title>Bert pre-training of image transformers, 2022</article-title>
          . URL: https: //arxiv.org/abs/2106.08254. arXiv:
          <volume>2106</volume>
          .
          <fpage>08254</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>M.</given-names>
            <surname>Oquab</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Darcet</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Moutakanni</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H. V.</given-names>
            <surname>Vo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Szafraniec</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Khalidov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Fernandez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Haziza</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Massa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>El-Nouby</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Howes</surname>
          </string-name>
          , P.-Y. Huang,
          <string-name>
            <given-names>H.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Sharma</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.-W.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Galuba</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Rabbat</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Assran</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Ballas</surname>
          </string-name>
          , G. Synnaeve, I. Misra,
          <string-name>
            <given-names>H.</given-names>
            <surname>Jegou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Mairal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Labatut</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Joulin</surname>
          </string-name>
          , P. Bojanowski,
          <article-title>Dinov2: Learning robust visual features without supervision</article-title>
          ,
          <year>2023</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>A.</given-names>
            <surname>Kirillov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Mintun</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Ravi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Mao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Rolland</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Gustafson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Xiao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Whitehead</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. C.</given-names>
            <surname>Berg</surname>
          </string-name>
          , W.-Y. Lo,
          <string-name>
            <given-names>P.</given-names>
            <surname>Dollár</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Girshick</surname>
          </string-name>
          , Segment anything,
          <year>2023</year>
          . URL: https://arxiv.org/abs/2304.02643. arXiv:
          <volume>2304</volume>
          .
          <fpage>02643</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>A.</given-names>
            <surname>Paszke</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Gross</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Massa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Lerer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Bradbury</surname>
          </string-name>
          , G. Chanan,
          <string-name>
            <given-names>T.</given-names>
            <surname>Killeen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Lin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Gimelshein</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Antiga</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Desmaison</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Kopf</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>DeVito</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Raison</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Tejani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Chilamkurthy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Steiner</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Fang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Bai</surname>
          </string-name>
          , S. Chintala,
          <article-title>PyTorch: An Imperative Style, High-Performance Deep Learning Library</article-title>
          ,
          <source>in: Advances in Neural Information Processing Systems</source>
          , volume
          <volume>32</volume>
          ,
          <string-name>
            <surname>Curran</surname>
            <given-names>Associates</given-names>
          </string-name>
          , Inc.,
          <year>2019</year>
          . URL: https://papers.nips.cc/paper_files/paper/2019/hash/ bdbca288fee7f92f2bfa9f7012727740-Abstract.html.
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>A.</given-names>
            <surname>Krizhevsky</surname>
          </string-name>
          , I. Sutskever,
          <string-name>
            <given-names>G. E.</given-names>
            <surname>Hinton</surname>
          </string-name>
          ,
          <article-title>Imagenet classification with deep convolutional neural networks</article-title>
          , in: F. Pereira,
          <string-name>
            <given-names>C.</given-names>
            <surname>Burges</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Bottou</surname>
          </string-name>
          ,
          <string-name>
            <surname>K.</surname>
          </string-name>
          Weinberger (Eds.),
          <source>Advances in Neural Information Processing Systems</source>
          , volume
          <volume>25</volume>
          ,
          <string-name>
            <surname>Curran</surname>
            <given-names>Associates</given-names>
          </string-name>
          , Inc.,
          <year>2012</year>
          . URL: https://proceedings.neurips.cc/ paper_files/paper/2012/file/c399862d3b9d6b76c8436e924a68c45b-Paper.pdf.
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>I.</given-names>
            <surname>Loshchilov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Hutter</surname>
          </string-name>
          , Decoupled Weight Decay Regularization,
          <year>2019</year>
          . URL: http://arxiv.org/abs/ 1711.05101. doi:
          <volume>10</volume>
          .48550/arXiv.1711.05101, arXiv:
          <fpage>1711</fpage>
          .05101 [cs, math].
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>