<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Transfer Learning and Mixup for Fine-Grained Few-Shot Fungi Classification</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Jason Kahei Tam</string-name>
          <email>jtam30@gatech.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Murilo Gustineli</string-name>
          <email>murilogustineli@gatech.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Anthony Miyaguchi</string-name>
          <email>acmiyaguchi@gatech.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Georgia Institute of Technology</institution>
          ,
          <addr-line>North Ave NW, Atlanta, GA 30332</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Accurate identification of fungi species presents a unique challenge in computer vision due to fine-grained inter-species variation and high intra-species variation. This paper presents our approach for the FungiCLEF 2025 competition, which focuses on few-shot fine-grained visual categorization (FGVC) using the FungiTastic Few-Shot dataset. Our team (DS@GT) experimented with multiple vision transformer models, data augmentation, weighted sampling, and incorporating textual information. We also explored generative AI models for zero-shot classification using structured prompting but found them to significantly underperform relative to vision-based models. Our final model outperformed both competition baselines and highlighted the efectiveness of domainspecific pretraining and balanced sampling strategies. Our approach ranked 35/74 on the private test set in post-completion evaluation, this suggests additional work can be done on metadata selection and domain-adapted multi-modal learning. Our code is available at https://github.com/dsgt-arc/fungiclef-2025.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;LifeCLEF</kwd>
        <kwd>FungiCLEF</kwd>
        <kwd>Fine-Grained Visual Categorization (FGVC)</kwd>
        <kwd>Vision Transformers</kwd>
        <kwd>fungi</kwd>
        <kwd>species identification</kwd>
        <kwd>machine learning</kwd>
        <kwd>computer vision</kwd>
        <kwd>CEUR-WS</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <sec id="sec-1-1">
        <title>1.1. Dataset Overview</title>
        <p>
          The dataset provided for the competition is the few-shot subset of the FungiTastic dataset, a collection
of fungal records continuously collected over a twenty-year span [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ]. Each observation in the dataset
contains associated images, metadata, and vision language model (VLM), Molmo [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ], generated caption.
The metadata contains information such as date, location, substrate, and full taxonomic ranks. It is
important to note that the task is to classify images based on category_id, which has a slightly diferent
count than species. The training dataset contains 7,819 images with 2,413 unique species and 2,427
unique category_id. The validation dataset contains 2,285 images with 569 species and 570 unique
category_id. The test dataset contains 1,911 images with no taxonomic ranks. The provided image
dataset contains the training, validation, and testing sub-datasets. Each sub-dataset contains images in
diferent maximum pixel sizes, ranging from 300p to full-size images.
        </p>
        <p>The datasets do not have the same category_id distribution (Figure 3). In the chart, the category_id is
mapped to class ID and then sorted by frequency, with category_id 2383 appearing most frequently.
Both datasets exhibit class imbalance, with the most common class having approximately 30 images
and multiple classes having only 1 image.</p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>2. Related Work</title>
      <p>
        Previous work by the DS@GT group for FungiCLEF 2024 demonstrated the strong performance of
using DINOv2 vision transformers in image classification [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. Last year’s winner, Team IES, combined
image embeddings from Swin Transformer V2 [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] with metadata features from multi-layer-perception
for species classification [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ].
      </p>
    </sec>
    <sec id="sec-3">
      <title>3. Methodology</title>
      <p>
        Our benchmark approach uses PlantCLEF 2024 [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] embeddings, weighted sampling [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ], and Mixup
[
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]. Using of-the-shelf generative AI models, multi-modal approach of combining text embeddings
with image embeddings, and multi-objective loss were also explored. The competition evaluation metric
is top-k accuracy, with k = 5.
      </p>
      <p>Top-k Accuracy =
=1

1 ∑︁ 1 [︁ ∈</p>
      <p>ˆ ()]︁

(1)</p>
      <p>
        The cloud computing resources were funded by the Data Science at Georgia Tech (DS@GT). Data
and computing was hosted by the Partnership for an Advanced Computing Environment (PACE) at
Georgia Tech [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ].
      </p>
      <sec id="sec-3-1">
        <title>3.1. Benchmark Methodology</title>
        <sec id="sec-3-1-1">
          <title>Our benchmark methodology can be summarized in a few steps:</title>
        </sec>
        <sec id="sec-3-1-2">
          <title>1. Image embeddings from PlantCLEF 2024 model</title>
          <p>
            2. Weighted sampling to balance the training dataset
3. Mixup on batches during training
4. Linear classifier
5. Mixup loss with cross-entropy [
            <xref ref-type="bibr" rid="ref14">14</xref>
            ]
3.1.1. Dataset Preparation
We pre-computed both the image and text embeddings and stored them in parquet files for a modular
experimentation workflow. We encountered "Premature End of JPEG file" error when reading images
because some images did not end with the default hex code. This may result in unintended side efects
during training. This error was solved by loading the image with OpenCV and then saving it again [
            <xref ref-type="bibr" rid="ref15">15</xref>
            ].
There was one corrupted image in the validation 720p set; we did not use this image since we only used
the full-size images for our pipeline.
          </p>
          <p>
            All approaches except Generative AI involved using the pytorch lightning library in training the
classifier. The hyperparameters used are as follows: batch size of 256, 50 maximum epochs with early
stopping of 3, Adam [
            <xref ref-type="bibr" rid="ref16">16</xref>
            ] optimizer, learning rate of 5 · 10− 4 and no learning rate scheduler.
3.1.2. Image Embeddings
We experimented with multiple transformer models: Facebook DINOv2 [
            <xref ref-type="bibr" rid="ref17">17</xref>
            ], PlantCLEF 2024
pretrained model[
            <xref ref-type="bibr" rid="ref10">10</xref>
            ], FungiTastic BEiT [
            <xref ref-type="bibr" rid="ref5">5</xref>
            ], and FungiTastic ViT[
            <xref ref-type="bibr" rid="ref5">5</xref>
            ]. A summary of the models used is
shown in Table 1 and Table 2.
          </p>
          <p>
            DINOv2 was selected for its state-of-the-art performance in computer vision tasks and its strong
results in FungiCLEF 2024[
            <xref ref-type="bibr" rid="ref7">7</xref>
            ]. The PlantCLEF model was selected due to its foundation in DINOv2 and
its pre-training on 1.4 million plant images from the Pl@ntNet database, ofering the potential benefits
of transfer learning. The two FungiTastic models were selected because they were pre-trained on the
fungi images.
          </p>
          <p>
            The image embeddings from the PlantCLEF 2024 model used in our benchmark methodology have a
size of 768.
3.1.3. Weighted Sampling and Mixup
To mitigate the efects of class imbalance observed in Figure 3, we experimented with the
PyTorch’s WeightedRandomSampler [
            <xref ref-type="bibr" rid="ref11">11</xref>
            ] using weights calculated with inverse class frequency via
compute_sample_weight from the sklearn library, on the training dataset. This sampling strategy
was implemented in the data loader to ensure that minority classes were sampled more frequently
during training.
          </p>
          <p>
            We also experimented with Mixup [
            <xref ref-type="bibr" rid="ref12">12</xref>
            ] to increase the influence of minority classes. Mixup was
implemented in the classifier and applied to batches provided by the data loader. Mixup encourages
the classifier model to generalize better by interpolating features and labels between classes. In our
implementation, the embeddings extracted from the training dataset were linearly combined with a
shufled version to generate an augmented set using the equations:
˜ =   + (1 −  )
(2)
ℒMixup =  · ℒ ( (˜), ) + (1 −  ) · ℒ ( (˜),  )
(3)
where  ∼ Beta(,  ),  denotes image embeddings,  denotes the label targets,  indexes the original
mini-batch, and  indexes a randomly shufled version of the same mini-batch. In the competition
approach,  = 2.0, 256 batch size, and 10 epochs were used to evaluate the impact of Mixup. An
 = 2.0 was inspired by Manifold Mixup [
            <xref ref-type="bibr" rid="ref20">20</xref>
            ] to encourage an greater generalization due to the small
dataset.
          </p>
          <p>
            In our post-competition evaluation, we increased the epochs used in the Mixup [
            <xref ref-type="bibr" rid="ref12">12</xref>
            ] only approach
to 50 to make its results more comparable with the other approaches discussed in Section 4 Results as
all other approaches used 50 max epochs. Here, we evaluated the results using  values ranging from
[0.1, 2.0], which encompasses the recommended ranges from the Mixup [
            <xref ref-type="bibr" rid="ref12">12</xref>
            ] paper and the Manifold
Mixup [
            <xref ref-type="bibr" rid="ref20">20</xref>
            ] paper (Figure 5). An  = 1.20 and  = 1.45 achieved the highest public score and these
two values were used to run additional experiments with a finetuned Mixup and weighted sampling.
          </p>
        </sec>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Additional Methodologies</title>
        <p>
          3.2.1. Text Embeddings
We used ModernBERT-Large[
          <xref ref-type="bibr" rid="ref21">21</xref>
          ], a state-of-the-art BERT variant optimized for eficiency, to compute
1024-dimensional text embeddings. We concatenated text from categories present in the test metadata
ifle with the generated captions to form a single string. The results were saved in a parquet file.
        </p>
        <p>
          In post-competition evaluation, we also used BioBERT-Large [
          <xref ref-type="bibr" rid="ref22">22</xref>
          ], a domain specific BERT based
model pre-trained on biomedical corpora to compute 1024-dimensional text embeddings. There is a
potential for transfer learning between biomedical texts and fungi textual information since both fall
under the domain of biology. Again, we concatenated text from categories present in the test metadata
ifle with the generated captions to form a single string.
        </p>
        <p>
          In a multi-modal classifier, the image and text embeddings are fed through their own linear layers
with the same output size of 256. The image and text embeddings are then concatenated, normalized,
and fed into a 512 input size linear classification layer.
3.2.2. Multi-Objective Loss GradNorm
Inspired by the evaluation metrics used in FungiCLEF 2024[
          <xref ref-type="bibr" rid="ref7">7</xref>
          ], we experimented with a multi-objective
classification framework to jointly predict category_id, poisonous, genus, and species. Each objective
has its own classification head and loss function: cross-entropy loss for category_id, genus, and species,
and binary cross-entropy loss for poisonous. To prevent a single objective from dominating classification
and to encourage balanced learning, we implemented GradNorm [
          <xref ref-type="bibr" rid="ref23">23</xref>
          ]. GradNorm allows dynamically
assigning weights for each objective when calculating loss. We introduced a learnable weight for each
objective and computed the gradient norms with respect to the shared parameters.
3.2.3. Generative AI
We explored the use of generative AI techniques to predict species in the dataset. Many commercially
available multi-modal large language models are vision-language models, where vision and language
modalities are fused through an attention mechanism. We implement a zero-shot prompting method
across three API providers using the OpenRouter platform and leverage structured output to enforce
the structural regularity of the results.
        </p>
        <p>Model Name
google/gemini-2.0-flash-001
openai/gpt-4.1-mini-2025-04-14
google/gemini-2.5-flash-preview-04-17
mistralai/mistral-medium-3</p>
        <p>Release Date
2025-02-05
2025-04-14
2025-04-17
2025-05-07</p>
        <p>Context
(tokens)
1,048,576
1,047,576
1,048,576
131,072</p>
        <p>Input
($/M)
$0.10
$0.40
$0.15
$0.40</p>
        <p>Output Vision Input
($/M) ($/K images)
$0.40
$1.60
$0.60
$2.00
$0.026
N/A
$0.619
N/A</p>
        <p>We perform three rounds of prompting across family, genus, and species per test image to
logarithmically reduce the search space and ensure that only species within the training set are used. Each
round of prompting relies on the prompt used in listing 3.2.3. We append a yaml list of all the candidate
items to rank. We append all available images for a single image ID (which can range from one to a
dozen images) as context to the completion. We request a list of 20 ranked candidates, including an
item name and a corresponding confidence score. The results are validated against the candidate list
and accepted if at least half of the results are valid, i.e., there exists an item that is within 90% of the
string by normalized edit distance. For human debugging purposes, we also have the LLM generate a
reason for the decision.</p>
        <p>Accurately identify and assign the correct {class_type} label to each image of
fungi, protozoa, or chromista utilizing all provided image views and associated
metadata (location, substrate, season) to ensure precision, especially for
fine-grained distinctions. Choose the top twenty most relevant labels ranked in
order from the available class labels, a confidence on the Likert scale between
1-5 on not-confident to confident and provide short reasoning (in under 50
words) for your selection.</p>
        <p>In the first round of prompting, we provide a list of all families and ask the LLM to rank the top 20
species relevant to the test images. We use the most relevant families to generate a candidate list of
genera. We then use this to generate a candidate list of species. We provide the top 10 species as the
ifnal result of the competition.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Results</title>
      <sec id="sec-4-1">
        <title>4.1. Image Embeddings Results</title>
        <p>
          The best performing models to pre-compute the image embeddings were the PlantCLEF 2024 [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ] model
and the FungiTastic ViT [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ] model. The embeddings from each model were passed into a linear layer
to generate the predictions. The top-5 accuracy public score is then used to select the model to use in
our benchmark methodology (Table 4). PlantCLEF 2024 [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ] was selected as our baseline classifier and
incorporated into our best performing approach.
        </p>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. Ablation Study</title>
        <p>
          The results from our various approaches are compiled in Table 5. Our best in-competition approach
was with Mixup [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ]  = 2.0 and weighted sampling with a private top-5 accuracy score of 40.75. In
post-competition evaluation, a finetuned  = 1.20 achieved the highest private score when combined
with weighted sampling and  = 1.45 achieved the highest private score when used by itself.
        </p>
        <p>
          We found that Mixup with a tuned  is the single technique with the greatest positive impact with an
increase of 4.27% on the private score. Weighted sampling provided a much smaller increase in accuracy
on its own and had minimal efect when combined with a tuned  . Lastly, we find that incorporating
metadata + caption and a multi-objective GradNorm [
          <xref ref-type="bibr" rid="ref23">23</xref>
          ] approach of classifying category_id, poisonous,
species, and genus had a negative impact on the prediction accuracy.
        </p>
      </sec>
      <sec id="sec-4-3">
        <title>4.3. Leaderboard Results</title>
        <p>
          Our team’s result beat both competition baselines: BioCLIP [
          <xref ref-type="bibr" rid="ref24">24</xref>
          ] + FAISS + Prototypes and BioCLIP +
FAISS + NN. However, our result falls short of the leaders in the competition. We are ranked 37/74 on
the public leaderboard and 35/74 on the private leaderboard. These results are summarized in Table 6
        </p>
      </sec>
      <sec id="sec-4-4">
        <title>4.4. Validation Dataset Performance</title>
        <p>In Figure 6, we plot the class frequency versus the top-5 accuracy on the validation dataset. The class
frequency is calculated on a per image basis, and not a per observation basis. There can be multiple
images under an observation. The concentration of points at accuracy 1.0 and 0.0 (shown as darker
points) at the rarer classes shows that the classifier often achieves perfect or zero Top-5 accuracy due to
the small sample size. This highlights the volatility in the classification accuracy in class imbalanced
datasets.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Discussion</title>
      <sec id="sec-5-1">
        <title>5.1. Weighted Sampling and Mixup</title>
        <p>
          As seen in Table 5, Mixup [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ] with a tuned  = 1.20 and  = 1.45 had the greatest positive impact on
our baseline accuracy. This is diferent from the recommended range of [0.1,0.4] for  suggested by the
Mixup paper and could be because the Mixup is applied at the feature level instead at the raw inputs.
Applying Mixup at the feature level is closer to the Manifold Mixup [
          <xref ref-type="bibr" rid="ref20">20</xref>
          ] approach which applies Mixup
between intermediate layers of a neural network and uses  = 2.00. From Figure 5, there is no clear
monotonic trend as  changes, however all values of  except for two resulted in an improvement over
the baseline. The lack of a monotonic trend may suggest adding more learnable layers to our classifier
is needed to flatten class boundaries and reduce volatility as seen in Manifold Mixup [
          <xref ref-type="bibr" rid="ref20">20</xref>
          ].
        </p>
        <p>
          Weighted sampling [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ] was another approach that had a positive impact on our baseline accuracy,
albeit to a smaller extent than a tuned Mixup. The modest increase in accuracy indicates that while
it helps the model see rare classes more often, it alone does not suficiently address the challenges of
learning robust patterns for underrepresented classes. The mixed results when combined with Mixup
shows that there is diminishing returns in applying multiple sampling approaches.
        </p>
      </sec>
      <sec id="sec-5-2">
        <title>5.2. Image Embeddings Generation</title>
        <p>
          Among the diferent models evaluated for image embedding generation, the PlantCLEF 2024 and
FungiTastic ViT models performed the best (Table 4. These two models slightly edged out general use
DINOV2. Although the improvement is small, this suggests that domain-adapted models can ofer an
advantage over general use models in few-shot fine grained species classification. This finding is similar
to the few-shot results presented in the FungiTastic paper [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ], in which BioCLIP[
          <xref ref-type="bibr" rid="ref24">24</xref>
          ] outperformed
DINOv2 and CLIP[
          <xref ref-type="bibr" rid="ref25">25</xref>
          ].
        </p>
      </sec>
      <sec id="sec-5-3">
        <title>5.3. Text Embeddings</title>
        <p>
          The inclusion of metadata and captions had a negative impact on our classifier performance. This is
contrary to the findings presented in the FungiTastic paper [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ], where the incorporation of metadata
did improve performance. This discrepancy is likely due to our inclusion of extraneous or weakly
informative metadata such as district, countryCode, and hasCoordinate.
        </p>
      </sec>
      <sec id="sec-5-4">
        <title>5.4. Generative AI Results</title>
        <p>Current-generation multi-modal LLMs are not efective at generalizing to the domain-specific task of
labeling fungus images, at least within our price range. Our best model is from the Gemini family of
models, scoring around 13% on the private leaderboard. Our choice of models is dictated by cost. For
example, Gemini Pro is about 10 times as expensive as the Flash series of models. Gemini Flash was at a
level of cost-efectiveness that we were willing to experiment with, and initial experimentation led us
to hold of on trying models with higher token usage, which is generally associated with "thinking" or
"reasoning" capabilities. We then chose GPT-4.1-mini and Mistral as models that had both structured
output and image inputs. The set of models that accepted both of these constraints is much smaller
than we would have liked and precluded models such as Anthropic Claude. We summarize the costs
associated with this approach in table 7, which comes close to a total of $30 over 15k requests.</p>
        <p>Note that while we limited models to structured outputs, we can simulate this in a two-pass
methodology, where a stronger model generates results in a particular semi-structured shape, and a second,
smaller but cheaper model converts this into a structured output via JSON Schema. However, this
requires more boilerplate code and efort than we were willing to explore at this point, given the
performance relative to stronger vision-first approaches. We also note that our three-round approach
was necessary because there are limits to the structured schema API. For example, one of the first things
we tried was to return a list of strings where a string must be part of a particular enumeration. However,
enumerations are supported only up to a certain number of elements, which are undocumented, if
supported at all. Another reason is that the context window significantly influences which elements
are recalled from the list of available class elements. If we were to include all species in one big list,
there is a good chance that not every species would be considered from that list due to limitations of
context locality. This behavior is challenging to describe quantitatively due to the accelerated pace of
development of these models in production and the associated cost of running experiments.</p>
        <p>We also note a few limitations in our methodology. First, LLMs are strongly afected by the amount of
stochasticity introduced at token generation time (i.e., temperature). As such, the runs of our algorithm
will change significantly over time, making it challenging to reproduce our results exactly. However,
there are two approaches to mitigate reproduction issues, given that the cost of a single Gemini test run
is about $2. The first option is to lower the temperature of the model, which is often supported. Another
approach is to run several iterations of a model and aggregate the final results. The ideal solution would
take on a Monte-Carlo tree search flavor, where we would sample the top-k elements many times and
produce some probabilistic taxonomic tree based on knowledge embedded in the LLM.</p>
        <p>
          FungiCLEF is a domain-specific task that is relatively resource-poor relative to the general task of
information recall from large pools of publicly available text. However, it is impressive that these
models can get any results at all. It would be interesting to gain a deeper understanding of the
visionquestion-based capabilities of these models, perhaps by using a smaller subset that considers the general
challenges of the fungi dataset while managing costs. What might make the most sense is fine-tuning
a smaller VLM, such as Gemma [
          <xref ref-type="bibr" rid="ref26">26</xref>
          ], Phi [27], or Llama [28], on the FungiCLEF dataset and seeing
whether these smaller models can be efectively tuned for domain-specific language queries.
        </p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>6. Future Work</title>
      <p>
        Improvements can be made to address to disparity in class distributions between the training and
validation datasets as observed in Figure 3. One approach, seen in previous research, is to combine the
provided datasets [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], then we can resplit them to have similar distributions. In addition, improvements
could be made to the selection and processing of textual data. Rather than incorporating all available
metadata fields, future work should prioritize informative features. As a starting point, we propose
using the three metadata attributes highlighted in the FungiTastic paper [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], which have been shown to
be efective in improving model performance. In addition, more learnable layers can be added to the
classifier with Mixup [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] applied at a random layer to more closely follow the approach proposed in
Manifold Mixup [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ]. Continuing with our generative AI approach, one can experiment with costlier
models, or implement a Monte-Carlo tree search as discussed in Section 5.4.
      </p>
    </sec>
    <sec id="sec-7">
      <title>7. Conclusions</title>
      <p>In this paper, we present our approach to tackle the challenge of FungiCLEF 2025 few-shot fine-grained
visual classification (FGVC) using vision transformer embeddings. We explored a range of models, such
as DINOv2, PlantCLEF 2024, and FungiTastic pre-trained models, ultimately selecting the PlantCLEF
2024 model for our benchmark approach due to its strong performance and transfer learning. To mitigate
the class imbalance in the dataset, we implemented weighted sampling and Mixup, with Mixup providing
the most significant performance gain. We also experimented with incorporating textual metadata and
multi-objective learning GradNorm, but found these approaches to be detrimental, likely due to noisy or
weakly informative inputs. Our final competition and post-competition classifiers outperformed both
competition baselines and demonstrated the importance of domain-specific embeddings and balancing
strategies. However, a significant performance gap with the leaders in the competition indicates the
need for further exploration of alternate classifier architectures and improved metadata integration.</p>
    </sec>
    <sec id="sec-8">
      <title>Acknowledgements</title>
      <p>
        We thank the Data Science at Georgia Tech (DS@GT) CLEF competition group for their support. This
research was supported in part through research cyberinfrastructure resources and services provided by
the Partnership for an Advanced Computing Environment (PACE) at the Georgia Institute of Technology,
Atlanta, Georgia, USA [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ].
      </p>
    </sec>
    <sec id="sec-9">
      <title>8. Declaration on Generative AI</title>
      <p>During the preparation of this work, the authors used Chat-GPT-4 in order to: drafting content, Grammar
and spelling check. After using these tool(s)/service(s), the authors reviewed and edited the content as
needed and take full responsibility for the publication’s content.
arXiv:2403.08295 (2024).
[27] M. Abdin, J. Aneja, H. Awadalla, A. Awadallah, A. A. Awan, N. Bach, A. Bahree, A. Bakhtiari,
J. Bao, H. Behl, et al., Phi-3 technical report: A highly capable language model locally on your
phone, arXiv preprint arXiv:2404.14219 (2024).
[28] H. Touvron, T. Lavril, G. Izacard, X. Martinet, M.-A. Lachaux, T. Lacroix, B. Rozière, N. Goyal,
E. Hambro, F. Azhar, et al., Llama: Open and eficient foundation language models, arXiv preprint
arXiv:2302.13971 (2023).</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>R.</given-names>
            <surname>Lücking</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Aime</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Robbertse</surname>
          </string-name>
          , et al.,
          <article-title>Unambiguous identification of fungi: where do we stand and how accurate and precise is fungal dna barcoding?</article-title>
          ,
          <source>IMA Fungus</source>
          (
          <year>2020</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <surname>Klarka</surname>
          </string-name>
          , picekl,
          <source>FungiCLEF</source>
          <year>2025</year>
          @
          <article-title>CVPR-FGVC &amp; LifeCLEF, 2025</article-title>
          . URL: https://kaggle.com/ competitions/fungi-clef-
          <year>2025</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>A.</given-names>
            <surname>Joly</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Picek</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Kahl</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Goëau</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Adam</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Botella</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Servajean</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Marcos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Leblanc</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Larcher</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Matas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Janoušková</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Čermák</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Papafitsoros</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Planqué</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.-P.</given-names>
            <surname>Vellinga</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Klinck</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Denton</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Bonnet</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Müller</surname>
          </string-name>
          ,
          <article-title>Lifeclef 2025 teaser: Challenges on species presence prediction and identification, and individual animal identification</article-title>
          ,
          <source>Advances in Information Retrieval</source>
          <year>2025</year>
          (
          <year>2025</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>L.</given-names>
            <surname>Picek</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Šulc</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Matas</surname>
          </string-name>
          , Overview of Fungiclef 2024:
          <article-title>Revisiting fungi species recognition beyond 0-1 cost</article-title>
          ,
          <source>CLEF 2024 Working Notes CEUR-WS</source>
          (
          <year>2024</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>L.</given-names>
            <surname>Picek</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Janoušková</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Cermak</surname>
          </string-name>
          ,
          <string-name>
            <surname>J. Matas,</surname>
          </string-name>
          <article-title>FungiTastic: A multi-modal dataset and benchmark for image categorization</article-title>
          ,
          <source>arXiv:2408.13632</source>
          (
          <year>2025</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>M.</given-names>
            <surname>Deitke</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Clark</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Tripathi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. S.</given-names>
            <surname>Park</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Salehi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Muennighof</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Lo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Soldaini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Lu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Anderson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Bransom</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Ehsani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Ngo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Patel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Yatskar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Callison-Burch</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Head</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Hendrix</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Bastani</surname>
          </string-name>
          , E. VanderBilt, N. Lambert,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Chou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Chheda</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Sparks</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Skjonsberg</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Schmitz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Sarnat</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Bischof</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Walsh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Newell</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Wolters</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Gupta</surname>
          </string-name>
          ,
          <string-name>
            <surname>K.-H. Zeng</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Borchardt</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Groeneveld</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Nam</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Lebrecht</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Wittlif</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Schoenick</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          <string-name>
            <surname>Michel</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Krishna</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          <string-name>
            <surname>Weihs</surname>
            ,
            <given-names>N. A.</given-names>
          </string-name>
          <string-name>
            <surname>Smith</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          <string-name>
            <surname>Hajishirzi</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Girshick</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Farhadi</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Kembhavi</surname>
          </string-name>
          ,
          <article-title>Molmo and pixmo: Open weights and open data for state-of-the-art vision-language models</article-title>
          ,
          <source>arXiv preprint arXiv:2409.17146</source>
          (
          <year>2024</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>C.</given-names>
            <surname>Chiu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Heil</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Kim</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Miyaguchi</surname>
          </string-name>
          ,
          <article-title>Fine-grained classification for poisonous fungi identification with transfer learning</article-title>
          ,
          <source>CLEF 2024 Working Notes CEUR-WS</source>
          (
          <year>2024</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Hu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Lin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhuliang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Xie</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Wei</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Ning</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Cao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Dong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Wei</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Guo</surname>
          </string-name>
          , Swin transformer v2:
          <article-title>Scaling up capacity and resolution</article-title>
          ,
          <source>CVPR</source>
          <year>2022</year>
          , arXiv:
          <fpage>2111</fpage>
          .09883 (
          <year>2022</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>S.</given-names>
            <surname>Wolf</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. H.</given-names>
            <surname>Thelen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Beyerer</surname>
          </string-name>
          ,
          <article-title>Poison-aware open-set fungi classification: Reducing the risk of poisonous confusion</article-title>
          ,
          <source>CLEF 2024 Working Notes CEUR-WS</source>
          (
          <year>2024</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>H.</given-names>
            <surname>Goëau</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.-C.</given-names>
            <surname>Lombardo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Afouard</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Espitalier</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Bonnet</surname>
          </string-name>
          ,
          <string-name>
            <surname>A</surname>
          </string-name>
          . Joly,
          <article-title>PlantCLEF 2024 Pretrained Models on the Flora of Southwestern Europe Based on a Subset of Pl@ntNet Collaborative Images and</article-title>
          a
          <source>ViT Base Patch 14 DINOv2</source>
          ,
          <year>2024</year>
          . URL: https://zenodo.org/records/10848263.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>C.</given-names>
            <surname>Hughes</surname>
          </string-name>
          ,
          <article-title>Demystifying PyTorch's WeightedRandomSampler by example</article-title>
          ,
          <year>2024</year>
          . URL: https://medium.com
          <article-title>/data-science/ demystifying-pytorchs-weightedrandomsampler-by-example-a68aceccb452.</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>H.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Cisse</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y. N.</given-names>
            <surname>Dauphin</surname>
          </string-name>
          ,
          <string-name>
            <surname>D.</surname>
          </string-name>
          <article-title>Lopez-Paz, mixup: Beyond empirical risk minimization</article-title>
          ,
          <source>ICLR</source>
          <year>2018</year>
          , arXiv:
          <fpage>1710</fpage>
          .09412 (
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <surname>PACE</surname>
          </string-name>
          ,
          <article-title>Partnership for an Advanced Computing Environment (PACE</article-title>
          ),
          <year>2017</year>
          . URL: http://www. pace.gatech.edu.
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14] PyTorch, CrossEntropyLoss,
          <year>2025</year>
          . URL: https://docs.pytorch.org/docs/stable/generated/torch.nn. CrossEntropyLoss.html.
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>K.</given-names>
            <surname>Poulinakis</surname>
          </string-name>
          , Img_Premature_Ending-Detect_Fix.py,
          <year>2021</year>
          . URL: https://github. com/Poulinakis-Konstantinos/ML-util-functions/blob/master/scripts/Img_Premature_ Ending-Detect_Fix.py.
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>D. P.</given-names>
            <surname>Kingma</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Ba</surname>
          </string-name>
          ,
          <article-title>Adam: A method for stochastic optimization</article-title>
          ,
          <source>ICLR</source>
          <year>2015</year>
          , arXiv:
          <fpage>1412</fpage>
          .6980 (
          <year>2017</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>M.</given-names>
            <surname>Oquab</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Darcet</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Moutakanni</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H. V.</given-names>
            <surname>Vo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Szafraniec</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Khalidov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Fernandez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Haziza</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Massa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>El-Nouby</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Assran</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Ballas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Galuba</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Howes</surname>
          </string-name>
          , P.-Y. Huang,
          <string-name>
            <given-names>S.-W.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Misa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Rabbat</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Sharma</surname>
          </string-name>
          , G. Synnaeve,
          <string-name>
            <given-names>H.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Jegou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Mairal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Labatut</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Joulin</surname>
          </string-name>
          , P. Bojanowski,
          <article-title>DINOv2: Learning robust visual features without supervision</article-title>
          ,
          <source>Transactions on Machine Learning Research, arXiv:2304.07193</source>
          (
          <year>2024</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>H.</given-names>
            <surname>Bao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Dong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Piao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Wei</surname>
          </string-name>
          , Beit:
          <article-title>Bert pre-training of image transformers</article-title>
          ,
          <source>ICLR</source>
          <year>2022</year>
          , arXiv:
          <fpage>2106</fpage>
          .08254 (
          <year>2022</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>A.</given-names>
            <surname>Dosovitskiy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Beyer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Kolesnikov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Weissenborn</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Zhai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Unterthiner</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Dehghani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Minderer</surname>
          </string-name>
          , G. Heigold,
          <string-name>
            <given-names>S.</given-names>
            <surname>Gelly</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Uszkoreit</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Houlsby</surname>
          </string-name>
          ,
          <article-title>An image is worth 16x16 words: Transformers for image recognition at scale</article-title>
          ,
          <source>ICLR</source>
          <year>2021</year>
          , arXiv:
          <year>2010</year>
          .
          <volume>11929</volume>
          (
          <year>2021</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>V.</given-names>
            <surname>Verma</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Lamb</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Beckham</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Najafi</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Mitliagkas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Courville</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Lopez-Paz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Bengio</surname>
          </string-name>
          ,
          <article-title>Manifold mixup: Better representations by interpolating hidden states</article-title>
          ,
          <source>ICML</source>
          <year>2019</year>
          , arXiv:
          <year>1806</year>
          .
          <volume>05236</volume>
          (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>B.</given-names>
            <surname>Warner</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Chafin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Clavié</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Weller</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Hallström</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Taghadouini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Gallapher</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Biswas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Ladhak</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Aarsen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Cooper</surname>
          </string-name>
          , G. Adams,
          <string-name>
            <given-names>J.</given-names>
            <surname>Howard</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Poli</surname>
          </string-name>
          , Smarter, better, faster, longer
          <article-title>: A modern bidirectional encoder for fast, memory eficient, and long context finetuning and inference</article-title>
          ,
          <source>arXiv:2412.13663</source>
          (
          <year>2024</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <given-names>J.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Yoon</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Kim</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Kim</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Kim</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. H.</given-names>
            <surname>So</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Kang</surname>
          </string-name>
          ,
          <article-title>Biobert: a pre-trained biomedical language representation model for biomedical text mining</article-title>
          ,
          <source>Bioinformatics</source>
          <year>2019</year>
          , arXiv:
          <year>1901</year>
          .
          <volume>08746</volume>
          (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Badrinarayanan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.-Y.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Rabinovich</surname>
          </string-name>
          , Gradnorm:
          <article-title>Gradient normalization for adaptive loss balancing in deep multitask networks</article-title>
          ,
          <source>ICML</source>
          <year>2018</year>
          , arXiv:
          <fpage>1711</fpage>
          .02257 (
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [24]
          <string-name>
            <given-names>S.</given-names>
            <surname>Stevens</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. J.</given-names>
            <surname>Thompson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E. G.</given-names>
            <surname>Campolongo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. H.</given-names>
            <surname>Song</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. E.</given-names>
            <surname>Carlyn</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Dong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W. M.</given-names>
            <surname>Dahdul</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Stewart</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Berger-Wolf</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.-L.</given-names>
            <surname>Chao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Su</surname>
          </string-name>
          ,
          <article-title>Bioclip: A vision foundation model for the tree of life</article-title>
          ,
          <source>CVPR</source>
          <year>2024</year>
          , arXiv:
          <fpage>2311</fpage>
          .18803 (
          <year>2024</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          [25]
          <string-name>
            <given-names>A.</given-names>
            <surname>Radford</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. W.</given-names>
            <surname>Kim</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Hallacy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Ramesh</surname>
          </string-name>
          , G. Goh,
          <string-name>
            <given-names>S.</given-names>
            <surname>Agarwal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Sastry</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Askell</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Mishkin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Clark</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Krueger</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Sutskever</surname>
          </string-name>
          ,
          <article-title>Learning transferable visual models from natural language supervision</article-title>
          ,
          <source>ICML</source>
          <year>2021</year>
          , arXiv:
          <fpage>2103</fpage>
          .00020 (
          <year>2021</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          [26]
          <string-name>
            <given-names>G.</given-names>
            <surname>Team</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Mesnard</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Hardin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Dadashi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Bhupatiraju</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Pathak</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Sifre</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Rivière</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. S.</given-names>
            <surname>Kale</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Love</surname>
          </string-name>
          , et al.,
          <source>Gemma: Open models based on gemini research and technology, arXiv preprint</source>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>