<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <issn pub-type="ppub">1613-0073</issn>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Few-Shot Classification of Fungi Species Using Contrastive Representation Learning and Multimodal Fusion</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Lianping Lu</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Heng Yang</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Shuo Li</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Fang Liu</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Puhua Chen</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Wenping Ma</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Workshop</string-name>
        </contrib>
        <contrib contrib-type="editor">
          <string-name>FungiCLEF, Few-Shot Learning, Dynamic Weighting Contrastive Loss, Feature Fusion, Fine-grained Classification</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Intelligent Perception and Image Understanding Lab, Xidian University</institution>
        </aff>
      </contrib-group>
      <abstract>
        <p>The FungiCLEF2025 challenge pioneers few-shot fungi species classification through multimodal observational data integration, specifically targeting the critical bottleneck of identifying rare and under documented taxa in practical biodiversity conservation scenarios. In this work, we present a novel two-stage framework that synergizes: (1) feature space optimization via Dynamic Weighting Contrastive Loss (DWCL), and (2) cross-modal fusion of visual characteristics with ecological metadata to achieve joint representation of environmental context and fine-grained morphological patterns. Through these technical innovations, the framework ultimately secured 2nd place in the competition leaderboard. The code is publicly available at https://github.com/Looploop555/fungi.</p>
      </abstract>
      <kwd-group>
        <kwd>Multimodal Fusion</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>CEUR</p>
      <p>ceur-ws.org
encoded using BERT and subsequently fused with visual features through Q-Former [9] based
cross modal interaction.
• Two-Stage Decoupled Pipeline: By separating feature extraction and contrastive learning from
multimodal fusion and final classification, each phase can be optimized independently. The first
stage focuses on crafting highly discriminative visual embeddings, and the second stage integrates
complementary modal signals.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Work</title>
      <sec id="sec-2-1">
        <title>2.1. Fine-grained classification of Fungi</title>
        <p>The participating teams in FungiCLEF2023 [5, 10, 11, 12], primarily employed Transformer-based [13]
architectures for multimodal data processing, efectively combining visual features with metadata
through advanced fusion strategies. To address critical challenges in fungi classification, the solutions
incorporated specialized techniques including customized loss functions (such as Seesaw loss [14] and
poisonous-classification loss) for handling class imbalance and long-tailed distributions.
The methods in FungiCLEF2024 [4, 15, 16, 17], primarily focused on multi-modal fusion of visual
and metadata features using architectures like Swin Transformer V2 [18] and DINOv2, combined
with dynamic MLPs [19] or attention mechanisms for fine-grained species classification. To handle
open-set recognition, teams employed entropy-based rejection or generative adversarial approaches
like OpenGAN [15] to detect unknown species. Safety-critical optimization was emphasized through
poisonous-aware loss functions (e.g., heavily penalizing toxic misclassifications) and post hoc re-ranking
to minimize dangerous errors. Auxiliary supervision (e.g., genus-level losses) and techniques like Seesaw
Loss improved robustness against class imbalance.</p>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. Contrastive Learning</title>
        <p>In the field of fine-grained classification, contrastive learning loss functions demonstrate unique
advantages. Triplet Loss [20] constructs anchor-positive-negative triplets to enforce the distance between
the anchor and the positive example to be smaller than that between the anchor and the negative
example plus a margin. It aims to bring samples of the same class closer while pushing apart those
of diferent classes, but its sampling eficiency is constrained by negative sample selection strategies.
N-pair Loss [21] extends Triplet Loss by innovatively adopting a multi-negative parallel optimization
mechanism, establishing a ”1-positive-N-negative” contrast relationship within a single batch. However,
when certain fungi categories have too few samples, their contribution as negative samples diminishes.
Supervised Contrastive Loss [22] leverages supervised information to treat multiple samples from the
same class as positives and those from diferent classes as negatives. It pulls same class samples closer
in the embedding space while pushing apart diferent-class samples through contrastive learning. This
approach is particularly suitable for supervised learning scenarios, excelling especially in few-shot
learning and fine-grained classification tasks.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Method</title>
      <p>We propose a two‑stage framework for fine‑grained fungi classification. In the first stage, foundational
visual embeddings are extracted via DINOv2 and refined through a single layer Transformer encoder,
then optimized with our Dynamic Weighting Contrastive, which incorporates entropy-based sample
weighting and adaptive positive or negative pair construction to enhance intra class compactness and
inter class separation even under scarce data regimes. In the second stage, we generate structured text
from each specimen’s metadata, encode them with BERT, and fuse the resulting text embeddings with the
refined visual features using a Q‑Former with a set of learnable queries q. This multimodal representation
is trained with cross‑entropy loss to produce habitat aware classification outputs, achieving competitive
performance in FungiCLEF2025.</p>
      <sec id="sec-3-1">
        <title>DINOv2</title>
      </sec>
      <sec id="sec-3-2">
        <title>Vanilla ViT</title>
        <p>...</p>
      </sec>
      <sec id="sec-3-3">
        <title>Meta Data</title>
        <p>date: 2010-10-1
habitat: natural grassland
substrate: soil</p>
        <p>Template
“This fungi specimen was collected on
2020–10–1 in a natural grassland area,
growing on a soil substrate.”
g
n
i
an Fine-grained Feature Embedding
ir
T
g
n
ir
n
a
e
ievL ...
t
s
a
tr
n
o
C</p>
      </sec>
      <sec id="sec-3-4">
        <title>Q-Former</title>
        <p>...</p>
      </sec>
      <sec id="sec-3-5">
        <title>Queries</title>
        <p>q
Stage Ⅰ
Stage Ⅱ</p>
      </sec>
      <sec id="sec-3-6">
        <title>Classification</title>
      </sec>
      <sec id="sec-3-7">
        <title>Head</title>
        <sec id="sec-3-7-1">
          <title>3.1. Model Architecture</title>
          <p>In the first stage, we extract initial visual features from each fungi image using DINOv2 and feed them
into a Transformer based contrastive learning framework. This framework operates on pre-extracted
features from a standard ViT and employs a single layer Transformer encoder with a 16 heads self
attention mechanism to build a high dimensional attention space, efectively capturing fine‑grained
visual cues. In the second stage, for every fungi image, we construct a structured textual description
template from its observation metadata-year, month, day, habitat, and substrate as follows:
“This fungi specimen was collected on [year]–[month]–[day] in a [habitat] area, growing
on a [substrate] substrate.”</p>
          <p>We design a two‑stage model as shown in Figure 1. In the first stage, we concentrate on extracting
and refining visual features; in the second stage, we carry out multimodal fusion and classification.</p>
          <p>Subsequently, we employ BERT to encode the descriptions, then the generated text embeddings and
the first stage visual features are jointly fed into the Q-Former module as input. Q-Former serves as
the core component for cross modal fusion, establishing semantic relationships between image and
ecological text descriptors. A set of learnable query tokens q. is introduced to facilitate cross modal
interaction between textual and visual features. Through iterative updates via multi-head self attention,
the Q-Former generates query representations that fuse habitat semantics with visual information.
These representations are then projected through a classification head and optimized using cross-entropy
loss to produce the final species classification results.</p>
        </sec>
        <sec id="sec-3-7-2">
          <title>3.2. Training Strategy</title>
          <p>In the first stage, we designed the Dynamic Weighting Contrastive Loss, an enhanced supervised
contrastive loss function [22], which incorporates an entropy-based uncertainty weighting sampling
mechanism to prioritize hard examples for optimized model training. Notably, our improvements
to the standard loss function are as follows: First, uncertainty aware weighting: In loss calculation,
samples with higher prediction uncertainty are assigned greater weights, ensuring the model focuses
on ambiguous instances critical for fine-grained discrimination. Second, adaptive pair construction:
Positive pairs are formed by randomly sampling up to 4 instances per category, with a strict requirement
of at least 2 samples per category to form valid pairs. For categories with fewer than 2 samples, new
instances are generated via data augmentation to meet this constraint. Negative pairs are generated
across distinct categories using a uniform class sampling strategy to avoid model bias. This design
stabilizes the contrastive learning process by balancing positive and negative pairs while dynamically
emphasizing samples that contribute most to reducing model uncertainty.</p>
          <p>Given a batch of  samples, let z denote the feature vector of the  -th sample (including augmented
instances for sparse categories). We first normalize the features:</p>
          <p>The pairwise similarity matrix is computed as:
The enhanced loss function is defined as:
ẑ =
z

‖z ‖2
  = ẑ ⋅ ẑ⊤
ℒ =</p>
          <p>1
∑∈   ∈
∑   ⋅ | ()|
1
∑ − log
∈()</p>
          <p>
            exp(  / )
∑∉ℐ () exp(  / )
(
            <xref ref-type="bibr" rid="ref1">1</xref>
            )
(
            <xref ref-type="bibr" rid="ref2">2</xref>
            )
(
            <xref ref-type="bibr" rid="ref3">3</xref>
            )
•  = { ∣ | ()| ≥ 2}
•   =  ( (
instances are included for sparse categories).
(self-similarity and invalid pairs).
          </p>
          <p>is the set of valid anchors.
•  is the temperature parameter.
•  () = { ∣</p>
          <p>=   ,  ≠ } denotes the set of positive samples for anchor  , with | ()| ≥ 2 (augmented
• ℐ () = {} ∪ { ∣</p>
          <p>mask = 0} represents invalid indices excluded by the triple masking mechanism
of predicted probabilities   ), and  is the sigmoid function.</p>
          <p>)) is the uncertainty weight for anchor  , where  (  ) = − ∑=1  , log  , (entropy</p>
          <p>In the second stage, the text embeddings and visual features are integrated and fed into the Q‑Former
module. Meanwhile, the learnable query tokens are initialized as q. Q‑Former performs interactive
fusion between the textual and visual features through multi-head self attention, progressively updating
the query tokens across multiple layers and representation subspaces to capture the fused multimodal
information. The output query representations from Q‑Former are then passed through a classification
head for species prediction, and the final classification results are supervised using a cross-entropy loss
function.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Experiment</title>
      <sec id="sec-4-1">
        <title>4.1. Experimental Settings</title>
        <p>Dataset. The FungiCLEF2025 challenge dataset is built from fungi observations submitted to the Atlas
of Danish Fungi before the end of 2023, with labels provided by mycologists. It includes not only
multiple photographs of the same specimen but also a wealth of supplementary data such as satellite
imagery, meteorological records, and structured metadata. The vast majority of observations have
been annotated with most of these attributes. As is shown in Table 1, The training set contains 4,293
observations, 7,819 images, and 2,427 classes, while the validation set has 1,099 observations, 2,285
images, and 570 classes. All of the images are also accompanied by tabular metadata and
automaticallygenerated text descriptions of the images. Each class in the training set has between 1-4 observations.
learning rate scheduler, and the initial learning rate set to 0.0002 and a batch size of 1024.</p>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. Evaluation Metric</title>
        <p>The evaluation metrics for this competition is the standard Top@ which is defined as the proportion
of instances where the true label is within the top  predicted labels:</p>
        <p>Top- Accuracy =
∑</p>
        <p>
          =1 (  ∈  ̂ ) ,

(
          <xref ref-type="bibr" rid="ref4">4</xref>
          )
•  is the total number of samples.
•   is the true label for the  -th sample.
•  ̂  is the set of top  predicted labels for the  -th sample.
        </p>
        <p>• (⋅) is the indicator function.</p>
        <p>We set  = 5 for the main evaluation metric.</p>
      </sec>
      <sec id="sec-4-3">
        <title>4.3. Fungi Dataset Experiments</title>
        <p>As detailed in Table 2, when using only DINOv2 pretrained visual features, the model demonstrates
relatively low Top5 accuracy, demonstrating that global visual features alone are insuficient for
distinguishing morphologically similar fungi species. The incorporation of the Transformer encoder led to a
significant improvement in accuracy, primarily attributed to the self-attention mechanism’s dynamic
focus on locally discriminative features. Further integration of habitat metadata boosted the model’s
accuracy to 76.991%, as the metadata provided complementary ecological information constraints to
the visual features.</p>
        <p>As detailed in Table 3, our enhanced loss function ensuring numerical robustness during training
and delivering optimal performance in fine-grained fungi classification tasks. The Dynamic Weighted
Contrastive Loss enhances the model’s discriminative capability by focusing on challenging samples
near decision boundaries, thereby improving classification performance for ambiguous cases.</p>
        <p>As shown in Table 4, when training on small-scale datasets, excessively deep architectures may
lead to overfitting, thereby reducing test set performance. The multi-head attention mechanism, as a
core component of Transformer, captures richer feature information by simultaneously attending to
diferent segments of the input sequence across multiple representation subspaces. In our experiments,
the 16 heads configuration demonstrated superior performance compared to the 32 heads setup. The
experimental results in Table 4 show that the model achieved high scores at 50, 100, and 150 training
epochs. Building upon these three optimal results, we adopted a weighted voting ensemble approach
[25] to integrate predictions from these top-performing models as our final competition submission.
The aggregated final score reached 78.137%.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Conclusion</title>
      <p>The proposed two-stage framework secured 2nd place in the FungiCLEF2025 competition. This
achievement was accomplished through the integration of pretrained DINOv2 feature embeddings, a customized
Transformer architecture, Dynamic Weighting Contrastive Loss, and metadata fusion strategies. Future
research will focus on exploring satellite data augmentation and explainable attention mechanisms to
facilitate practical field applications.</p>
    </sec>
    <sec id="sec-6">
      <title>6. Declaration on Generative AI</title>
      <p>During the preparation of this work, we did not use generative AI tools or services for writing assistance,
ifgure generation, or data analysis. All text, figures, and results were produced solely by the authors.
[5] L. Picek, M. Sulc, R. Chamidullin, J. Matas, Overview of fungiclef 2023: Fungi recognition beyond
1/0 cost., in: CLEF (Working Notes), 2023, pp. 1943–1953.
[6] L. Picek, M. Šulc, J. Heilmann-Clausen, J. Matas, Overview of FungiCLEF 2022: Fungi recognition
as an open set classification problem, in: Working Notes of CLEF 2022 - Conference and Labs of
the Evaluation Forum, 2022.
[7] A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani,
M. Minderer, G. Heigold, S. Gelly, et al., An image is worth 16x16 words: Transformers for image
recognition at scale, arXiv preprint arXiv:2010.11929 (2020).
[8] M. Oquab, T. Darcet, T. Moutakanni, H. Vo, M. Szafraniec, V. Khalidov, P. Fernandez, D. Haziza,
F. Massa, A. El-Nouby, et al., Dinov2: Learning robust visual features without supervision, arXiv
preprint arXiv:2304.07193 (2023).
[9] J. Li, D. Li, S. Savarese, S. Hoi, Blip-2: Bootstrapping language-image pre-training with frozen
image encoders and large language models, in: International conference on machine learning,
PMLR, 2023, pp. 19730–19742.
[10] H. Ren, H. Jiang, W. Luo, M. Meng, T. Zhang, Entropy-guided open-set fine-grained fungi
recognition., in: CLEF (Working Notes), 2023, pp. 2122–2136.
[11] S. Wolf, J. Beyerer, Optimizing fine-grained fungi classification for diverse application-oriented
open-set metrics., in: CLEF (Working Notes), 2023, pp. 2159–2167.
[12] F. Hu, P. Wang, Y. Li, C. Duan, Z. Zhu, Y. Li, X.-S. Wei, A deep learning based solution to
fungiclef2023., in: CLEF (Working Notes), 2023, pp. 2051–2059.
[13] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, I. Polosukhin,</p>
      <p>Attention is all you need, Advances in neural information processing systems 30 (2017).
[14] J. Wang, W. Zhang, Y. Zang, Y. Cao, J. Pang, T. Gong, K. Chen, Z. Liu, C. C. Loy, D. Lin, Seesaw loss
for long-tailed instance segmentation, in: Proceedings of the IEEE/CVF conference on computer
vision and pattern recognition, 2021, pp. 9695–9704.
[15] J. Etheredge, Openwgan-gp for fine-grained open-set fungi classification, Working Notes of CLEF
(2024).
[16] B.-F. Tan, Y.-Y. Li, P. Wang, L. Zhao, X.-S. Wei, Say no to the poisonous fungi: An efective strategy
for reducing 0-1 cost in fungiclef2024, Training 1 (2024) 295–938.
[17] S. Wolf, P. Thelen, J. Beyerer, Poison-aware open-set fungi classification: Reducing the risk of
poisonous confusion, Working Notes of CLEF (2024).
[18] Z. Liu, H. Hu, Y. Lin, Z. Yao, Z. Xie, Y. Wei, J. Ning, Y. Cao, Z. Zhang, L. Dong, et al., Swin
transformer v2: Scaling up capacity and resolution, in: Proceedings of the IEEE/CVF conference
on computer vision and pattern recognition, 2022, pp. 12009–12019.
[19] L. Yang, X. Li, R. Song, B. Zhao, J. Tao, S. Zhou, J. Liang, J. Yang, Dynamic mlp for fine-grained
image classification by leveraging geographical and temporal information, in: Proceedings of the
IEEE/CVF conference on computer vision and pattern recognition, 2022, pp. 10945–10954.
[20] F. Schrof, D. Kalenichenko, J. Philbin, Facenet: A unified embedding for face recognition and
clustering, IEEE (2015).
[21] K. Sohn, Improved deep metric learning with multi-class n-pair loss objective, in: Advances in
Neural Information Processing Systems, volume 29, Curran Associates, Inc., 2016, pp. 1857–1865.
URL: https://proceedings.neurips.cc/paper/2016/file/6b180037abbebea991d8b1232f8a8ca9-Paper.
pdf.
[22] P. Khosla, P. Teterwak, C. Wang, A. Sarna, Y. Tian, P. Isola, A. Maschinot, C. Liu, D. Krishnan,
Supervised contrastive learning, Advances in neural information processing systems 33 (2020)
18661–18673.
[23] A. Paszke, Pytorch: An imperative style, high-performance deep learning library, arXiv preprint
arXiv:1912.01703 (2019).
[24] I. Loshchilov, F. Hutter, et al., Fixing weight decay regularization in adam, arXiv preprint
arXiv:1711.05101 5 (2017) 5.
[25] L. Breiman, Bagging predictors, Machine learning 24 (1996) 123–140.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>X.-S.</given-names>
            <surname>Wei</surname>
          </string-name>
          , Y.-
          <string-name>
            <given-names>Z.</given-names>
            <surname>Song</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O. Mac</given-names>
            <surname>Aodha</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Peng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Tang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Belongie</surname>
          </string-name>
          ,
          <article-title>Fine-grained image analysis with deep learning: A survey</article-title>
          ,
          <source>IEEE transactions on pattern analysis and machine intelligence</source>
          <volume>44</volume>
          (
          <year>2021</year>
          )
          <fpage>8927</fpage>
          -
          <lpage>8948</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>K.</given-names>
            <surname>Janouskova</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Matas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Picek</surname>
          </string-name>
          , Overview of FungiCLEF 2025:
          <article-title>Few-shot classification with rare fungi species</article-title>
          ,
          <source>in: Working Notes of CLEF 2025 - Conference and Labs of the Evaluation Forum</source>
          ,
          <year>2025</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>L.</given-names>
            <surname>Picek</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Kahl</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Goëau</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Adam</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Larcher</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Leblanc</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Servajean</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Janoušková</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Matas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Čermák</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Papafitsoros</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Planqué</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.-P.</given-names>
            <surname>Vellinga</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Klinck</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Denton</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. S.</given-names>
            <surname>Cañas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Martellucci</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Vinatier</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Bonnet</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Joly</surname>
          </string-name>
          , Overview of lifeclef 2025:
          <article-title>Challenges on species presence prediction and identification, and individual animal identification</article-title>
          ,
          <source>in: International Conference of the Cross-Language Evaluation Forum for European Languages (CLEF)</source>
          , Springer,
          <year>2025</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>L.</given-names>
            <surname>Picek</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Šulc</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Matas</surname>
          </string-name>
          , Overview of fungiclef 2024:
          <article-title>Revisiting fungi species recognition beyond 0-1 cost</article-title>
          , in
          <source>: CLEF</source>
          <year>2024</year>
          ,
          <year>2024</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>