<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>Madrid, Spain
* Corresponding author.
$ lukaspicek@gmail.com (L. Picek)</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Overview of FungiCLEF 2025: Few-Shot Classification With Rare Fungi Species</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Klara Janouskova</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jiří Matas</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Lukas Picek</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Cybernetics, Faculty of Applied Sciences, University of West Bohemia</institution>
          ,
          <country country="CZ">Czech Republic</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Inria, LIRMM, University of Montpellier</institution>
          ,
          <country country="FR">France</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Visual Recognition Group, Faculty of Electrical Engineering, Czech Technical University in Prague</institution>
          ,
          <country country="CZ">Czech Republic</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2025</year>
      </pub-date>
      <volume>000</volume>
      <fpage>0</fpage>
      <lpage>0002</lpage>
      <abstract>
        <p>FungiCLEF 2025, the 4th edition of the FungiCLEF challenge, was organized as part of the LifeCLEF and the FGVC workshops. This year's edition targeted few-shot classification of rare fungi species. Participants were tasked with identifying species from multimodal observations, including images, structured metadata, and environmental data. The data was collected through citizen science and underwent expert-based labeling. Building upon the FungiTastic dataset, FungiCLEF 2025 emphasized real-world constraints such as limited training samples, high intra-class variability, fine-grained inter-class similarities, and distribution shift. The competition attracted 74 teams, with the leading submissions demonstrating significant gains over the provided baselines, showcasing the potential of pretrained vision transformers, contrastive learning, and ensemble techniques. This overview summarizes the challenge setup, dataset, baselines, participant strategies, and key nfidings, and outlines directions for future work. The winning team achieved a top-5 accuracy of 78.9%, outperforming baselines by over 52%.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;LifeCLEF</kwd>
        <kwd>FungiCLEF</kwd>
        <kwd>fine-grained</kwd>
        <kwd>classification</kwd>
        <kwd>multi-modal</kwd>
        <kwd>fungi</kwd>
        <kwd>species</kwd>
        <kwd>machine learning</kwd>
        <kwd>computer vision</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>these species lies in comprehensive and time-consuming techniques such as DNA sequencing or the
examination of microscopic structures like spores.</p>
      <p>
        To address these challenges, FungiCLEF [10, 11, 12] has been organized annually, now in its 4th
edition, with the goal of advancing the state of the art in fine-grained recognition, fostering collaboration
between machine learning researchers and domain experts, raising public awareness about fungal
biodiversity, and promoting real-world impact through applied ML solutions. This year, the focus is on
few-shot fine-grained recognition, which is especially important in biological datasets, where species
exhibit a long-tail distribution and many classes have very limited annotated data. In the context of
species prediction, and fungi in particular, advances in few-shot recognition support tasks such as
biodiversity assessment, ecological monitoring, and the development of tools for citizen scientists,
where collecting large training datasets is often infeasible [
        <xref ref-type="bibr" rid="ref2 ref3 ref4">2, 3, 4, 13</xref>
        ].
      </p>
      <p>The competition attracted 74 teams, of which 6 submitted working notes detailing their solutions.
The remainder of this overview outlines the challenge, including its organization, data, and evaluation
protocol. This is followed by the overall results and a summary of the submitted working notes. The
conclusions highlight key findings from the competition and they suggest directions for future work.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Challenge Description</title>
      <p>The FungiCLEF 2025 competition, hosted on Kaggle as part of the LifeCLEF [14, 15] and FGVC workshops,
focused on few-shot classification of rare fungi species with multimodal data. Participants were asked
to build models to recognize species from very limited training examples (&lt;5) taken from
expertverified records in the Atlas of Danish Fungi and reflecting real-world biodiversity monitoring scenarios.
Submissions were CSV files matching observation IDs to predicted labels, and evaluation was based on
top-5 accuracy to account for the dificulty of distinguishing visually similar species.</p>
      <p>The challenging nature of the task is demonstrated by the results of the few-shot baselines presented
in Table 1, clearly showing the impact of the number of training samples: while for species with 4
training observations, the baselines achieve top-1 accuracies of 40% and 29%, respectively. For species
with one training observation, the performance drops dramatically to only 13% for both of the methods.</p>
      <sec id="sec-2-1">
        <title>2.1. Evaluation Protocol</title>
        <p>To account for the dificulty of distinguishing visually similar fungi species, we used top-5 accuracy as
the primary evaluation metric for the FungiCLEF 2025 challenge. It measures the percentage of test
samples where the correct label is among the model’s top-5 predicted labels. Top-5 accuracy allows to
account for reasonable alternatives, reflecting real-world scenarios where multiple plausible predictions
can be useful, e.g. as a shortlist for experts. More generally, the standard top- accuracy metric is
defined as:</p>
        <p>top- = 1 ∑︁ 1( ∈ ^ ),</p>
        <p>=1
where  is the total number of test samples,  is the true label for the -th sample, ^  is the set of top
 predicted labels for the -th sample, and 1(· ) is the indicator function, returning 1 if the condition is
true and 0 otherwise.</p>
        <p>While the top-5 accuracy determines the oficial leaderboard rankings, we also report top- accuracies
for a range of diferent cutof thresholds (  = 1, 2, . . . , 10) to provide a more detailed evaluation of
models’ performance.</p>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. Baselines</title>
        <p>We provided a Kaggle starter notebook that implements two few-shot classification baselines from [ 6]
with the objective to support participants and reduce entry barriers. The baselines rely on BioCLIP
[16] embeddings and FAISS [17] for eficient nearest-neighbor search. The notebook includes all core
components: data loading, pre-processing, feature extraction and storage, classification on top of
precomputed features and evaluation. It enables participants to run reproducible experiments and
extend the code with minimal computational and coding overhead. It encourages broad participation
across both machine learning and biodiversity research communities and beyond by ofering a functional
and modular starting point.</p>
        <p>The first baseline is a centroid-based classifier , which computes a prototype (mean embedding) for
each class using the BioCLIP representations of the training data. In inference, each query image is
embedded and then classified by assigning it to the nearest class centroid using cosine similarity.</p>
        <p>The second baseline is a nearest neighbors (NN) classifier using FAISS [ 17], an eficient similarity
search library optimized for high-dimensional vector spaces. Test samples are embedded and compared
against all training embeddings, and the label is assigned based on the label of the nearest neighbor.
FAISS accelerates the nearest neighbor search and thus supports scalable evaluation on larger datasets.</p>
      </sec>
      <sec id="sec-2-3">
        <title>2.3. Timeline</title>
      </sec>
      <sec id="sec-2-4">
        <title>2.4. Working Notes</title>
        <p>The FungiCLEF 2025 competition launched on March 7, 2025, around one week earlier than last year. It
was open to submissions for about 10 weeks until May 19. The competition was hosted on Kaggle and
promoted through LifeCLEF [14, 15] and FGVC, as well as on social media.</p>
        <p>Participants were strongly encouraged to submit both their code and a detailed technical report (Working
Notes) to ensure their results be fully reproducible. The working notes provide an in-depth analysis of the
techniques employed, including hyperparameter tuning, model ensembling, and loss function selection,
ofering valuable insights into the development of the method for fungal image classification. The
Working Notes underwent a thorough review; two experts with strong publication records in Computer
Vision and Machine Learning provided detailed feedback. The review process was primarily designed
to guarantee reproducibility and maintain quality standards. The review was single-blind, allowing
participants to respond with up to two rebuttals to address any concerns raised by the reviewers.
New in 2025: AI–Assisted Reviewing. To explore whether large language models can improve
review consistency and coverage, we introduced a third, automatic reviewer based on ChatGPT. Each
Working Note therefore received an additional LLM-generated review alongside the two expert reviews.
The ChatGPT review was produced with the prompt shown below, constraining the model to the
workshop’s emphasis on reproducibility and clarity.</p>
        <p>Prompt used for the ChatGPT reviewer:
You are acting as a peer reviewer for a scientific workshop in computer science:
LifeCLEF 2025,part of CLEF. The workshop focuses on biodiversity informatics challenges
involving machine learning, computer vision, and related techniques. The emphasis of the
submissions is on reproducibility, careful experimental evaluation, and thoughtful
analysis, rather than purely on novelty.</p>
        <p>Please carefully read the following paper submission and write a professional, constructive
review. Your review should include the following sections:
1. Summary:
* Briefly summarize the task, methods, datasets, and key findings.
* State clearly what problem the authors are addressing, which LifeCLEF challenge it
pertains to, and what their main contributions are.
2. Strengths: List the strong aspects of the paper, such as:
* Reproducibility (e.g., availability of code, data, clear methodology)
* Careful experimental design
* Well-performed ablation studies or error analyses
* Insightful discussions of results
* Clarity of writing and presentation
3. Weaknesses / Areas for Improvement: Identify any weaknesses or limitations, such as:
* Missing details that would prevent reproduction
* Lack of ablations or sensitivity analyses
* Incomplete or unclear description of the method
* Insufficient discussion or interpretation of the results
* Missing comparison to appropriate baselines
4. Detailed Comments:
* Provide actionable, constructive feedback that the authors can use to improve their paper.
* You may point out specific sections, figures, or tables that need clarification, expansion,
or correction.</p>
        <p>* Comment on both scientific and presentational aspects.
5. Overall Evaluation: Please provide your overall recommendation, choosing one of:</p>
        <p>* Strong Accept | Accept | Weak Accept | Borderline | Weak Reject | Reject | Strong Reject
Important Reviewing Guidelines:
* Focus on scientific rigor, reproducibility, and clarity rather than novelty alone.
* Do not hallucinate or infer information not present in the submission.
* Be neutral, unbiased, and professional.</p>
        <p>* If some required information is missing, state it explicitly.
chosen. The second and third place submissions, for example, would shift under top-1 or top-10 metrics.
This suggests that the choice of metric has a strong influence on leaderboard outcomes. Some teams
may have specifically optimized for top-5, while others aimed for more general performance across
diferent  values.</p>
        <p>Baseline Performance. We implemented two baselines, described in Subsection 2.2: a Centroid
prototype-based method and a Nearest Neighbor (NN) classifier, both based on image-level features only.
The Centroid baseline achieved an accuracy of 33.2% on the public test set and 26.7% on the private test
set. The NN baseline performed slightly worse, with 28.3% and 24.7% on the public and private test sets,
respectively. In comparison, the winning team’s approach outperformed the stronger Centroid baseline
by more than 52 percentage points.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>4. Participants and Methods</title>
      <p>Out of 74 participating teams, 53 outperformed both baselines, and six teams submitted working notes
detailing their approach, including the winning team and the runner-up. Each team’s method and its
performance on both public and private leaderboards are listed in Table 2. All working note submissions
utilize embeddings from strong pre-trained vision transformers, including DINOv2 [19], BEIT [20],
BioCLIP [16], SigLIP [21], and SAM [22]. While all teams use similar building blocks, their approaches
difer in the choice of the backbone, the way embeddings are used (e.g., projection heads, contrastive
losses), and whether or how additional modalities are incorporated.</p>
      <p>Notably, the top two teams, Jack Etheredge and hard_work, rely solely on the image modality,
outperforming the next-best working-note method by more than 20%. This highlights that there is
still significant headroom in optimizing image-based pipelines, particularly when using strong data
augmentations and ensemble strategies.</p>
      <p>Several teams show that when integrated efectively, multimodal information can bring substantial
improvements. For example, Yang Tuan Anh improved performance from 30% to 46.2% by incorporating
textual captions and metadata. I2C–UHU–Pegasus achieved a 7.5% gain from modeling ecological
context. In contrast, DS@GT LifeCLEF saw no improvement from metadata, indicating that the
efectiveness of multimodal integration depends heavily on implementation details.</p>
      <sec id="sec-3-1">
        <title>Rank</title>
        <p>Team
1 Jack Etheredge
2 hard_work
17 Embia
22 I2C-UHU-Pegasus
26 Yang Tuan Anh
35 DS@GT LifeCLEF
- baseline
- baseline</p>
      </sec>
      <sec id="sec-3-2">
        <title>Method core</title>
      </sec>
      <sec id="sec-3-3">
        <title>Aug+Proj+Ensemble</title>
      </sec>
      <sec id="sec-3-4">
        <title>Contrastive Transformer</title>
      </sec>
      <sec id="sec-3-5">
        <title>Context Fusion</title>
      </sec>
      <sec id="sec-3-6">
        <title>Multimodal Proto</title>
      </sec>
      <sec id="sec-3-7">
        <title>BioCLIP ProtoNet Transfer + Mixup</title>
        <p>centroid-based classifier
nearest neighbors</p>
        <sec id="sec-3-7-1">
          <title>4.1. Working Notes</title>
          <p>Team Jack Etheredge [23] (Top1) developed an ensemble of prototypical networks that integrates
embeddings from several pretrained models (e.g., SAM [22], BEIT [20], DINOv2 [19]). For each
image, multiple geometric augmentations are applied to increase variability, both during training
and testing. The embeddings from these models are concatenated and passed through a two-layer
projection network to enhance class separation. Predictions are made using cosine similarity between
observation-level and class-level prototype embeddings. An ensemble of multiple independently trained
pipelines ensures stable and robust performance. Test-time data augmentation (specifically five-crop,
horizontal flip, and 90 ° rotations) is a key component; removing it leads to a performance drop from 63%
to 37.3%. The two-layer projection network is also critical: without it, accuracy drops from 63% to 54.6%.
Optimal results require the use of both cross-entropy and InfoNCE losses [24], which together improve
the quality of the projected embeddings. Individually, BEIT and DINOv2 achieve top-1 accuracies in
the range of 58–60%, while SAM performs poorly at 11.7%; however, including SAM embeddings still
boosts ensemble performance, suggesting they provide valuable complementary information.
Team hard_work [25] (Top2) designed a supervised contrastive learning approach that combines
DINOv2 features with a customized Transformer model. The approach helped to create a rich feature
space that better captures subtle visual diferences between species, even with limited data. A specially
designed supervised contrastive loss helps the model learn more distinct class representations by
eficiently organizing positive and negative training samples. Experiments revealed that contrastive
learning benefits significantly from larger batch sizes, with a minimum of 256 needed for good
performance. While increasing the batch size beyond this point provides some improvement, the gains
are marginal.</p>
          <p>Team Embia [26] (Top17) focused on enhancing prototype quality for few-shot classification. They used
BioCLIP embeddings and prototypical networks, carefully constructing class prototypes by averaging
embeddings from both the training and validation datasets. This approach makes the prototypes
more representative, especially for species with few samples, resulting in over 30 percentage points
improvement in top-5 accuracy compared to the baseline. A comparison between domain-specific
BioCLIP and the general-purpose SigLIP embeddings shows a clear advantage for BioCLIP, with
accuracies of 61.1% and 53.5%, respectively. Further, extending the training set with validation images
increases performance from 61.1% to 64.2%.</p>
          <p>Team I2C-UHU-Pegasus [27] (Top22) developed a multimodal pipeline that integrates visual features
from fine-tuned BioCLIP and DINOv2 with structured textual descriptions, ecological metadata, and
hierarchical taxonomic information. They fine-tuned the BioCLIP model while keeping most of its
layers frozen, adding a specialized multimodal classifier that weighs visual and textual information.
Ecological context modeling significantly improved performance, and the ensemble combines multiple
sources of information for better rare species recognition. Ablation studies show that ecological context
delivers the largest single gain, adding 7.5% to top-1 accuracy. Addressing class imbalance contributes
a further 5.0%, while fine-tuning the visual backbone yields an additional 4.0%. A failure analysis
reveals a small set of species that are always misclassified (100% error), with most errors occurring
between taxa of the same genus or family. Finally, DINOv2-based ensemble experiments highlight a
domain shift: the ensemble lowers accuracy on the validation set but improves it on the hidden test set,
underscoring the challenges posed by evolving ecological data.</p>
          <p>Team Yang Tuan Anh [28] (Top26) introduced a multimodal few-shot learning system that merges
image embeddings from BioCLIP, SigLIP, and DINOv2 with metadata and automatically extracted
textual features. The training involves two stages: first, supervised pretraining of the multimodal
encoder; second, few-shot fine-tuning using a prototypical network under episodic training. They also
use an observation-level re-ranking method to consolidate predictions across multiple images of the
same observation. Compared to the image-only performance of 30%, adding textual captions improves
accuracy to 44.2% while including metadata boosts it to 45.8%. Using all three modalities together results
in 46.2% demonstrating their complementarity. The single-stage supervised pretrained model achieves
46.2% the two-stage fine-tuned model reaches 50.7% and combining both with observation-level
re-ranking leads to 55.5%.</p>
          <p>Team DS@GT LifeCLEF [29] (Top35) combine transformer-based embeddings (e.g., DINOv2, PlantCLEF
[30]), class-balanced sampling, Mixup, and a linear classifier. While generative AI and multi-objective
loss were explored, the best results were achieved with vision-only, domain-pretrained embeddings and
class-balanced training strategies. Interestingly, experiments with advanced models, such as Gemini,
Mistral, and ChatGPT, led to negative results. Their results show the dominance of tailored,
domainspecific solutions, where the proposed method with Mixup and class weighting scores 45.4% while the
best-performing generative model, Gemini 2.5 Flash, reaches a significantly lower accuracy of 13.6%.
OpenAI GPT-4.1 Mini and MistralAI Mistral Medium 3 lagged behind even further with 6.2% and 3.1%,
respectively.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>5. Conclusions</title>
      <p>The paper presented an overview and results evaluation of the 4th edition of the FungiCLEF 2025
challenge, organized in conjunction with the CLEF LifeCLEF lab [14] and CVPR-FGVC11 — The 11th
Workshop on Fine-Grained Visual Categorization, held within the CVPR conference. FungiCLEF 2025
built upon previous editions [10, 11, 12], with a continued focus on the challenging task of few-shot
fungi species recognition. In this year’s edition, Participants were tasked with classifying observations
of species with a limited number of training samples from the FungiTastic dataset using multimodal
inputs such as images, machine-generated captions, and environmental metadata.</p>
      <p>In contrast to previous years, the 2025 edition removed constraints on the model size and it reverted
to an open competition format. This change led to increased participation of more than 70 teams. It
sparked a diverse range of methodological innovations across vision-only, multimodal, and metric-based
approaches. Several top-performing teams demonstrated that well-optimized image-only pipelines can
still achieve state-of-the-art performance, while others explored the potential of structured multimodal
fusion strategies. The main outcomes from this year’s evaluation are:
• A highly-optimized vision-only method wins. The top-performing solutions in the challenge
demonstrate that carefully tuned strong vision-only pipelines, built on pretrained transformers
and enhanced with efective data augmentation and ensemble strategies, can outperform more
complex multimodal systems. For example, the winning solution combined embeddings from
several pretrained models with strong augmentations and a simple projection network.
• Multimodality is promising but tricky to get right. Teams that integrated metadata and text
saw modest to strong improvements, but only with carefully designed pipelines. Straightforward
fusion, even with strong general-purpose models, lagged far behind structured text prompts,
learned weighting between modalities, or explicit taxonomic hierarchies. Simply adding more
modalities is not enough.
• Prototypes over neighbours. Prototypical networks remained a popular choice, with teams
refining the basic prototype idea using multiple embeddings, reranking, or ensemble voting. Even
without gradient-based fine-tuning, class averaging plus a smart similarity function can go a long
way in these settings, and nearest prototypes were preferred over nearest-neighbor approaches.
• Contrastive learning is finally pulling its weight. Supervised contrastive loss showed clear
gains this year. Especially when using transformer heads or class-specific projection layers, this
training objective encouraged better separation in embedding space, crucial when fine-grained
diferences are subtle and data is sparse.
• Ensembles help, but at what cost? Top teams improved their performance by combining
predictions from multiple embedding models, data augmentation variants, or even entirely
separate pipelines. Ensembling helps to smooth noise and improve generalization, proving once
again to be a good strategy for a competition. However, these gains come with a steep price in
computational overhead, which, in practice, is often not worth it.
• Generative models? Not yet. Some teams tried large vision-language models for zero-shot
inference via prompting. While creative, the results fell far short of specialized models and more
handcrafted approaches. For now, it seems generative multimodal models still lack the resolution
and structure to handle fine-grained multimodal biological classification.
• Data imbalance. As in previous editions, rare species dominated the dataset. The teams reported
that performance dropped sharply for species with fewer observations. Mixup, class-aware
sampling, focal loss, and contrastive training all helped.</p>
      <p>Directions for future work. An important direction is to investigate whether the strong performance
of vision-only systems can be further enhanced through principled multimodal integration. This
includes exploring unified multimodal pretraining that jointly leverages visual, ecological, taxonomic,
and textual inputs. Robustness to domain shift remains a challenge. Promising future exploration
strategies include test-time adaptation, distribution-aware ensembling, and synthetic data augmentation
to better generalize across ecological and seasonal variation. Additionally, prototype refinement during
inference and uncertainty-guided active learning may support more reliable recognition of rare or
ambiguous species.</p>
    </sec>
    <sec id="sec-5">
      <title>6. Declaration on Generative AI</title>
      <p>During the preparation of this work, the authors used Grammarly for grammar and spelling checks
and ChatGPT for improving clarity and rewording sentences. After using this tool/service, the authors
reviewed and edited the content as needed and take full responsibility for the publication’s content.</p>
    </sec>
    <sec id="sec-6">
      <title>Acknowledgments</title>
      <p>This research was supported by the Technology Agency of the Czech Republic, project No. SS73020004.
We extend our sincere gratitude to the mycologists from the Danish Mycological Society, particularly
Jacob Heilmann-Clausen, Thomas Laessøe, Thomas Stjernegaard Jeppesen, Tobias Guldberg Frøslev,
Ulrik Søchting, and Jens Henrik Petersen, for their contributions and expertise. We also thank the
dedicated citizen scientists whose data and eforts have been instrumental to this competition.
the global diversity, distribution, and conservation of fungi, Annual review of Environment and
resources 48 (2023) 149–176.
[6] L. Picek, K. Janouskova, V. Cermak, J. Matas, Fungitastic: A multi-modal dataset and benchmark for
image categorization, in: Proceedings of the Computer Vision and Pattern Recognition Conference
(CVPR) Workshops, 2025, pp. 2046–2056.
[7] Y. Wang, Q. Yao, J. T. Kwok, L. M. Ni, Generalizing from a few examples: A survey on few-shot
learning, ACM computing surveys (csur) 53 (2020) 1–34.
[8] Y. Song, T. Wang, P. Cai, S. K. Mondal, J. P. Sahoo, A comprehensive survey of few-shot learning:</p>
      <p>Evolution, applications, challenges, and opportunities, ACM Computing Surveys 55 (2023) 1–40.
[9] L. Picek, M. Šulc, J. Matas, T. S. Jeppesen, J. Heilmann-Clausen, T. Laessøe, T. Frøslev, Danish
fungi 2020-not just another image recognition dataset, in: Proceedings of the IEEE/CVF Winter
Conference on Applications of Computer Vision, 2022, pp. 1525–1535.
[10] L. Picek, M. Šulc, J. Heilmann-Clausen, J. Matas, Overview of FungiCLEF 2022: Fungi recognition
as an open set classification problem, in: Working Notes of CLEF 2022 - Conference and Labs of
the Evaluation Forum, 2022.
[11] L. Picek, M. Šulc, J. Heilmann-Clausen, J. Matas, Overview of FungiCLEF 2023: Fungi recognition
beyond 0-1 cost, in: Working Notes of CLEF 2023 - Conference and Labs of the Evaluation Forum,
2023.
[12] L. Picek, M. Sulc, J. Matas, Overview of FungiCLEF 2024: Revisiting fungi species recognition
beyond 0-1 cost, in: Working Notes of CLEF 2024 - Conference and Labs of the Evaluation Forum,
2024.
[13] M. Lotfian, J. Ingensand, M. A. Brovelli, The partnership of citizen science and machine learning:
benefits, risks, and future challenges for engagement, data collection, and data quality,
Sustainability 13 (2021) 8087.
[14] L. Picek, S. Kahl, H. Goëau, L. Adam, T. Larcher, C. Leblanc, M. Servajean, K. Janoušková, J. Matas,
V. Čermák, K. Papafitsoros, R. Planqué, W.-P. Vellinga, H. Klinck, T. Denton, J. S. Cañas, G.
Martellucci, F. Vinatier, P. Bonnet, A. Joly, Overview of lifeclef 2025: Challenges on species presence
prediction and identification, and individual animal identification, in: International Conference of
the Cross-Language Evaluation Forum for European Languages (CLEF), Springer, 2025.
[15] A. Joly, L. Picek, S. Kahl, H. Goëau, L. Adam, C. Botella, M. Servajean, D. Marcos, C. Leblanc,
T. Larcher, et al., Lifeclef 2025 teaser: Challenges on species presence prediction and identification,
and individual animal identification, in: European Conference on Information Retrieval, Springer,
2025, pp. 373–381.
[16] S. Stevens, J. Wu, M. J. Thompson, E. G. Campolongo, C. H. Song, D. E. Carlyn, L. Dong, W. M.</p>
      <p>Dahdul, C. Stewart, T. Berger-Wolf, et al., Bioclip: A vision foundation model for the tree of life,
in: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2024, pp.
19412–19424.
[17] M. Douze, A. Guzhva, C. Deng, J. Johnson, G. Szilvasy, P.-E. Mazaré, M. Lomeli, L. Hosseini,</p>
      <p>H. Jégou, The faiss library (2024). arXiv:2401.08281.
[18] M. Deitke, C. Clark, S. Lee, R. Tripathi, Y. Yang, J. S. Park, M. Salehi, N. Muennighof, K. Lo,
L. Soldaini, et al., Molmo and pixmo: Open weights and open data for state-of-the-art
visionlanguage models, in: Proceedings of the Computer Vision and Pattern Recognition Conference,
2025, pp. 91–104.
[19] M. Oquab, T. Darcet, T. Moutakanni, H. Vo, M. Szafraniec, V. Khalidov, P. Fernandez, D. Haziza,
F. Massa, A. El-Nouby, et al., Dinov2: Learning robust visual features without supervision, arXiv
preprint arXiv:2304.07193 (2023).
[20] H. Bao, L. Dong, F. Wei, Beit: Bert pre-training of image transformers, arXiv preprint
arXiv:2106.08254 (2021).
[21] M. Tschannen, A. Gritsenko, X. Wang, M. F. Naeem, I. Alabdulmohsin, N. Parthasarathy, T. Evans,
L. Beyer, Y. Xia, B. Mustafa, et al., Siglip 2: Multilingual vision-language encoders with improved
semantic understanding, localization, and dense features, arXiv preprint arXiv:2502.14786 (2025).
[22] A. Kirillov, E. Mintun, N. Ravi, H. Mao, C. Rolland, L. Gustafson, T. Xiao, S. Whitehead, A. C. Berg,
W.-Y. Lo, et al., Segment anything, in: Proceedings of the IEEE/CVF International Conference on
Computer Vision, 2023, pp. 4015–4026.
[23] J. Etheredge, Few-shot fungi classification with prototypical networks using multiple pretrained
embedding models, in: Working Notes of CLEF 2025 - Conference and Labs of the Evaluation
Forum, 2025.
[24] A. v. d. Oord, Y. Li, O. Vinyals, Representation learning with contrastive predictive coding, arXiv
preprint arXiv:1807.03748 (2018).
[25] L. Lu, H. Yang, S. Li, F. Liu, P. Chen, W. Ma, Few-shot fine-grained classification of fungi species
using contrastive representation learning, in: Working Notes of CLEF 2025 - Conference and Labs
of the Evaluation Forum, 2025.
[26] A. Traore, Éric Hervet, A. Couturier, Improving fungi prototype representations for few-shot
classification, in: Working Notes of CLEF 2025 - Conference and Labs of the Evaluation Forum,
2025.
[27] F. C. García, V. P. Álvarez, J. M. Vázquez, M. G. García, I2c-uhu-pegasus at fungiclef 2025:
Multimodal pipeline for rare fungal species classification using fine-tuned vlms and ecological context,
in: Working Notes of CLEF 2025 - Conference and Labs of the Evaluation Forum, 2025.
[28] T.-A. Yang, M.-Q. Nguyen, Mushroom for improvement: Prototypical few-shot learning with
multimodal fungal features, in: Working Notes of CLEF 2025 - Conference and Labs of the
Evaluation Forum, 2025.
[29] J. K. Tam, M. Gustineli, A. Miyaguchi, Transfer learning and mixup for fine-grained few-shot
fungi classification, in: Working Notes of CLEF 2025 - Conference and Labs of the Evaluation
Forum, 2025.
[30] H. Goëau, J.-C. Lombardo, A. Afouard, V. Espitalier, P. Bonnet, A. Joly, PlantCLEF 2024 pretrained
models on the flora of the south western Europe based on a subset of Pl@ntNet collaborative
images and a ViT base patch 14 dinoV2, 2024. doi:10.5281/zenodo.10848263.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>D. L.</given-names>
            <surname>Hawksworth</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Lücking</surname>
          </string-name>
          ,
          <article-title>Fungal diversity revisited: 2.2 to 3.8 million species</article-title>
          ,
          <source>Microbiology spectrum 5</source>
          (
          <year>2017</year>
          )
          <fpage>10</fpage>
          -
          <lpage>1128</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>P.</given-names>
            <surname>Stephenson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. C.</given-names>
            <surname>Londoño-Murcia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. A.</given-names>
            <surname>Borges</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Claassens</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Frisch-Nwakanma</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Ling</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>McMullan-Fisher</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. J.</given-names>
            <surname>Meeuwig</surname>
          </string-name>
          ,
          <string-name>
            <surname>K. M. M. Unter</surname>
            ,
            <given-names>J. L.</given-names>
          </string-name>
          <string-name>
            <surname>Walls</surname>
          </string-name>
          , et al.,
          <article-title>Measuring the impact of conservation: The growing importance of monitoring fauna, flora and funga</article-title>
          ,
          <source>Diversity</source>
          <volume>14</volume>
          (
          <year>2022</year>
          )
          <fpage>824</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>L. A.</given-names>
            <surname>Lofgren</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. E.</given-names>
            <surname>Stajich</surname>
          </string-name>
          ,
          <article-title>Fungal biodiversity and conservation mycology in light of new technology, big data, and changing attitudes</article-title>
          ,
          <source>Current Biology</source>
          <volume>31</volume>
          (
          <year>2021</year>
          )
          <fpage>R1312</fpage>
          -
          <lpage>R1325</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>S. D.</given-names>
            <surname>Warnasuriya</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Udayanga</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. S.</given-names>
            <surname>Manamgoda</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Biles</surname>
          </string-name>
          ,
          <article-title>Fungi as environmental bioindicators</article-title>
          ,
          <source>Science of The Total Environment</source>
          <volume>892</volume>
          (
          <year>2023</year>
          )
          <fpage>164583</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>T.</given-names>
            <surname>Niskanen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Lücking</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Dahlberg</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Gaya</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L. M.</given-names>
            <surname>Suz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Mikryukov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Liimatainen</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Druzhinina</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. R.</given-names>
            <surname>Westrip</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G. M.</given-names>
            <surname>Mueller</surname>
          </string-name>
          , et al.,
          <source>Pushing the frontiers of biodiversity research: Unveiling</source>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>