<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>Madrid, Spain
$ lukas.picek@inria.fr (L. Picek); cesar.leblanc@inria.fr (C. Leblanc); alexis.joly@inria.fr (A. Joly)</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Overview of GeoLifeCLEF 2025: Plant Species Presence Prediction with Environmental and High-resolution Remote Sensing Data</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Lukas Picek</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>César Leblanc</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Théo Larcher</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Maximilien Servajean</string-name>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Pierre Bonnet</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Alexis Joly</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>AMAP, Univ Montpellier</institution>
          ,
          <addr-line>CIRAD, CNRS, INRAE, IRD, Montpellier</addr-line>
          ,
          <country country="FR">France</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Department of Cybernetics, FAV, University of West Bohemia in Pilsen</institution>
          ,
          <addr-line>Czechia</addr-line>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>INRIA, LIRMM, Univ Montpellier</institution>
          ,
          <addr-line>CNRS, Montpellier</addr-line>
          ,
          <country country="FR">France</country>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>LIRMM, AMIS, Univ Paul Valéry Montpellier, Univ Montpellier</institution>
          ,
          <addr-line>CNRS</addr-line>
          ,
          <country country="FR">France</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2025</year>
      </pub-date>
      <volume>000</volume>
      <fpage>0</fpage>
      <lpage>0002</lpage>
      <abstract>
        <p>GeoLifeCLEF 2025 competition, organized as part of the LifeCLEF and FGVC workshops, challenges participants to predict plant species composition at high spatial resolution across Europe using multimodal environmental data. The task builds on a large-scale dataset that combines 5 million Presence-Only (PO) observations and approximately 100,000 standardized Presence-Absence (PA) surveys, paired with Sentinel-2 imagery, Landsat time series, climate rasters, and soil descriptors. This year's edition introduced two major challenges: a geographically shifted test set with plots from previously unseen regions with diferent species distribution, thereby including many rare species that are under-reported by citizen scientists. These changes increased the modeling dificulty and emphasized the need for generalization under spatial shift and class imbalance. In this paper, we summarize the task design, dataset characteristics, evaluation protocol, participant approaches, and competition results, and discuss implications for scalable species distribution modeling and biodiversity monitoring.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;LifeCLEF</kwd>
        <kwd>biodiversity</kwd>
        <kwd>environmental data</kwd>
        <kwd>species distribution</kwd>
        <kwd>prediction</kwd>
        <kwd>evaluation</kwd>
        <kwd>benchmark</kwd>
        <kwd>methods comparison</kwd>
        <kwd>presence-only data</kwd>
        <kwd>presence-absence</kwd>
        <kwd>model performance</kwd>
        <kwd>remote sensing</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>Monitoring plant species distributions at high spatial resolution is essential for understanding ecosystem
dynamics and informing conservation eforts. However, collecting standardized species observations
over large areas remains resource-intensive and geographically constrained. Species distribution models
(SDMs) ofer a scalable solution by learning to predict species presence from a combination of species
occurrence data and environmental predictors. These occurrence data include Presence-Absence (PA)
records, which systematically document whether a species is detected or not at surveyed locations, and
Presence-Only (PO) records, which opportunistically record only where a species has been observed,
without information on absences.</p>
      <p>
        In recent years, deep learning–based SDMs (deep-SDMs) have demonstrated improved accuracy
by leveraging heterogeneous environmental data sources, including multi-spectral satellite imagery,
climatic time series, and edaphic (soil) variables [
        <xref ref-type="bibr" rid="ref1 ref2 ref3">1, 2, 3</xref>
        ]. Despite these advances, several challenges
remain. PO data (available at large scale from platforms such as Pl@ntNet and iNaturalist) are relatively
sparse, spatially extensive, and subject to sampling bias and annotation noise [
        <xref ref-type="bibr" rid="ref4 ref5 ref6">4, 5, 6</xref>
        ].
      </p>
      <p>
        In contrast, standardized PA data are less afected by labeling noise but are geographically
concentrated in a few well-sampled regions. Additionally, plant species distributions are highly imbalanced,
with most taxa being rare, making model training under limited supervision dificult [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. Finally,
environmental inputs vary widely in terms of resolution, format, and temporal depth, requiring models that
can integrate multi-source, multi-scale data.
      </p>
      <p>
        Despite these limitations, the increasing availability of multimodal environmental data and
largescale biodiversity observations opens opportunities to evaluate and improve SDMs in realistic settings.
To support this, the GeoLifeCLEF challenge [
        <xref ref-type="bibr" rid="ref10 ref11 ref12 ref13 ref8 ref9">8, 9, 10, 11, 12, 13</xref>
        ] was created as part of the LifeCLEF
[
        <xref ref-type="bibr" rid="ref14 ref15 ref16">14, 15, 16, 17</xref>
        ] and FGVC workshop series. Its objective is to benchmark SDMs under operational
constraints, such as label imbalance, biased sampling, and spatial generalization, while promoting
reproducible, scalable modeling approaches grounded in real ecological data.
      </p>
      <p>The 2025 edition continues this efort by focusing on multi-species prediction across geolocated
vegetation plots in Europe. Participants were tasked with predicting plant species composition using
high-resolution Sentinel-2 imagery, Landsat time series, climate variables, and edaphic predictors.
The training set combines approximately 90,000 PA surveys from the European Vegetation Archive
(EVA) [18] and over 5 million PO observations from GBIF. Each plot is represented by multimodal
environmental descriptors with variable spatial resolution, ranging from 10 m to 1 km.</p>
      <p>This year’s edition introduces two key challenges. First, the test set includes more than 14,000 test
vegetation plots primarily sampled from regions not represented in the PA training data, resulting in a
strong spatial distribution shift. Second, the label space includes a larger proportion of rare species,
increasing the dificulty of generalization under limited supervision. Together, these conditions reflect
real-world limitations of field survey coverage and taxonomic imbalance, making the task a more
realistic benchmark compared to prior SDM benchmarks that often rely on data with more uniform
geographic sampling and less representation of rare species.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Dataset and Evaluation Protocol</title>
      <p>The dataset for GeoLifeCLEF 2025 is built directly upon the GeoPlant dataset [19], which was used in the
2024 edition [20]. The training occurrence data remains the same and includes ∼ 5M PO observations
from GBIF and related repositories and ∼ 100K standardized PA surveys from EVA, covering roughly
10K species. The dataset continues to provide multimodal inputs, including: (i) Sentinel-2 image
patches (RGB+NIR, 64× 64, 10 meter resolution), (ii) Landsat-based satellite time series (6 spectral bands,
spanning 84 seasons from 2000–2020), (iii) Monthly climatic time series (CHELSA, 2000–2019), and
(iv) Raster-derived scalar and spatial predictors (e.g., elevation, land cover, soil, bioclimatic variables,
human footprint). However, several notable updates and improvements have been introduced:
1. A new set of significantly more detailed human footprint rasters was added, now at a
30meter resolution (compared to 1 km in previous editions). Derived from OpenStreetMap data
(2021), these updated rasters capture fine-grained features such as roads, railways, and built
environments with greater spatial and temporal precision (see Figure 1) than the previously used
global rasters (e.g., Venter et al. 2016 [21]).
and test sites from GeoLifeCLEF 2024 are primarily concentrated in Western and Central Europe, including
France, Denmark, Switzerland, Czechia, and Italy. In contrast, the new test sites in 2025 extend into previously
unseen regions, particularly in Eastern and Southeastern Europe, thereby introducing a significant spatial
distribution shift relative to the training data. Presence-Only (PO) training data spans the majority of habitable
Europe, providing broad spatial context.</p>
      <p>2. The last year test set was enriched with more than 9,000 surveys from new geographical
origins (i.e., eastern and northern Europe), allowing to test geospatial generalization (see Figure 2).
3. The SoilGrids data, which was incorrectly exported in the 2024 dataset, was corrected and
re-extracted. In the previous version. The land cover and soil features contained identical,
noninformative values due to a processing error. This significantly reduced their utility for species
prediction.
4. The Sentinel-2 satellite data underwent a major upgrade in format and processing. Instead
of the previously used compressed JPEG images, this year’s edition provides raw multi-band TIFF
ifles, significantly improving radiometric fidelity and spatial integrity for geospatial modeling.
These TIFFs include all four bands (RGB + NIR) at 10-meter resolution. Updated preprocessing
and normalization techniques were provided in oficial tutorial notebooks, enabling more accurate
and flexible use of the remote sensing inputs.</p>
      <sec id="sec-2-1">
        <title>2.1. Evaluation Metric</title>
        <p>
          As in the previous editions [
          <xref ref-type="bibr" rid="ref13">13, 20</xref>
          ], we use the sample-averaged F1-score (F1) as the main evaluation
metric. The F1 measures the degree of agreement between the predicted and actual species composition
observed within a specific geographical area and timeframe. In the context of ecological surveys, such
as those conducted in protected areas, each survey instance  is associated with a ground-truth set of
labels , representing the plant species found by experts within a defined grid. Given this setup, and
a list of predicted labels ̂︀,1, ̂︀,2, . . . , ̂︀, , the F1 can be computed by averaging the per-instance F1
scores over all samples. Let  denote the total number of evaluation samples, then the F1 is computed
as follows:
        </p>
        <p>F1 =
1 ∑︁</p>
        <p>2 · TP
 =1 2 · TP + FP + FN
,
⎧TP − correctly predicted, i.e., |̂︀ ∩ |.</p>
        <p>⎪
where ⎨FP − predicted but not observed, i.e., |̂︀ ∖ |.</p>
        <p>⎪⎩FN − not predicted but present, i.e., | ∖ ̂︀|.
(1)</p>
        <p>This formulation encapsulates the precision and recall elements crucial for assessing the accuracy of
predictive models in ecological studies.</p>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. Baselines</title>
        <p>This year, we provided the same set of baselines as in the 2024 edition, covering multiple modalities. All
baselines were trained exclusively on Presence–Absence (PA) data and released as executable Kaggle
notebooks, complete with training and inference code. The baselines include:
1. Naive frequency-based baselines. This model ranked species by their frequency in the PA
training data, either globally or within administrative or biogeographic regions, and it served
as a simple lower bound. While this approach achieved a sample-averaged F1 of 0.20 in 2024, it
performed poorly in 2025 (0.08), reflecting the impact of a shift in spatial distribution.
2. CNN for bioclimatic and Landsat time series. These baselines use 3D convolutional networks
derived from ResNet-18 [22] to process time series cubes: 19× 12× 4 for bioclimatic data and
21× 4× 6 for Landsat. They provide eficient, modality-specific baselines, with sample-averaged
F1 scores of 0.12779 (Bioclim) and 0.14415 (Landsat).
3. CNN for Sentinel-2 imagery. Unlike last year’s Swin-v2-t baseline, the 2025 edition uses a
lightweight ResNet-18 backbone to process Sentinel-2 patches (32× 32, 4 channels: R,G,B + NIR).
This change simplifies the model and aligns it with the other single-modality baselines. The
sample-averaged F1 score for this baseline was 0.12213.
4. Multimodal fusion model. A simple MLP-based model that combines the outputs of the three
ResNet-18 backbones (bioclimatic, Landsat, and Sentinel-2), illustrating the performance gains
from integrating multiple environmental data sources.</p>
        <p>The only baseline modification this year is related to Sentinel-2 preprocessing. A new notebook
was released for exploratory analysis and standardized normalization. It includes per-band statistics,
handling of missing pixels, and min–max scaling across all four channels. This preprocessing was also
provided in the updated Sentinel-2 baseline model and separated data processing notebook. For full
model architecture and training configurations, we refer the reader to the GeoLifeCLEF 2024 overview
paper [20].</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Competition Results</title>
      <p>GeoLifeCLEF 2025 drew 41 participating teams, submitting a total of 750 entries. The final leaderboard,
computed on approximately 77% of the test set, revealed a substantial drop in absolute performance
compared to the last year. Of all the teams, just 17 teams outperformed the best provided baselines on
the private leaderboard. The top-performing team, webmaking [23], achieved an F1 score of 0.2302,
followed by PredComX [24] (0.2215) and Miss Qiu [25] (0.2169). The overall performance of the top 25
teams is visualized in Figure 3.</p>
      <p>In comparison, the best-performing team in 2024 (also webmaking) achieved a much higher F1 score
of 0.4089, with over 20 teams surpassing the 0.30 mark on the final leaderboard. In 2025, however, no
team exceeded an F1 of 0.24, reflecting a considerably more challenging evaluation scenario. Several
factors likely contributed to the overall lower scores, but we attribute the largest impact to the expanded
and more ecologically diverse test set, which significantly increased the need for model generalization.
Unlike the 2024 edition, where the test samples were geographically closer to the training data and
many teams relied primarily on PA data, this year’s setup required efective use of PO data to succeed,
a task that remains dificult due to its inherent biases and lack of negative labels. Overall, while the
absolute performance dropped, the technical quality and competitiveness remained high. The challenge
successfully pushed participants to develop more generalizable, scalable, and multimodal solutions.
Further technical details are available below.</p>
    </sec>
    <sec id="sec-4">
      <title>4. Participant’s Methods</title>
      <p>Out of 41 teams that participated in the GeoLifeCLEF 2025 challenge, 5 submitted working notes
reports for peer review. The submitted approaches reflect a diverse set of strategies, including
multimodal fusion architectures, rare species handling, spatial post-processing, ensemble learning, and
confidence-based filtering. Many top-performing solutions are built upon the baseline models while
incorporating additional mechanisms to address class imbalance and spatial shift. Below, we summarize
the core techniques used by the participants who submitted their working notes. Full implementation
details are available in the respective working notes [23, 24, 25, 26, 27].</p>
      <p>Team webmaking [23] (Top1) developed a four-component ensemble designed to address the strong
class imbalance and spatial shift present in the test data. The approach integrated (i) a multimodal MLP-R
+ ResNet-18 + EficientNet-B4 classifier trained on all species, (ii) a rare-species version of the same
classifier trained only on infrequent taxa, and (iii) a GeoCLIP model [ 28] leveraging satellite imagery
and metadata. These classifiers were combined with a CatBoost [ 29] regressor predicting the number of
plant species per location. Spatially-aware post-processing using Jaccard-based similarity on a 0.1° grid
further improved predictions. The final ensemble, which combined all three classifiers and applied
multiple filters, achieved an F 1 score of 0.2302 and 0.2714 on the private and public leaderboards, respectively.
Team PredComX [24] (Top2) introduced a hybrid framework integrating Joint Species Distribution
Modeling (JSDM) and deep learning. A ResNet-based deep-SDM extracted features from remote
sensing inputs, which were then used to train a Hierarchical Model of Species Communities (HMSC)
that modeled interspecies correlations and accounted for study design structure. Their final ensemble
included three models: a pure deep-SDM, a pure JSDM, and a combined MLP+HMSC model using
features from the deep-SDMs as input to the hierarchical JSDM. The method achieved strong spatial
generalization and interpretability, securing second place with an F1 score of 0.2215.
Team Miss Qiu [25] (Top3): This team proposed Tighnari v2, an improved multimodal framework based
on their solution of the previous edition of the challenge [30]. Their approach addresses label noise in
PO data through a novel pseudo-label aggregation strategy and mitigates geographic distribution shifts
using a mixture-of-experts inference scheme. The model integrates satellite imagery, temporal data,
and tabular features via a stackable tri-modal cross-attention module, and employs asymmetric loss
[31] to handle class imbalance. Their solution, which achieved 3rd place in this year’s edition of the
challenge with a F1 of 0.218 on the private test set, also outperformed the 2nd-place score from 2024 [20].
Team Lonan Syayf [26] (Top6): This participant proposed a multimodal deep learning approach based
on three separate Swin-T transformer encoders [32], each specialized for a diferent input modality:
Sentinel-2 imagery, Landsat time series, and bioclimatic rasters. The modality-specific features were
projected, concatenated, and passed through an MLP for multi-label species presence prediction [33].
They filtered the label space to plant species with at least 5 PA occurrences and applied a hybrid
inference strategy combining a tuned probability threshold (0.18) with a fallback minimum of 14
predictions per site. Their model, trained exclusively on PA data, achieved an F1 of 0.192 on the private
test set.</p>
      <p>Team BernGron [27] (Top8): This team combined the PO and PA data in a two-stage deep learning
pipeline. They first pre-trained a ResNet18 model [ 22] on the PO observations to learn general
environmental patterns and then fine-tuned it on the PA records for more accurate absence modeling.
They tested this strategy across three environmental data modalities: Sentinel-2 imagery, Landsat time
series, and bioclimatic variables. Their approach showed that PO-based pretraining improved predictive
performance of PA-only baselines, with 7% absolute gains in F1 (0.173 on the private test set). They also
performed spatial bias analyses using Jensen–Shannon divergence [34] and permutation tests.</p>
    </sec>
    <sec id="sec-5">
      <title>5. Discussion and Conclusion</title>
      <p>This paper presented an overview and evaluation of the GeoLifeCLEF 2025 challenge, hosted within the
LifeCLEF [35, 36] and FGVC12 workshops. Building on previous editions, this year’s edition kept its
focus on large-scale species distribution modeling using multimodal remote sensing and environmental
data. Participants were tasked to predict the presence of plant assemblages at geolocated survey sites
using satellite imagery, climatic time series, and tabular environmental descriptors.</p>
      <p>GeoLifeCLEF 2025 introduced two significant changes to the task formulation. First, the geographic
distribution of the test data was explicitly shifted relative to the training set, emphasizing the need for
spatial generalization. Second, the evaluation placed increased weight on detecting rare taxa, many of
which had few training observations or were restricted to novel biogeographic regions. These changes
increased the modeling complexity compared to previous years, forcing participants to adopt new
strategies for spatial extrapolation, confidence calibration, and predicting rare species.</p>
      <p>Despite the increased dificulty, participation remained high, with over 40 teams submitting solutions
during the competition. A wide variety of modeling pipelines were explored, ranging from multimodal
transformer-based architectures and ensemble learning to ecological modeling frameworks such as
Joint Species Distribution Models (JSDMs). The main technical outcomes of the challenge are as follows:
• Generalization across space is a primary limitation for current SDMs. The introduction
of geographically shifted test data revealed substantial performance degradation in models that
lacked spatial understanding. Approaches that explicitly accounted for location, through either
pre-processing, architecture, or inference, were more robust to bigger geographic shifts.
• Multimodal data integration improves accuracy. Consistent with previous editions, models
that used more than one environmental modality, e.g., remote sensing imagery, climate time
series, and topographic or land-use variables, outperformed single-source baselines. The challenge
confirms the utility of combining complementary data types to model ecological patterns.
• Ensemble methods provide a practical way to improve performance. Many top-performing
solutions combined multiple specialized models to capture diferent aspects of the prediction
task. Ensembles helped mitigate overfitting, balance predictions for general and rare species, and
smooth uncertainty under distributional shifts.
• Data quality and annotation type still dictate model performance. Methods trained
solely on Presence-Absence data consistently outperformed those relying only on Presence-Only
data. Nevertheless, the selective use of PO data, e.g., for pretraining or pseudo-labeling, proved
beneficial when handled carefully.
• Ecologically informed modeling is gaining prominence. Some participants incorporated
principles from community ecology and biogeography, such as species co-occurrence structure
and region-specific species pools. These approaches showed promising results and reflect a shift
toward more interpretable, hypothesis-driven models in the competition setting.
• Baselines and community engagement accelerate progress and improve performance.</p>
      <p>The continued development of strong open-source baselines and active discourse through the
competition platform (Kaggle) enabled participants to iterate quickly, test new hypotheses,
and contribute improvements, highlighting the importance of open benchmarking ecosystems
in ecological machine learning. Participants extended the baselines by experimenting with
alternative architectures, data augmentations, and fusion strategies, demonstrating how shared
starting points can accelerate progress and improve overall performance.</p>
      <p>Future Directions. While the increasing scale and complexity of the GeoLifeCLEF dataset unlock
new research frontiers, it also raises barriers to entry and experimentation. Future editions could
explore more modular and accessible task designs, such as regional tracks, taxon-specific subtasks,
or single-modality-focused challenges, to maintain broad participation. At the same time, several
research directions remain underexplored. First, the development of uncertainty-aware models capable
of expressing epistemic uncertainty under geographic or temporal shift would improve both robustness
and interpretability. Second, supporting hierarchical taxonomic prediction (e.g., genus-level fallback)
could improve performance on rare species. Third, the integration of foundation models trained on
environmental data (e.g., SatCLIP [37], BioCLIP [38], GeoCLIP [28]) may ofer substantial gains in
representation quality. Finally, incorporating ecological priors and spatial constraints, such as species
pool filtering or dispersal limitations, could promote more biologically grounded and generalizable
model behavior.</p>
    </sec>
    <sec id="sec-6">
      <title>6. Declaration on Generative AI</title>
      <p>During the preparation of this work, the authors used Grammarly for grammar and spelling checks
and ChatGPT for improving clarity and rewording sentences. After using this tool/service, the authors
reviewed and edited the content as needed and take full responsibility for the publication’s content.</p>
    </sec>
    <sec id="sec-7">
      <title>Acknowledgement</title>
      <p>The research described in this paper was funded by the European Commission via the MAMBO
(http:doi.org/10.3030/101060639) and GUARDEN (http:doi.org/10.3030/101060693) projects, which have
received funding from the European Union’s Horizon Europe research and innovation program under
grant agreements 101060693 and 101060639.
[17] A. Joly, L. Picek, S. Kahl, H. Goëau, V. Espitalier, C. Botella, B. Deneu, D. Marcos, J. Estopinan,
C. Leblanc, T. Larcher, M. Šulc, M. Hrúz, M. Servajean, et al., Overview of LifeCLEF 2024:
Challenges on species distribution prediction and identification, in: International Conference of
the Cross-Language Evaluation Forum for European Languages, Springer, 2024.
[18] M. Chytry`, S. M. Hennekens, B. Jiménez-Alfaro, I. Knollová, J. Dengler, F. Jansen, F. Landucci, J. H.</p>
      <p>Schaminée, S. Aćić, E. Agrillo, et al., European vegetation archive (eva): an integrated database of
european vegetation plots, Applied vegetation science 19 (2016) 173–180.
[19] L. Picek, C. Botella, M. Servajean, C. Leblanc, R. Palard, T. Larcher, B. Deneu, D. Marcos, P. Bonnet,
a. joly, Geoplant: Spatial plant species prediction dataset, in: A. Globerson, L. Mackey, D. Belgrave,
A. Fan, U. Paquet, J. Tomczak, C. Zhang (Eds.), Advances in Neural Information Processing Systems,
volume 37, Curran Associates, Inc., 2024, pp. 126653–126676.
[20] L. Picek, C. Botella, M. Servajean, C. Leblanc, R. Palard, T. Larcher, B. Deneu, D. Marcos, J. Estopinan,
P. Bonnet, et al., Overview of geolifeclef 2024: Species composition prediction with high spatial
resolution at continental scale using remote sensing, in: CLEF 2024-Working Notes of the 25th
Conference and Labs of the Evaluation Forum, 186, CEUR, 2024, pp. 1966–1977.
[21] O. Venter, E. W. Sanderson, A. Magrach, J. R. Allan, J. Beher, K. R. Jones, H. P. Possingham, W. F.</p>
      <p>Laurance, P. Wood, B. M. Fekete, et al., Global terrestrial human footprint maps for 1993 and 2009,
Scientific data 3 (2016) 1–10.
[22] K. He, X. Zhang, S. Ren, J. Sun, Deep residual learning for image recognition, in: Proceedings of
the IEEE conference on computer vision and pattern recognition, 2016, pp. 770–778.
[23] N. Semenova, Addressing class imbalance and spatial shift in geolifeclef 2025, in: Working Notes
of CLEF 2025 - Conference and Labs of the Evaluation Forum, 2025.
[24] G. Tikhonov, D. Tikhonov, Synthesizing joint and deep species distribution modeling to enhance
spatial prediction of plant communities at continental scale, in: Working Notes of CLEF 2025
Conference and Labs of the Evaluation Forum, 2025.
[25] H. Liu, Y. Wang, C. Shi, T. Xu, H. Xing, Tighnari v2: Mitigating label noise and distribution shift
in multimodal plant distribution prediction via mixture of experts, in: Working Notes of CLEF
2025 - Conference and Labs of the Evaluation Forum, 2025.
[26] A. Syayfetdinov, Swin-t based multimodal networks for geolifeclef 2025, in: Working Notes of</p>
      <p>CLEF 2025 - Conference and Labs of the Evaluation Forum, 2025.
[27] D. Rawlings, T. Chopard, Enhancing presence-absence identification models using presence-only
data, in: Working Notes of CLEF 2025 - Conference and Labs of the Evaluation Forum, 2025.
[28] V. Vivanco Cepeda, G. K. Nayak, M. Shah, Geoclip: Clip-inspired alignment between locations
and images for efective worldwide geo-localization, Advances in Neural Information Processing
Systems 36 (2023) 8690–8701.
[29] L. Prokhorenkova, G. Gusev, A. Vorobev, A. V. Dorogush, A. Gulin, Catboost: unbiased boosting
with categorical features, Advances in neural information processing systems 31 (2018).
[30] H. Liu, Z. Tao, P. Jiang, Q. Sun, M. Wan, Tighnari: Multi-modal plant species prediction based
on hierarchical cross-attention using graph-based and vision backbone-extracted features, in:
Working Notes of CLEF 2024 - Conference and Labs of the Evaluation Forum, 2024.
[31] T. Ridnik, E. Ben-Baruch, N. Zamir, A. Noy, I. Friedman, M. Protter, L. Zelnik-Manor, Asymmetric
loss for multi-label classification, in: Proceedings of the IEEE/CVF international conference on
computer vision, 2021, pp. 82–91.
[32] Z. Liu, Y. Lin, Y. Cao, H. Hu, Y. Wei, Z. Zhang, S. Lin, B. Guo, Swin transformer: Hierarchical vision
transformer using shifted windows, in: Proceedings of the IEEE/CVF international conference on
computer vision, 2021, pp. 10012–10022.
[33] C. Leblanc, A. Joly, T. Lorieul, M. Servajean, P. Bonnet, Species distribution modeling based
on aerial images and environmental features with convolutional neural networks, in: CLEF
2022 Working Notes-23rd Conference and Labs of the Evaluation Forum, volume 3180, 2022, pp.
2123–2150.
[34] J. Lin, Divergence measures based on the shannon entropy, IEEE Transactions on Information
theory 37 (2002) 145–151.
[35] L. Picek, S. Kahl, H. Goëau, L. Adam, T. Larcher, C. Leblanc, M. Servajean, K. Janoušková, J. Matas,
V. Čermák, K. Papafitsoros, R. Planqué, W.-P. Vellinga, H. Klinck, T. Denton, J. S. Cañas, G.
Martellucci, F. Vinatier, P. Bonnet, A. Joly, Overview of lifeclef 2025: Challenges on species presence
prediction and identification, and individual animal identification, in: International Conference of
the Cross-Language Evaluation Forum for European Languages (CLEF), Springer, 2025.
[36] A. Joly, L. Picek, S. Kahl, H. Goëau, L. Adam, C. Botella, M. Servajean, D. Marcos, C. Leblanc,
T. Larcher, et al., Lifeclef 2025 teaser: Challenges on species presence prediction and identification,
and individual animal identification, in: European Conference on Information Retrieval, Springer,
2025, pp. 373–381.
[37] K. Klemmer, E. Rolf, C. Robinson, L. Mackey, M. Rußwurm, Satclip: Global, general-purpose
location embeddings with satellite imagery, in: Proceedings of the AAAI Conference on Artificial
Intelligence, volume 39, 2025, pp. 4347–4355.
[38] S. Stevens, J. Wu, M. J. Thompson, E. G. Campolongo, C. H. Song, D. E. Carlyn, L. Dong, W. M.</p>
      <p>Dahdul, C. Stewart, T. Berger-Wolf, et al., Bioclip: A vision foundation model for the tree of life,
in: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2024, pp.
19412–19424.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>C.</given-names>
            <surname>Botella</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Joly</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Bonnet</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Monestiez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Munoz</surname>
          </string-name>
          ,
          <article-title>A deep learning approach to species distribution modelling, Multimedia Tools and Applications for Environmental</article-title>
          &amp; Biodiversity
          <string-name>
            <surname>Informatics</surname>
          </string-name>
          (
          <year>2018</year>
          )
          <fpage>169</fpage>
          -
          <lpage>199</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>B.</given-names>
            <surname>Deneu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Joly</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Bonnet</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Servajean</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Munoz</surname>
          </string-name>
          ,
          <article-title>Very high resolution species distribution modeling based on remote sensing imagery: how to capture fine-grained and large-scale vegetation ecology with convolutional neural networks?</article-title>
          ,
          <source>Frontiers in plant science 13</source>
          (
          <year>2022</year>
          )
          <fpage>839279</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>J.</given-names>
            <surname>Estopinan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Servajean</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Bonnet</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Munoz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Joly</surname>
          </string-name>
          ,
          <article-title>Deep species distribution modeling from sentinel-2 image time-series: a global scale analysis on the orchid family</article-title>
          ,
          <source>Frontiers in Plant Science</source>
          <volume>13</volume>
          (
          <year>2022</year>
          )
          <fpage>839327</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>E. H.</given-names>
            <surname>Boakes</surname>
          </string-name>
          ,
          <string-name>
            <surname>P. J. McGowan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. A.</given-names>
            <surname>Fuller</surname>
          </string-name>
          ,
          <string-name>
            <surname>D.</surname>
            Chang-qing,
            <given-names>N. E.</given-names>
          </string-name>
          <string-name>
            <surname>Clark</surname>
          </string-name>
          ,
          <string-name>
            <surname>K. O'Connor</surname>
            ,
            <given-names>G. M.</given-names>
          </string-name>
          <string-name>
            <surname>Mace</surname>
          </string-name>
          ,
          <article-title>Distorted views of biodiversity: spatial and temporal bias in species occurrence data</article-title>
          ,
          <source>PLoS biology 8</source>
          (
          <year>2010</year>
          )
          <article-title>e1000385</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>N. J.</given-names>
            <surname>Isaac</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. J.</given-names>
            <surname>Pocock</surname>
          </string-name>
          ,
          <article-title>Bias and information in biological records</article-title>
          ,
          <source>Biological Journal of the Linnean Society</source>
          <volume>115</volume>
          (
          <year>2015</year>
          )
          <fpage>522</fpage>
          -
          <lpage>531</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>T.</given-names>
            <surname>Mesaglio</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. T.</given-names>
            <surname>Callaghan</surname>
          </string-name>
          ,
          <article-title>An overview of the history, current contributions and future outlook of inaturalist in australia</article-title>
          ,
          <source>Wildlife Research</source>
          <volume>48</volume>
          (
          <year>2021</year>
          )
          <fpage>289</fpage>
          -
          <lpage>303</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>C.</given-names>
            <surname>Garcin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Joly</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Bonnet</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.-C.</given-names>
            <surname>Lombardo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Afouard</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Chouet</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Servajean</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Lorieul</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Salmon</surname>
          </string-name>
          , Pl@
          <fpage>ntnet</fpage>
          -
          <lpage>300k</lpage>
          :
          <article-title>a plant image dataset with high label ambiguity and a long-tailed distribution</article-title>
          ,
          <source>in: NeurIPS 2021-35th Conference on Neural Information Processing Systems</source>
          ,
          <year>2021</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>B.</given-names>
            <surname>Deneu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Lorieul</surname>
          </string-name>
          , E. Cole,
          <string-name>
            <given-names>M.</given-names>
            <surname>Servajean</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Botella</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Bonnet</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Joly</surname>
          </string-name>
          ,
          <article-title>Overview of lifeclef location-based species prediction task 2020 (geolifeclef</article-title>
          ),
          <source>CEUR-WS</source>
          ,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>C.</given-names>
            <surname>Botella</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Bonnet</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Joly</surname>
          </string-name>
          , Overview of GeoLifeCLEF 2018:
          <article-title>location-based species recommendation</article-title>
          ,
          <source>in: CLEF task overview</source>
          <year>2018</year>
          ,
          <article-title>CLEF: Conference and Labs of the Evaluation Forum</article-title>
          , Sep.
          <year>2018</year>
          , Avignon, France.,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>T.</given-names>
            <surname>Lorieul</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Cole</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Deneu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Servajean</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Joly</surname>
          </string-name>
          , Overview of GeoLifeCLEF 2021:
          <article-title>Predicting species distribution from 2 million remote sensing images</article-title>
          ,
          <source>in: Working Notes of CLEF 2021 - Conference and Labs of the Evaluation Forum</source>
          ,
          <year>2021</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>T.</given-names>
            <surname>Lorieul</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Cole</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Deneu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Servajean</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Joly</surname>
          </string-name>
          , Overview of GeoLifeCLEF 2022:
          <article-title>Predicting species presence from multi-modal remote sensing, bioclimatic and pedologic data</article-title>
          ,
          <source>in: Working Notes of CLEF 2022 - Conference and Labs of the Evaluation Forum</source>
          ,
          <year>2022</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>C.</given-names>
            <surname>Botella</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Servajean</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Bonnet</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Joly</surname>
          </string-name>
          , Overview of GeoLifeCLEF 2019:
          <article-title>plant species prediction using environment and animal occurrences</article-title>
          ,
          <source>CLEF: Conference and Labs of the Evaluation Forum</source>
          (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>C.</given-names>
            <surname>Botella</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Deneu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Marcos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Servajean</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Larcher</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Leblanc</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Estopinan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Bonnet</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Joly</surname>
          </string-name>
          , Overview of geolifeclef 2023:
          <article-title>Species composition prediction with high spatial resolution at continental scale using remote sensing</article-title>
          ,
          <source>in: CLEF 2023 Working Notes-24th Conference and Labs of the Evaluation Forum</source>
          , volume
          <volume>3497</volume>
          ,
          <year>2023</year>
          , pp.
          <fpage>1954</fpage>
          -
          <lpage>1971</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>A.</given-names>
            <surname>Joly</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Goëau</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Glotin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Spampinato</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Bonnet</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.-P.</given-names>
            <surname>Vellinga</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Planque</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Rauber</surname>
          </string-name>
          , B. Fisher, H. Müller,
          <string-name>
            <surname>LifeCLEF</surname>
          </string-name>
          <year>2014</year>
          :
          <article-title>Multimedia Life Species Identification Challenges</article-title>
          , in: CLEF:
          <string-name>
            <surname>Cross-Language Evaluation</surname>
          </string-name>
          Forum, number 8685 in Information Access Evaluation. Multilinguality, Multimodality, and Interaction, Springer International Publishing, Shefield, UK,
          <year>2014</year>
          , pp.
          <fpage>229</fpage>
          -
          <lpage>249</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>A.</given-names>
            <surname>Joly</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Goëau</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Glotin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Spampinato</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Bonnet</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.-P.</given-names>
            <surname>Vellinga</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.-C.</given-names>
            <surname>Lombardo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Planqué</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Palazzo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Müller</surname>
          </string-name>
          ,
          <article-title>Lifeclef 2017 lab overview: multimedia species identification challenges</article-title>
          ,
          <source>in: Experimental IR Meets Multilinguality, Multimodality, and Interaction: 8th International Conference of the CLEF Association, CLEF</source>
          <year>2017</year>
          , Dublin, Ireland,
          <source>September 11-14</source>
          ,
          <year>2017</year>
          , Proceedings 8, Springer,
          <year>2017</year>
          , pp.
          <fpage>255</fpage>
          -
          <lpage>274</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>A.</given-names>
            <surname>Joly</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Goëau</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Kahl</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Picek</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Lorieul</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Cole</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Deneu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Servajean</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Durso</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Glotin</surname>
          </string-name>
          , et al.,
          <source>Overview of lifeclef</source>
          <year>2022</year>
          :
          <article-title>an evaluation of machine-learning based species identification and species distribution prediction</article-title>
          ,
          <source>in: International Conference of the Cross-Language Evaluation Forum for European Languages</source>
          , Springer,
          <year>2022</year>
          , pp.
          <fpage>257</fpage>
          -
          <lpage>285</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>