<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Enhancing Presence-Absence Prediction Models Using Presence-Only Data</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Tim Chopard</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Darren Rawlings</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>University of Groningen</institution>
          ,
          <addr-line>Broerstraat 5, 9712 CP Groningen</addr-line>
          ,
          <country country="NL">The Netherlands</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>University of Leeds</institution>
          ,
          <addr-line>Woodhouse, Leeds, LS2 9JT</addr-line>
          ,
          <country country="UK">UK</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Accurate Species Distribution Modelling (SDM) is essential for biodiversity conservation, however the limited and spatially biased nature of Presence-Absence (PA) data poses a challenge. In contrast, Presence-Only (PO) datasets are abundant but lack explicit absence records. This paper examines a two step deep learning approach to combining both PO and PA data to generate an SDM. In the first step, the model was trained on a larger PO dataset, and in the second the model was then tuned on a smaller PA dataset. Results indicate that pre-training with PO data improved the performance by 7% when subsequently fine-tuned with PA data, as measured by the samples-averaged F1-score. This approach demonstrates the potential of combining diverse data types to create more reliable species distribution models for plant biodiversity conservation.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Species Distribution Modelling</kwd>
        <kwd>Presence-Only Data</kwd>
        <kwd>Presence-Absence Data</kwd>
        <kwd>Environmental Predictors</kwd>
        <kwd>Climatic Data</kwd>
        <kwd>Biodiversity Conservation</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Understanding the complex patterns of plant species distribution can help in managing and protecting
species, that are rare [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], climatically sensitive [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], or economically important [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. Species Distribution
Models (SDMs) are used to predict likely locations for these plants. These models function by correlating
known species occurrences with environmental factors such as climate, terrain, and land cover [
        <xref ref-type="bibr" rid="ref4 ref5">4, 5</xref>
        ].
Through understanding these relationships, it is possible to predict the full occurrence rates of a species,
even in areas that have not been surveyed.
      </p>
      <p>Deep learning techniques, especially Convolutional Neural Networks (CNNs), have become popular
for this purpose. CNNs are particularly good at identifying complex patterns in large, high-dimensional
environmental data.</p>
      <p>
        These models typically rely on two kinds of data. The first is Presence-Only (PO) data, which is
widely available from sources such as citizen science projects and museum collections. However, this
data only tells us where a species has been found, not where it’s absent, and it can be geographically
skewed [
        <xref ref-type="bibr" rid="ref6 ref7">6, 7</xref>
        ]. The second is Presence-Absence (PA) data, gathered from systematic surveys. While PA
data provides reliable information on both where a species is present and where it is truly absent, it is
much less common and covers smaller areas. The geographical distributions of the PA and PO data
used in this paper are shown in Figure 1.
      </p>
      <p>
        Through the combination of both PA and PO data more powerful models may be developed, namely
Integrated Distribution Models. Development of these models is, however, both technically and
computationally complex, and often requires notable amounts of time and resources [
        <xref ref-type="bibr" rid="ref7 ref8">7, 8</xref>
        ].
      </p>
      <p>
        CNNs are a natural fit for SDMs because they excel at analyzing high-resolution data from remote
sensing. Architectures such as ResNet have been used efectively in modeling species distribution, and
served as key benchmarks in previous GeoLifeCLEF labs [
        <xref ref-type="bibr" rid="ref10 ref9">9, 10</xref>
        ].
      </p>
      <p>
        In previous GeoLifeCLEF labs multi-modal models, which include more than one predictor per
model, have been shown to perform well on similar tasks [
        <xref ref-type="bibr" rid="ref11 ref12">11, 12</xref>
        ]. This paper, however, focuses on the
utilization PO and PA data, rather than producing the absolute highest performing solution. In this
research, single modal models are used and tested against each other.
      </p>
      <sec id="sec-1-1">
        <title>1.1. Integrating PO Pretraining with PA Fine-Tuning</title>
        <p>
          To take advantage of both Presence-Only (PO) and Presence-Absence (PA) data, a two-step deep learning
approach was used, with the goal of improving on previous models that exclusively used PA data [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ].
First, a ResNet18 model was trained on a large subset of the PO data, allowing it to learn general features
about the environment. In the second step, the model is fine-tuned using the more complete PA data,
ensuring more accurate identification of places in which a species is truly absent. The goal of this
transfer learning strategy is to create a more reliable model that benefits from the larger availability of
PO data as well as the accuracy of PA data.
        </p>
        <p>
          This approach was tested with the GeoLifeCLEF 2025 dataset [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ], which provides over five million
PO records and ninety thousand PA survey sites in Europe, along with environmental data such as
satellite imagery and climate trends. The ResNet18 architecture was chosen because of its previous
success in similar tasks.
        </p>
        <p>Ultimately, this research investigates whether pre-training a model on PO data can improve predictions
when it is later trained on PA data. This paper demonstrates that this combined method will produce
more accurate results than using PA data by itself.</p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>2. Data</title>
      <sec id="sec-2-1">
        <title>2.1. Summary</title>
        <p>
          The data for this study [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ] was sourced from the GeoLifeCLEF 2025 competition [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ]. These datasets
comprised both tabular and image data.
        </p>
        <p>The provided data contained two main types of species information: approximately 5 million
PresenceOnly (PO) sightings and 90,000 Presence-Absence (PA) survey records, all linked to specific GPS
coordinates and survey ID values.</p>
        <p>Each record is paired with a comprehensive set of environmental variables, including:
• Satellite imagery (from Sentinel-2 and Landsat)
• Climate data (CHELSA time series and bioclimatic variables)
• Land cover maps
• Human footprint indexes
• Soil data</p>
        <p>The dataset is organized to frame the task as a multi-label classification problem, where the goal is to
predict the presence of various species. This research specifically drew upon three main data sources
within this collection. These three data sources were selected as they were provided as PyTorch tensors
in a structure that could easily be input into a CNN such as ResNet18 without additional manipulation
data:
• Sentinel-2 (sen): Pre-processed raster files scaled to the European continent, representing
Sentinel-2 Level-2A observations. Each TIFF file corresponds to a unique observation location
(surveyId).
• Landsat (lan): Over 20 years of Landsat satellite imagery extracted from Ecodatacube, aggregated
into CSV files and data cubes representing mean spectral band values for three months preceding
each observation date. Cubes are structured as (n_bands, n_quarters, n_years) where n_bands =
6, n_quarters = 4, and n_years = 21.
• Bioclimatic Cubes (bio): Four monthly CHELSA climatic rasters (precipitation, maximum
temperature, minimum temperature, and mean temperature) with a resolution of 30 arc seconds,
spanning January 2000 to June 2019. Cubes are structured as (n_year, n_month, n_bio) where
n_year = 19, n_month = 12, and n_bio = 4.</p>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. Geospatial Distribution</title>
        <p>Statistical analyses were performed to determine whether the PO and PA training and test datasets
exhibit similar geographic spreads. This was done to ensure the test data’s spatial distribution is an
accurate representation of the training data.</p>
        <p>
          The test compared the spatial density of GPS coordinates between the Presence-Absence training data,
the Presence-Only training data, and the Presence-Absence test data. The Jensen-Shannon Divergence
(JSD) [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ] was employed, which is a method to measure how diferent two probability distributions are.
        </p>
        <p>The following hypotheses were tested:</p>
        <p>Null Hypothesis: The training and test datasets originate from the same geographic area, and observed
spatial diferences are attributable to random chance. Alternative Hypothesis: The datasets come from
diferent geographic areas, meaning there is a significant spatial mismatch between them.</p>
        <p>To do this, a Gaussian Kernel Density Estimate (KDE) was created for the coordinates of each dataset
to visualize its spatial distribution. The JSD score was then calculated to quantify the diference between
these distributions. The resulting KDE heatmaps for the PA and PO datasets are shown in Figure 2.</p>
        <p>Second, to evaluate the statistical significance of the observed JSD, a permutation test was then
conducted under the null hypothesis that the training and test samples originate from the same underlying
spatial distribution. The combined dataset was randomly permuted, and samples were reassigned into
surrogate training and test groups matching the original group sizes. For each permutation, KDEs and
the corresponding JSD were recomputed, generating a null distribution of divergence values.</p>
      </sec>
      <sec id="sec-2-3">
        <title>2.3. Species Occurrence</title>
        <p>Value counts for each individual species were calculated for the data in order to remove low-occurrence
outlier species. These counts were performed on the PA training data, as this provides a comprehensive
overview of each survey site, mitigating the possibility of certain harder to identify species being
overlooked, which might occur in the PO data.</p>
        <p>The top 1,000 species were selected for use. These represent 94.12% of the PA training data and
include species with occurrences of 130 or greater within this dataset.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Method</title>
      <p>A CNN based on the ResNet18 architecture was trained to predict species presence from satellite imagery.
The model was designed to classify species presence or absence at specified geographic locations,
utilizing image tiles centered on survey coordinates as input. To optimize spatial generalization and
reduce overfitting, a data strategy that combined diferent types of occurrence data while accounting
for their respective strengths and limitations was implemented.
3.1. Incorporating Presence-Only Data to Supplement Sparse Presence-Absence</p>
      <p>Observations
Presence-Absence data were used as the primary target for model training and evaluation, as they
provided both positive and negative labels necessary for supervised learning. However, the available
PA dataset was relatively limited in size and spatial coverage. Many regions were underrepresented,
including regions covered in the test data. To address this, the PA were supplemented with a larger
and more spatially uniform Presence-Only (PO) dataset, which, although lacking explicit absence
information and label balance, ofered broader geographic coverage.</p>
      <p>The PO data were used to augment the spatial diversity of the training inputs, exposing the model to
a wider range of environmental and landscape contexts associated with species presence. While PO
data could not be used for direct training on absence, they contributed to pretraining and representation
learning stages, improving the model’s ability to extract ecologically meaningful spatial features. This
combined data approach allowed the model to benefit from both the accuracy of PA labels and the
spatial representativeness of the PO observations.</p>
      <sec id="sec-3-1">
        <title>3.2. Pipeline</title>
        <p>
          This study employs a multi-label classification approach to predict species presence based on the
environmental predictors described above. ResNet18 [16], a commonly used convolutional neural
network architecture, was selected as the base model for all experiments. This model was chosen for its
eficiency and the ability to train it on consumer hardware. The code developed for this experiment
was adapted from the baselines provided as part of the Kaggle competition [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ]. The following steps
were undertaken:
1. Data Preprocessing: The training data were filtered to include only the 1,000 most common
species from the PA training data. This filtering was applied consistently to both the PO and PA
datasets to ensure a consistent feature space.
2. Model Creation: Three instances of the ResNet18 model were created, each utilizing a diferent
input data source (Sentinel-2, Bioclimatic Cubes, Landsat). These models were adapted to suit the
input format of the data sources.
3. Pre-training Phase: The experimental models were pre-trained for 10 epochs on the
PresenceOnly (PO) dataset. This initial training phase aimed to enable the models to learn generalizable
features from the broader PO distribution and to address the disparity observed between the PA
training and test datasets. This approach was selected for simplicity and was constrained by the
limited resources available.
4. Fine-tuning Phase: Following pre-training, the models were fine-tuned for an additional 10
epochs on the PA training data. This fine-tuning stage adapts the pre-trained weights to the
specific task of predicting species presence in the PA survey records. A set of three baseline
models were also trained for 10 epochs on only the PA training data.
5. Prediction and Evaluation: The 25 most likely species to appear at each site in the test data
were selected, to ensure consistency across models and with the baseline. The performance of the
models was evaluated using the samples-averaged F1-score, which measures the overlap between
the predicted and actual sets of species present at each location.
        </p>
      </sec>
      <sec id="sec-3-2">
        <title>3.3. Models</title>
        <p>All models were trained using an RTX 3090 GPU with 24GB VRAM. This use of consumer grade
hardware influenced the choice of model architecture for smaller networks.</p>
        <p>For this research, it was necessary to make some adjustments to the ResNet18 model.
Model for processing the Sentinel-2 data:
• Used pre-trained weights IMAGENET1K_V1
• The first convolutional layer was modified from 3 to 4 channels to accommodate the Infrared
spectrum imagery.</p>
        <p>
          Model for processing the Bioclimatic Cubes data:
• Used randomized initial weights
• Input channels were set to [
          <xref ref-type="bibr" rid="ref12 ref4">4, 19, 12</xref>
          ]
• The first convolutional layer was modified from 3 to 4 channels to accommodate cube dimensions.
• The final classification layer was changed to two linear layers: fc1 (in: 1,000 parameters, out:
2,056 parameters), and fc2 (in: 2,056 parameters, out: 1,000 parameters) in an attempt to improve
classification.
        </p>
        <p>
          Model for processing the Landsat data:
• Used randomized initial weights
• Input channels were set to [
          <xref ref-type="bibr" rid="ref4 ref6">6, 4, 21</xref>
          ]
• The first convolutional layer was modified from 3 to 6 channels to accommodate cube dimensions.
• The final classification layer was changed to two linear layers: fc1 (in: 1,000 parameters, out:
2,056 parameters), and fc2 (in: 2,056 parameters, out: 1,000 parameters) in an attempt to improve
classification.
        </p>
        <p>The performance was assessed using the samples-averaged F1-score. This metric quantifies the degree
of overlap between the species predicted to be present at each location and the actual species observed
(ground truth). Submissions were then created consisting of a list of predicted species IDs for each test
survey. These were compared against the known set of species present at that location and time, which
was stored on the Kaggle platform. The F1-score was calculated to provide an evaluation of the model’s
predictive accuracy.</p>
      </sec>
      <sec id="sec-3-3">
        <title>3.4. Metric</title>
        <p>Where:
1 =</p>
        <p>1 ∑︁</p>
        <p>=1   + (  +  )/2
  = Number of predicted species truly present
  = Number of predicted species that are absent
  = Number of present species not predicted
(1)</p>
      </sec>
      <sec id="sec-3-4">
        <title>3.5. Hyperparameters</title>
        <p>
          The hyperparameters used in training the models are shown in Table 2. These were adopted from
previous research [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ] on similar data, performed in GeoLifeCLEF 2024. The learning rates were
increased from the original value of 0.00025 to accommodate the decision to reduce the initial epochs
from 20 down to 10. The learning rate of the Sentinel model was kept lower than that of the Bioclimatic
and Landsat models as pre-trained weights were used.
        </p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Results</title>
      <p>As shown in Figure 3 all three models pre-trained on PO data outperformed their equivalent models
trained on just PA training data with random weight initialization. Training solely on PO data (the first
10 epochs of the 10 PO + 10 PA results in Figure 3) proved insuficient to produce a useful model, as the
PO-trained Bioclimatic Cubes, Sentinel-2 models did not outperform a simple top-25 most frequent
species prediction and the Landsat outperformed this baseline (0.08372) with a samples-averaged
F1score of 0.09374. However, when subsequently fine-tuned with the PA training data, the three models
demonstrated an approximate 7% increase in performance, as measured by the F1-score.</p>
      <p>Looking at the individual models, the Landsat model has the highest F1-score in both PO+PA and
PA-only training. Further, the Landsat PA-only model outperforms both the Bioclimatic and Sentinel
PO+PA models. Between the two versions of the Landsat model, an improvement is also seen in the
PO+PA model, which outperforms the PA-only model, resulting in an F1-score of 0.1784.</p>
    </sec>
    <sec id="sec-5">
      <title>5. Conclusion</title>
      <p>Pre-training our three baseline ResNet18 models with PO data resulted in improved performance for
all three models. This improvement was observed despite the demonstrated diferences between the
PO and PA datasets. Notably, substantial geographical areas existed where the PA training and test
data exhibited no overlap. The diferent data coverage for Switzerland, a country with very distinctive
terrain, is shown in Figure 4. It is largely covered in the PA test data, but is only sparsely covered in the
PA training data. It is, however, well covered in the PO data.</p>
      <p>Due to the disparity in data size, training the model on PO data took longer per epoch than training
on PA data. This cost in time would be a factor in choosing to use PO data, however, due to the eficiency
of ResNet18, this did not push the hardware requirements beyond what can be delivered by modern
consumer hardware. Furthermore, as seen in Figure 3 the models did not exhibit improvement in
performance over extended training and the number of epochs could be reduced without sacrificing
performance.</p>
      <p>The PO data could, in theory, allow these trained models to generalize better to areas not adequately
covered by PA training data, but this would require further analysis of the data, and access to the ground
truth labels for the test set.</p>
      <p>Due to resource limitations, limited hyperparameter tuning was performed. Further tuning is likely
to result in improved performance, and when combined with more complex pre-training/fine-tuning
methods, could result in a more accurate model.</p>
      <p>Future research would be assisted by the addition of more data related to the plant species, such as
family or genus information. This would allow for a deeper analysis of plant groups where the models
underperform.</p>
      <p>This research has shown that Presence-Only data can still improve predictive multi-label classification
models, when supported by accompanying Presence-Absence data. Improved predictions could allow
for better planning in protecting biodiversity.</p>
    </sec>
    <sec id="sec-6">
      <title>Declaration on Generative AI</title>
      <p>During the preparation of this work, the authors used Gemini 2.5 in order to: Grammar and Spelling
Check; and Improve writing style. Further, the initial reviews used ChatGPT-4o for: Peer review
simulation, the results of which were taken into account in later drafts. After using these tools, the
authors reviewed and edited the content as needed and take full responsibility for the publication’s
content.
[16] K. He, X. Zhang, S. Ren, J. Sun, Deep residual learning for image recognition, 2015. URL: https:
//arxiv.org/abs/1512.03385. arXiv:1512.03385.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>S. L.</given-names>
            <surname>Pimm</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. N.</given-names>
            <surname>Jenkins</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Abell</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T. M.</given-names>
            <surname>Brooks</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. L.</given-names>
            <surname>Gittleman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L. N.</given-names>
            <surname>Joppa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. H.</given-names>
            <surname>Raven</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. M.</given-names>
            <surname>Roberts</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. O.</given-names>
            <surname>Sexton</surname>
          </string-name>
          ,
          <article-title>The biodiversity of species and their rates of extinction, distribution, and protection</article-title>
          , science
          <volume>344</volume>
          (
          <year>2014</year>
          )
          <fpage>1246752</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>C.</given-names>
            <surname>Parmesan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. E.</given-names>
            <surname>Hanley</surname>
          </string-name>
          ,
          <article-title>Plants and climate change: complexities and surprises</article-title>
          ,
          <source>Annals of botany 116</source>
          (
          <year>2015</year>
          )
          <fpage>849</fpage>
          -
          <lpage>864</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>H.</given-names>
            <surname>Seebens</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Essl</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Dawson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Fuentes</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Moser</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Pergl</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Pyšek</surname>
          </string-name>
          , M. van
          <string-name>
            <surname>Kleunen</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          <string-name>
            <surname>Weber</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Winter</surname>
          </string-name>
          , et al.,
          <article-title>Global trade will accelerate plant invasions in emerging economies under climate change</article-title>
          ,
          <source>Global change biology 21</source>
          (
          <year>2015</year>
          )
          <fpage>4128</fpage>
          -
          <lpage>4140</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>J.</given-names>
            <surname>Elith</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. R.</given-names>
            <surname>Leathwick</surname>
          </string-name>
          ,
          <article-title>The art of modelling range-shifting species</article-title>
          ,
          <source>Methods in Ecology and Evolution</source>
          <volume>1</volume>
          (
          <year>2009</year>
          )
          <fpage>330</fpage>
          -
          <lpage>342</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>A.</given-names>
            <surname>Guisan</surname>
          </string-name>
          , W. Thuiller,
          <article-title>Predicting species distribution: ofering more than simple habitat models</article-title>
          ,
          <source>Ecology letters 8</source>
          (
          <year>2005</year>
          )
          <fpage>993</fpage>
          -
          <lpage>1009</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>S. J.</given-names>
            <surname>Phillips</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Elith</surname>
          </string-name>
          ,
          <article-title>Sample selection bias and presence-only distribution models: implications for background and pseudo-absence data</article-title>
          ,
          <source>Ecological applications 19</source>
          (
          <year>2009</year>
          )
          <fpage>181</fpage>
          -
          <lpage>197</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>J. A.</given-names>
            <surname>Royle</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. B.</given-names>
            <surname>Chandler</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Yackulic</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. D.</given-names>
            <surname>Nichols</surname>
          </string-name>
          ,
          <article-title>Integrating presence-absence and presenceonly data to model species distribution</article-title>
          ,
          <source>Methods in Ecology and Evolution</source>
          <volume>3</volume>
          (
          <year>2012</year>
          )
          <fpage>349</fpage>
          -
          <lpage>359</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>W.</given-names>
            <surname>Fithian</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Hastie</surname>
          </string-name>
          , et al.,
          <article-title>Bias correction in species distribution models: pooling survey and collection data for multiple species</article-title>
          ,
          <source>Methods in Ecology and Evolution</source>
          <volume>6</volume>
          (
          <year>2015</year>
          )
          <fpage>424</fpage>
          -
          <lpage>438</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>L.</given-names>
            <surname>Picek</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Botella</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Servajean</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Deneu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. Marcos</given-names>
            <surname>Gonzalez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Palard</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Larcher</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Leblanc</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Estopinan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Bonnet</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Joly</surname>
          </string-name>
          , Overview of GeoLifeCLEF 2024:
          <article-title>Species presence prediction based on occurrence data and high-resolution remote sensing images</article-title>
          ,
          <source>in: Working Notes of CLEF 2024 - Conference and Labs of the Evaluation Forum</source>
          ,
          <year>2024</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>R.</given-names>
            <surname>Chen</surname>
          </string-name>
          , et al.,
          <article-title>Transfer learning with cnns for predicting plant distributions in novel environments</article-title>
          ,
          <source>Remote Sensing of Environment</source>
          (
          <year>2024</year>
          ). In press.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>C.</given-names>
            <surname>Leblanc</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Joly</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Lorieul</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Servajean</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Bonnet</surname>
          </string-name>
          ,
          <article-title>Species distribution modeling based on aerial images and environmental features with convolutional neural networks</article-title>
          ,
          <source>CEUR-WS</source>
          ,
          <year>2022</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>D.</given-names>
            <surname>Rawlings</surname>
          </string-name>
          , T. Chopard,
          <article-title>Exploring biodiversity: A multi-model approach to multi-label plant species prediction</article-title>
          ,
          <source>in: 25th Working Notes of the Conference and Labs of the Evaluation Forum, CLEF</source>
          <year>2024</year>
          , CEUR Workshop Proceedings,
          <year>2024</year>
          , pp.
          <fpage>2188</fpage>
          -
          <lpage>2200</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>L.</given-names>
            <surname>Picek</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Botella</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Servajean</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Leblanc</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Palard</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Larcher</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Deneu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Marcos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Bonnet</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Joly</surname>
          </string-name>
          ,
          <article-title>Geoplant: Spatial plant species prediction dataset</article-title>
          , in: A.
          <string-name>
            <surname>Globerson</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          <string-name>
            <surname>Mackey</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Belgrave</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Fan</surname>
            ,
            <given-names>U.</given-names>
          </string-name>
          <string-name>
            <surname>Paquet</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Tomczak</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          Zhang (Eds.),
          <source>Advances in Neural Information Processing Systems</source>
          , volume
          <volume>37</volume>
          ,
          <string-name>
            <surname>Curran</surname>
            <given-names>Associates</given-names>
          </string-name>
          , Inc.,
          <year>2024</year>
          , pp.
          <fpage>126653</fpage>
          -
          <lpage>126676</lpage>
          . URL: https://proceedings.neurips.cc/paper_files/paper/2024/file/ e4e7de47202bda8133dd3e8b46205cf2-Paper-Datasets_and_Benchmarks_Track.pdf.
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>L.</given-names>
            <surname>Picek</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Leblanc</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Larcher</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Servajean</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Bonnet</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Joly</surname>
          </string-name>
          , Overview of GeoLifeCLEF 2025:
          <article-title>Plant species presence prediction with environmental and high-resolution remote sensing data</article-title>
          ,
          <source>in: Working Notes of CLEF 2025 - Conference and Labs of the Evaluation Forum</source>
          ,
          <year>2025</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>B.</given-names>
            <surname>Fuglede</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Topsoe</surname>
          </string-name>
          ,
          <article-title>Jensen-shannon divergence and hilbert space embedding</article-title>
          ,
          <source>in: International symposium onInformation theory</source>
          ,
          <year>2004</year>
          .
          <article-title>ISIT 2004</article-title>
          .
          <article-title>Proceedings</article-title>
          ., IEEE,
          <year>2004</year>
          , p.
          <fpage>31</fpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>