Exploring Biodiversity: A Multi-Model Approach to
                         Multi-Label Plant Species Prediction
                         Darren Rawlings1,*,† , Tim Chopard2,*,†
                         1
                             University of Groningen, Netherlands
                         2
                             University of Edinburgh, United Kingdom


                                        Abstract
                                        The research aims to develop a multi-label classification method for predicting plant species based on environ-
                                        mental factors such as satellite images, climate data, and other environmental variables. Utilizing a dataset of
                                        plant surveys in Europe, the study addresses challenges such as collating image, tabular and time series data from
                                        a variety of sources, multi-modal learning, with variable label counts. The approach employs dimensionality
                                        reduction via PCA, and an ensemble of machine learning models used to predict both species pseudo-probabilities
                                        and also the counts of species present in a given area. The potential contributions of this research include advanc-
                                        ing our understanding of ecological systems, informing conservation efforts, and promoting the preservation of
                                        plant diversity — all essential components of a comprehensive approach to safeguarding the natural world.

                                        Keywords
                                        Multi-Label classification, Principal component analysis, ResNet, Vision Transformer, Swin Transformer, Gradient
                                        Boosting, XGBoost, GeoLifeCLEF 2024, CEUR-WS


                         1. Introduction
                         In this research, we aim to develop a model for predicting plant species in a specific location and
                         time using various environmental factors as predictors. These predictors include satellite images,
                         climatic time series, and other rasterized environmental data such as land cover, human footprint,
                         bioclimatic variables, and soil characteristics. Our motivation behind this challenge is the potential
                         usefulness of accurate plant species prediction in various scenarios related to biodiversity management
                         and conservation, species identification and inventory tools, and education.
                            We were provided a large-scale training dataset of approximately 5 million plant occurrences in
                         Europe, as well as train and test sets with 88987 and 4716 surveys, respectively. The predicted output
                         will be multi-label, presence-absence data for all present species at each plot. The data covered 11255
                         different plant species, which created significant challenges associated with this task, including learning
                         from single positive labels, dealing with strong class imbalance, multi-modal learning, and handling
                         large-scale datasets.
                            The potential applications of accurate plant species prediction are numerous. High-resolution maps of
                         species composition and related biodiversity indicators can be created to aid in scientific ecology studies
                         and conservation efforts. The accuracy of species identification tools can be improved by reducing the
                         list of candidate species observable at a given site. Additionally, location-based recommendation services
                         and educational applications with features such as quests or contextualized educational pathways can be
                         developed to facilitate biodiversity inventories and promote environmental education. We believe that
                         our research will contribute to the advancement of plant species prediction and its practical applications
                         in various fields.
                            The research was conducted as part of the GeoLifeCLEF 2024 competition on Kaggle [1], which is a
                         part of the LifeCLEF initiative [2]. The competition aims to develop models for predicting plant species
                         in a specific location and time using various environmental factors as predictors.
                         CLEF 2024: Conference and Labs of the Evaluation Forum, September 09–12, 2024, Grenoble, France
                         *
                           GeoLifeCLEF Kaggle team BernIgen https://github.com/timchopard/ml-geolife
                         †
                           These authors contributed equally.
                         $ d.rawlings@student.rug.nl (D. Rawlings); timchopard@pm.me (T. Chopard)
                          https://startung.github.io (D. Rawlings); https://cloudberries.io (T. Chopard)
                                     © 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).


CEUR
                  ceur-ws.org
Workshop      ISSN 1613-0073
Proceedings
2. Background
The GeoLifeCLEF challenge has been running for several of years. Each year, participants are tasked with
predicting species distribution, but the challenge has evolved over time, with new datasets, evaluation
metrics, and research questions introduced each year. Here, we provide an overview of the some of the
recent submissions to GeoLifeCLEF challenge and summarize the key contributions.
   In 2021, the GeoLifeCLEF challenge [3] focused on fine-grained visual categorization using remote
sensing data. The winning submission by [4] leveraged contrastive learning to improve species distribu-
tion modeling (SDM) from remote sensing imagery. The authors explored the effectiveness of using only
RGB imagery and the impact of adding altitude imagery to the model’s performance. They introduced
a new consistency-based model selection metric to enhance the model’s generalization capabilities. The
paper outlined potential areas for further research, including the impact of transformations and the
utility of the consistency metric.
   In 2022, the GeoLifeCLEF challenge [5] shifted its focus to predicting species distribution across the
U.S. and France using remote sensing data and other covariates. The second-place submission by [6]
proposed a classification approach with a spatial block-label swap regularization during training and an
ensemble of deep learning models. Their method achieved a top-30 accuracy of 31.22% on the private
test set, securing second place in the competition. The authors reflected on the results and suggested
potential improvements and the importance of species distribution modeling for ecological research.
   In 2023, the GeoLifeCLEF challenge [7] introduced a new dataset with single positive labels for
each location, making multi-label prediction challenging. The winning submission by [8] proposed a
three-step training strategy to leverage the single positive labels effectively. The authors introduced
several CNN-based models and demonstrated their effectiveness compared to a simple baseline. The
paper discussed the challenges of the new dataset and the proposed model’s performance, providing
detailed results and comparisons.


Figure 1: Survey latitude and longitude with PO surveys in green, PA training surveys in orange and PA test
surveys in red.
3. Data
The training data comprises species observations and environmental data. Below, we explain the data
in detail.

3.1. Observations data
The species-related training data for this study was gathered using two different methods: Presence-
Absence (PA) surveys and Presence-Only (PO) occurrences. The different methodologies resulted in
datasets that differed in both scale and utility.
   The PA surveys included 88987 surveys with 5016 species from the European flora. This data was
provided to address the issue of false absences in PO data and calibrate models to avoid associated
biases.
   The PO occurrences consisted of approximately five million observations gathered from various
datasets available through the Global Biodiversity Information Facility (GBIF). This data covered all
countries in the study area and constituted the larger portion of the training data. The PO also
contained 11255 species, notably more than the PA. However, it was collected opportunistically, without
a standardized sampling protocol, leading to various sampling biases. It is important to note that the
local absence of a species in PO data does not necessarily indicate its true absence. An observer may
have failed to report a species due to difficulties in seeing or identifying it at a particular time of year,
or because it was not the target of monitoring efforts.
   The testing data comprised of only PA surveys. The geographical spread of PA, PO and test data can
be seen in Figure 1.

3.2. Environmental data
Besides species data, we had access to spatialized geographic and environmental data, these were used
as additional input variables.
   Satellite image patches which consist of 3-band (RGB) and 1-band (NIR) 128x128 JPEG images,
a color JPEG file for RGB data and a grayscale one for Near-Infrared images at 10m resolution. The
source for these images is Sentinel2 remote sensing data pre-processed by the Ecodatacube platform.
An example of this data can be seen in Figure 2.


Figure 2: Satellite image patches. Left: Red, Green, Blue (RGB); Right: Infra-Red (IR).


  Satellite time series comprises up to 20 years of values for six satellite bands (R, G, B, NIR, SWIR1,
and SWIR2). Each observation is associated with the time series of the satellite median point values
over each season since the winter of 1999 for six satellite bands (R, G, B, NIR, SWIR1, and SWIR2). This
data carries a high-resolution local signature of the past 20 years’ succession of seasonal vegetation
changes, potential extreme natural events (fires), or land use changes. The original satellite data has a
resolution of 30m per pixel. The source for this is the Landsat remote sensing data pre-processed by the
Ecodatacube platform
   Environmental rasters are formed of various climatic, pedologic, land use, and human footprint
variables at the European scale. These were provided as scalar values, time-series, and the original
rasters.
   Environmental rasters, for each observation, we were provided additional environmental data such
as GeoTIFF rasters and scalar values were already extracted from the rasters. We were provided CSV
files, one per band raster type, i.e., Climate, Elevation, Human Footprint, LandCover, and SoilGrids.
Further details can be found in Appendix A.


4. Method
For the creation of our proposed solution, a multi-stage development process was adopted. Initially, we
performed data pre-processing. Next, we trained and tested individual models. In order to improve
performance, an ensemble approach was then implemented by combining an XGBoost model, with
a Multi-Modal model. The output was tuned to predict the number of species per survey using an
additional XGBoost regression model. Finally, additional tests were performed using different weightings
for each model in the ensemble step and different weightings added to the species count prediction. We
also tested combinations of the same Multi-Modal model trained on different seeds.

4.1. Data Processing
In order to improve the model performance and reduce noise [9] the data was processed prior to use in
the models. The flow of the data processing is shown in Figure 3.

4.1.1. Initial Processing
The raw PA and Environmental Rasters data included some missing, or infinite values that needed to
be processed prior to model fitting. When a column was both deemed important to model fitting and
contained such values, these values were replaced by median values so as to avoid excessive influence
from outliers without overly effecting the shape of the data [10]. Both the Country and Region columns
were one-hot encoded to remove the categorical variables. The following columns were then dropped
from the data:

    • Environmental Rasters from before 2008, retaining a 12 year window for further processing.
    • All countries and regions with fewer than 250 occurrences, in order to minimize overfitting to
      the few occurrences present in the data.
    • The combined Human Footprint data as well as that for Navigable water, Roads were removed, as
      they were each missing more than 5% of their values.

4.1.2. Principal Component Analysis
Principal Component Analysis (PCA) is a method for reducing dimensionality [11]. It functions by
projecting the high dimensional data onto the direction of maximum variance, thus retaining key
features and reducing noise. Single component PCAs were used on several of the Environmental
Rasters:

    • Monthly precipitation values were combined for the years 2008-2019, giving a single value per
      month.
    • Monthly mean, minimum and maximum temperature values were combined for the years 2008-
      2019, giving a single value per month.
                                                     PCA1

                 Environmental Rasters               PCA2


                                                                               Combined Data
                                                     PCAn
                Presence-Absence Data

                                                    One-Hot


Figure 3: Pre-processing of the PA and Environmental Rasters data to create a combined dataset for the XGBoost
models.


    • Other high covariance columns (in Appendix Table 6) were combined using PCA into single
      values.

4.1.3. Species Reduction
The number of viable output species was reduced from 11255 (the total species present across all data)
down to 1141 (the total species with more than 100 occurrences in the PA data). This reduction served
to remove several edge cases and focus the models more on likelier species.

4.2. Models
To handle the multi-label classification necessary for this study, we employed an ensemble approach.
Ensemble methods have been utilized to enhance neural network performance on challenging tasks [12].
Our ensemble consists of various model types, including an XGBoost model and a Multi-Modal model
comprised of multiple neural networks, as well as multiple instances of the same Multi-Modal model
trained with different seeds. The complex nature and stochastic initialization of the multi-modal model
can result in its loss function settling into local minima during training. By using multiple instances
of the same multi-modal model trained with different seeds and aggregating their outputs, we aim to
improve performance at the cost of increased training time. The pseudo-probability outputs of the
models that formed the ensemble were then weighted and combined. The species with the combined
highest score were then selected with the number determined by a further XGBoost model referred to
as the Count model. This technique of combining models in to an ensemble has been used in Kaggle
competitions to optimize performance [13].

4.2.1. XGBoost
XGBoost is an open source gradient tree boosting package [14]. For this research, we used the XGB
regression model. It has shown broad success in a wide range of tasks, performing on par with or better
than most equivalent and Automated Machine learning approaches [15].
  A multi-label regression XGBoost model was chosen as this generated predictions in the form of
pseudo-probabilities for every species class. This also allowed the model to be used in an ensemble
with other models, as the pseudo-probabilities could be combined in a weighted sum.
  The XGBoost Regression model was trained on a combination of the processed data and the Landsat
data. as shown in Figure 4. Hyperparameter tuning was performed using a Grid Search Cross Validation
approach [16].

4.2.2. Multi-Modal
The multi-modal model is the Sentinel+Landsat+Bioclim baseline model [17]. This model uses ResNet18
to process all data except the satellite images which are processed by a Swin Transformer. This used
the full PA data to train.
                           Combined Data
                                                                     XGB Counts


                         Landsat Time Series
                                                              XGB Psuedo-Probabilities


                        Environmental Rasters
                                                                   Multi-Modal Model


                           Sentinel Images


Figure 4: The flow of data into the models.


   ResNet (Residual Network) is an architecture for deep neural networks that speeds up the training
process through the use of residual connections [18]. For the purposes of this research we focused on
ResNet18, the 18 layer deep variation with some modifications to accommodate the shape of the input
data and the required output.
   Swin Transformers, which were introduced in 2021 by researchers at Microsoft [19], are a type of
vision transformer architecture that use a hierarchical structure to process images. Unlike traditional
convolutional neural networks (CNNs), which use fixed-size filters to scan an image and extract features,
Swin Transformers use multi-head self-attention mechanisms to dynamically learn relationships be-
tween different parts of an image. This allows them to capture more complex patterns and dependencies
in the data.
   Swin Transformers divide an image into smaller patches and process them in a hierarchical manner,
starting with small patches and gradually merging them into larger ones. This allows the network to
learn features at different scales and improve its performance on tasks that require fine-grained detail,
as well as coarse-grained context. Swin Transformers have been shown to outperform other state-
of-the-art vision transformer architectures on several benchmark datasets, making them a promising
approach for computer vision applications.

4.3. Count Prediction
In order to select the number of species per survey two methods will be used.
   Firstly, a top-K approach using fixed values in the range [⌊𝜇𝑐𝑜𝑢𝑛𝑡𝑠 ⌋, 25]. This range was selected
as the mean count 𝜇𝑐𝑜𝑢𝑛𝑡 provided a good lower bound and 25 allowed us to test against the baseline
models provided by the GeoLifeCLEF organizers. [1].
   Secondly a dynamic approach was used based on a XGBoost Regression model trained on the pre-
processed PA-Metadata and Environmental Rasters. The training labels used were the count of species
recorded in each PA survey. The predicted counts y ^ then had a weight added and were rounded to the
nearest integer as shown in equation 1.

                                               c = ⌊y
                                                    ^ + 𝑛 + 0.5⌋                                       (1)
   The weight 𝑛 was consistent across all surveys and the values used were [0, 2, 2.5, 3, 3.5, 4, 4.5, 5].
These values were selected as during initial testing there was consistently a drop-off from values above
4.5 or below 2.5. The raw predicted count was also included to demonstrate the value of adding an 𝑛.
4.4. Scoring Metric
The main scoring metric used was the micro-averaged F1 score as shown in equation 2 which is
                                  TP                    TP
calculated from the precision TP+FP    and the recall TP+FN for each individual class 𝑖. Where TP are the
true positives, FP are the false positives, and FN are the false negatives.
                                                   𝑁
                                               1 ∑︁        2 · TP𝑖
                                   F1𝑚𝑖𝑐𝑟𝑜 =                                                           (2)
                                               𝑁    2 · TP𝑖 + FP𝑖 + FN𝑖
                                                  𝑖=1


5. Results

Table 1
F1 scores per model using the top N species selection.
            Model                      15         17        19        21        23        25
            XGB Regression          0.30664    0.30986   0.31035   0.31025   0.30958   0.30826
            Multi-Modal Model       0.30876    0.31216   0.31420   0.31531   0.31577   0.31514
            Ensemble Model 1:1      0.32509    0.32838   0.32967   0.32972   0.32859   0.32699


Table 2
F1 scores per model using the Count Prediction model with additional weighting.
  Model                    Raw         +2        +2.5       +3       +3.5       +4       +4.5       +5
  XGB Regression         0.31957    0.32178    0.32214   0.32234   0.32217   0.32233   0.32198   0.32184
  Multi-Modal Model      0.32165    0.32523    0.32566   0.32664   0.32721   0.32762   0.32750   0.32751
  Ensemble Model 1:1     0.34078    0.34302    0.34335   0.34378   0.34323   0.34329   0.34316   0.34338
  Ensemble Model 2:3     0.34077    0.34306    0.34297   0.34304   0.34303   0.34341   0.34289   0.34093
  Ensemble Model 5:4     0.34116    0.34349    0.34340   0.34396   0.34407   0.34402   0.34378   0.34372


5.1. Top-K Species Counts
The Top-K approach was used to provide a benchmark for each model approach. Here the base XGB
Regression and Multi-Modal models were tested as well as an equally weighted combination of the two.
The results for this are shown in Table 1. The optimal configuration was using the equally weighted
ensemble model with a 𝐾 value of 21. This led to an F1 score of 0.32927.

5.2. XGBoost Species Counts
The second round of testing used the dynamic XGBoost Count model in place of Top-K species selection.
In addition to the models tested with Top-K, different weightings of the XGB Regression to Multi-Modal
models were tested as shown in table 2. Here the model weighting is denoted as X:Y, where the X weight
is applied to the Multi-Modal model and the Y weight to the XGB Regression model. An additional
weight was applied to the XGBoost Count model on top of the predicted species count.
   The best performance was achieved with the Ensemble Model with a 5:4 weighting, using a species
weight of +3.5. This resulted in an F1 score of 0.34407, an increase of 0.01435 over the best Top-K model
result.

5.3. Multiple Multi-Modal Models
Using a combination approach, with a mean prediction between multiple Multi-Modal models trained
with different seeds, as shown in Table 3, outperformed the other approaches. The best result was
achieved with a mean output of 6 different seeds equally weighted against the XGB Regression model,
Table 3
Mean F1 scores across 3 tests per ensemble using a 1:1 split of the mean output of Multi-Modal models and the
output of the XGB Regression model, with additional species count weighting.
                          Multi-Modal Count        +3          +3.5        +4
                                  2             0.349850     0.349743   0.350037
                                  3             0.353183     0.353063   0.353303
                                  4             0.353797     0.353547   0.353637
                                  5             0.355133     0.354817   0.354873
                                  6             0.355177     0.354803   0.354643


using a species count weight of +3. This gave a final F1 score of 3.55177, and increase of 0.011107 over
the best single Multi-Modal model result and 0.025457 over the best Top-K model result. A more detailed
breakdown of these results, including seeds, can be found in Appendix Table 7.


6. Conclusion
The data presented in Tables 1 and 2 suggest that using values higher than the mean species count of
15.396 and the predicted species count of 15.762 consistently resulted in better performance. In our
tests a value of 3.5 added to the XGB Count prediction values provided the best result. This can in part
be attributed to the use of the micro F1 score as the evaluation metric. This metric prioritizes models
that are able to accurately predict a higher number of true positives even if it comes at the cost of
introducing additional false positives, as the number of false negatives decreases. The results also show
how combining multiple independent models can be better than the sum of their parts.
   Increasing the number of instances (trained with different seeds) of the Multi-Modal models in the
ensemble further improved the results. Each additional model improved the score, albeit with diminish-
ing returns. The addition of further models comes at a cost of time and computational complexity. The
required resources for training the additional models scales linearly with each additional model added,
but the results only add marginal improvements. Whilst testing higher numbers of additional models
may improve the score further, we would suggest the testing of different optimizers and tuning hyper-
parameters may allow a single model to achieve similar or better results without the computational
penalty. There is also the possibility for improvements through adjusting the sub-models within the
Multi-Modal model. Alternatively using larger ResNet models in the Multi-Modal model trained over a
greater number of epochs could potentially enhance performance.
   Another potential area for further investigation is the acquisition of additional data. In this study, the
plant data was anonymized in order to protect sensitive information about rare species, which may have
limited the ability to analyze species-specific performance. Future datasets that include information
about plant families or genera could provide valuable context for improving model performance in
specific areas. Additionally, the Presence Only data was not utilized in this investigation. With proper
pre-processing, this information could be used to refine and optimize our models.
   This research has laid the groundwork for future studies with more time and access to greater
computational or data resources to build upon.


References
 [1] L. Picek, C. Botella, M. Servajean, B. Deneu, D. Marcos Gonzalez, R. Palard, T. Larcher, C. Leblanc,
     J. Estopinan, P. Bonnet, A. Joly, Overview of GeoLifeCLEF 2024: Species presence prediction based
     on occurrence data and high-resolution remote sensing images, in: Working Notes of CLEF 2024 -
     Conference and Labs of the Evaluation Forum, 2024.
 [2] A. Joly, L. Picek, S. Kahl, H. Goëau, V. Espitalier, C. Botella, B. Deneu, D. Marcos, J. Estopinan,
     C. Leblanc, T. Larcher, M. Šulc, M. Hrúz, M. Servajean, et al., Overview of lifeclef 2024: Challenges
     on species distribution prediction and identification, in: International Conference of the Cross-
     Language Evaluation Forum for European Languages, Springer, 2024.
 [3] A. Joly, T. Lorieul, E. Cole, B. Deneu, M. Servajean, P. Bonnet, Overview of geolifeclef 2021:
     Predicting species distribution from 2 million remote sensing images, in: CLEF (Working Notes),
     2021, pp. 1451–1462.
 [4] S. Seneviratne, Contrastive representation learning for natural world imagery: Habitat predic-
     tion for 30,000 species, in: CLEF-2021, 2021, pp. 1639–1648. URL: http://ceur-ws.org/Vol-2936/
     #paper-140.
 [5] A. Joly, T. Lorieul, E. Cole, B. Deneu, M. Servajean, P. Bonnet, Overview of geolifeclef 2022:
     Predicting species presence from multi-modal remote sensing, bioclimatic and pedologic data, in:
     CLEF (Working Notes), 2022, pp. 1940–1956.
 [6] B. Kellenberger, T. Devis, Block label swap for species distribution modelling, in: CLEF2022, 2022,
     pp. 2103–2114. URL: http://ceur-ws.org/Vol-3180/#paper-167.
 [7] C. Botella, B. Deneu, D. M. Gonzalez, M. Servajean, T. Larcher, C. Leblanc, J. Estopinan, P. Bonnet,
     A. Joly, Overview of geolifeclef 2023: Species composition prediction with high spatial resolution
     at continental scale using remote sensing, in: CLEF 2023: Conference and Labs of the Evaluation
     Forum, 2023.
 [8] H. Q. Ung, R. Kojima, S. Wada, Leverage samples with single positive labels to train cnn-based
     models for multi-label plant species prediction, in: CLEF2023, 2023, pp. 2149–2158. URL: http:
     //ceur-ws.org/Vol-3497/#paper-181.
 [9] A. Famili, W.-M. Shen, R. Weber, E. Simoudis, Data preprocessing and intelligent data analysis,
     Intelligent data analysis 1 (1997) 3–23.
[10] E. Acuna, C. Rodriguez, The treatment of missing values and its effect on classifier accuracy,
     in: Classification, Clustering, and Data Mining Applications: Proceedings of the Meeting of the
     International Federation of Classification Societies (IFCS), Illinois Institute of Technology, Chicago,
     15–18 July 2004, Springer, 2004, pp. 639–647.
[11] K. Pearson, On lines and planes of closest fit to systems of points in space, in: Proceedings of the
     17th ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems (SIGMOD),
     1901, p. 19.
[12] M. P. Perrone, L. N. Cooper, When networks disagree: Ensemble methods for hybrid neural
     networks, in: How we learn; How we remember: Toward an understanding of brain and neural
     systems: Selected papers of Leon N Cooper, World Scientific, 1995, pp. 342–358.
[13] N. Ketkar, E. Santana, Deep learning with Python, volume 1, Springer, 2017.
[14] T. Chen, C. Guestrin, XGBoost: A scalable tree boosting system, in: Proceedings of the 22nd
     ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’16,
     ACM, New York, NY, USA, 2016, pp. 785–794. URL: http://doi.acm.org/10.1145/2939672.2939785.
     doi:10.1145/2939672.2939785.
[15] L. Ferreira, A. Pilastri, C. M. Martins, P. M. Pires, P. Cortez, A comparison of automl tools for
     machine learning, deep learning and xgboost, in: 2021 International Joint Conference on Neural
     Networks (IJCNN), 2021, pp. 1–8. doi:10.1109/IJCNN52387.2021.9534091.
[16] M. Adnan, A. A. S. Alarood, M. I. Uddin, I. ur Rehman, Utilizing grid search cross-validation with
     adaptive boosting for augmenting performance of machine learning models, PeerJ Computer
     Science 8 (2022) e803.
[17] L. Picek, Sentinel+landsat+bioclim baseline [0.31626], 2024. URL: https://www.kaggle.com/code/
     picekl/sentinel-landsat-bioclim-baseline-0-31626.
[18] K. He, X. Zhang, S. Ren, J. Sun, Deep residual learning for image recognition, CoRR abs/1512.03385
     (2015). URL: http://arxiv.org/abs/1512.03385. arXiv:1512.03385.
[19] Z. Liu, Y. Lin, Y. Cao, H. Hu, Y. Wei, Z. Zhang, S. Lin, B. Guo, Swin transformer: Hierarchical
     vision transformer using shifted windows, CoRR abs/2103.14030 (2021). URL: https://arxiv.org/
     abs/2103.14030. arXiv:2103.14030.
A. Appendix — Data
Country & Region Data

Table 4
The number of surveys per country in the training and test data.
* Data removed for the XGBoost Model.
                          Country                      Training Data      Test Data
                          Andorra*                                 59             0
                          Austria                              43821            204
                          Belgium                              31620             71
                          Bosnia and Herzegovina*                1428             7
                          Bulgaria*                              2738             2
                          Croatia*                               4837            42
                          Czech Republic                       24481             10
                          Denmark                             791993           1453
                          France                              247739           1386
                          Germany                              24701              2
                          Greece*                                2268             4
                          Hungary*                               1355             0
                          Ireland*                                107             0
                          Italy                                46625            433
                          Latvia                               11723              0
                          Luxembourg*                              38             0
                          Monaco*                                 150             0
                          Montenegro                           20652             19
                          Netherlands                         179457            846
                          North Macedonia*                        122             0
                          Norway*                                  36             0
                          Poland                               13697             75
                          Portugal*                                79             0
                          Romania                                8114             0
                          Serbia*                                1803             0
                          Slovakia*                              4850             4
                          Slovenia*                              1104            39
                          Spain                                11574             16
                          Switzerland                            6466           103


Table 5
The number of surveys per region in the training and test data.
* Data removed for the XGBoost model. ** Removed through country filtering.
                              Region                Training Data       Test Data
                              ALPINE                         1838             773
                              ATLANTIC                      36357            1402
                              BOREAL**                         73               0
                              BLACK SEA*                        8               0
                              CONTINENTAL                   43147            1449
                              MEDITERRANEAN                  7333            1092
                              PANNONIAN*                      219               0
                              STEPPIC*                         12               0
Environmental Rasters
Bioclimatic rasters: 19 low-resolution rasters covering Europe; commonly used in species distribution
modeling. Provided in longitude/latitude coordinates (WGS84). These were provided as GeoTIFF files
with compression and CSV file with extracted values, with a resolution of 30 arcsec (∼ 1 kilometer).
The source for these rasters is the CHELSA climate dataset.
   Soil rasters: Nine pedologic low-resolution rasters covering Europe. Provided variables describe the
soil properties from 5 to 15cm depth and are determinant of plant species distributions. These included
numerous values such as pH, clay, organic carbon and nitrogen contents. The format is GeoTIFF files
with compression and CSV file with extracted values, with a resolution of ∼ 1 kilometer. The source for
these rasters is Soilgrids.
   Elevation: High-resolution raster covering Europe. Provided as a GeoTIFF file and CSV file with
extracted values, with a resolution of 1 arc second (∼ 30 meters). The source for this raster is the ASTER
Global Digital Elevation Model V3.
   Land Cover: A medium-resolution multi-band land cover raster covering Europe. Each band
describes either the land cover class prediction or its confidence under various classifications. The
format is GeoTIFF file with compression and CSV file with extracted values, with a resolution of ∼ 500
meters. The source for this raster is MODIS Terra+Aqua 500m.
   Human footprint: Several low-resolution rasters describing human footprint, encapsulating seven
pressures on the environment (e.g., nighlight level, population density) induced by human presence
and activity, are provided for two time periods, the early 90’s (∼ 1993) and late 2000’s (∼ 2009). The
format is GeoTIFF files with compression and CSV file with extracted values, with a resolution of ∼ 1
kilometer.

PCA Columns

Table 6
High Covariance columns combined with PCA.
For the columns with months denoted by MM PCA was applied per month with year values YYYY in the range
2008-2019, giving 12 precipitation and 12 temperature columns of PCA output values.
                                    Column
                                    HumanFootprint-Built1994
                                    HumanFootprint-Built2009
                                    HumanFootprint-croplands1992
                                    HumanFootprint-croplands2005
                                    HumanFootprint-Lights1994
                                    HumanFootprint-Lights2009
                                    HumanFootprint-Pasture1993
                                    HumanFootprint-Pasture2009
                                    HumanFootprint-Popdensity1990
                                    HumanFootprint-Popdensity2010
                                    HumanFootprint-Railways
                                    Bio-tas_MM_YYYY
                                    Bio-tasmax_MM_YYYY
                                    Bio-tasmin_MM_YYYY
                                    Bio-pr_MM_YYY
B. Appendix — Extended Results

Table 7
F1 scores of mean outputs of Multi-Modal models weighted 1:1 with the XGB Regression output.
         Addition   Number of Networks     Seeds                                      F1-Score
         3          2                      489304 556925                              0.34836
         3.5        2                      489304 556925                              0.34802
         4          2                      489304 556925                              0.34836
         3          2                      481765 356592                              0.35082
         3.5        2                      481765 356592                              0.35076
         4          2                      481765 356592                              0.35084
         3          2                      489304 356592                              0.35037
         3.5        2                      489304 356592                              0.35045
         4          2                      489304 356592                              0.35091
         3          3                      356592 20438 489304                        0.35337
         3.5        3                      356592 20438 489304                        0.35351
         4          3                      356592 20438 489304                        0.35366
         3          3                      278872 481765 831651                       0.35346
         3.5        3                      278872 481765 831651                       0.35352
         4          3                      278872 481765 831651                       0.35346
         3          3                      356592 937905 216510                       0.35272
         3.5        3                      356592 937905 216510                       0.35216
         4          3                      356592 937905 216510                       0.35279
         3          4                      278872 937905 556925 216510                0.35349
         3.5        4                      278872 937905 556925 216510                0.35329
         4          4                      278872 937905 556925 216510                0.35330
         3          4                      278872 216510 831651 556925                0.35306
         3.5        4                      278872 216510 831651 556925                0.35269
         4          4                      278872 216510 831651 556925                0.35290
         3          4                      216510 356592 489304 831651                0.35484
         3.5        4                      216510 356592 489304 831651                0.35466
         4          4                      216510 356592 489304 831651                0.35471
         3          5                      20438 489304 831651 729571 216510          0.35502
         3.5        5                      20438 489304 831651 729571 216510          0.35452
         4          5                      20438 489304 831651 729571 216510          0.35427
         3          5                      481765 937905 216510 356592 489304         0.35512
         3.5        5                      481765 937905 216510 356592 489304         0.35478
         4          5                      481765 937905 216510 356592 489304         0.35534
         3          5                      216510 831651 481765 356592 489304         0.35526
         3.5        5                      216510 831651 481765 356592 489304         0.35515
         4          5                      216510 831651 481765 356592 489304         0.35501
         3          6                      356592 729571 937905 831651 278872 20438   0.35598
         3.5        6                      356592 729571 937905 831651 278872 20438   0.35540
         4          6                      356592 729571 937905 831651 278872 20438   0.35509
         3          6                      556925 356592 831651 481765 489304 20438   0.35544
         3.5        6                      556925 356592 831651 481765 489304 20438   0.35507
         4          6                      556925 356592 831651 481765 489304 20438   0.35493
         3          6                      556925 356592 278872 20438 216510 937905   0.35411
         3.5        6                      556925 356592 278872 20438 216510 937905   0.35394
         4          6                      556925 356592 278872 20438 216510 937905   0.35391

  Addition: The addition to the count prediction used to select species
  Number of Networks: The number of unique seeds of the same network used in combination
C. Appendix — Hyper-parameters
All models were trained on GPUs using CUDA.

Table 8
Hyper-parameters used to train the XGBoost models.
                            Hyper-Parameter      Pseudo-Probability    Count
                            reg lambda                           12       10
                            learning rate                       0.1      0.1
                            min split loss                        0        0
                            reg alpha                             0        0


Table 9
Hyper-parameters used to train the Multi-Modal model.
                              Hyper-Parameter                         Value
                              learning rate                         0.00025
                              epochs                                     10
                              positive weight factor                     1.0
                              training batch size                        64
                              optimizer                           Adam W
                              scheduler                 Cosine Annealing LR
                              T𝑚𝑎𝑥                                       25