=Paper=
{{Paper
|id=Vol-2936/paper-124
|storemode=property
|title=Overview of GeoLifeCLEF 2021: Predicting species distribution from 2 million remote
                        sensing images
|pdfUrl=https://ceur-ws.org/Vol-2936/paper-124.pdf
|volume=Vol-2936
|authors=Titouan Lorieul,Elijah Cole,Benjamin Deneu,Maximilien Servajean,Pierre Bonnet,Alexis Joly
|dblpUrl=https://dblp.org/rec/conf/clef/LorieulCDSBJ21
}}
==Overview of GeoLifeCLEF 2021: Predicting species distribution from 2 million remote
                        sensing images==
<pdf width="1500px">https://ceur-ws.org/Vol-2936/paper-124.pdf</pdf>
<pre>
Overview of GeoLifeCLEF 2021: Predicting species
distribution from 2 million remote sensing images
Titouan Lorieul1 , Elijah Cole2 , Benjamin Deneu1 , Maximilien Servajean3 ,
Pierre Bonnet4 and Alexis Joly1
1
  Inria, LIRMM, Univ Montpellier, CNRS, Montpellier, France
2
  Department of Computing and Mathematical Sciences, Caltech, USA
3
  LIRMM, AMI, Univ Paul Valéry Montpellier, Univ Montpellier, CNRS, Montpellier, France
4
  CIRAD, UMR AMAP, Montpellier, France


                                         Abstract
                                         Understanding the geographic distribution of species is a key concern in conservation. By pairing
                                         species occurrences with environmental features, researchers can model the relationship between an
                                         environment and the species which may be found there. To advance research in this area, a large-scale
                                         machine learning competition called GeoLifeCLEF 2021 was organized. It relied on a dataset of 1.9 mil-
                                         lion observations from 31K species mainly of animals and plants. These observations were paired with
                                         high-resolution remote sensing imagery, land cover data, and altitude, in addition to traditional low-
                                         resolution climate and soil variables. The main goal of the challenge was to better understand how to
                                         leverage remote sensing data to predict the presence of species at a given location. This paper presents
                                         an overview of the competition, synthesizes the approaches used by the participating groups, and an-
                                         alyzes the main results. In particular, we highlight the ability of remote sensing imagery and convolu-
                                         tional neural networks to improve predictive performance, complementary to traditional approaches.

                                         Keywords
                                         LifeCLEF, evaluation, benchmark, biodiversity, presence-only data, environmental data, remote sensing
                                         imagery, species distribution, species distribution models


1. Introduction
In order to make informed conservation decisions, it is essential to understand where different
species live. Citizen science projects now generate millions of geo-located species observations
every year, covering tens of thousands of species. But how can these point observations be used
to predict what species might be found at a new location?
   A common approach is to build a species distribution model (SDM) [1], which uses a location’s
environmental covariates (e.g. temperature, elevation, land cover) to predict whether a species
is likely to be found there. Once trained, the model can be used to make predictions for any
location where those covariates are available.

CLEF 2021 – Conference and Labs of the Evaluation Forum, September 21–24, 2021, Bucharest, Romania
" titouan.lorieul@inria.fr (T. Lorieul); ecole@caltech.edu (E. Cole); benjamin.deneu@inria.fr (B. Deneu);
servajean@lirmm.fr (M. Servajean); pierre.bonnet@cirad.fr (P. Bonnet); alexis.joly@inria.fr (A. Joly)
 0000-0001-5228-9238 (T. Lorieul); 0000-0001-6623-0966 (E. Cole); 0000-0003-0640-5706 (B. Deneu);
0000-0002-9426-2583 (M. Servajean); 0000-0002-2828-4389 (P. Bonnet); 0000-0002-2161-9940 (A. Joly)
                                       © 2021 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
    CEUR
    Workshop
    Proceedings
                  http://ceur-ws.org
                  ISSN 1613-0073       CEUR Workshop Proceedings (CEUR-WS.org)
Figure 1: Each species observation is paired with high-resolution covariates (clockwise from top left:
RGB imagery, IR imagery, altitude, land cover).


   Developing an SDM requires a dataset where each species observation is paired with a
collection of environmental covariates. However, many existing SDM datasets are both highly
specialized and not readily accessible, having been assembled by scientists studying particular
species or regions. In addition, the provided environmental covariates are typically coarse, with
resolutions ranging from hundreds of meters to kilometers per pixel.
   In this work, we present the results of the GeoLifeCLEF 2021 competition which is part of the
LifeCLEF evaluation campaign [2] and co-hosted in Eighth Workshop on Fine-Grained Visual
Categorization (FGVC8)1 at CVPR 2021. This competition is the fourth GeoLifeCLEF challenge.
In the first two editions, GeoLifeCLEF 2018 [3] and GeoLifeCLEF 2019 [4], each observation was
associated only with environmental features given as vectors or patches extracted around the
observation. Like last year’s competition, GeoLifeCLEF 2020 [5], GeoLifeCLEF 2021 is aimed
at bridging the previously mentioned gaps by (i) sharing a large-scale dataset of observations
paired with high-resolution covariates and (ii) defining a common evaluation methodology to
measure the predictive performance of models trained on this dataset. The dataset is based
on over 1.9 million observations of plant and animal species. Each observation is paired with

   1
       https://sites.google.com/view/fgvc8
high-resolution remote sensing imagery – see Figure 1 – as well as traditional environmental
covariates (e.g. climate, altitude and soil variables). To the best of our knowledge, this is the
first publicly available dataset to pair remote sensing imagery with species observations. Our
hope is that this analysis-ready dataset and associated evaluation methodology will (i) make
SDM and related problems more accessible to machine learning researchers and (ii) facilitate
novel research in large-scale, high-resolution, and remote-sensing-based species distribution
modeling.


2. Dataset and evaluation protocol presentation
     Data collection. The data for this year’s challenge is the same as last year reorganized in a
more easy-to-use and compact format. A detailed description of the GeoLifeCLEF 2020 dataset
is provided in [6]. In a nutshell, it consists of 1,921,123 observations covering 31, 435 species
(mainly plants and animals) distributed across US (1,097,640) and France (823,483), as shown in
Figure 2. Each species observation is paired with high-resolution covariates (RGB-IR imagery,
land cover and altitude) as illustrated in Figure 1. These high-resolution covariates are resampled
to a spatial resolution of 1 meter per pixel and provided as 256 × 256 images covering a 256m ×
256m square centered on each observation. RGB-IR imagery come from the 2009-2011 cycle of
the National Agriculture Imagery Program (NAIP) for the US2 , and from the BD-ORTHO® 2.0
and ORTHO-HR® 1.0 databases from the IGN for France3 . Land cover data originates from the
National Land Cover Database (NLCD) [7] for the US and from CESBIO4 for France. All elevation
data comes from the NASA Shuttle Radar Topography Mission (SRTM)5 . In addition, the dataset
also includes traditional coarser resolution covariates: 19 bio-climatic rasters (30arcsec2 /pixel,
i.e., 1km2 /pixel, from WorldClim [8]) and 8 pedologic rasters (250m2 /pixel, from SoilGrids [9]).
The details of these rasters are given in Table 1.
   Train-test split. The full set of occurrences was split in a training and testing set using
a spatial block holdout procedure as illustrated in Figure 2. This limits the effect of spatial
auto-correlation in the data [10]. Using this splitting procedure, a model cannot perform well
by simply interpolating between training samples. The split was based on a global grid of 5km
× 5km quadrats. 2.5% of these quadrats were randomly sampled and the observations falling
in those formed the test set. 10% of those observations were used for the public leaderboard
on Kaggle while the remaining 90% allowed to compute the private leaderboard providing the
final results of the challenge. Similarly, another 2.5% of the quadrats were randomly sampled
to provide an official validation set. The remaining quadrats and their associated observations
were assigned to the training set.
   Evaluation metric. For each occurrence in the test set, the goal of the task was to return a
candidate set of species likely to be present at that location. Due to the presence-only [11] nature
of the observation data used during the evaluation of the methods, for each location in the test
set, we only have the knowledge of the presence of one species – the one observed – among the

    2
      https://www.fsa.usda.gov
    3
      https://geoservices.ign.fr
    4
      http://osr-cesbio.ups-tlse.fr/~oso/posts/2017-03-30-carte-s2-2016/
    5
      https://lpdaac.usgs.gov/products/srtmgl1v003/
Table 1
Summary of the low-resolution environmental variable rasters provided. The first 19 rows correspond
to the bio-climatic variables from WorldClim [8]. The last 8 rows correspond to the pedologic variables
from SoilGrid [9].
      Name       Description                                                           Resolution
        bio_1    Annual Mean Temperature                                                30 arcsec
        bio_2    Mean Diurnal Range (Mean of monthly (max temp - min temp))             30 arcsec
        bio_3    Isothermality (bio_2/bio_7) (* 100)                                    30 arcsec
        bio_4    Temperature Seasonality (standard deviation *100)                      30 arcsec
        bio_5    Max Temperature of Warmest Month                                       30 arcsec
        bio_6    Min Temperature of Coldest Month                                       30 arcsec
        bio_7    Temperature Annual Range (bio_5-bio_6)                                 30 arcsec
        bio_8    Mean Temperature of Wettest Quarter                                    30 arcsec
        bio_9    Mean Temperature of Driest Quarter                                     30 arcsec
      bio_10     Mean Temperature of Warmest Quarter                                    30 arcsec
      bio_11     Mean Temperature of Coldest Quarter                                    30 arcsec
      bio_12     Annual Precipitation                                                   30 arcsec
      bio_13     Precipitation of Wettest Month                                         30 arcsec
      bio_14     Precipitation of Driest Month                                          30 arcsec
      bio_15     Precipitation Seasonality (Coefficient of Variation)                   30 arcsec
      bio_16     Precipitation of Wettest Quarter                                       30 arcsec
      bio_17     Precipitation of Driest Quarter                                        30 arcsec
      bio_18     Precipitation of Warmest Quarter                                       30 arcsec
      bio_19     Precipitation of Coldest Quarter                                       30 arcsec
       orcdrc    Soil organic carbon content (g/kg at 15cm depth)                        250 m
      phihox     Ph x 10 in H20 (at 15cm depth)                                          250 m
       cecsol    cation exchange capacity of soil in cmolc/kg 15cm depth                 250 m
      bdticm     Absolute depth to bedrock in cm                                         250 m
       clyppt    Clay (0-2 micro meter) mass fraction at 15cm depth                      250 m
       sltppt    Silt mass fraction at 15cm depth                                        250 m
      sndppt     Sand mass fraction at 15cm depth                                        250 m
        bldfie   Bulk density in kg/m3 at 15cm depth                                     250 m


different ones which can actually be found all together at that point. To measure the precision
of the predicted sets while accommodating with this limited knowledge, a simple set-valued
classification [12] metric was chosen as the main evaluation criterion: top-30 error rate. Each
observation 𝑖 is associated with a single ground-truth label 𝑦𝑖 corresponding to the observed
species. For each observation, the submissions provided 30 candidate labels 𝑦^𝑖,1 , 𝑦^𝑖,2 , . . . , 𝑦^𝑖,30 .
The top-30 error rate is then computed using
                                𝑁
                                                    {︃
                            1 ∑︁                      1 if ∀𝑘 ∈ {1, . . . , 30}, 𝑦^𝑖,𝑘 ̸= 𝑦𝑖
       Top-30 error rate =         𝑒𝑖 where 𝑒𝑖 =
                            𝑁
                               𝑖=1
                                                      0 otherwise

Note that this evaluation metric does not try to correct the sampling bias [13] inherent to
present-only observation data (linked to the density of population, etc.). The absolute value of
the resulting figures should thus be taken with care. Nevertheless, this metric does allow to
compare the different approaches and to determine which type of input data and of models are
                          (a) US
                                                                   (b) France
Figure 2: Observations distribution over the US and France. Training observation data points are
shown in blue while test data points are shown in red.


useful for the species presence detection task.
   Course of the challenge. The training and test data were publicly shared in early March 2021
through the Kaggle platform6 . Any research team wishing to participate in the evaluation could
register on the platform and download the data. Each team could submit up to 3 submissions per
day to compete on the public leaderboard. A submission (also called a run in the next sections)
takes the form of a CSV file containing the top-30 predictions of the method being evaluated for
all observations in the test set. For each submission, the top-30 error rate was first computed
only on a subset of the test set to produce the public leaderboard which was visible to all the
participants while the competition was still running. Once the submission phase was closed
(mid-May), only 5 submissions per team were retained to compute the private leaderboard using
the rest of the test set. These submissions were either hand-picked by the team or automatically
chosen as the 5 best performing submissions on the public leaderboard. The participants could
then see the final scores of all the other participants on the private leaderboard as well as
their final ranking. Each participant was asked to provide a working note, i.e. a detailed report
containing all technical information required to reproduce the results of the submissions. All
LifeCLEF working notes were reviewed by at least two members of the LifeCLEF organizing
committee to ensure a sufficient level of quality and reproducibility.


3. Baseline methods
Based on last year’s competition, three baselines were provided by the organizers of the challenge
to serve as comparison references for the participants while developing their own methods.
They consisted in:

    • Top-30 most present species: a constant predictor returning always the same list of
   6
       https://www.kaggle.com/c/geolifeclef-2021/
      the most present species, i.e. the ones having the most occurrences in the training set.
    • RF on environmental variables: a Random Forest model trained on environmental
      feature vectors only, i.e. on the 27 climatic and soil variables extracted at the position of
      the observation.
    • CNN on 6-channels patches: this method [14] is the one that obtained the best result
      during GeoLifeCLEF 2020 competition [5]. It is based on a Convolution Neural Network
      trained on all the high-resolution image covariates, i.e. on 6-channel tensors composed of
      RGB-IR images, land cover image, and altitude image.


4. Participants and methods
Seven teams participated to the GeoLifeCLEF 2021 challenge and submitted a total of 26 submis-
sions: University of Melbourne, DUTH (Democritus University of Thrace), CONABIO (Comisión
Nacional para el Conocimiento y Uso de la Biodiversidad), UTFPR (Federal University of Tech-
nology – Paraná) as well as three participants for which we could not identify the affiliation and
which we denote here, respectively, as Team Alpha, Team Beta and Team Gamma. The results
are shown in Figure 3. Only the winning team (University of Melbourne) submitted a working
notes paper with a detailed description of the used methodology and models trained [15]. We
summarize hereafter the two main models they employed:

    • CNN on RGB patches (University of Melbourne - Run 1): this method is based on a
      Convolution Neural Network (ResNet50) trained on the RGB remote sensing images
      only. The model was first trained in an unsupervised way using MOCO on the dataset,
      a contrastive representation learning framework (for 20 epochs). The resulting model
      was then supervisely fine-tuned entirely (i.e., end-to-end) using 7 epochs of stochastic
      gradient descent (SGD) on the whole training set (validation set included). Three types of
      data augmentation were used to reduce overfitting: (i) random horizontal flip, (ii) random
      vertical flip, and (iii) RandAugment [16], an automated data augmentation optimizer.
    • Bi-modal CNN on RGB patches & altitude (University of Melbourne - Run 2): The
      CNN on RGB patches was combined with another CNN on the altitude patches (using the
      same ResNet50 architecture). This other CNN was also pre-trained in an unsupervised way
      similarly using MOCO as for the RGB-based model. The two models are then combined
      by concatenating the final bottleneck layer of the ResNet50 (i.e. the new layer contains
      4096 neurons instead of 2048 in the mono-modal CNN on RGB patches) and the global
      model was fine-tuned as before on fewer epochs. Most data augmentation was removed
      during training.


5. Global competition results analysis
The global results of the GeoLifeCLEF 2021 are shown in Figure 3. Generally speaking, CNN-
based models trained on high-resolution patches used in runs by University of Melbourne and
Team Alpha as well as in the baseline CNN (high resolution patches) are very competitive and
efficient compared to the traditional model (RF (environmental vectors)). This observation tends
                         1.00

     Top-30 error rate   0.95
                         0.90
                         0.85
                         0.80
                         0.75
                         0.70

                               0


                                                                                       2


                                                                                     n2

                                                                                     n1
                                                                                       1

                                                                                     n1


                                                                                       2
                                                                                       2


                                                                                       1


                                                                                       2


                                                                                       1


                                                                                     n1
                                                                                        e
                                    n2

                                                n1


                                                                                      s)
                                                                                        )
                                                         hes


                                                                                    lin
                                                                                  tor


                                                                                  un

                                                                                  un


                                                                                  un
                                                                                  un


                                                                                  un


                                                                                  un


                                                                                  un
                                                                                Ru


                                                                                Ru

                                                                                Ru


                                                                                Ru
                                   Ru

                                             Ru


                                                                                ase
                                                      atc


                                                                              -R

                                                                              -R


                                                                              -R
                                                                              -R


                                                                              -R


                                                                              -R


                                                                              -R
                                                                              vec


                                                                            tb
                                                                              -


                                                                           R-

                                                                           R-


                                                                            a-
                               e-

                                          e-

                                                      np


                                                                         TH

                                                                           ta

                                                                         TH

                                                                           ta
                                                                          ha


                                                                          ha


                                                                       BIO


                                                                       BIO
                                                                           al
                              n

                                            n


                                                                           n

                                                                      mm
                                                                       FP

                                                                       FP
                                                                       Be


                                                                       Be
                                                    tio


                                                                      ent
                          our

                                        our


                                                                         a
                                                           Alp


                                                                     Alp

                                                                    DU


                                                                    DU


                                                                     nst
                                                                    NA


                                                                    NA
                                                 olu


                                                                   UT

                                                                   UT


                                                                   Ga
                                                                  onm


                                                                    m


                                                                    m
                         elb

                                    elb


                                                                  Co
                                                         m


                                                                   m


                                                               Tea


                                                               Tea

                                                               CO


                                                               CO
                                                res
             fM

                                   fM


                                                                m
                                                      Tea


                                                              Tea
                                                              vir


                                                            Tea
                                              h
      yo

                              yo


                                                           (en
                                          hig
  rsit

                          rsit

                                        N(


                                                           RF
ive

                         ive

                                    CN
Un

                Un


Figure 3: Results of the GeoLifeCLEF 2021 task. The top-30 error rates of the submissions of each
participant are shown in blue. The provided baselines are shown in orange.


to show that (i) important information explaining the species composition is contained in the
high-resolution patches, and, (ii) convolutional neural networks are able to capture and exploit
this information.
    One question raised by the challenge is how to properly aggregate the different variables
provided as input. Adding altitude data to the model (University of Melbourne - Run 2) provides
an improvement in prediction accuracy backing the intuition that this variable is informative of
the species distribution. However, aggregating all the variables does not mechanically lead to
higher performance: CNN (high resolution patches) makes use of the additional land cover data
but its performance is not as good as the two runs from University of Melbourne. It seems that
it is important not to aggregate the features representation of those variables too early in the
architectures of the networks: concatenation of higher-level features (University of Melbourne -
Run 2) is more efficient than early aggregation (CNN (high resolution patches)).


6. Complementary analysis
In this section, we provide complementary analyses of the submitted results. These analyses
are based on the complete predicted scores (the logits for every class of every test occurrence)
gently provided by the University of Melbourne team [15] after the competition for both of
their models. As the dataset used this year is the same as for GeoLifeCLEF 2020, some of the
conclusions of last year’s overview paper [5] still hold and we refer the reader to it. Here, we
focus on other aspects of the dataset not considered previously.
                                1.0
                                                                         top-K
                                0.8                                      average-K


               set error rate
                                0.6

                                0.4

                                0.2

                                0.0 0
                                   10   101        102           103         104
                                              (average) set size K
Figure 4: Comparison of the top-𝐾 error rate (blue) and average-𝐾 error rate (orange) metrics on the
predictions of the bi-modal CNN of [15]. In dashed red, the vertical line 𝐾 = 30 corresponding to the
value of 𝐾 used in this year’s challenge is displayed. Both metrics are nearly indistinguishable with
the provided predicted scores.


   Comparison of set-valued classification metrics. As stated previously, the task consid-
ered here fits in the framework of set-valued classification [12]. A set of species is present at
each location, the model tries to predict this set, however, we are only given a single of those
present species to evaluate the performance of the model. This year, the top-30 error rate was
chosen as the main evaluation metric. This metric however considers that, at each location, the
same number of species will be present, implicitly assuming that the species richness is uniform
over the geographical extent considered. To surpass this limitation, we can relax this constraint
of predicting always sets of size 𝐾 by allowing to predict adaptive sets which size in average is
equal to 𝐾. This type of set-valued classification is known as average-𝐾 classification [17]. In
Figure 4, we compare top-𝐾 error rate to average-𝐾 error rate for all values of 𝐾. Average-𝐾
error rate is computed by finding the appropriate threshold on the scores which will result in
sets of average size 𝐾 on the test set. We refer the reader to [17, 12] for more details. On the
comparison plot, average-𝐾 error rate can only improve upon top-𝐾 error rate. However, in
our case, both curves are indistinguishable. Meaning that either, average-𝐾 does not capture the
proper heterogeneity of the samples between each location [17] or that the predicted scores are
not good enough to estimate it properly. We tried additional probability calibration procedures
such as temperature scaling [18] to improve the average-𝐾 predictions but it did not provide a
significant edge. Further investigation is required to understand why average-𝐾 set classifiers
do not perform better on this dataset.
   Kingdoms analysis. Most species are animals (16,328) and plants (13,035). Due to an
omission during the construction of the dataset, other kingdoms are also represented in the US,
accounting for 2,072 species (mainly Fungi) but representing only 1.5% of the observations. They
are thus negligible. In Figure 5, we detail the performance of the bi-modal CNN of [15] for these
different kingdoms. The main evaluation metric consists of all the kingdoms mixed together.
For the separate kingdoms metrics, only the test points which observed species belonging to
this specific kingdom are considered. If we directly compute the metrics on the top-30 sets
                      50%                                                       50%

    top-30 accuracy   40%                                                       40%


                                                              top-30 accuracy
                      30%                                                       30%

                      20%                                                       20%

                      10%                                                       10%

                      0%                                                        0%
                            all   animals   plants   others                           all   animals   plants   others
                                      kingdoms                                                  kingdoms
    (a) Single top-30 sets used for all kingdoms.                 (b) Separate top-30 sets for each kingdom.
Figure 5: Comparison of the top-30 accuracy on the different kingdoms of species present in the dataset
based on the predictions of the bi-modal CNN of [15]. In Figure 5a, the top-30 sets are predicted for
each test observation and the error rate is then computed separately for all the observations belonging
to the different kingdoms. In Figure 5b, for each test observation, only the species belonging to the
kingdom of the observed species are used to build the top-30 sets resulting in different top-30 sets for
each kingdoms.


predicted as in Figure 5a, it would seem that the task is harder for animals than for plants.
However, this gives a biased image of the difficulty of the task. A better approach is to compute
separate top-30 sets for each kingdom by retaining only the species belonging to it. In this case,
the predicted top-30 sets at the same location are thus different for each kingdom. The resulting
error rates are shown in Figure 5b. In that case, animals, although containing more species
and fewer observations, seem easier to predict than plants. Surprisingly, although the dataset
contains few (28,573) observations from other kingdoms, they seem to be easier to predict.
This is actually likely to be simply an artefact of the metric: as they are relatively few species
compared to the other kingdoms, taking the top-30 most probable species is probably a too
large and not very informative set. This analysis suggests that predicting separate top-𝐾 sets
for each kingdom and use these sets to derive a new evaluation metric is worth considering.
Note that, this might not be as direct as it seems. Indeed, it could be important to use different
values of 𝐾 for the different kingdoms depending of the number of species they contain and
the likelihood to find several of them present at the same location. This would thus require
some tweaking.
   Late-fusion of models. We test the complementarity of the predictions made by the different
methods by aggregating their output (the logits) in a late-fusion manner: their outputs are
averaged before computing the top-30 most confident species. Here, we focus on the runs
submitted by team University of Melbourne as well as the baseline CNN (high resolution patches).
The results are shown in Figure 6. As explained previously, one important difference between
the baseline and the runs submitted is the way the variables are aggregated. Surprisingly,
fusing the predictions of the baseline method with either the uni-modal or bi-modal model from
University of Melbourne actually results in top-30 sets of significantly similar accuracy. This
seems to suggest that, beyond the aggregation method, the unsupervised pre-training used by
University of Melbourne might be particularly helpful for RGB data but less for altitude data
                         early agg. / all


                        late agg. / RGB


                late agg. / RGB + alti.

                         early agg. / all
                     + late agg. / RGB

                         early agg. / all
              + late agg. / RGB + alti.

                       late agg. / RGB
              + late agg. / RGB + alti.
                         early agg. / all
                     + late agg. / RGB
              + late agg. / RGB + alti.
                                        0%   5%   10%      15%       20%   25%
                                                     top-30 accuracy
Figure 6: Comparison of top-30 accuracy of the late-fusion of the CNN-based models using different
covariates and different aggregation methods.


which is already fairly well exploited by the baseline. Another interesting observation is that
combining the uni-modal and bi-modal models from [15] provides a little additional gain of
about 2% This might come up as a surprise: the bi-modal model consists essentially of the
uni-modal model with additional altitude data. This additional gain might come from three
factors:
    • the slight differences in the training of uni-modal and bi-modal models, i.e. data augmen-
      tation and training time;
    • the late aggregation of the altitude data with RGB images might not be perfectly suited and
      adding a second layer of aggregation via late-fusion might allow to recover it additional
      complementary information;
    • the variance reduction due to ensembling of similar models with high bias.
Finally, late-fusing all three models provides yet another non-negligible gain, highlighting the
complementary of those different models. Further investigation could understand better where
this complementary originates and help build better models.


7. Conclusion
The main challenges raised by this year’s challenge are two folds:
    • How to properly aggregate the different covariates provided?
    • How much complementary or redundant is the information contained in the high-
      resolution patches to the one captured from the bioclimatic and soil variables?
Moreover, there remains considerable room for improvement on this challenge as the winning
solution does not make use of all the different patches provided and its top-30 error rate is still
high, near 75% error rate.


Acknowledgement
This project has received funding from the French National Research Agency under the Invest-
ments for the Future Program, referred as ANR-16-CONV-0004 and from the European Union’s
Horizon 2020 research and innovation program under grant agreement No 863463 (Cos4Cloud
project).


References
 [1] J. Elith, J. R. Leathwick, Species Distribution Models: Ecological Explanation and Prediction
     Across Space and Time, Annual Review of Ecology, Evolution, and Systematics (2009).
 [2] A. Joly, H. Goëau, S. Kahl, L. Picek, T. Lorieul, E. Cole, B. Deneu, M. Servajean, A. Durso,
     I. Bolon, H. Glotin, R. Planqué, R. Ruiz De Castañeda, W.-P. Vellinga, H. Klinck, T. Denton,
     I. Eggel, P. Bonnet, H. Müller, Overview of lifeclef 2021: an evaluation of machine-learning
     based species identification and species distribution prediction, in: Proceedings of the
     Twelfth International Conference of the CLEF Association (CLEF 2021), 2021.
 [3] C. Botella, P. Bonnet, F. Munoz, P. Monestiez, A. Joly, Overview of GeoLifeCLEF 2018:
     location-based species recommendation, CLEF: Conference and Labs of the Evaluation
     Forum (2018).
 [4] C. Botella, M. Servajean, P. Bonnet, A. Joly, Overview of GeoLifeCLEF 2019: plant species
     prediction using environment and animal occurrences, CLEF: Conference and Labs of the
     Evaluation Forum (2019).
 [5] B. Deneu, T. Lorieul, E. Cole, M. Servajean, C. Botella, D. Morris, N. Jojic, P. Bonnet, A. Joly,
     Overview of LifeCLEF location-based species prediction task 2020 (GeoLifeCLEF), in:
     CLEF task overview 2020, CLEF: Conference and Labs of the Evaluation Forum, Sep. 2020,
     Thessaloniki, Greece., 2020.
 [6] E. Cole, B. Deneu, T. Lorieul, M. Servajean, C. Botella, D. Morris, N. Jojic, P. Bonnet, A. Joly,
     The GeoLifeCLEF 2020 dataset, arXiv preprint arXiv:2004.04192 (2020).
 [7] C. Homer, J. Dewitz, L. Yang, S. Jin, P. Danielson, G. Xian, J. Coulston, N. Herold, J. Wickham,
     K. Megown, Completion of the 2011 national land cover database for the conterminous
     united states – representing a decade of land cover change information, Photogrammetric
     Engineering & Remote Sensing 81 (2015) 345–354.
 [8] R. J. Hijmans, S. E. Cameron, J. L. Parra, P. G. Jones, A. Jarvis, Very high resolution
     interpolated climate surfaces for global land areas, International Journal of Climatology:
     A Journal of the Royal Meteorological Society 25 (2005) 1965–1978.
 [9] T. Hengl, J. M. de Jesus, G. B. Heuvelink, M. R. Gonzalez, M. Kilibarda, A. Blagotić, W. Shang-
     guan, M. N. Wright, X. Geng, B. Bauer-Marschallinger, et al., Soilgrids250m: Global gridded
     soil information based on machine learning, PLoS one 12 (2017).
[10] D. R. Roberts, V. Bahn, S. Ciuti, M. S. Boyce, J. Elith, G. Guillera-Arroita, S. Hauenstein, J. J.
     Lahoz-Monfort, B. Schröder, W. Thuiller, et al., Cross-validation strategies for data with
     temporal, spatial, hierarchical, or phylogenetic structure, Ecography 40 (2017) 913–929.
[11] J. L. Pearce, M. S. Boyce, Modelling distribution and abundance with presence-only data,
     Journal of applied ecology 43 (2006) 405–412.
[12] E. Chzhen, C. Denis, M. Hebiri, T. Lorieul, Set-valued classification–overview via a unified
     framework, arXiv preprint arXiv:2102.12318 (2021).
[13] S. J. Phillips, M. Dudík, J. Elith, C. H. Graham, A. Lehmann, J. Leathwick, S. Ferrier, Sample
     selection bias and presence-only distribution models: implications for background and
     pseudo-absence data, Ecological applications 19 (2009) 181–197.
[14] B. Deneu, M. Servajean, A. Joly, Participation of LIRMM / Inria to the GeoLifeCLEF 2020
     challenge, in: CLEF working notes 2020, CLEF: Conference and Labs of the Evaluation
     Forum, Sep. 2020, Thessaloniki, Greece., 2020.
[15] S. Seneviratne, Contrastive representation learning for natural world imagery: Habitat
     prediction for 30,000 species, in: CLEF working notes 2021, CLEF: Conference and Labs of
     the Evaluation Forum, Sep. 2021, Bucharest, Romania., 2021.
[16] E. D. Cubuk, B. Zoph, J. Shlens, Q. V. Le, Randaugment: Practical automated data aug-
     mentation with a reduced search space, in: Proceedings of the IEEE/CVF Conference on
     Computer Vision and Pattern Recognition Workshops, 2020, pp. 702–703.
[17] T. Lorieul, Uncertainty in predictions of deep learning models for fine-grained classification,
     Ph.D. thesis, Université de Montpellier (UM), FRA., 2020.
[18] C. Guo, G. Pleiss, Y. Sun, K. Q. Weinberger, On calibration of modern neural networks, in:
     International Conference on Machine Learning, PMLR, 2017, pp. 1321–1330.

</pre>