<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Overview of LifeCLEF location-based species prediction task 2020 (GeoLifeCLEF)</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Benjamin Deneu</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff4">4</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Titouan Lorieul</string-name>
          <xref ref-type="aff" rid="aff4">4</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Elijah Cole</string-name>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Maximilien Servajean</string-name>
          <xref ref-type="aff" rid="aff5">5</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Christophe Botella</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Pierre Bonnet</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Alexis Joly</string-name>
          <xref ref-type="aff" rid="aff4">4</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>AMAP, Univ Montpellier</institution>
          ,
          <addr-line>CIRAD, CNRS, INRAE, IRD, Montpellier</addr-line>
          ,
          <country country="FR">France</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>CIRAD, UMR AMAP</institution>
          ,
          <addr-line>F-34398 Montpellier</addr-line>
          ,
          <country country="FR">France</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>CNRS</institution>
          ,
          <addr-line>LECA</addr-line>
          ,
          <country country="FR">France</country>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>Caltech</institution>
          ,
          <addr-line>Pasadena</addr-line>
          ,
          <country country="US">US</country>
        </aff>
        <aff id="aff4">
          <label>4</label>
          <institution>INRIA, UMR LIRMM</institution>
          ,
          <addr-line>Univ Montpellier</addr-line>
          ,
          <country country="FR">France</country>
        </aff>
        <aff id="aff5">
          <label>5</label>
          <institution>LIRMM, Universite Paul Valery, University of Montpellier</institution>
          ,
          <addr-line>CNRS, Montpellier</addr-line>
          ,
          <country country="FR">France</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Understanding the geographic distribution of species is a key concern in conservation. By pairing species occurrences with environmental features, researchers can model the relationship between an environment and the species which may be found there. To advance the stateof-the-art in this area, a large-scale machine learning competition called GeoLifeCLEF 2020 was organized. It relied on a dataset of 1.9 million species observations paired with high-resolution remote sensing imagery, land cover data, and altitude, in addition to traditional low-resolution climate and soil variables. This paper presents an overview of the competition, synthesizes the approaches used by the participating groups, and analyzes the main results. In particular, we highlight the ability of remote sensing imagery and convolutional neural networks to improve predictive performance, complementary to traditional approaches.</p>
      </abstract>
      <kwd-group>
        <kwd>LifeCLEF</kwd>
        <kwd>biodiversity</kwd>
        <kwd>environmental data</kwd>
        <kwd>species distribution</kwd>
        <kwd>evaluation</kwd>
        <kwd>benchmark</kwd>
        <kwd>species distribution models</kwd>
        <kwd>methods comparison</kwd>
        <kwd>presence-only data</kwd>
        <kwd>model performance</kwd>
        <kwd>prediction</kwd>
        <kwd>predictive power</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>In order to make informed conservation decisions it is essential to understand
where di erent species live. Citizen science projects now generate millions of
geolocated species observations every year, covering tens of thousands of species.
But how can these point observations be used to predict what species might be
found at a new location?</p>
      <p>Copyright c 2020 for this paper by its authors. Use permitted under Creative
Commons License Attribution 4.0 International (CC BY 4.0)</p>
      <p>
        A common approach is to build a species distribution model (SDM) [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], which
uses a location's environmental covariates (e.g. temperature, elevation, land
cover) to predict which species may be found there. Once trained, the model
can be used to make predictions for any location where those covariates are
available.
      </p>
      <p>Developing an SDM requires a dataset where each species occurrence is
paired with a collection of environmental covariates. However, many existing
SDM datasets are both highly specialized and not readily accessible, having been
assembled by scientists studying particular species or regions. In addition, the
provided environmental covariates are typically coarse, with resolutions ranging
from hundreds of meters to kilometers per pixel.</p>
      <p>In this work, we present the results of the GeoLifeCLEF 2020 competition.
This competition aimed at bridging these gaps by (i) sharing a large-scale dataset
of observations paired with high-resolution covariates and (ii) de ning a common
evaluation methodology to measure the predictive performance of models trained
on this dataset. The dataset is based on over 1.9 million observations of plant and
animal species. Each observation is paired with high-resolution satellite imagery
as well as traditional environmental covariates (e.g. climate, altitude and soil
variables). To the best of our knowledge, this is the rst publicly available dataset
to pair remote sensing imagery with species observations. Our hope is that this
analysis-ready dataset and associated evaluation methodology will (i) make the
SDM problem more accessible to machine learning researchers and (ii) facilitate
novel research in large-scale, high-resolution, and remote-sensing-based species
distribution modeling.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Dataset and Evaluation Protocol</title>
      <p>
        Data collection: A detailed description of the GeoLifeCLEF 2020 dataset is
provided in [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. For completeness, we give a brief description here. The
GeoLifeCLEF 2020 dataset consists of 1,921,123 observations from the US (1,097,640)
and France (823,483) covering 31; 435 plant and animal species. Each species
observation is paired with high-resolution covariates (RGB-IR imagery, land cover
and altitude) as illustrated in Figure 1. These high-resolution covariates are
resampled to a spatial resolution of 1 meter per pixel and provided as 256 256
images covering a 256m 256m square centered on each observation. RGB-IR
imagery come from the 2009-2011 cycle of the National Agriculture Imagery
Program (NAIP) for the U.S.1, and from the BD-ORTHO R 2.0 and
ORTHOHR R 1.0 databases from the IGN for France2. Land cover data originates from
the National Land Cover Database (NLCD) [7] for the U.S. and from CESBIO3
for France. All elevation data comes from the NASA Shuttle Radar Topography
Mission (SRTM)4. In addition, the dataset also includes traditional coarser
resolution covariates: 19 bio-climatic rasters (30arcsec2/pixel (above 1km2/pixel)
from WorldClim [6]) and 8 pedologic rasters (250m2/pixel, from SoilGrids [5]).
The details on these rasters are given in Table 1.
      </p>
      <p>Train-test split: The full set of occurrences was split in a training and
testing set using a spatial block holdout procedure (see Figure 2). This limits the
e ect of spatial auto-correlation in the data as explained in [8]. This means that
a model cannot achieve a high performance by simply interpolating between
training samples. The split was based on a global grid of 5 km 5 km quadrats.
2.5% of the quadrats were randomly sampled for the test set, and the remaining
quadrats were assigned to the training set.</p>
      <p>
        Evaluation metric: For each occurrence in the test set, the goal of the task
was to return a candidate set of species with associated con dence scores. The
main evaluation criterion is an adaptive variant of the top-K accuracy. Contrary
to a classical top-K accuracy, this metric accounts for the fact that the number of
species K may not be the same at each location. It is computed by thresholding
the con dence score of the predictions and keeping only the species above that
threshold. The threshold is determined automatically so as to have K = 30
1 National Agriculture Image Program, https://www.fsa.usda.gov
2 https://geoservices.ign.fr
3 http://osr-cesbio.ups-tlse.fr/~oso/posts/2017-03-30-carte-s2-2016/
4 https://lpdaac.usgs.gov/products/srtmgl1v003/
results per occurrence on average on the test set. Traditional top-K accuracy
with K = 30 is used as secondary evaluation metric. See [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] for full details and
justi cation.
      </p>
      <p>Course of the challenge: The training data was publicly shared in early
April 2020 through the AICrowd platform5. Any research team wishing to
participate in the evaluation could register on the platform and download the data.
The test data was shared a few weeks later but without the species labels, which
were kept secret. Each team could then submit up to 10 submissions
corresponding to di erent methods or di erent settings of the same method. A submission
(also called a run) takes the form of a CSV le containing the predictions of the
method being evaluated for all observations in the test set. For each submission,
the evaluation metrics are computed and made visible to the participant. Once
the submission phase was closed (mid-June), the participants could also see the
evaluation metric values of the other participants. As a last important step, each
5 https://www.aicrowd.com/
(a) US
(b) France
participant was asked to provide a working note, i.e. a detailed report containing
all technical information required to reproduce the results of the submissions.
All LifeCLEF working notes are reviewed by at least two members of LifeCLEF
organizing committee to ensure a su cient level of quality and reproducibility.
3</p>
      <p>Participants and methods
40 participants registered for the GeoLifeCLEF 2020 challenge and downloaded
the dataset. Only two teams succeeded in submitting results in the end:
Stanford and LIRMM. A major obstacle to participation was the volume of data
as well as the computing power needed to train a model. For instance, it took
the LIRMM team almost two weeks to train a convolutional neural network on
the full dataset using 8 GPUs. The details of the methods and systems used by
the two participants are summarized below.</p>
      <p>{ LIRMM: This research team uploaded four submissions to the AICrowd
platform but they reported problems for two of them afterwards, so we only
report the correct ones here:</p>
      <p>Submission 1 - Random forest trained on environmental feature vectors
only (i.e. on the 27 climatic and soil variables).</p>
      <p>Submission 3 - Convolution neural network trained on high-resolution
image covariates (i.e. on 6-channel tensors composed of RGB-IR images,
land cover image and altitude image).</p>
      <p>
        More information about the used methods can be found in the individual
working note of this team [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ].
{ Stanford: This research team uploaded ve submissions to the AICrowd
platform. Most of the submissions were based on deep neural networks, but
the authors informed us that they encountered convergence issues resulting
in performance poorer than expected. Thus, it was mutually agreed that
they would not provide a working note describing their method in detail.
Only one of their submissions (referred as Submission 3) was valid. This was
a baseline method that always predicted the list of the most frequent species
in the training set.
4
      </p>
    </sec>
    <sec id="sec-3">
      <title>Global results of the evaluation</title>
      <p>In Table 2, we report the performance measured for each of the 8
submissions. The main outcome is that the method achieving the best results (LIRMM
Submission 3) was based solely on a convolutional neural network (CNN) trained
on the high-resolution covariates (RGB-IR imagery, land cover, and altitude). It
did not make use of any bioclimatic or soil variables, which are often considered
to be the most informative in the ecological literature. On the contrary, LIRMM
Submission 1 was a machine learning method classically used for species
distribution models [4] trained solely on the climatic and soil variables. This shows
two things: (i) important information explaining the species composition is
contained in the high-resolution covariates, and, (ii) convolutional neural networks
are able to capture this information. The performance achieved by the baseline
predictor of Stanford shows that the other methods are consistently better than
just returning the most common species everywhere.
5</p>
    </sec>
    <sec id="sec-4">
      <title>Complementary analysis</title>
      <p>In this section we provide complementary analyses of the submitted results
focusing on certain aspects of the dataset. In particular, we will consider the two
main methods submitted which we denote:
{ RF (env.): the model from LIRMM Submission 1 consisting of a random
forest trained solely on environmental variables;
{ CNN (high res.): the model from LIRMM Submission 3 consisting of a
CNN trained on the high-resolution covariates.
1.0
0.9
0.8
sec0.7
n
e
rru0.6
c
c
o
fo0.5
n
ito0.4
r
o
p
ro0.3
p
0.2
0.1
0.0
0.0
0.4 0.5 0.6
proportion of species
0.1
0.2
0.3
0.7
0.8
0.9
1.0</p>
      <p>Species-wise performance: First, we study the performance of the models
depending on species frequencies. The dataset contains 31; 435 species distributed
according to a long tail distribution which can be seen in Figure 3. For example,
the curve shows that the most common 10% of species represent more than 80%
of the occurrences. On the other hand, the least common 70% of species account
for fewer than 5% of the occurrences. In such long-tailed contexts, metrics that
are averaged over all observations can seem satisfactory while the model only
performs well on a few common species. To evaluate model performance across
all species, we computed the top-30 accuracy for each species independently and
then averaged the resulting scores.</p>
      <p>The results, presented in Figure 4, show that the top-30 accuracy averaged
over the species is much lower than the one averaged over the occurrences. This
is expected because the average over species gives much more weight to less
represented species. However, it is interesting to compare how this a ects each
of the two models. The CNN drops from an accuracy of 23:5% to an accuracy
of 13:2% with a 44% relative loss. The random forest falls from 20:4% to 6:9%
with a 66% relative loss. RF (env.) is thus more a ected by the average over
species. To better understand this, Figure 5 shows the performance of the two
models on each species of the test set ranked by their frequency in the training
set (with an adaptive sliding average for display purposes). It shows that the</p>
      <p>CNN outperforms the random forest on nearly every species. The random forest
is slightly better for only the most frequent species.</p>
      <p>Figure 6 shows a similar graph but instead of ranking the species by their
frequency, species are ranked by the average top-30 accuracy achieved by the
model. On this gure we can see how many species are predicted by the model
with a top-30 accuracy over a given threshold. We can see in particular that for
the CNN (high res.) model, 2; 300 species over the 7; 794 species are predicted
with a top-30 accuracy greater than 0. In comparison, only 1; 100 have a
top30 accuracy greater than 0 when using the RF (env.) model. Moreover, we can
observe that the CNN (high res.) model is able to perfectly predict (with a
score of 1) more species than the RF (env.) model. The CNN (high res.) model
perfectly predicts near 400 species of the test set whereas as the RF (env.) model
only predicts 150 species perfectly.</p>
      <p>Analysis by kingdom and geographic area: As the dataset covers France
and US and contains both plants and animals species, it is interesting to compare
the accuracy obtained by the models over these criteria. The resulting top-30
accuracy values are provided in Figure 7. Concerning the prediction of plants
vs. animals, both models have similar results as shown in Figure 7a. Both have
a slightly better prediction on plants, which can in part be explained by the
greater number of occurrences per plant species than animals. Concerning the
geographical area, Figure 7b shows that the performance is globally lower in
France, in particular for animals where the performance is dramatically low. The
Table 3 gives the average number of occurrences per species by kingdoms and
regions. It can be noted that the very poor prediction performance on animals
in France may be due to the low number of occurrences per animal species,
on average 12:6. The lower performance on plants in France, however, is more
di cult to explain since the average number of occurrences per species is pretty
high (238). A possible interpretation could be that the lack of animal occurrences
globally degrades the performance of the model, for instance because animals
could be stronger markers of the habitat. But this could be due to other reasons
such as distribution of occurrences over species or to macro-ecological di erences
between US and France.
Fusion of the predictions of the high-resolution CNN and classical
punctual environmental model: As discussed in previous sections, an
important outcome of this evaluation is that the CNN (high res.) model performs
better than the more classical RF (env.) model which means that the CNN (high
res.) model is able to capture important information explaining the species
distribution from the high-resolution covariates. Now, an important question is to
what extent this information is complementary to the information extracted by</p>
      <p>Fig. 7: Top-30 accuracy per kingdom and geographical area
23.5%</p>
      <p>25.5%
the RF (env.) model. The RF (env.) model is based solely on climatic and soil
variables which are classically used in ecology to model the niche of the species,
i.e. their environmental preferences. It is likely that these ecological preferences
may also be partially inferred by the CNN (high res.) from the high-resolution
covariates. For instance, it can recognize a particular habitat from the topology,
landscape, or the forest's canopy. But it may also miss some important
bioclimatic factors that cannot be inferred from the chosen high-resolution covariates.
To assess how much complementary information is captured by the CNN (high
res.) and RF (env.) models, we computed the accuracy of a simple fusion
approach consisting of averaging the predicted probabilities of each model. In
practice, we rst had to renormalize the probability values provided in the the
submitted CSV les because the number of predictions was limited to 150 per
observation because of storage issues. The result of the fusion approach compared to
each model alone is presented in Figure 8. It shows that the top-30 accuracy of
the fusion approach is consistently better (25:5%). This result suggests that the
models capture some distinct information and that more advanced methods for
combining the high-resolution variables with the environmental rasters should
be explored.
6</p>
    </sec>
    <sec id="sec-5">
      <title>Discussion and Conclusion</title>
      <p>The main outcomes of the evaluation conducted in this paper are related to the
comparison of two radically di erent approaches: one approach based on
highresolution satellite imagery and convolutional networks, CNN (high res.), and
one more classical approach based on bioclimatic and soil variables processed
with a random forest model, RF (env.). Beyond the slightly better performance
of the CNN (high res.) model, a more in-depth study of the species
predictions reveals important di erences between the models. The top-30 accuracy
per species, in particular, reveals a much larger performance gap. Moreover, by
comparing the predictions species by species according to their frequency, we
observed that the CNN is actually much better for the large majority species, and
particularly for the less represented ones. This raises several points questions
related to model evaluation. In the context of species distribution studies, it seems
particularly important to be able to predict the distribution of rare species,
especially for protection and conservation purposes. However, a raw evaluation
of the performance of the models on occurrences is biased by the long-tailed
distribution of occurrences per species. The CNN (high res.) model trained on
high-resolution covariates is both better on less represented species and able to
predict more species perfectly than the RF learned on environmental variables,
while it is slightly less e cient on more frequent species. Random forest tends to
predict mostly the most frequent species. Even if the random forest's prediction
is good on average on the test occurrences, it is clearly less relevant than the
predictions of the CNN. Unfortunately, as both models were trained on di erent
data, it is di cult to determine whether the origin of this di erence lies in the
model structure or the input data. It is important to note, however, that the
high spatial resolution data has made it possible to learn a model capable of
rivaling and even surpassing a model derived from a more classical approach
learned on environmental variables. In addition, combining two approaches with
late fusion produces a gain in performance, indicating that the models have
captured complementary information. The use of high-resolution data seems to
be an interesting way to learn models with high predictive power. None of the
models submitted by participants were able to use the high and low resolution
data together. However, if even simple late fusion improves performance, it is
likely that a model trained on both data sources simultaneously will provide
even better performance.</p>
    </sec>
    <sec id="sec-6">
      <title>Acknowledgement</title>
      <p>This project has received funding from the French National Research Agency
under the Investments for the Future Program, referred as ANR-16-CONV-0004
and from the European Union's Horizon 2020 research and innovation program
under grant agreement No 863463 (Cos4Cloud project).
4. Evans, J.S., Murphy, M.A., Holden, Z.A., Cushman, S.A.: Modeling species
distribution and change using random forest. In: Predictive species and habitat modeling
in landscape ecology, pp. 139{159. Springer (2011)
5. Hengl, T., de Jesus, J.M., Heuvelink, G.B., Gonzalez, M.R., Kilibarda, M., Blagotic,
A., Shangguan, W., Wright, M.N., Geng, X., Bauer-Marschallinger, B., et al.:
Soilgrids250m: Global gridded soil information based on machine learning. PLoS one
12(2) (2017)
6. Hijmans, R.J., Cameron, S.E., Parra, J.L., Jones, P.G., Jarvis, A.: Very high
resolution interpolated climate surfaces for global land areas. International Journal
of Climatology: A Journal of the Royal Meteorological Society 25(15), 1965{1978
(2005)
7. Homer, C., Dewitz, J., Yang, L., Jin, S., Danielson, P., Xian, G., Coulston, J.,
Herold, N., Wickham, J., Megown, K.: Completion of the 2011 national land cover
database for the conterminous united states { representing a decade of land cover
change information. Photogrammetric Engineering &amp; Remote Sensing 81(5), 345{
354 (2015)
8. Roberts, D.R., Bahn, V., Ciuti, S., Boyce, M.S., Elith, J., Guillera-Arroita, G.,
Hauenstein, S., Lahoz-Monfort, J.J., Schroder, B., Thuiller, W., et al.:
Crossvalidation strategies for data with temporal, spatial, hierarchical, or phylogenetic
structure. Ecography 40(8), 913{929 (2017)</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Cole</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Deneu</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lorieul</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Servajean</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Botella</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Morris</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jojic</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bonnet</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Joly</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>The GeoLifeCLEF 2020 dataset</article-title>
          . arXiv preprint arXiv:
          <year>2004</year>
          .
          <volume>04192</volume>
          (
          <year>2020</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Deneu</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Servajean</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Joly</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Participation of LIRMM / Inria to the GeoLifeCLEF 2020 challenge</article-title>
          . In: CLEF working notes
          <year>2020</year>
          ,
          <article-title>CLEF: Conference and Labs of the Evaluation Forum</article-title>
          , Sep.
          <year>2020</year>
          , Thessaloniki,
          <string-name>
            <surname>Greece.</surname>
          </string-name>
          (
          <year>2020</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Elith</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Leathwick</surname>
            ,
            <given-names>J.R.</given-names>
          </string-name>
          : Species Distribution Models:
          <article-title>Ecological Explanation and Prediction Across Space and Time</article-title>
          . Annual Review of Ecology, Evolution, and
          <string-name>
            <surname>Systematics</surname>
          </string-name>
          (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>