<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>Run</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Participation of LIRMM / Inria to the GeoLifeCLEF 2020 challenge</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Benjamin Deneu</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Maximilien Servajean</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Pierre Bonnet</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Francois Munoz</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Alexis Joly</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>INRIA</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Zenith Team</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>UMR LIRMM</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Univ Montpellier</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>France.</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Univ Montpellier</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>CIRAD</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>INRAE</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Montpellier</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>France.</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>CIRAD, UMR AMAP</institution>
          ,
          <addr-line>F-34398 Montpellier</addr-line>
          ,
          <country country="FR">France</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>LIRMM, Universite Paul Valery, Univ Montpellier</institution>
          ,
          <addr-line>CNRS, Montpellier</addr-line>
          ,
          <country country="FR">France</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Universite Grenoble Alpes</institution>
          ,
          <addr-line>38400 Saint-Martin-d'Heres</addr-line>
          ,
          <country country="FR">France</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2020</year>
      </pub-date>
      <volume>3</volume>
      <issue>0</issue>
      <abstract>
        <p>This paper describes the methods that we have implemented in the context of the GeoLifeCLEF 2020 machine learning challenge. The goal of this challenge is to advance the state-of-the-art in locationbased species recommendation on a very large dataset of 1.9 million species observations, paired with high-resolution remote sensing imagery, land cover data, and altitude. We provide a detailed description of the algorithms and methodology, developed by the LIRMM / Inria team, in order to facilitate the understanding and reproducibility of the obtained results.</p>
      </abstract>
      <kwd-group>
        <kwd>LifeCLEF</kwd>
        <kwd>biodiversity</kwd>
        <kwd>environmental data</kwd>
        <kwd>species distribution</kwd>
        <kwd>evaluation</kwd>
        <kwd>benchmark</kwd>
        <kwd>Species Distribution Models</kwd>
        <kwd>methods comparison</kwd>
        <kwd>presenceonly data</kwd>
        <kwd>model performance</kwd>
        <kwd>prediction</kwd>
        <kwd>predictive power</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        in a geographical area where they could not develop. The growing demand of
geolocation-based biodiversity service, has led the development of the
GeoLifeCLEF challenge [
        <xref ref-type="bibr" rid="ref4 ref6">6, 4</xref>
        ], in the context of the LifeCLEF[] evaluation campaign.
      </p>
      <p>This challenge is linked to a particular type of species distribution models
(SDM) where the objective is to predict the species most likely to be present
at a given point. The models do not predict presence or probability of presence,
nor densities, but a relative probability of presence of the species in relation to
each other.</p>
      <p>
        In the recent years and in previous participations to the GeoLifeCLEF
challenge [
        <xref ref-type="bibr" rid="ref5 ref7">7, 5</xref>
        ], convolutional neural networks have shown to outperform more
traditional strategies by producing better predictions, especially on rare species.
Their performance seems to be due to two key points: (i) rst, the possibility of
processing large datasets that are inaccessible to other approaches; (ii) second,
by learning a representation space common to all species, stabilizing predictions
from one species to another, especially for the less represented ones.
      </p>
      <p>
        In this paper, we detail the participation of LIRMM / Inria at the
GeoLifeCLEF challenge 2020 [
        <xref ref-type="bibr" rid="ref2 ref3">3, 2</xref>
        ]. The particularity of this year challenge is to focus on
a large common high resolution dataset covering France and USA. In addition
to have for the rst time an international dataset with the addition of USA, this
year, the dataset contains for each occurrence a high resolution tensor including
remote sensing imagery, elevation and land cover at one meter per pixel. An
other improvement of this year edition is the evaluation protocol. The metric
has been adapted for more exibility and the split between train and test is now
spatially done to avoid biases due to spatial auto-correlation.
      </p>
      <p>
        A detailed description of the challenge methodology, data and results is
provided in [
        <xref ref-type="bibr" rid="ref1 ref2">2, 1</xref>
        ]. Figure 1 provides an overview of the results of the challenge. In
a nutshell, LIRMM / Inria submitted three categories of runs. A rst one based
on a random forest (RF) (trained on environmental vectors). A second one with
a convolutional neural network (CNN) only based on high resolutions patches
(this CNN has been trained using a cross entropy and cross-validated using a
top-1, top-10 and top-30 metric). Finally a fusion model from the output of the
convolutional neural network and the random forest has been submitted.
      </p>
      <p>The following sections of this manuscript have been organized as follows :
Section 2 gives an overview of the various data we used to build our model
and of the o cial metric used in the challenge. Section 3 provides the detailed
description of our implemented methods. Section 4 discusses the results.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Data and metric</title>
      <p>
        This section brie y introduces the data used by our models. A more detailed
presentation of the dataset used for the GeoLifeCLEF challenge 2020 is available
at [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ].
0.20
0.15
0.10
0.05
0.00
      </p>
      <sec id="sec-2-1">
        <title>LIRMM/InriaRun3</title>
      </sec>
      <sec id="sec-2-2">
        <title>LIRMM/InriaRun4</title>
      </sec>
      <sec id="sec-2-3">
        <title>LIRMM/InriaRun1</title>
      </sec>
      <sec id="sec-2-4">
        <title>LIRMM/InriaRun2</title>
        <p>
          The occurrence dataset is a large presence only dataset containing 1.9M
occurrences of plants and animals covering USA and France. The occurrences come
from two citizen science programs iNaturalist and Pl@ntNet. All the 1.1M
american occurrences come from iNaturalist (animals and plants) as well as the french
animal ones. The french plant occurrences come from Pl@ntNet platform [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ].
The split between train and test sets was processed through a spatial block
holdout. To do so, all occurrences were considered on a grid of 5 by 5km and
all occurrences falling in 2.5% randomly selected quadrats were kept for the test
set.
High resolution patches are composed of 6 layers of 256 by 256 pixels at a
resolution of 1 meter per pixel. These layers are 4 layers of aerial view (Red, Green,
Blue and Near-IR channels of remote sensing imagery), 1 layer for altitude and
1 layer for land cover. Both altitude and land cover had a lower raw resolution
and have been re-sampled. For the remote sensing imagery, US raw data where
already at 1m/px and french ones where down-sampled from 0.5m/px to 1m/px.
Patches where extracted for each train and test occurrences and directly
accessible for participants. Land cover is a categorical layer with 34 categories, for
practical use it must be unstack to 34 binary layers each coding the presence or
absence of the corresponding category. A dataset code was also given to facilitate
nding and reading patches. This code also include the possibility to unstack the
land cover categorical layer. More details about these patches are given in [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ].
Twenty seven environmental rasters were also provided to the participants. This
dataset combined 19 bioclimatic rasters from WorldClim and 8 pedologic rasters
from SoilGrids covering France and USA. Their spatial resolution is
approximately of 1 kilometer. This dataset allows to learn more classical environmental
models in order to compare them with deep learning approaches. An extraction
code was provided to the participants at https://github.com/maximiliense/
GLC.
the set of N predictions over K classes. The rst step of the adaptive top-30 is
the de nition of a threshold t such that:
t = arg min ; s:t:
2R
1 X X
N
i
j
si(j)&gt;
30
where s(j)&gt;t equals 1 if the inequality is veri ed and 0 otherwise. In other words,
i
t de nes the smallest threshold such that in average, at most 30 classes have a
score above t.
        </p>
        <p>Finally, the score is computed as:
1 X
N
i
s(yi)&gt;t:
i</p>
        <p>Such a metric permits to adapt the number of species to consider at a given
location depending on the con dence returned by the model.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3 Implemented Methods</title>
      <p>For reproducibility purposes, we share a repository containing the trained models
and their parameters, available at the following web address : https://gitlab.
inria.fr/bdeneu/glc20-participation.
3.1</p>
      <p>
        Run 1 - Random forest trained on environmental feature
vectors
This method was used for the runs entitled LIRMM / Inria Run 1 in Figure 1.
This model has been trained with environmental rasters (bio 1-bio 19, orcdrc,
phihox, cecsol, bdticm, clyppt, sltppt, sndppt, bld e). For each occurrence an
environmental vector is extracted from the rasters at the occurrence location
(vector of size 27). The model was learnt on all occurrences (France and US)
keeping 0.5% for validation (note that the nal model has not been re-trained
on these occurrences). In addition, few occurrences ended up outside of our
environmental rasters and were put aside. The implementation used is the random
forest classi er from scikit-learn [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] with a number of trees (n estimators) of
100 and a maximum depth of 10 (all other parameters are kept at the default
value).
3.2
      </p>
      <p>Runs 2/3 - Convolution neural networks trained on
high-resolution image covariates
The method described here was used for the runs entitled LIRMM / Inria Run 3
and LIRMM / Inria Run 2 in Figure 1. Both runs correspond to the same model
(architecture and training). They di er only by the prediction procedure on the
test set. The model has been trained on the following patches: red imagery, green
imagery, blue imagery, near-infrared imagery, land cover and altitude. Thus, all
patches have shape (c w h) = 6 256 256 where c stands for channel.</p>
      <p>The 1.9 million occurrences have been grouped together in order to train the
model on all samples. In addition, the occurrences have been split in three sets:
train, validation and test. Train occurrences represent 90% of all occurrences,
validation ones represent 5% of the total and tests occurrences represent 5% of
the total as well. The validation set is used to select the best model while the
test set is used once at the end to have an estimate of its performances. The
split is done completely randomly, unlike the challenge test occurrences which
split used spatial information.</p>
      <p>
        The model used in both runs is an Inception V3 [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] that has been a little bit
customized. Lower layers have been adapted for our tensor shape. In particular,
the rst two layers have been constructed as follows in Pytorch6:
1 self . Conv2d_1a_3x3 = BasicConv2d ( n_input , 32 , kernel_size =3 ,
2 stride =1 , padding =1)
3 self . Conv2d_2a_3x3 = BasicConv2d (32 , 32 , kernel_size =3 ,
4 stride =1 , padding =1)
In addition, Inception v3 models typically have an auxiliary output that we
have not used for simplicity reasons. In more details, the training procedure is
as follows:
{ Batch size: 128,
{ Dropout: 0.5,
{ Learning rate: 0.1,
{ 180 epochs with a learning rate decay ( = 0:1) at 90; 130; 150; 170; 180,
{ One validation every 5 epochs where top-1, top-10 and top-30 accuracy are
evaluated.
      </p>
      <p>The representation layer has width 2048 and is followed by a softmax:
pi = exp(pi)=(X exp(pk)):</p>
      <p>k
`(y; p) =</p>
      <p>X yklog(pk);
k</p>
      <p>Additionally the loss is the cross-entropy, also known as the negative
loglikelihood when interpreting the output of the softmax layer as probabilities:
where y is the one-hot encoding of the sample label and p the output of the
model interpreted as a probability. The model has been trained on a machine
with 190GB of memory and 8 GPUs V100 with 32GB. Due to a lack of time, the
model has not been trained on 100% of the occurrences before predicting on the
test set. Run 2 was used with dropout during its prediction, whereas run 3 was
used in prediction mode, without dropout. The latter obtained the best score in
the challenge.
3.3</p>
      <p>Run 4 - Fusion
This method was used for the run entitled LIRMM / Inria Run 4 in Figure 1.
The late fusion is based on the prediction of the CNN and of the random forest
on the test set. Predictions normalized as probabilities of both model have been
averaged. Then predicted species of each occurrence have been reranked.</p>
      <p>Unfortunately a bug during the decompression of US patches has a ected
this run leading to a degraded prediction of the CNN using them. This late
fusion is in consequence not representative of the potential maximum score of
this method.
6 http://pytorch.org</p>
    </sec>
    <sec id="sec-4">
      <title>Results analysis</title>
      <p>
        As expected, CNNs beat the random forest which con rm previous results
comparing these two methods [
        <xref ref-type="bibr" rid="ref5 ref7">7, 5</xref>
        ]. Here the comparison went further by not
comparing the two methods on the same data. Random forest has been learned on
an environmental vector as it is the case in most classical SDM approaches while
the neural network was learned on high-resolution tensors at 1m per pixel
containing the aerial views in R, G, B and near-IR, altitude and land cover. The
neural network does not use environmental raster data and therefore does not use
explicit environmental data. This result suggests that convolutional neural
networks are capable of extracting rich ecological information from high-resolution
aerial view data. This result is particularly interesting from the point of view of
producing high spatial resolution SDMs. The comparison of the two models can
also be made from a practical point of view, as both have advantages and
disadvantages. Firstly, as the neural network uses large dimensional data with ne
resolution, the volume of data to be manipulated is very large (near 1TB). This
can be a blocking point depending on the volume of computational resources
available. Moreover, learning the model on such a volume of data requires access
to large computing resources over a long period of time (several weeks). In
contrast, random forests take less than 15 minutes to be learnt and require a smaller
amount of resources. The limiting point is the amount of RAM required (in this
case near 50GB). If we now look at the learned model, the neural network is
much lighter than the random forest (656MB vs. 41GB). This is an important
positive point for the CNN because, combined with its other advantages such
as high spatial resolution and the possibility of transfer learning, it makes it a
lightweight, reusable model with a ne resolution high predictive power. It is
important to note, however, that methods using environmental rasters
(available on a territorial scale) such as RF allow predictions to be made at any point
by easily extracting the environmental vector. This is much more complicated
for the CNN, for which it is necessary to rst extract the tensors (especially
aerial views) at the point where the prediction is to be made. This extraction is
time-consuming as it is very di cult to quickly access this information over the
entire territory (the volume of data being far too large).
      </p>
      <p>Regarding the late fusion, the idea was to merge a more classical
environmental model (the RF) with the CNN at a ne resolution to study the complementary
of methods. Unfortunately the extraction of the US patches from the archives
on the machine where was performed the late fusion was corrupted, resulting on
near half of the patches (the majority of the US ones) being damaged. We did
not have time to re-download and extract the patches in the allocated time. The
results of this fusion are therefore not exploitable as such. We can however note
that the performance, despite the number of patches concerned, remains high
and close to the CNN (above the RF) which would seem to indicate probable
gain in the case where there would not have been these problems.
5</p>
    </sec>
    <sec id="sec-5">
      <title>Conclusion and future works</title>
      <p>Our results show that the method achieving the best prediction of species in the
context of the GeoLifeCLEF 2020 challenge was a convolutional neural network
trained solely on the high-resolution covariates (RGB-IR imagery, land cover,
and altitude). It did outperform the more classical species distribution
modelling approach based solely on punctual environmental variables at a coarse
resolution. This suggests two things: (i) important information explaining the
species composition is contained in the high-resolution covariates and (ii),
convolutional neural networks are able to capture this information. An important
following question would be to know whether the information captured by the
high-resolution CNN is complementary to the one captured from the bioclimatic
and soil variables. This was the purpose of one of the method we implemented
but unfortunately was not really conclusive because of a corruption in the data.
6</p>
    </sec>
    <sec id="sec-6">
      <title>Acknowledgement</title>
      <p>This project has received funding from the European Union's Horizon 2020
research and innovation program under grant agreement No 863463 (Cos4Cloud
project), the support of #DigitAG.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>Elijah</given-names>
            <surname>Cole</surname>
          </string-name>
          , Benjamin Deneu, Titouan Lorieul, Maximilien Servajean, Christophe Botella, Dan Morris, Nebojsa Jojic, Pierre Bonnet, and Alexis Joly. \
          <article-title>The GeoLifeCLEF 2020 Dataset"</article-title>
          . In: arXiv preprint arXiv:
          <year>2004</year>
          .
          <volume>04192</volume>
          (
          <year>2020</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>Benjamin</given-names>
            <surname>Deneu</surname>
          </string-name>
          , Titouan Lorieul, Elijah Cole, Maximilien Servajean, Christophe Botella, Dan Morris, Nebojsa Jojic, Pierre Bonnet, and Alexis Joly. \
          <article-title>Overview of LifeCLEF location-based species prediction task 2020 (GeoLifeCLEF)"</article-title>
          . In: CLEF task overview
          <year>2020</year>
          ,
          <article-title>CLEF: Conference and Labs of the Evaluation Forum</article-title>
          , Sep.
          <year>2020</year>
          , Thessaloniki, Greece.
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>Alexis</given-names>
            <surname>Joly</surname>
          </string-name>
          , Herve Goeau, Stefan Kahl, Benjamin Deneu, Maximilien Servajean, Elijah Cole, Lukas Picek, Rafael Ruiz De Castan~eda, e, Titouan Lorieul, Christophe Botella, Herve Glotin, Julien Champ,
          <string-name>
            <surname>Willem-Pier</surname>
            <given-names>Vellinga</given-names>
          </string-name>
          ,
          <article-title>Fabian-Robert Stoter, Andrew Dorso, Pierre Bonnet, Ivan Eggel, and Henning Muller. \Overview of LifeCLEF 2020: a System-oriented Evaluation of Automated Species Identi cation and Species Distribution Prediction"</article-title>
          .
          <source>In: Proceedings of CLEF</source>
          <year>2020</year>
          ,
          <article-title>CLEF: Conference and Labs of the Evaluation Forum</article-title>
          , Sep.
          <year>2020</year>
          , Thessaloniki, Greece.
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>Christophe</given-names>
            <surname>Botella</surname>
          </string-name>
          , Maximilien Servajean, Pierre Bonnet, and Alexis Joly. \
          <source>Overview of GeoLifeCLEF</source>
          <year>2019</year>
          :
          <article-title>plant species prediction using environment and animal occurrences"</article-title>
          .
          <source>In: CLEF task overview</source>
          <year>2019</year>
          ,
          <article-title>CLEF: Conference and Labs of the Evaluation Forum</article-title>
          , Sep.
          <year>2019</year>
          , Lugano,
          <string-name>
            <surname>Switzerland.</surname>
          </string-name>
          (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>Mathilde</given-names>
            <surname>Negri</surname>
          </string-name>
          , Maximilien Servajean, Benjamin Deneu, and Alexis Joly. \
          <string-name>
            <surname>Location-Based Plant Species Prediction Using A CNN Model Trained On Several Kingdoms-Best Method Of GeoLifeCLEF 2019 Challenge</surname>
          </string-name>
          <article-title>"</article-title>
          .
          <source>In: CLEF working notes</source>
          <year>2019</year>
          ,
          <article-title>CLEF: Conference and Labs of the Evaluation Forum</article-title>
          , Sep.
          <year>2019</year>
          , Lugano, Switzerland.
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>Christophe</given-names>
            <surname>Botella</surname>
          </string-name>
          , Pierre Bonnet, Francois Munoz, Pascal Monestiez, and Alexis Joly. \Overview of GeoLifeCLEF 2018:
          <article-title>location-based species recommendation"</article-title>
          .
          <source>In: CLEF task overview</source>
          <year>2018</year>
          ,
          <article-title>CLEF: Conference and Labs of the Evaluation Forum</article-title>
          , Sep.
          <year>2018</year>
          , Avignon, France. (
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>Christophe</given-names>
            <surname>Botella</surname>
          </string-name>
          , Alexis Joly, Pierre Bonnet, Pascal Monestiez, and Francois Munoz.
          <article-title>\A deep learning approach to species distribution modelling"</article-title>
          .
          <source>In: Multimedia Tools and Applications for Environmental &amp; Biodiversity Informatics</source>
          . Springer,
          <year>2018</year>
          , pp.
          <volume>169</volume>
          {
          <fpage>199</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <surname>Antoine</surname>
            <given-names>A ouard</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jean-Christophe</surname>
            <given-names>Lombardo</given-names>
          </string-name>
          , Herve Goeau, Pierre Bonnet, and Alexis Joly. \
          <article-title>Pl@ntnet app in the era of deep learning"</article-title>
          .
          <source>In: ICLR 2017 Workshop Track-5th International Conference on Learning Representations</source>
          (
          <year>2017</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>Christian</given-names>
            <surname>Szegedy</surname>
          </string-name>
          , Vincent Vanhoucke, Sergey Io e, Jon Shlens, and Zbigniew Wojna.
          <article-title>\Rethinking the inception architecture for computer vision"</article-title>
          .
          <source>In: Proceedings of the IEEE conference on computer vision and pattern recognition</source>
          .
          <source>2016</source>
          , pp.
          <volume>2818</volume>
          {
          <fpage>2826</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>F.</given-names>
            <surname>Pedregosa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Varoquaux</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Gramfort</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Michel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Thirion</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Grisel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Blondel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Prettenhofer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Weiss</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Dubourg</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Vanderplas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Passos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Cournapeau</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Brucher</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Perrot</surname>
          </string-name>
          , and
          <string-name>
            <surname>E. Duchesnay.</surname>
          </string-name>
          \
          <article-title>Scikit-learn: Machine Learning in Python "</article-title>
          .
          <source>In: Journal of Machine Learning Research</source>
          <volume>12</volume>
          (
          <year>2011</year>
          ), pp.
          <volume>2825</volume>
          {
          <fpage>2830</fpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>