<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>ACM Conference on Recommender Systems, Amsterdam, The Netherlands
" indre.zliobaite@helsinki.fi (I. Zliobaite)
~ https://www.zliobaite.com/ (I. Zliobaite)</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Recommender systems meet species distribution modelling</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Indre Zliobaite</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Computer science, University of Helsinki</institution>
          ,
          <country country="FI">Finland</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2021</year>
      </pub-date>
      <volume>000</volume>
      <fpage>0</fpage>
      <lpage>0003</lpage>
      <abstract>
        <p>Recommender systems techniques can naturally lend themselves to species distribution modelling if biological species are treated as items and places where they occur are treated as users. In this setting recommendation scores can reflect which habitats are suited for which species. Recommendation scores can also be used for reconstructing relative abundances of species, and analysing their rises and declines over millions of years in the past. Analysis of such predictions can shed light on the efects of changing environments on the biosphere now and in the past, as well as help to make predictions for the future. The major potential advantage of the recommender systems treatment over many existing solutions is the large spatial and temporal scale at which such analysis can be done within a single model. A single model makes predictions easier to compare globally in space and over time. While algorithmic application of recommender systems techniques to species distribution modelling is relatively straightforward, model selection and evaluation is particularly challenging, as there is no possibility for online tests or on-demand sampling, since the past worlds are long gone. Explainability is paramount in these tasks. Here we highlight the main challenges and promising directions of evaluation of such modelling, which is still in early stages of development. We show how aggregated prediction statistics and constraints may help for reliable model selection and evaluation. We illustrate the approaches on a case study of the mammalian fossil record from Europe around 8-17 millions of years ago.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;matrix factorization</kwd>
        <kwd>implicit feedback</kwd>
        <kwd>species distribution modeling</kwd>
        <kwd>NOW database</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        tems. While multispecies distribution models, modelling several species at a time, are coming
about [
        <xref ref-type="bibr" rid="ref7 ref8">7, 8</xref>
        ], their primary focus is still on modeling ecological niches across environmental
gradients. This relies on high resolution climatic data, which realistically is not available for the
record of distant past.
      </p>
      <p>Collaborative filtering techniques of recommender systems can be used to model preferences
of organisms for diferent habitats without explicitly characterizing those habitats. This approach
would extract patterns of species co-occurrence and extrapolate them over large spatial and
temporal scales. Modelling the dynamics of an ecosystem with a single model makes the
predictions directly comparable across diferent species and diferent biodiversity spots. The
inferred model then can be used:
1. for identifying species that would do well but are likely to be missing at sites;
2. for reconstructing relative population sizes from species lists (this is similar to predicting
product ratings from transactional data);
3. for tracking co-occurrences and co-evolution over time and space, and for analyzing
macroevolutionary processes.</p>
      <p>
        Technical research in recommender systems for species distribution modeling is in early
stages, but has already shown promising results [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ].
      </p>
    </sec>
    <sec id="sec-2">
      <title>2. Recommender systems for species distribution modelling</title>
      <p>The recommender system task naturally lends itself to analysis of ecosystems if we consider
species as items and sites where they occur as users. The relationship is not the other way
around, because each user can consume only a limited number of products, but each product
can be consumed by potentially infinite number of users. Similarly, each site can accommodate
a limited number of species, but each species potentially can occur in an infinite number of
sites. While both species and sites can potentially be described by features, here we consider
the simplest scenario where only occurence information is available. Thus, we are in the
collaborative filtering task setting.</p>
      <p>
        Data for such analysis of ecosystems can come from many sources. Many biodiversity
databases maintain records of species diversity today, the majority of them aggregate data from
many sources1[
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. Fossil databases, describing ecosystems of the past, are also widely available
[
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]. There are also databases that aggregate biodiversity databases, for instance, GBIT2. Some
data may come from expert surveys, research expeditions or professional wildlife monitoring
projects, another portion of data may come from citizen observations. Naturally, even within a
single database data can be of varying quality, which is not unlike user data typically used for
recommender systems. Most notably, the uncertainty of absences is higher than the uncertainty
of presences. If a species has been not been reported at a place it is uncertain whether it does
not occur there, or it just has not been encountered yet. This applies to species occurrence data
at present as well as in the fossil record. Similarly, presence of a transaction in typical data for
      </p>
      <sec id="sec-2-1">
        <title>1https://en.wikipedia.org/wiki/List_of_biodiversity_databases 2https://www.gbif.org/</title>
        <p>recommender systems signals that a user preferred a particular product, while absence of a
transaction might mean either that the user did not like the product or has never came across it.</p>
        <p>Further challenges arise due to sparsity of data, especially that of the past ecosystems. Each
fossil site presents only a tiny fraction of all species that have ever lived. Similarly, one user can
realistically watch only a small fraction of movies that have ever been made. The total number
of movies can vary from user to user as the number of species can vary from site to site.</p>
        <p>Last but not least, synonymy is a challenge for modelling user preferences as it is for analysing
the fossil record. Just as the same movie may appear under diferent titles in diferent contexts (or
countries), the same species can appear under diferent names in diferent research communities.
Recommender system techniques are generally equipped to be robust to these shared challenges
and hopefully can lend their perspectives to species distribution modelling.</p>
        <p>
          Sometimes information on abundances of species at sites might be available. This corresponds
to availability of user ratings. Yet availability of relative abundances at large scales is rare.
Typically, only lists of species that have occurred at sites are given. This corresponds to
transactional user data without ratings. The latter setting calls for recommender systems
solutions with implicit feedback [
          <xref ref-type="bibr" rid="ref12 ref13">12, 13</xref>
          ]. Such solutions may draw on the repetitiveness of a
transaction or the certainty associated with a transaction in general. In the species modelling
world certainty can be quantified via qualifiers associated with species identification. Fortunately,
neither certainty nor presence-absence information has to be complete; recommender systems
typically operate on incomplete information and such is the nature of information about species
occurrences.
        </p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Evaluation challenges and evaluation criteria</title>
      <p>
        Latent factor models [
        <xref ref-type="bibr" rid="ref12 ref14 ref15 ref16">12, 14, 15, 16</xref>
        ] largely dominate the collaborative filtering research for
over a decade due to their simplicity and efectiveness. For our case study consider a weighted
latent factor model (WFM) [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] for collaborative filtering with implicit feedback.
      </p>
      <p>Let D×  be a binary matrix of observed presence or absence of taxa at sites. WFM defines
a confidence matrix as C = 1 +  D, where  is a parameter that accounts for asymmetry of
uncertainties about presence and absence. The higher  the more certainty is put on presences
in contrast to absences.</p>
      <p>WFM factorizes the occurrence matrix into two preference matrices taking into account
confidences of the transactions D→− C X×  × YT× . Here  is a parameter specifying the
dimensionality of the projection.</p>
      <p>WFM minimizes cost function min⋆,⋆ ∑︀, ( − T)2 +  (︀ ∑︀ ||||2 + ∑︀ ||||2)︀ .
Here  and  are elements of matrices C and D (defined earlier), and  and  are rows of
matrices X and Y.  a regularisation parameter.</p>
      <p>All in all WFM requires setting four parameters:  , ,  and the number of iterations for
minimizing the cost function. Next we need to define quantitative evaluation criteria and and a
testing procedure for choosing the parameter values.</p>
      <p>
        While many indirect evaluation approaches exist for recommender systems [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ], usually the
most reliable is online testing, where users are exposed to diferent recommender solutions at
random. Yet, since species occurrence data is almost always exclusively observational, online
evaluation is not an option and we are left to evaluate the model fit based on the observational
data used for modelling.
      </p>
      <p>If we wanted the model to reconstruct the observational data as closely as possible, the best
approach would be to set the number of internal dimensions  as high as possible and to set the
regularisation parameter  to zero. Such a model would memorise and reconstruct underlying
data perfectly but it would not have predictive power, since it would overfit.</p>
      <p>Cross-validation, that would normally be used in predictive modeling to avoid overfitting,
is not an option since there is no easy way to hold out a separate testing set. Variants of
cross-validation have been used for testing autoencoder-based collaborative filtering [ 18, 19].
They would leave out some users for testing, which is possible with autoencoders, since they
have explicit inputs to the model and outputs. That does not straightforwardly apply to latent
factor models, however.</p>
      <p>
        For latent factor models we can do pseudo-cross-validation, where individual occurrences
are nullified at random [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ], and check which parameter settings best reproduce the nullified
occurrences. Yet, this is not suficient either. If we were simply to maximise this leave-one-out
accuracy, the optimal solution would be to predict everything as ones, that is to predict all
species to occur everywhere. Clearly, this is not an informative outcome either.
      </p>
      <p>
        Ideally, we want the model not only to reproduce observed occurrences but also to identify the
species that are most likely to be missing at sites, as well as flag potential misidentifications. Thus,
predictions must be inaccurate with respect to the training data in order to produce meaningful
predictions. Our approach [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] thus is to push the model (1) to predict more occurrences than
in the original data while at the same time (2) reproduce the occurrences in the original data
reasonably accurately.
      </p>
      <p>The first criterion – pushing the model to predict more positives than in the original data is
easy to achieve by increasing  . However, at the same time it is important not to overshoot
the carrying capacity of the environments. Species-area relationships are restrictive in a sense
that environments can accommodate only a limited number of species [20], which is somewhat
predictable from the climatic conditions [21]. While a movie recommender system could
potentially keep recommending highly scored movies to the user for as long as the user keeps
watching them, an informative species distribution model should recommend a finite number
of species that can exist on a site, and this number will certainly vary from site to site. An
evaluation criteria that can be used to control for the model realism from this perspective could
be requiring that the total-number of recommended species does not exceed, say, 20% of the
species that are already there.</p>
      <p>The second criterion – keeping the occurrences in the original data accurately reconstructed
should rely on a subset of data points for which we have high confidence of both positive
occurrences and absences. Repetitive presences (if any of those are reported) can be considered
as true positives. Absences out of the time range when the species is known to have been
extinct (or has not originated yet) can be considered as true negatives. The latter requires a
temporal information in the meta data and thus is primarily suitable for fossil data.</p>
      <p>With these two targets in mind one can aim at maximising a conventional evaluation metric,
for example, the area under curve (ROC), on a subset of the data that only includes true positives
and true negatives and let the aggregated statistics of positive predictions over sites and species
take care of not deviating too far from the carrying capacity limits.</p>
    </sec>
    <sec id="sec-4">
      <title>4. A case study</title>
      <p>
        Our case study shows an application of matrix factorisation with implicit feedback to
reconstructing relative abundances of large plant eating mammals in Europe from about 17 to about
8 million years ago. This time range captures sites assigned to the European Land Mammal
biozones from MN4 to MN12 [22], that include a major faunal turnover. Species occurrence
data comes from a public fossil mammal database called NOW [23]. The database records sites
where fossils have been found. Age information is assigned to sites, not to individual fossils.
Each site has a list of species that have been found there. Some identifications of species may be
uncertain, the database records uncertainty qualifiers. The database also records features that
characterize each species, but we have not used this information in this study. We aggregated
the data at the genus level rather than analyzing it at the species level. There is no diference
from the algorithmic perspective, but this way the results are easier to interpret from the ecology
perspective. Details of preprocesing can be found in [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. The preprocessed dataset contains
104 genera (items), 351 sites (users) and 2616 occurrences (transactions). Sparsity is 93%. The
preprocessed dataset used for this case study is available on GitHub3.
      </p>
      <p>Following the principles outlined in the previous section we monitored the following
quantitative performance measures:
1. ˆ all mean prediction score over all the data (the prior probability over all the data is
 = 0.063) [we want ˆ  to be slightly higher than all , but not too much higher];
2. MAE all mean absolute error over all the data [we want it to be small, but not zero];
3. MAE animals mean absolute error over the numbers of occurrences for the animals [we
want it to be small, but not zero];
4. MAE sites mean absolute error over how many animals each site hosts [this relates to the
carrying capacity, we want the error to be small, but not zero];
5. AUC all area under ROC on all training data [we want it to be close to one];
6. ˆ pos+ is mean prediction score over true positives) [ideally, we want ˆ + = 1];
7. (ˆ pos+) the standard deviation;
8. ˆ neg+ is mean prediction score over true negatives) [ideally, we want ˆ + = 0];
9. (ˆ neg+) the standard deviation;
10.  + area under ROC on selected true positives and true negatives [we want it to be
close to one];
11. (AUC +) the standard deviation.</p>
      <p>We tested around 200 parameter setting variants via a grid search in the 3-dimensional model
parameter space ( , ,  ). We kept the number of model fitting iterations fixed to 10. Instead of
testing on all true positives and true negatives we randomly selected 10 of each and repeated
10 times for each model. This saved computational costs and sidestepped the challenge of
class imbalance. We initialised the factor matrices by drawing random values from the normal
distribution with zero mean and unit variance. It took a couple of minutes to fit one model
using ad hoc implementation in R suite on a commodity laptop.</p>
      <sec id="sec-4-1">
        <title>3https://github.com/zliobaite/fossilrec</title>
        <p>where  is the preference score for species  to occur at site , coming from the model; 
is the presence-absence matrix, where  &gt; 0 means that we only sum taxa that are reported
to be present at site . The subtraction of 0.5 from the probability score is an arbitrary cutof
implying that preference scores below 0.5 signal absence. In this study we only analyse the
relative abundances of animals that are present, but in principle, the recommender systems
approach would allow the estimation of potential relative abundances of animals that are absent
as well. The challenge is how to keep the total number of recommended animals contained
and in line with the carrying capacity of the environment, as discussed earlier. This is an open
question for further research.</p>
        <p>Table 1 shows the results. We see that the order from the most abundant to the least abundant
animals (genera) is not too far of, but the predictions for rare animals are quite too high.</p>
        <p>Animal genus
Gomphotherium
Anchitherium
Prosantorhinus
Tethytragus
Micromeryx</p>
        <p>Heteroprox
% fragments found recommendation score
% predicted</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Concluding remarks</title>
      <p>Recommender systems approaches open new perspectives for analysing ecosystems and species
distribution modelling. Reliable evaluation of such approaches is an open challenge. Here we
outline several evaluation criteria that are based on domain knowledge about ecosystems.
Hopefully similar solutions can potentially be useful in user modelling applications of recommender
systems as well. Curious is to learn that the two settings are more similar than it may look from
the first sight.</p>
      <p>Our case study showed that an of-the-shelf matrix factorisation approach already works
reasonably well for fossil species distribution modelling, but many methodological challenges
remain. Open directions for future research include taking time and energy constraints into
the models. As not all species are alive at all times, models could take constraints of species
being alive into their optimisation criteria. As ecosystems vary in energy (for example, tropical
forests produce much more edible biomass than semi-deserts), models could incorporate such
constraints as well. In the product world this would correspond to one customer having much
more purchasing power than another. At a larger scale, diferent epochs with diferent climates
may be considered as diferent contexts, where context-aware recommender systems [26] can
ofer better treatment. Finally, there is a large potential for blending occurrence information
with descriptive features of animals and sites, drawing on recent works in reconstructing past
environments [27, 28, 29, 30, 31]. The ultimate purpose is to understand how the living world
was in the past, when is it livable, and how it works in general.</p>
    </sec>
    <sec id="sec-6">
      <title>6. Acknowledgments</title>
      <p>The author is grateful to two anonymous reviewers for insightful feedback. Research leading to
these results was partially supported by the Academy of Finland (grants no. 314803, 341623).
[18] D. Liang, R. Krishnan, M. Hofman, T. Jebara, Variational autoencoders for collaborative
ifltering, in: Proc. of the 2018 World Wide Web Conference, WWW’18, 2018, pp. 689–698.
[19] H. Steck, Embarrassingly shallow autoencoders for sparse data, in: Proc. of the 2019 World</p>
      <p>Wide Web Conference, WWW’19, 2019, pp. 3251–3257.
[20] S. Cain, The species-area curve, The American Midland Naturalist 19 (1938) 573–581.
[21] H. Hillebrand, On the generality of the latitudinal diversity gradient, The American</p>
      <p>Naturalist 163 (2004).
[22] F. Hilgen, L. Lourens, J. van Dam, The neogene period, in: F. Gradstein, J. Ogg, M. Schmitz,</p>
      <p>G. Ogg (Eds.), The Geologic Time Scale 2012, Elsevier, 2012, pp. 923–978.
[23] The NOW Community, New and old worlds database of fossil mammals (now), Licensed
under CC BY 4.0, http://www.helsinki.fi/science/now/, 2020.
[24] M. Domingo, D. Martin-Perea, L. Domingo, E. Cantero, J. Cantalapiedra, B. Garcia-Yelo,
A. Gomez-Cano, G. Alcalde, O. Fesharaki, M. Hernandez-Fernandez, Taphonomy of
mammalian fossil bones from the debris-flow deposits of somosaguas-north (middle miocene,
madrid basin, spain), Palaeogeography, Palaeoclimatology, Palaeoecology 465 (2017)
103–121.
[25] A. K. Behrensmeyer, S. M. Kidwell, R. A. Gastaldo, Taphonomy and paleobiology,
Paleobiology 26 (2000) 103–147.
[26] G. Adomavicius, B. Mobasher, F. Ricci, A. Tuzhilin, Context-aware recommender systems,</p>
      <p>AI Magazine 32 (2011) 67–80.
[27] L. Liu, K. Puolamaki, J. Eronen, M. Mirzaie Ataabadi, E. Hernesniemi, M. Fortelius, Dental
functional traits of mammals resolve productivity in terrestrial ecosystems past and present,
Proceedings of the Royal Society B 279 (2012) 2793–2799.
[28] M. Fortelius, I. Zliobaite, F. Kaya, F. Bibi, R. Bobe, L. Leakey, et al., An ecometric analysis
of the fossil mammal record of the turkana basin, Philosophical Transactions of the Royal
Society: Biological Sciences 371 (2016).
[29] A. Barr, Ecomorphology: Reconstructing cenozoic terrestrial environments and ecological
communities, in: D. Croft, D. Su, S. Simpson (Eds.), Methods in Paleoecology:
Reconstructing Cenozoic Terrestrial Environments and Ecological Communities, Springer, 2018, pp.
339–349.
[30] W. Vermillion, J. Head, P. Polly, J. Eronen, A. Lawing, Ecometrics: A trait-based approach
to paleoclimate and paleoenvironmental reconstruction, in: D. Croft, D. Su, S. Simpson
(Eds.), Methods in Paleoecology: Reconstructing Cenozoic Terrestrial Environments and
Ecological Communities, Springer, Cham, 2018, pp. 373–394.
[31] T. Faith, A. Du, J. Rowan, Addressing the efects of sampling on ecometric-based
paleoenvironmental reconstructions, Palaeogeography, Palaeoclimatology, Palaeoecology 528
(2019) 175–185.
Performance evaluation of models with diferent parameter settings. 10 best results within each
evaluation criterion are highlighted in bold.</p>
      <p>^
5
5
5
5
5
5
5
5
5
5
5
5
5
5
5
5
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
10
20
20
20
20
20
20
20
20
20
20
20
20
20
20
20
20
30
30
30
30
30
30
30
30
30
30
30
30
30
30
all
0.173
0.182
all
0.235
0.163
0.162
animals
9.74
17.769
19.846
sites
5.194
5.265
5.869
all
0.9894
0.9999
0.529
0.579
0.106
0.068
0.028
0.027
-0.006
0.05
0.045
0.941
0.976
0.066
0.029</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>J.</given-names>
            <surname>Elith</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Leathwick</surname>
          </string-name>
          ,
          <article-title>Species distribution models: Ecological explanation and prediction across space and time</article-title>
          ,
          <source>Annual Review of Ecology, Evolution, and Systematics</source>
          <volume>40</volume>
          (
          <year>2009</year>
          )
          <fpage>677</fpage>
          -
          <lpage>697</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>R.</given-names>
            <surname>Pearson</surname>
          </string-name>
          ,
          <article-title>Species? distribution modeling for conservation educators and practitioners, Lessons in Conservation 3 (</article-title>
          <year>2010</year>
          )
          <fpage>54</fpage>
          -
          <lpage>89</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>L.</given-names>
            <surname>Pollock</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Tingley</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Morris</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Golding</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. O</given-names>
            <surname>'Hara</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Parris</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Vesk</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.</surname>
          </string-name>
          <article-title>McCarthy, Understanding co-occurrence by modelling species simultaneously with a joint species distribution model (JSDM)</article-title>
          ,
          <source>Methods in Ecology and Evolution</source>
          <volume>5</volume>
          (
          <year>2014</year>
          )
          <fpage>397</fpage>
          -
          <lpage>406</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>G.</given-names>
            <surname>Tikhonov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Opedal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Abrego</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Lehikoinen</surname>
          </string-name>
          , M. d. Jonge,
          <string-name>
            <given-names>J.</given-names>
            <surname>Oksanen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Ovaskainen</surname>
          </string-name>
          ,
          <article-title>Joint species distribution modelling with the r-package Hmsc</article-title>
          ,
          <source>Methods in Ecology and Evolution</source>
          <volume>11</volume>
          (
          <year>2020</year>
          )
          <fpage>442</fpage>
          -
          <lpage>447</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>C.</given-names>
            <surname>Myers</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Stigall</surname>
          </string-name>
          ,
          <string-name>
            <surname>B. Lieberman,</surname>
          </string-name>
          <article-title>PaleoENM: applying ecological niche modeling to the fossil record</article-title>
          ,
          <source>Paleobiology</source>
          <volume>41</volume>
          (
          <year>2015</year>
          )
          <fpage>226</fpage>
          -
          <lpage>244</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>S.</given-names>
            <surname>Varela</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Lobo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Hortal</surname>
          </string-name>
          ,
          <article-title>Using species distribution models in paleobiogeography: A matter of data, predictors and concepts</article-title>
          ,
          <source>Palaeogeography</source>
          , Palaeoclimatology, Palaeoecology
          <volume>310</volume>
          (
          <year>2011</year>
          )
          <fpage>451</fpage>
          -
          <lpage>463</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>P.</given-names>
            <surname>Dunstan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Foster</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Darnell</surname>
          </string-name>
          ,
          <article-title>Model based grouping of species across environmental gradients</article-title>
          ,
          <source>Ecological Modelling</source>
          <volume>222</volume>
          (
          <year>2011</year>
          )
          <fpage>955</fpage>
          -
          <lpage>963</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>F.</given-names>
            <surname>Hui</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Taskinen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Pledger</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Foster</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Warton</surname>
          </string-name>
          ,
          <article-title>Model-based approaches to unconstrained ordination</article-title>
          ,
          <source>Methods in Ecology and Evolution</source>
          <volume>6</volume>
          (
          <year>2015</year>
          )
          <fpage>399</fpage>
          -
          <lpage>411</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <surname>I. Zliobaite</surname>
          </string-name>
          ,
          <article-title>Recommender systems for fossil species distribution modelling, under review (</article-title>
          <year>2021</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>J.</given-names>
            <surname>Ball-Damerow</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Brenskelle</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Barve</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Soltis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Sierwald</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Bieler</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>LaFrance</surname>
          </string-name>
          , A. Arino, R. Guralnick,
          <article-title>Research applications of primary biodiversity databases in the digital age</article-title>
          ,
          <source>PLoS ONE 14</source>
          (
          <year>2019</year>
          )
          <article-title>e0215794</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>M.</given-names>
            <surname>Uhen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Barnosky</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Bills</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Blois</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Carrano</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Carrasco</surname>
          </string-name>
          , et al,
          <article-title>From card catalogs to computers: databases in vertebrate paleontology</article-title>
          ,
          <source>Journal of Vertebrate Paleontology</source>
          <volume>33</volume>
          (
          <year>2013</year>
          )
          <fpage>13</fpage>
          -
          <lpage>28</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Hu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Koren</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Volinsky</surname>
          </string-name>
          ,
          <article-title>Collaborative filtering for implicit feedback datasets</article-title>
          ,
          <source>in: Proceedings of the 8th IEEE International Conference on Data Mining</source>
          , IEEE ICDM,
          <year>2008</year>
          , pp.
          <fpage>263</fpage>
          -
          <lpage>272</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>K.</given-names>
            <surname>Verstrepen</surname>
          </string-name>
          ,
          <article-title>Collaborative Filtering with Binary, Positive-only Data</article-title>
          ,
          <source>Ph.D. thesis</source>
          , Universiteit Antwerpen,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>P.</given-names>
            <surname>Gopalan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Hofman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Blei</surname>
          </string-name>
          ,
          <article-title>Scalable recommendation with hierarchical poisson factorization</article-title>
          ,
          <source>in: Proc. of the Thirty-First Conference on Uncertainty in Artificial Intelligence</source>
          ,
          <year>2015</year>
          , pp.
          <fpage>326</fpage>
          -
          <lpage>335</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>R.</given-names>
            <surname>Salakhutdinov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Mnih</surname>
          </string-name>
          ,
          <article-title>Probabilistic matrix factorization</article-title>
          ,
          <source>in: Advances in neural information processing systems</source>
          ,
          <year>2008</year>
          , pp.
          <fpage>1257</fpage>
          -
          <lpage>1264</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>X.</given-names>
            <surname>Ning</surname>
          </string-name>
          , G. Karypis, Slim:
          <article-title>Sparse linear methods for top-n recommender systems</article-title>
          ,
          <source>in: IEEE International Conference on Data Mining</source>
          , ICDM,
          <year>2011</year>
          , pp.
          <fpage>497</fpage>
          -
          <lpage>506</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>J.</given-names>
            <surname>Herlocker</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Konstan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Terveen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Riedl</surname>
          </string-name>
          ,
          <article-title>Evaluating collaborative filtering recommender systems</article-title>
          ,
          <source>ACM Trans. Inf. Syst</source>
          .
          <volume>22</volume>
          (
          <year>2004</year>
          )
          <fpage>5</fpage>
          -
          <lpage>53</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>