=Paper= {{Paper |id=Vol-3632/ISWC2023_paper_458 |storemode=property |title=Towards preserving Biodiversity using Nature FIRST Knowledge Graph with Crossovers |pdfUrl=https://ceur-ws.org/Vol-3632/ISWC2023_paper_458.pdf |volume=Vol-3632 |authors=Albin Ahmeti,Jan-Kees Schakel,Robert David,Artem Revenko |dblpUrl=https://dblp.org/rec/conf/semweb/AhmetiSDR23 }} ==Towards preserving Biodiversity using Nature FIRST Knowledge Graph with Crossovers== https://ceur-ws.org/Vol-3632/ISWC2023_paper_458.pdf
                                Towards Preserving Biodiversity using Nature FIRST
                                Knowledge Graph with Crossovers
                                Albin Ahmeti1,2,∗ , Jan-Kees Schakel3 , Robert David1 and Artem Revenko1
                                1
                                  Semantic Web Company, Austria
                                2
                                  Vienna University of Technology (TU Wien), Austria
                                3
                                  Sensing Clues Foundation, Netherlands.


                                                                         Abstract
                                                                         Preserving biodiversity, encompassing species and their habitats, is gaining significant attention and
                                                                         becoming a central concern, alongside the focus on climate change. Climate change directly impacts
                                                                         biodiversity and is a prominent aspect of Environmental, Social, and Governance (ESG) criteria. At the
                                                                         EU level, designated areas called Natura 2000 sites have been established for protection and conservation,
                                                                         aimed at safeguarding habitats and species. However, the data regarding these sites, habitats, and species
                                                                         is currently dispersed and isolated, resulting in limited usefulness. To address this issue, we introduce
                                                                         our work on a Knowledge Graph (KG) for biodiversity, known as Nature First KG. This KG aims to connect
                                                                         various data silos, including information about sites, species, and habitats, through cross-references called
                                                                         crossovers. Combining it with a digital twin, we empower recommender use cases such as: preventing
                                                                         human-wildlife conflicts, facilitating species reproduction, and combating illegal poaching to name a few.

                                                                         Keywords
                                                                         knowledge graphs, biodiversity, data integration, linked open data, FAIR




                                1. Introduction
                                Climate change is one of the main challenges that has preoccupied mankind in the recent
                                decades. The effects of climate change have critically altered ecosystems and biodiversity all
                                around the world, changes in ecosystem range and distribution, ecosystem composition, local
                                species extinctions or mass mortality events of plants and animals have been observed [1].
                                Human-induced land cover change has led to environmental impacts such as the decline of
                                biodiversity and other ecosystem services1 . Similar negative results have been observed with
                                anthropogenic change of land use, i.e., replacing nature with architectural buildings for humans
                                to live in enclosed spaces, as reported in this survey [2]. In order to tackle the issue – in the
                                broad level of climate change – the United Nations (UN) have identified a number of goals to be
                                achieved on the topic of Environmental, Social and Governance (ESG).
                                   Furthermore, at the European Union (EU) level, designated areas called Natura 2000 sites

                                ISWC 2023 Posters and Demos: 22nd International Semantic Web Conference, November 6–10, 2023, Athens, Greece
                                ∗
                                    Corresponding author.
                                Envelope-Open albin.ahmeti@semantic-web.com (A. Ahmeti); jankees.schakel@sensingclues.org (J. Schakel);
                                robert.david@semantic-web.com (R. David); artem.revenko@semantic-web.com (A. Revenko)
                                                                       © 2023 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
                                    CEUR
                                    Workshop
                                             CEUR Workshop Proceedings (CEUR-WS.org)
                                    Proceedings
                                                  http://ceur-ws.org
                                                  ISSN 1613-0073




                                1
                                    Land cover accounts – an approach to geospatial environmental accounting, European Environment Agency,
                                    https://www.eea.europa.eu/themes/landuse/land-accounting




CEUR
                  ceur-ws.org
Workshop      ISSN 1613-0073
Proceedings
have been established for protection and conservation, aimed at safeguarding habitats and
species. However, the data regarding these sites, habitats, and species is currently dispersed
and isolated, resulting in limited usefulness. Data is provided by different organizations and
classification systems such as EUNIS 2 and IUCN 3 in different structure, format and completeness;
with IUCN reporting mainly on threatened species aka. “Red List Species.” The problem is
further exacerbated due to existence of different versions of habitat taxonomies alongside
habitat names and codes that have changed over time but in fact are equivalent, namely habitat
“Subarctic and alpine dwarf Salix scrub” with code S21 in EUNIS ver. 2021 versus “Subarctic and
alpine dwarf willow scrub” with code F2.1 in EUNIS ver. 2012. Domain experts have created
spreadsheets maintaining the relationships between habitats – describing how one habitat
maps to another using crossovers, i.e., if they are equivalent (=), superset (>), subset (<) or
overlap (#) to the designated habitats in other versions. Despite those efforts, the data is not
linked and contextualized to a larger context, and the semantics of such relations are only
known to those experts. In addition, the occurrences of species that are known to exist in
habitats are written using latin names as strings (Ursus arctos), without further connection
to the source of truth species URIs for looking up and dereferencing them as things (https:
//eunis.eea.europa.eu/species/1568). Similarly, data about sites have connections to habitats
and species, and are also of spatial form that are provided in shapefiles along with geometric
coordinates. This opens new challenges in terms of querying and performing geo-calculations
with polygons, in addition to having relations to habitats and species as triple patterns by using
GeoSPARQL. For a better presentation, we have summarised all the discussed problems and
challenges in Table 1.
    In this paper, we present our work towards a Knowledge Graph (KG) for biodiversity in
the context of Nature FIRST research project4 —dubbed Nature FIRST KG—that connects si-
los of data, namely sites, species and habitats by using cross-references, so-called crossovers.
The imported data is heterogeneous, ranging from shapefiles to tabular data, which are then
mapped, integrated and consolidated in a KG; afterwards the entities in KG are linked using
relations based on crossovers, constituting Linked Open Data (LOD). The crossover relations are
based on SKOS relations with well-defined meaning, namely for habitat mapping e x a c t M a t c h ,
b r o a d M a t c h , n a r r o w M a t c h , or c l o s e M a t c h ; in other cases we use a bespoke OWL (object) property
e.g., h a s D i a g n o s t i c S p e c i e s specifying indicator species for a habitat. The relation from a site to
a habitat also contains the coverage information in percentage that has motivated us to use the
RDF*5 (RDF-star) data model for representing the percentages in the relation itself in a compact
way akin to property graphs. We summarize our contributions:
     • Provide a KG that semantically links disparate information, allowing to traverse and get
       new insights powering new use cases pertaining to biodiversity;
     • Methodology for creating relations from crossovers in KG;
     • Publish and Consume KG using LOD frontends, graph view and SPARQL endpoint to
       comply with FAIR principles.
2
  European Nature Information System of the European Environment Agency (EUNIS/EEA), https://eunis.eea.europa.
  eu/
3
  The International Union for Conservation of Nature, https://www.iucn.org/
4
  https://www.naturefirst.info/
5
  https://www.w3.org/2021/12/rdf-star.html
                             Problem                   Authority sources                      Challenges
                           Silo-ed data      EUNIS/EEA, IUCN habitats & species        completeness, mapping
                        Dataset versions            EUNIS/EEA habitats                   mapping, semantics
                       String occurrences     EUNIS/EEA sites, habitats, species          entity extraction
                           Shape files               EUNIS/EEA sites                  GeoSPARQL computations
                        Natura 2000 sites            EUNIS/EEA sites                     reified statements



Table 1
Problems and challenges summarized.



2. Methodology
The data ingested in the KG comprises of habitats, species and Natura 2000 sites. There are
various data authorities when it comes to habitat and species data, such as EUNIS and IUCN.
The data sources are in different formats, schemas and completeness (c.f. Table 2). In addition,
within EUNIS there exist different version of habitats (ver. 2017, 2021) that map to a legacy one
(ver. 2012). The requirement is to consolidate the data into a KG, with each version having a
crossover link to the source of truth or legacy version. The advantage of this approach is that one
can report data that is already described using a taxonomy by specifying another taxonomy that
is interlinked. Each version has its own description, codes and granularity in terms of broader
relationships in the SKOS hierarchy. It is worth mentioning that Red List Species contained the
taxonomic rank (S p e c i e s - > G e n u s - > F a m i l y - > O r d e r - > C l a s s - > P h y l u m - > K i n g d o m ), whereas EUNIS
only contained ‘genus’ as a parent relationship. In both cases, s k o s : b r o a d e r relationships were
created in order to create the hierarchy.
   As seen from Table 2, some of the data is already in RDF, while others are non-RDF and
need to be transformed using ETL (Extract-Transform-Load). For the transformation, we used
UnifiedViews [3] tool that is able to do transformation of tabular data (CSV, XLS) using respective
Data Processing Units (DPUs). Each DPU contains a logic where one can configure the mappings
of how each column is mapped to a property in the ontology. Regarding the shapefiles, we
used GeoTriples [4] application that generates RML6 mappings from the provided Natura2000
shapefiles7 . We can distinguish three cases when building crossovers:
      • EUNIS vs IUCN habitats, species resp. by using the common labels (latin names);
      • EUNIS habitats with links to different versions by using the expert spreadsheet8 ,
        which uses codes such as =, <, > and #; A SPARQL query generates s k o s : e x a c t M a t c h ,
        s k o s : n a r r o w M a t c h , s k o s : b r o a d M a t c h and s k o s : c l o s e M a t c h after mappings are run;
      • Species mentioned only in latin name that we apply concept annotation via NLP tech-
        niques to determine their URI (EUNIS Species taxonomy), using relations such as
        : h a s D o m i n a n t S p e c i e s , : h a s D i a g n o s t i c S p e c i e s , or : h a s C o n s t a n t S p e c i e s .

  Regarding URI management — by following the Linked Data principles [5] — we reused source
authority URIs, e.g., http://eunis.eea.europa.eu/habitats/409 and in cases where we ought to
6
  https://rml.io/specs/rml/
7
  https://www.eea.europa.eu/data-and-maps/data/natura-14/natura-2000-spatial-data
8
  https://www.eea.europa.eu/data-and-maps/data/eunis-habitat-classification/eunis-habitat-classification-review-2017
generate the URI, we made sure that it conforms to our Linked Data frontend so that it becomes
dereference-able, e.g., https://sensingclues.poolparty.biz/HabitatClassificationScheme/237.


3. Nature FIRST KG
In Table 2 is shown the current snapshot of Nature First KG. Per each project (taxonomy) are
given the stats such as the input data, number of total concepts, the crossovers with respect to
other projects, and the total number of relations with respect to other projects. The relations
are only materialized in direct relationships without storing the inverse relations - as also seen
from the ‘no value’ (-) for #1 EUNIS Species.
                       No #     Project (Taxonomy)        Input data      # Concepts     Crossovers      # Crossovers
                        1         EUNIS Species             RDF              315316           -                -
                        2      EUNIS Habitats 2012          RDF               7495         #1 #10         38306 ; 388
                        3      EUNIS Habitats 2017           XLS              2214         #1 #2          1777 ; 2231
                        4      EUNIS Habitats 2021           XLS              3558         #1 #2          4869 ; 3765
                        5        Habitats Annex I            XLS              264            #4               586
                        6        General habitats            XLS               54             -                -
                        7          IUCN Species             RDF              15139           #1              2655
                        8         IUCN Habitats             CSV               252             -                -
                        9           Natura 2000         CSV, shapefile       27054         #1 #6        240790 ; 139802
                        10      Corine Land Cover           RDF                65             -                -

Table 2
Nature First KG in numbers.

    The Linked Data frontend can be used to browse the projects and is accessible here9 , whereas
the SPARQL endpoint for a specific project, e.g. for ‘EUNIS Habitats 2012’ can be accessed
here10 . Moreover, the graph visualisation for all the projects is accessible using GraphViews
application11 . The aforementioned URIs ensure that the approach complies with FAIR.
    We created explicit g e o n a m e s : n e a r b y relationships between Natura 2000 sites that in addi-
tion have relations to EUNIS species and ‘General habitats’ via the ontological relationships
: s i t e H a s S p e c i e s and : s i t e H a s H a b i t a t resp. Moreover, the percentage coverage has been in-
cluded to specify the percentage of habitat that the site contains using RDF*. We provide such
a query in the following that combines n e a r b y relations and percentage of habitats in RDF*,
which computes the TOP 5 largest habitats that are close to : A T 1 1 0 1 1 1 2 area12 .
PREFIX geonames : < h t t p s : / / www. geonames . o r g / o n t o l o g y # >
PREFIX s i t e : < h t t p s : / / s e n s i n g c l u e s . p o o l p a r t y . b i z / S i t e O n t o l o g y / >
PREFIX : < h t t p s : / / s e n s i n g c l u e s . p o o l p a r t y . b i z / N a t u r a 2 0 0 0 S i t e / >

SELECT ? l a b e l ( SUM ( ? p e r c e n t a g e ) a s ? sum ) ( g r o u p _ c o n c a t ( ? p e r c e n t a g e ) a s ? c n t )
WHERE
{
   : AT1101112 geonames : n e a r b y ? s i t e s .
  <> s i t e : p e r c e n t a g e C o v e r ? p e r c e n t a g e .

9
 https://sensingclues.poolparty.biz/
10
   https://sensingclues.poolparty.biz/PoolParty/sparql/Habitats
11
   https://sensingclues.poolparty.biz/GraphViews/
12
   https://sensingclues.poolparty.biz/PoolParty/sparql/Natura2000Site
}
group by ? l a b e l order by desc ( ? sum ) l i m i t 5

   Similarly, one can exploit : s i t e H a s S p e c i e s relations in order to build recommender systems
that can predict Ursus arctos movement in respect to sites, based on preferred habitats and
species. On top of this, one can use SPARQL query federation using SERVICE keyword in order
to query different SPARQL endpoints and join results based on common variables. This system
combined with a digital twin [6] is useful as it provides observations and reasoning that can be
leveraged in order to prevent a human-wildlife conflict.


4. Conclusions & Future work
We have created a first version of Nature FIRST KG that can be used to power different use cases
that pertain biodiversity, addressing the problems reported in Table 1. We plan to enrich our
KG and ingest new sources that are related to site conservation, threats, treatment actions and
plans. Similarly, we are planning to add Ecological Networks [7] as a backbone to our KG in order
to perform different reasoning tasks. This infrastructure will be used to build recommender
systems that predict the movement of Ursus arctos and other relevant species in the context of
Nature FIRST research project. We will also study related knowledge graphs on biodiversity
such as Ozymandias13 and relevant parts of Wikidata, in order to reuse, link and query those
KGs in conjunction with the Nature FIRST KG.


Acknowledgments
We thank Boris Hinojo and Emil Zegers for their assistance and constructive feedback.

References
[1] H.-O. Pörtner, D. Roberts, M. Tignor, E. Poloczanska, K. Mintenbeck, A. Alegría, M. Craig, S. Langsdorf, S. Löschke, V. Möller,
    A. Okem, B. Rama, D. Belling, W. Dieck, S. Götze, T. Kersher, P. Mangele, B. Maus, A. Mühle, N. Weyer, Climate Change 2022:
    Impacts, Adaptation and Vulnerability Working Group II Contribution to the Sixth Assessment Report of the Intergovern-
    mental Panel on Climate Change, 2022. doi:1 0 . 1 0 1 7 / 9 7 8 1 0 0 9 3 2 5 8 4 4 .
[2] C. Pruski, D. S. Hensel, The Role of Information Modelling and Computational Ontologies to Support the Design, Planning
    and Management of Urban Environments: Current Status and Future Challenges, Springer International Publishing, Cham,
    2022, pp. 51–70. URL: https://doi.org/10.1007/978-3-031-03803-7_4. doi:1 0 . 1 0 0 7 / 9 7 8 - 3 - 0 3 1 - 0 3 8 0 3 - 7 _ 4 .
[3] T. Knap, P. Hanecák, J. Klímek, C. Mader, M. Necaský, B. V. Nuffelen, P. Skoda, Unifiedviews: An ETL tool for RDF data
    management, Semantic Web 9 (2018) 661–676. URL: https://doi.org/10.3233/SW-180291. doi:1 0 . 3 2 3 3 / S W - 1 8 0 2 9 1 .
[4] K. Kyzirakos, D. Savva, I. Vlachopoulos, A. Vasileiou, N. Karalis, M. Koubarakis, S. Manegold, Geotriples: Transforming
    geospatial data into rdf graphs using r2rml and rml mappings, Journal of Web Semantics 52-53 (2018) 16–32. URL: https:
    //www.sciencedirect.com/science/article/pii/S1570826818300428. doi:h t t p s : / / d o i . o r g / 1 0 . 1 0 1 6 / j . w e b s e m . 2 0 1 8 . 0 8 . 0 0 3 .
[5] T. Heath, C. Bizer, Linked Data: Evolving the Web into a Global Data Space, Synthesis Lectures on the Seman-
    tic Web, Morgan & Claypool Publishers, 2011. URL: https://doi.org/10.2200/S00334ED1V01Y201102WBE001. doi:1 0 . 2 2 0 0 /
    S00334ED1V01Y201102WBE001.
[6] K. de Koning, J. Broekhuijsen, I. Kühn, O. Ovaskainen, F. Taubert, D. Endresen, D. Schigel, V. Grimm, Digital twins: dynamic
    model-data fusion for ecology, Trends in Ecology and Evolution (2023). doi:1 0 . 1 0 1 6 / j . t r e e . 2 0 2 3 . 0 4 . 0 1 0 .
[7] G. Torta, L. Ardissono, L. L. Riccia, A. Savoca, A. Voghera, Representing ecological network specifications with semantic web
    techniques, in: D. Aveiro, J. L. G. Dietz, J. Filipe (Eds.), Proceedings of the 9th International Joint Conference on Knowledge
    Discovery, Knowledge Engineering and Knowledge Management - (Volume 2), Funchal, Madeira, Portugal, November 1-3,
    2017, SciTePress, 2017, pp. 86–97. URL: https://doi.org/10.5220/0006573500860097. doi:1 0 . 5 2 2 0 / 0 0 0 6 5 7 3 5 0 0 8 6 0 0 9 7 .

13
     https://ozymandias-demo.herokuapp.com