=Paper= {{Paper |id=Vol-2969/paper5-s4biodiv |storemode=property |title=A Data-driven Approach for Core Biodiversity Ontology Development |pdfUrl=https://ceur-ws.org/Vol-2969/paper5-s4biodiv.pdf |volume=Vol-2969 |authors=Nora Abdelmageed,Alsayed Algergawy,Sheeba Samuel,Birgitta König-Ries |dblpUrl=https://dblp.org/rec/conf/jowo/AbdelmageedASK21 }} ==A Data-driven Approach for Core Biodiversity Ontology Development== https://ceur-ws.org/Vol-2969/paper5-s4biodiv.pdf
A Data-driven Approach for Core Biodiversity
Ontology Development
Nora Abdelmageed1,2 , Alsayed Algergawy1 , Sheeba Samuel1,2 and
Birgitta König-Ries1,2
1
 Heinz Nixdorf Chair for Distributed Information Systems
2
 Michael Stifel Center Jena
Friedrich Schiller University Jena, Germany


                                         Abstract
                                         The biodiversity research domain is composed of diverse scientific subdisciplines resting on various con-
                                         ceptual models developed over time, which results in a large number of biodiversity domain ontologies,
                                         each representing a part of the domain. On the one hand, these parts overlap to some degree. On the
                                         other hand, the meaning of concepts used often depends on the particular interpretation according to
                                         the background. In this paper, we propose BiodivOnto, a core ontology including a well-defined and
                                         limited set of concepts within the biodiversity domain. This core ontology provides a basis for linking
                                         different sub-ontologies. To this end, we develop a semi-automatic data-driven approach that uses clear
                                         links between domain experts and knowledge engineers. In particular, the proposed method uses the
                                         fusion/merge strategy by reusing existing ontologies and is guided by data from several data resources
                                         in the biodiversity domain. The used data as a driving force for the proposed approach has been col-
                                         lected from various resources, including tabular data, unstructured data, and metadata extracted from
                                         diverse open data repositories.

                                         Keywords
                                         Biodiversity, Knowledge Representation, Core Ontology




1. Introduction
Understanding biodiversity and the mechanisms underlying it is crucial to preserve this im-
portant foundation of human well-being. This demands the management and integration of
biodiversity data [1]. A large amount of heterogeneous data is collected and generated in
biodiversity research, which means integrating these heterogeneous data remains a big chal-
lenge. Semantic web in general and ontologies in particular play a vital role in coping with the

S4BioDiv 2021: 3rd International Workshop on Semantics for Biodiversity, held at JOWO 2021: Episode VII The Bolzano
Summer of Knowledge, September 11–18, 2021, Bolzano, Italy
" nora.abdelmageed@uni-jena.de (N. Abdelmageed); alsayed.algergawy@uni-jena.de (A. Algergawy);
sheeba.samuel@uni-jena.de (S. Samuel); birgitta.koenig-ries@uni-jena.de (B. König-Ries)
~ https://fusion.cs.uni-jena.de/fusion/members/nora-abdelmageed/ (N. Abdelmageed);
https://fusion.cs.uni-jena.de/fusion/members/alsayed-algergawy (A. Algergawy);
https://fusion.cs.uni-jena.de/fusion/members/sheeba-samuel (S. Samuel);
https://fusion.cs.uni-jena.de/fusion/members/birgitta-konig-ries (B. König-Ries)
 0000-0002-1405-6860 (N. Abdelmageed); 0000-0002-8550-4720 (A. Algergawy); 0000-0002-7981-8504
(S. Samuel); 0000-0002-2382-9722 (B. König-Ries)
                                       © 2021 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
    CEUR
    Workshop
    Proceedings
                  http://ceur-ws.org
                  ISSN 1613-0073       CEUR Workshop Proceedings (CEUR-WS.org)
integration and management of these heterogeneous data by allowing representing the relevant
concepts and relations of a considered domain in a machine-readable format [2]. As a result, sev-
eral domain-specific ontologies have been developed. For example, statistics on BioPortal1 show
that more than 890 ontologies with 13.387.405 concepts have been developed. Several domain
ontologies like ENVO2 and IOBC3 exist to model specific areas in the biodiversity domain [3].
However, there is a growing need to bridge the more refined biodiversity concepts and general
concepts provided by the foundational ontologies. Foundational ontologies span many fields,
modeling the basic concepts and relations that make up the world [4]. Core ontologies provide
a precise definition of structural knowledge in a specific field that spans different application
domains [5]. Hence, core ontologies provide a bridge between the foundational and subdomain
ontologies. Several efforts have been made in different domains to represent the basic categories
of the domain knowledge using core ontologies. Several approaches exist in the development of
core ontologies, including manual and (semi)automatic ways.
   In this paper, we present the design of a core ontology, BiodivOnto for the biodiversity
domain. We use a semi-automatic approach that includes the usage of fusion/merge strategy
[6] for the core ontology development. We developed a four-phase pipeline with biodiversity
experts and computer scientists involved at different stages. We collected and analyzed a set
of heterogeneous biodiversity data sources, including tabular data, unstructured data, and
metadata. To extract keywords from the collected data repositories, we used existing ontologies
from Bioportal4 and AgroPortal5 . We applied biodiversity experts’ recommendations to filter the
keywords of interest. We generated the core concepts using automated approaches of clustering.
The relations between these core concepts are discussed and determined by the domain experts.
   The rest of the paper is structured as follows: In Section 2, we discuss related work. We
describe the methodology of developing our core ontology in Section 3. We present our
evaluation plan and discuss open issues and future works in the development of the core
ontology in the biodiversity domain in Section 4. Finally, we conclude in Section 5.


2. Related Work
Biodiversity aims to study the totality and variability of organisms, their morphology and
genetics, life history and habitats, and geographical ranges. It is strongly related to ecosystems’
services, such as provision of water and food, and climate regulation. Therefore, it is critically
important to understand and conserve it properly [1]. Core ontologies provide a precise
definition of structural knowledge in a specific field that connects different application domains
[7, 8, 5]. They are located between upper-level (foundation) and domain-specific ontologies,
defining the core concepts of a specific field. They aim at linking general concepts of a top-level
ontology to more domain-specific concepts from a sub-field.
   There is a large number of available foundational ontologies [9], such as BFO [10], GFO[11],
SUMO[12], PROTON[13] and, etc. At the same time, there is extensive work to formalize
   1
     https://bioportal.bioontology.org/, visited on 05.07.2021
   2
     https://bioportal.bioontology.org/ontologies/ENVO
   3
     https://bioportal.bioontology.org/ontologies/IOBC
   4
     https://bioportal.bioontology.org/
   5
     http://agroportal.lirmm.fr
                                                                      Concepts &
                       Data                 Term          Term
                                                                       Properties
                    Acquisition           Extraction   Filtration
                                                                     Determination


Figure 1: Proposed four-phase pipeline.


knowledge in the biodiversity domain, which results in many domain-specific ontologies. For
example, there are 890 ontologies in BioPortal among them ten are titled core ontologies. The
core ontology for biology and biomedicine (COB)6 and the ontology for core ecological entities
(ECOCORE)7 are the only two relevant biodiversity core ontologies. The COB ontology has 73
concepts and 30 relations, while the ECOCORE ontology has more than 2400 concepts. The start
of developing both ontologies was in 2020, which indicates a growing interest in developing
such core ontologies. However, for both of them, detailed information on how these ontologies
have been developed is missing.
   A few core ontologies have been introduced in the biodiversity domain; however, several core
ontologies developed in other related domains. The work introduced in [14] propose the design
of a core ontology to deal with the different types of research activities performed in empirical
research, encompassing (physical) sampling, sample preparation, and measurement. SemSur is
a core ontology for the semantic representation of research findings[7]. The GeoCore ontology
has been developed to be used as a core ontology for general use in the geology domain [8]. It
makes use of the BFO ontology as an upper-level ontology.
   According to [5], core ontologies should combine various features, such as axiomatization,
modularity, extensibility, and reusability. Developing a core ontology following these features
leads to an elegant way to achieve good interoperability in a complex domain, such as the
biodiversity domain. There are different strategies to develop ontologies considering these
features, such as fusion/merge and composition/integration strategies[6]. In this work, we use
the fusion/merge strategy that builds an ontology by bringing together knowledge from source
ontologies.


3. Methodology
The proposed data-driven approach is implemented using the pipeline shown in Figure 1. In
the following, we describe main steps of the proposed pipeline.

3.1. Data Acquisition
A first and crucial step is collecting and preparing a sufficient and relevant set of data sources
from which we can extract core terms in the biodiversity domain. These data sources should be
diverse, including structured data (tabular) and unstructured data (publications). To achieve
this goal, we have developed a crawling method, as shown in Figure 2. We have considered
two important factors during this step: (i) data resources, from which data sources will be
   6
       http://purl.obolibrary.org/obo/cob.owl
   7
       http://purl.obolibrary.org/obo/ecocore.owl
               ENVIRONMENT,
                 MATERIAL,                 Select
                 PROCESS,                    20
        QEMP      QUALITY                Keywords




                                                                   100           100        > 50
                                            Crawl                              Abstracts   Tables
                 Semedico                                        Abstracts


                                                                                 50        50
                                                                               Metadata Summaries
                                            Manual
                       BEFChina             Search                  50
          data.world                           +                 Datasets
                                        License Checks



Figure 2: Crawling phase [17].


extracted from and (ii) a set of keywords that will be used to query these data resources. For
the first point, we consider two well known data portals with very different characteristics
(BEFChina8 and data.world 9 ) to get tabular data. PubMed10 with more than 32 Million abstracts
is deemed to be the data resources for unstructured data. Once identified data resources, the
next step is to collect a set of domain-specific keywords that will be used to query these data
resources. To this end, we relax a version of the QEMP corpus [15] and a number of keywords,
such as ‘abundance’, ‘benthic’, ‘biomass’, ‘carbon’, ‘climate change’, ‘decomposition’, ‘earthworms’,
‘ecosystem’ are selected. The selected set of keywords is used later as input to the Semedico
search engine[16] to get relevant publications from PubMed. Among them, 100 abstracts have
been chosen, as shown in Figure 2 reflecting the biodiversity domain by applying an iterative
manual process for revision and cleaning for the crawled data. The result of this phase is a
data repository11 which contains 100 abstracts, more than 50 tables, some datasets are given by
multiple tables and, 50 metadata files. Our selected number of these data sources achieves the
balance between biodiversity domain coverage and reasonable human labor time.

3.2. Term Extraction
Once relevant data sources have been collected, the next step is to process them to extract
domain-specific terms. To this end, we manually annotated the collected data using GATE tool12
for document annotation. We have followed the annotation guidelines in [15] making use of
the same ontologies and adding more important ontologies and knowledge bases, like IOBC,


    8
      https://china.befdata.biow.uni-leipzig.de/
    9
      https://data.world/
   10
      https://www.nlm.nih.gov/bsd/licensee/baselinestats.html
   11
      https://github.com/fusion-jena/BiodivOnto/tree/main/data
   12
      https://gate.ac.uk/documentation.html
SWEET 13 , ECOCORE14 , ECSO 15 , CBO 16 , BCO 17 and the Biodiversity A-Z dictionary18 to cover
wider ranges of terms. We also make use of the BioPortal Annotator19 with the selected ontolo-
gies above to fetch the possible annotations for a given term. The extraction and annotation
process is not a simple task as it has several challenges to be addressed. On the one hand, some
keywords are ambiguous; we could not decide to include them. We keep those keywords in a
separate list as Open Issues. On the other hand, our main challenge is the handling of compound
words. For example, photosynthetic O2 production is expanded into the following keyword list:
[“photosynthetic”, “O2”, “O2 production”, “photosynthetic O2 production”]. We have enriched
the extracted list of terms using other existing resources: 1) annotated keywords in QEMP
corpus, 2) keywords from AquaDiva20 project and 3) soil-related keywords [18]. These existing
resources have 578, 222, and 410 keywords, respectively.

3.3. Term Filtration
To get the final relevant terms, we have discussed the Open Issues list with domain (biodiversity)
experts. Based on their votes on each term, we have decided on whether to include it or not.
Some keywords are already filtered out manually at this stage. We applied an automatic filtration
step for consistency, where we normalized keywords to be case insensitive and in a singular
form. Furthermore, we manually revised the final list of keywords to exclude spelling mistakes.
At the end of this step, we have 1107 unique keywords, which is 1.8x of QEMP corpus in size and
covers a broader range of biodiversity. Figure 3 illustrates the effect of this phase on the original
keywords per each data source of our work, where the figure shows that the most significant
number of unique keywords is collected using abstracts from PubMed using the Semedico
search engine. However, Figure 3 shows that BEFChina has the least number of collected unique
keywords. In addition, we have calculated the number of simple and complex keywords as
in Figure 4. The used subset of AquaDiva project has only simple keywords, however, the
soil-related keywords are only complex. QEMP and our work have a mixture of both, but our
work achieves a better balance.

3.4. Concepts and Relations Determination
In this section, we cover how we have reached our core concepts and their interlinks.

3.4.1. Concepts Determination
Given the vast output list from the previous step, we have automatically calculated the inter-
section among our work, QEMP, and AquaDiva lists. Such intersection yields a narrowed list

   13
      https://bioportal.bioontology.org/ontologies/SWEET
   14
      https://bioportal.bioontology.org/ontologies/ECOCORE
   15
      https://bioportal.bioontology.org/ontologies/ECSO
   16
      https://bioportal.bioontology.org/ontologies/CBO
   17
      https://bioportal.bioontology.org/ontologies/BCO
   18
      https://www.biodiversitya-z.org/
   19
      https://bioportal.bioontology.org/annotator
   20
      http://www.aquadiva.uni-jena.de/
                                  1750                                                All
                                                                                      Filtered
                                  1500

                                  1250


                       Keywords
                                  1000

                                   750

                                   500

                                   250

                                     0
                                           Semedico          BEFChina          data.world

Figure 3: Our extracted keywords vs. external data sources.


                                           Simple
                                  600      Compound

                                  500

                                  400
                       Keywords




                                  300

                                  200

                                  100

                                    0
                                         AquaDiva     QEMP              Soil     Our Work

Figure 4: Simple vs. compound keywords in our work and compared to existing data sources.


of keywords which we define as Seeds Candidates21 . For example, carbon, climate, composition,
forest, size and, ... etc. We have considered those 30 terms, as they are the most critical key-
words and common among various projects dealing with biodiversity. We have then applied a
distance-based clustering technique to assign each of the remaining words to the closest seed.
Word embeddings [19], [20], [21] are a good representation for words to capture their semantic
meaning. For example, grassland is similar to habitat in the embedding space, so these pairs
of words could be grouped in one cluster. Same case applies for abundance and size. Word
embeddings are commonly used in applications that involve word-word similarity. Seeds and
   21
        https://github.com/fusion-jena/BiodivOnto/blob/main/outcome/seeds.md
Figure 5: A sample of seeds WordNet similarity, TRUE has a 𝑡ℎ𝑟𝑒𝑠ℎ𝑜𝑙𝑑 >= 0.7


words are represented by 300D word embedding vectors using word2vec. Our selected metric is
the cosine similarity. Afterwards, we have manually revised the created clusters multiple times.
For each revision iteration, we check how the remaining keywords are grouped, discuss the
results with biodiversity experts, and modify the selected seeds by tending to more general
concepts. In the last iteration, we performed the WordNet[22] similarity among the remaining
seeds, clusters centroids, such that, if the similarity is 0.0, very unique seed, we pick it as a
core concept. Figure 5 illustrates a sample of our seeds with WordNet similarity > 0.7. If we
have some similarities with other seeds, we have checked BioPortal for those seeds and have
picked the common ancestor for them. In the previous step, we have used PATO22 , and SWEET
ontologies for looking to a common ancestor Abstract Seeds23 . We have discussed our final list
of seeds, Seeds (Final - Expert)24 , or core concepts with biodiversity experts. We have based our
naming on their recommendation, for example characteristic is changed to trait.
   Figure 6 shows the cluster’s members of the Quality core concept. It correctly captures terms
with measurements and attributes like width, depth, size, organic nitrogen content, space, and
speed. However, it has included non-characteristic terms like tree community and experimental
site. The scope of this paper does not yet cover a more detailed and quantitative evaluation.
The results of the remaining clusters are available in our GitHub repository25 .

3.4.2. Final Outcome
We have discussed the possible relations that could co-occur among our core concepts. Figure 7
represents our core categories, and domain experts have validated their core links (relations).
We have changed the relation between Quality and Trait, compared to the previous version [17],
since we have involved more biodiversity experts, they all agreed on that new relation. Each
category has a set of terms as a result of the clustering algorithm. To implement the fusion/merge
strategy, we make use of the ontology modularization and selection tool (JOYCE)[23] to extract

   22
      https://bioportal.bioontology.org/ontologies/PATO
   23
      second column in seeds.md file
   24
      the last column in seeds.md file
   25
      https://github.com/fusion-jena/BiodivOnto/tree/main/outcome/clusters
          2.0                                                                                                                                                         below-ground



                                                                                                                                                                                  longevity study
          1.5


                                                                                                grassy biome
          1.0                                                                                                               insect pollinator                                             preservation
                                                                                                                                                                                 plant nitrogen
                                                                                                                                                                                nitrous         concentration
                                                                                                                                                                                        oxide emissions

                                                                                  date time                                     tree community
                                                                                                                                         width
                                                                                                                                            individual tree
          0.5                                                                                                                                                       diversity
                                                                                                                                                                 climatic condition
Word X1




                                                                                          depth
                                                                                         abundance
                                                                                             photosynthetic o2 production
                                                               efficiency                                                             woody plant
                                  focal
                                Vegetation
                          thickness
                               qualitative
                          livenatural
                         High-throughput  tree    cooked
                                                 layer    range
                                                             food
                                                              leaf
                                                          height
                                                  sequencing                                                                                     distribution        soil microbe
                                biological
                                experimental
                                straight
                              plant  ecological
                               volatile
                           fatty       family
                                 coastal
                                 collection
                                     above
                               organic
                                  weight
                                 species
                                neighbour      mechanism
                                                name
                                            wetland
                                              ground
                                          nitrogen
                                         ga
                                          compound
                                    ground-based
                                  short
                                biodiversity
                                  acid
                                    concentration
                                            loss
                                          ornamental
                                    secondary
                                    size             site
                                                     functioning
                                                richnesscontent
                                                          plots
                                                  relationship
                                                            exotic     plant
                       lengthlight
                             benthic
                             plant      penetration
                                    altitude
                                    soil
                                  trophic
                              neighbour   crust
                                    condition
                                         habitat
                                             levels
                                             diversity
                                    consumption
                                     atmospheric
                                     diversity
                                 coarse
                                 fungal
                                  texture
                                 nitrous
                           coverage        silt
                                           family
                                bacterioplankton
                                           oxide
                                    clpropagation        co2   concentration
                                                           production
                    solid
                       long  plant above-ground
                                    air
                                content
                                sulphur
                             living
                            density   leaf
                                 capacity
                              volume
                                microbial
                               quality
                              coastal      content  soilliving
                                       anti-microbial
                                             activity
                                      sub-tropical
                                         saline          forest  resource
                                                                soil  lipid potato leaf                                  richness
                                                                                                                     soil layer                                                  functional biodiversity
          0.0                                                                                                                                                                        leaf trait
                                                      Instrumentation          sea ice cover
                                                   rain   gauge
                                                     green space                             coral habitat heterocylic nitrogenous compound
                                                                                                       consistency          leaf surface
                                                                                                    coral reef
                                                                      organic carbon           scientific plant average temperature
                                                               preserve
                                                                     flowering seasoncontent
                                                                              sample                        seadegradation
                                                                                                       habitat level
                                                                                                                 kinetic energy
                                                                                                                 intensity                      insect pollination
          0.5                                                                                                                                      woodland type
                                                                                                                                 treatment total soil organic carbon content
                                                                                                     dry
                                                                                                                                         measurement Laser precipitation monitor
                                                                                  sampling           manufacturing basal area                       fertility
                                                                                                                 female sample size
          1.0                                                                                                                              lytic enzyme
                                                                                                                 erode                    nitrification rate                       speed

                                                                                                                                                                        ratio
          1.5
                                       0.5                           0.0                           0.5                            1.0                           1.5                           2.0
                                                                                                Word X0

Figure 6: “Quality” cluster in the final iteration. X and Y axis represents the word vectors after the
dimensionality reduction.


relevant modules from each category. Table 1 shows the results of this process. The next step is
to combine (merge) the set of modules in each category to get a core ontology representing
the category. All the resources related to the design of the core ontology as well as the current
preliminary results are publicly available26 .

                    Category                             Ontology Modules                                                    Terms sample inside category
                  Environment                        ENVO, ECOCORE, ECSO, PATO                                                    groundwater, garden
                   Organism                          ENVO ECOCORE, ECSO, BCO                                                         mammal, insect
                  Phenomena                               ENVO, PATO, BCO                                                      decomposition, colonization
                     Quality                           ENVO, PATO, CBO, ECSO                                                           volume, age
                   Landscape                                   ENVO                                                                  grassland, forest
                      Trait                                     BCO                                                                 texture, structure
                   Ecosystem                         ENVO, ECOCORE, ECSO, PATO                                                        biome, habitat
                     Matter                                 ENVO, ECSO                                                                 carbon, H2O
Table 1
Core concepts in existing ontologies with examples[17].



          26
               https://github.com/fusion-jena/BiodivOnto
                           contain                              contain
   Landscape                                   Ecosystem                        Matter
                                    part_of
contain                                                                               have
                                                 is_a
                                is_a

                     have                               occur
    Organism                       Environment                   Phenomena

               have                                 have


       Trait                                                                     Quality
                    is_a

Figure 7: Core concepts and their relations.


4. Discussion and Open Issues
We used a novel data-driven and semi-automatic approach involving both domain experts and
computer scientists to develop a core ontology. This approach is different from the traditional
approach of developing ontologies manually. We reduce the manual effort of developing core
ontology using this semi-automatic data-driven approach. We also extract the crucial concepts
from the existing biodiversity domain ontologies to develop our core one. However, there are
many open questions regarding the development, quality, and evaluation of our developed core
ontology. In the current state, we have determined only the core concepts of BiodivOnto. The
domain expert at present suggests the relation between the core concepts for the conceptual
BiodivOnto core model. We need to determine how the relations between the core concepts
could be connected. The relation between core concepts can be determined using the same
approach as the core categories are determined. We could reuse the existing properties from the
current ontologies to determine the relationship between the core concepts. The other approach
is to use the relations validated by the domain experts.
   The involvement of domain experts is required for qualitative ontology development. In our
methodology, a biodiversity domain expert has been involved in each stage of our pipeline.
We have included the other domain experts only after the core concepts creation, only for
final evaluation and validation. We have made Quality and Trait be synonyms based on their
opinion. Hence, we plan to evaluate the ontology with more domain experts to make the core
ontology concrete. The members of each cluster have correctly captured the terms related
to the core concept. However, many terms include non-relevant of the core concept. As a
result, a detailed and quantitative evaluation is required, in addition to the domain expert
evaluation. We also need to compare between data-driven engineering approach for ontology
development and manual ontology development using domain experts. In our next phase, we
need to bring together the collected modules as an ontology. Currently, it is a conceptual data
model with modules from existing ontologies put together. Last but not least, after the complete
development of BiodivOnto, we plan to use this model in different biodiversity applications.


5. Conclusions and Future Work
In this paper, we present a semi-automatic approach to build BiodivOnto, a core ontology model
for Biodiversity domain. Our proposed method makes use of the fusion/merge strategy by
reusing existing ontologies and it is guided by data from several data resources in the biodiversity
domain. It consists of four steps: data acquisition, term extraction, term filtration and finally,
concepts and relation determination.
   Since the qualitative evaluation is done by a domain expert. Our future plan considers involv-
ing more domain experts. In addition, a quantitative evaluation of our approach, for example,
the quality of the automatically created clusters. Moreover, after the complete development of
BiodivOnto, we plan to use it in various Biodiversity applications.


Acknowledgments
The authors thank the Carl Zeiss Foundation for the financial support of the project “A Virtual
Werkstatt for Digitization in the Sciences (K3, P5)” within the scope of the program line “Break-
throughs: Exploring Intelligent Systems for Digitization” - explore the basics, use applications”.
Alsayed Algergawy’ work has been funded by the Deutsche Forschungsgemeinschaft (DFG) as
part of CRC 1076 AquaDiva. Our sincere thanks to Tina Heger (Berlin-Brandenburg Institute
of Advanced Biodiversity Research (BBIB)), Anahita Kazem (German Centre for Integrative
Biodiversity Research (iDiv)) and Jitendra Gaikwad (FUSION, Friedrich Schiller University Jena),
as the domain experts.


References
 [1] L. M. R. Gadelha, et al., A survey of biodiversity informatics: Concepts, practices, and
     challenges, Wiley Interdiscip. Rev. Data Min. Knowl. Discov. 11 (2021). URL: https://doi.
     org/10.1002/widm.1394. doi:10.1002/widm.1394.
 [2] R. Studer, V. Benjamins, D. Fensel, Knowledge engineering: Principles and meth-
     ods,     Data & Knowledge Engineering 25 (1998) 161 – 197. URL: http://www.
     sciencedirect.com/science/article/pii/S0169023X97000566. doi:https://doi.org/10.
     1016/S0169-023X(97)00056-6.
 [3] V. Senderov, K. Simov, N. Franz, P. Stoev, T. Catapano, D. Agosti, G. Sautter, R. A. Morris,
     L. Penev, Openbiodiv-o: ontology of the openbiodiv knowledge management system,
     Journal of biomedical semantics 9 (2018) 1–15.
 [4] N. Guarino, Formal ontology in information systems: Proceedings of the first international
     conference (FOIS’98), June 6-8, Trento, Italy, volume 46, IOS press, 1998.
 [5] A. Scherp, C. Saathoff, T. Franz, S. Staab, Designing core ontologies, Applied Ontology 6
     (2011) 177–221.
 [6] H. S. Pinto, J. P. Martins, Ontologies: How can they be built?, Knowledge and information
     systems 6 (2004) 441–464.
 [7] S. Fathalla, S. Vahdati, S. Auer, C. Lange, SemSur: A core ontology for the semantic
     representation of research findings, in: SEMANTICS, 2018.
 [8] L. F. Garcia, et al., The GeoCore ontology: A core ontology for general use in geology,
     Computers & Geosciences 135 (2020).
 [9] C. Trojahn, R. Vieira, D. Schmidt, A. Pease, G. Guizzardi, Foundational ontologies meet
     ontology matching: A survey, Semantic Web (2021).
[10] R. Arp, B. Smith, A. D. Spear, Building ontologies with basic formal ontology, Mit Press,
     2015.
[11] H. Herre, General formal ontology (gfo): A foundational ontology for conceptual modelling,
     in: Theory and applications of ontology: computer applications, 2010, pp. 297–345.
[12] I. Niles, A. Pease, Towards a standard upper ontology, in: Proceedings of the international
     conference on Formal Ontology in Information Systems-Volume 2001, 2001, pp. 2–9.
[13] I. Terziev, A. Kiryakov, D. Manov, et al., Base upper-level ontology (bulo) guidance, SEKT
     deliverable 1 (2005).
[14] P. M. Campos, C. C. Reginato, J. P. A. Almeida, Towards a core ontology for scientific
     research activities, in: ER, 2019.
[15] F. Löffler, N. Abdelmageed, S. Babalou, P. Kaur, B. König-Ries, Tag me if you can! semantic
     annotation of biodiversity metadata with the qemp corpus and the biodivtagger, in:
     Proceedings of The 12th Language Resources and Evaluation Conference, 2020, pp. 4557–
     4564.
[16] E. Faessler, U. Hahn, Semedico: A comprehensive semantic search engine for the life
     sciences, in: Proceedings of ACL 2017, System Demonstrations, Association for Computa-
     tional Linguistics, 2017, pp. 91–96. URL: https://www.aclweb.org/anthology/P17-4016.
[17] N. Abdelmageed, A. Algergawy, S. Samuel, , B. König-Ries, Biodivonto: Towards a core
     ontology for biodiversity (2021).
[18] V. Udovenko, A. Algergawy, Entity extraction in the ecological domain–a practical guide,
     BTW 2019–Workshopband (2019).
[19] Y. Goldberg, O. Levy, word2vec explained: deriving mikolov et al.’s negative-sampling
     word-embedding method, arXiv preprint arXiv:1402.3722 (2014).
[20] J. Pennington, R. Socher, C. D. Manning, Glove: Global vectors for word representation, in:
     Proceedings of the 2014 conference on empirical methods in natural language processing
     (EMNLP), 2014, pp. 1532–1543.
[21] M. E. Peters, M. Neumann, M. Iyyer, M. Gardner, C. Clark, K. Lee, L. Zettlemoyer, Deep
     contextualized word representations, arXiv preprint arXiv:1802.05365 (2018).
[22] T. Pedersen, S. Patwardhan, J. Michelizzi, et al., Wordnet: Similarity-measuring the
     relatedness of concepts., in: AAAI, volume 4, 2004, pp. 25–29.
[23] E. Faessler, F. Klan, A. Algergawy, B. König-Ries, U. Hahn, Selecting and tailoring ontologies
     with JOYCE, in: Knowledge Engineering and Knowledge Management - EKAW 2016
     Satellite Events, EKM and Drift-an-LOD, Bologna, Italy, November 19-23, 2016, Revised
     Selected Papers, volume 10180, 2016, pp. 114–118.