BioOnto: Towards an Integration of
       Biological and Biogeographic Data
       Marcos ZARATE a,b,1 , Agustina BUCCELLA c and Pablo FILLOTTRANI d,e
  a Centro para el Estudio de Sistemas Marinos, Centro Nacional Patagónico, Consejo

                    Nacional de Investigaciones Cientı́ficas y Técnicas
b LINVI, Faculty of Engineering, Universidad Nacional de la Patagonia San Juan Bosco
  c GIISCO Research Group, Computer Science Department, Universidad Nacional del

                                        Comahue
     d Computer Science and Engineering Department, Universidad Nacional del Sur
        e Comisión de Investigaciones Cientı́ficas de la provincia de Buenos Aires


           Abstract In this work we present the preliminary design of BioOnto, an ontology-
           based system designed to integrate two important databases for Argentine science,
           the National System of Biological Data (SNDB) and the Ocean Biogeographic In-
           formation System (OBIS). BioOnto uses Web Ontology Language (OWL) to make
           assertions about classes in the underlying model and to define object properties
           that are used to link instances therein. To accomplish the integration, we follow the
           Single Ontology Approach. We illustrate the usefulness of BioOnto presenting frag-
           ments of the ontology that supports inference about (1) the relationship between
           preys and predators (2) which species coexist in a marine region (in particular the
           Argentine region is of our interest).
           Keywords. Database Integration, Ontology, Biogeography, Biodiversity, Darwin
           Core


1. Introduction

Biogeography [1] is a scientific discipline that studies the distribution of living beings on
earth, as well as the processes that have originated it, that modify it and that can make
it disappear. It is an interdisciplinary science, which is both a branch of geography and
biology, receiving its foundations of specialties such as botany, zoology, ecology or evol-
utionary biology and other sciences such as geology. Currently there is a steadily grow-
ing wealth of biogeographic and biological data from a wide range of disciplines that
being available from on-line information systems [2,3,4] around the world. In particu-
lar, as the number of species and geographic regions under study and also the particip-
ating institutions and research groups continue growing, addressing elaborate research
issues about the complex interrelationships among these data becomes increasingly diffi-
cult. In the case that the information sources cannot be properly coordinated, data access
and information discovery can sometimes become a daunting task. This drives the need
 1 Corresponding Author: (CESIMAR-CENPAT-CONICET), Boulevard Brown 2915 (U9120ACD), Puerto

Madryn, Chubut, Argentina; E-mail: zarate@cenpat-conicet.gob.ar
to further facilitate the information integration across data sources, taking into account
different querying interests in a diversity of fields and disciplines.
     SNDB2 supported by the Ministry of Science, Technology and Productive Innova-
tion of Argentina and OBIS3 supported by the Intergovernmental Oceanographic Com-
mission of UNESCO, are currently two of main reference databases for ocean biolo-
gists, ecologists, and other researchers in Argentina. Both use Darwin Core Standard
(DwC) [5] which is a general purpose vocabulary designed to facilitate the transfer and
integration of biodiversity data. Though the DwC is defined in an RDF [6] document4 ,
integration of biodiversity data in the Semantic Web (SW) [7] is in its early stages. One
of the major challenges for DwC in the SW context is the lack of a well-defined on-
tology. Without rigorous relationships between concepts and the properties that define
them, connections between biodiversity data and related semantically rich information,
such as literature and genomes, are difficult to traverse [8,9,10,11].
     To facilitate the integration of these two databases, we present the design of BioOnto,
an ontology that enables the integration following the Single Ontology approaches which
uses one global ontology providing a shared vocabulary for the specification of the se-
mantics and all information sources are related to one global ontology. The choice of this
approach is due to the fact that it is ratified for many years as demonstrated in [12] its
implementation is successful in cases where the nature of the data is similar. In addition,
BioOnto is designed to link data together with other systems compliant with the RDF
which follows the principles established by Linked Open Data initiative (LOD)5 .
     The paper is organized as follows: Section 2 shows the state of art and the prob-
lems of integration that exist today. Section 3 describes the architecture, design consid-
erations, and features of BioOnto. Section 4 presents a use case that demonstrates how
the information from both data sets are integrated to better understand the behavior of a
particular specie. Finally Section 5 presents the conclusions and the future work.


2. Related Work

In this section we will review the work related to the integration of biodiversity and
biogeography domain. The first work consulted is [13] developed by OBIS-ENV-DATA,
which proposes an extension of DwC to expand OBIS with environmental data to ef-
fectively manage combined datasets, although this generates advantages in the incorpor-
ation of environmental data, still persists the integration problem which has the DwC
standard itself. In MarineTLO [14] the authors defined a core ontology for publishing
marine data concerning to the European iMarine project6 , which is suitable for setting up
warehouses that can serve complex queries. The main objective is to assemble informa-
tion from different datasets to give more details of a particular marine species. However
the DwC standard is not used to build the underlying ontology, limiting the possibility
of integrating biodiversity data that respects the DwC standard. BiSciCol [15] describes
an architecture to convert biodiversity data (in standard tabular formats) such as Darwin
  2 http://datos.sndb.mincyt.gob.ar/
  3 http://beta.iobis.org/
  4 http://rs.tdwg.org/dwc/rdf/dwcterms.rdf
  5 http://linkeddata.org/
  6 http://www.i-marine.eu/Pages/Home.aspx
Core-Archives [16] into RDF representations. In this case an ontology is described us-
ing the terms of the DwC but there are no examples of inferences or queries that may
be of help to researchers interested in integrating different databases; on the other hand,
currently the project page is not accessible. We plan to use DwC standard to capture
complex aspect of biodiversity domain. In [17] the authors describe RDF-based data
structures that are to be employed in the creation of a centralized repository of metadata
from the heterogeneous data sources in the RITMARE research network7 . A different
approach is taken in [18] where the authors propose the creation of micro-linked open
data clouds formed by oceanographic LOD-compliant datasets. A number of ontology
design patterns called GeoLink Modular Oceanography Ontology are designed in [19]
for the Oceanography domain. The resulting patterns are sufficiently modular, and thus
arguably easier to extend than foundational top-level ontologies. Currently, the GeoLink
project is in the middle of populating the patterns with actual data and a very preliminary
evaluation demonstrated that the patterns together can serve as an integrating layer of
heterogeneous oceanographic data repositories.
     Current research is trying to address the existing gap to integrate biological, ocean-
ographic and biogeographic data. However, critical look at the available literature in-
dicates that there are limitations related to: (1) the absence of robust, standardized, and
widely-accepted vocabularies and ontologies for linkable biodiversity and biogeography
data. (2) the disagreements presently governing the use of identifiers in biodiversity and
biogeography data is a major impediment to integrate these data.


3. Proposed Architecture and Ontology Details

The proposed architecture, showed in Figure 1, is divided into three layers: (1) In-
put Data representing the databases OBIS and SNDB in which the data can be down-
loaded in CSV format, then this information is converted to RDF using OpenRe-
fine tool [20] where data inside these files are cleaned and converted to standard-
ized data types (dates, numerical values, etc.). Data are converted to RDF triples us-
ing RDF Refine8 , which is an extension of OpenRefine. The columns of the CSV
file are mapped as instances of DwC classes. Every resource must have a URI that
can be used to link that resource to other resources both within this dataset and oth-
ers anywhere on the web. The base URI that is common to the main classes is:
bio-onto:http://www.cenpat-conicet.gob.ar/ontology/. In the CSV file we
have a column with unique identifier, “ID”, to use as unique values in identity URIs. We
use GREL (General Refine Expression Language9 ) to generate a new URI, the expres-
sion that we defined is "occurrence/"+value.urlify() this concatenates to the base
URI the string "ocurrence/" along with the value of the ID, then a URI generated after
applying the expression would look like this: bio-onto:occurrence/valueID. The
resulting RDF to describe an occurrence is:

           SUBJECT                                PREDICATE                  OBJECT
bio - onto : occurrence / f6bbf85d -85 ea -4605   rdf : type           dwc : Occurrence

  7 http://www.ritmare.it/en/
  8 http://refine.deri.ie/
  9 https://github.com/OpenRefine/OpenRefine/wiki/GREL-Functions
dwc: is an abbreviation for the real namespace http://rs.tdwg.org/dwc/terms/,
Table 1 describes the mapping performed together with the columns of CSV file used to
generate the main URIs. The complete mapping can be consulted at10 . (2) Global Onto-
logy and Storage in which the RDF triples are stored in GraphDB11 . This is a highly ef-
ficient and robust graph database with OWL inferences and SPARQL [21] support. Then
the users can access the data resources easily using the different visualizations provided
by GraphDB12 . (3) Information Retrieval which provides a transparent interface to the
non-expert user, so that they can perform queries without needing to know details of the
SPARQL query language.


Figure 1. Proposed Architecture: the data of both datasets are converted to RDF, then the mapping corres-
ponding to the classes defined in the ontology is performed. Finally the data can be queried through SPARQL
queries.


3.1. A Multidisciplinary Approach

The development of the ontology is initiated within CENPAT13 . The different research
lines at CENPAT include marine biology, aquatic resource management, oceanography,
paleontology, and biological diversity, among others. A working group at CENPAT is
focused on working with issues related to the management and conservation of marine
resources, this research group called CESIMAR acronym for (Center for the Study of
Marine Systems) have a long tradition on studying the biology and behavior of marine
mammals. Because most of the research done generates a large volume of data, but de-
pending on its nature, they are published in SNDB (if they are biological data) or OBIS
(if they are biogeographic data). Up to date there has not been developed a tool which
allows integrating both databases. For example it is of great interest to know which spe-
cies coexist in a certain marine region, or to establish trophic relations between marine
mammals.
  10 https://github.com/cenpat-gilia/BioOnto/blob/master/scripts/mapping.json
  11 http://www.ontotext.com/products/ontotext-graphdb/
  12 http://web.cenpat-conicet.gob.ar:7200/sparql
  13 Patagonian National Research Centre (CENPAT), http://www.cenpat-conicet.gob.ar/
Table 1. The first part of the table shows the main classes corresponding to the categories of the DwC standard.
You can also see the columns of the CSV file that were used to generate the corresponding URIs. The second
part of the table shows the properties used and an example of the literals that are obtained from the columns
of the file. For simplicity only the main ones are shown. geo-pos: is an abbreviation for the real namespace
https://www.w3.org/2003/01/geo/ and foaf:http://xmlns.com/foaf/spec/ .
 Class                      Columns used to create URI        URI example
 dwc:Taxon                  genus + specificEpithet           <bio-onto:taxon/Mirounga leonina>
 dwc:Occurrence             id                                <bio-onto:occurrence/f6bbf85d-85ea-4605>
 dwc:Event                  id                                <bio-onto:event/f6bbf85d-85ea-4605>
 dwc:Dataset                dataset                           <bio-onto:dataset/dwca-mamcenpat-v1.1>
 dc:Location                id                                <bio-onto:location/f6bbf85d-85ea-4605>
 foaf:Agent                 institutionCode                   <bio-onto:agent/cenpat-conicet>

 Property              Columns used                           Example
 dwc:class             class                                  ”Mammalia”∧∧xsd:string
 dwc:family            family                                 ”Phocidae”∧∧xsd:string
 dwc:genus             genus                                  ”Mirounga”∧∧xsd:string
 dwc:kingdom           kingdom                                ”Animalia”∧∧xsd:string
 dwc:order             order                                  ”Carnivora”∧∧xsd:string
 dwc:phylum            phylum                                 ”Chordata”∧∧xsd:string
 dwc:scientificName    scientificName                         ”Mirounga leonina”∧∧xsd:string
 dwc:basisOfRecord     basisOfRecord                          ”PreservedSpecimen”∧∧xsd:string
 dwc:occurrenceRemarks occurrenceRemarks                      ”craneo completo”∧∧xsd:string
 dwc:individualCount   individualCount                        1∧∧xsd:int
 dwc:CatalogNumber     CatalogNumber                          ”100751-1”∧∧xsd:string
 geo-pos:lat           decimalLatitude                        -42.53∧∧xsd:decimal
 geo-pos:long          decimalLongitude                       -63.6∧∧xsd:decimal
 geo-ont:countryCode   country                                ”Argentina”∧∧xsd:string
 dwc:verbatimEventDate verbatimEventDate                      ”2004-10-22”∧∧xsd:date
 foaf:name             recordedBy                             ”CENPAT-CONICET”@en .


     After several interviews with the domain-experts, the key targets of the ontology
model are specified to: (1) Establish marine regions, determined by maximum and min-
imum latitudes. (2) Establish which species of marine mammals are prey or predator and
(3) Determine which species coexist in the economic areas pertaining to Argentina, in
particular to see whether the commercial fishing does alter the ecological balance of the
sea species.

3.2. Ontology Development Life-cycle

Regarding to the methodological strategy of our approach, we keep in line with the tra-
dition that considers ontology as an engineering artifact that is useful to model some as-
pects of the world. That is why we adopted the methodology defined in [22]. The main
stages are: (1) Analysis: We planned to use the ontology first with the DwC classes, we
decided to first try and represent these objects as concepts and then see if some con-
cepts were lacking or inadequate and eventually adjust the structure. (2) Building the
ontology: The building process is iterative. Basically it can be broken down: finding
conditions to constrain the concepts, introducing the properties and/or concepts needed
to build the conditions and building the subsumption hierarchies of concepts and prop-
erties. (3) Evaluation: For the evaluation we take into account two important aspects
highlighted in [23], consistency of the ontology since an inconsistent ontology would
yield questionable results, this task is performed using Ontology Debugger14 a plugin for
Protégé and Pellet reasoner15 , we test the consistency after each set of changes we make,
even if the changes are supposed to be simple. Another factor to evaluate is the complex-
ity, one way to evaluate this is to ask the reasoner to classify the ontology. If this test
takes too much time, it is likely that the ontology will not be usable in real conditions.
If such is the case, corrections are to be made. Since usually the ontology size cannot be
reduced, the general idea is to write simpler restrictions on properties. This means using
a less complicated logic if possible. For instance, using existential restrictions instead
of qualified cardinality restrictions helps keeping the complexity lower for the reasoner.
(4) Maintenance: After complexity test is performed with adequate performance, we
check the ontology’s performance in real use. This is done by testing the applications
exploiting the ontology and evaluating the performance, both in terms of execution speed
and results quality. The analysis of the results help us fine tune the ontology to our exact
needs.

3.3. The Ontology Structure

At this point we might think that the integration of both databases is solved since both
are converted to RDF, their integration should be simple given that a URI can indicate
what kind of entity it identifies. But this is not enough, it is necessary to impose a layer
of computationally tractable meaning, where the relationships that hold between them
can be accurately interpreted and used in an automated way. For example, establishing
relationships between instances of classes through an inverse property, defining cardin-
ality restrictions or establish subsumption relationships. Description Logics (DL) [24] is
an adequate means of representing ontologies. Furthermore, OWL is based on DL, so
we decided to describe our ontology using DL and to implement it in OWL-DL [25].
Both of these flavors are well-supported by existing reasoners and is particularly suit-
able for the type of reasoning we intend to perform at this stage of the work. BioOnto is
based primarily on the DwC classes and terms, but also classes and properties defined by
domain experts.
     As we mentioned earlier DwC [5] is a body of standards for biodiversity inform-
atics. It provides stable terms and vocabularies for sharing biodiversity data. DwC is
maintained by TDWG16 (Biodiversity Information Standards, formerly The International
Working Group on Taxonomic Databases). These terms are organized into nine categor-
ies (often referred to as classes), six of which cover broad aspects of the biodiversity do-
main. Occurrence refers to existence of an organism at a particular place at a particular
time, Location is the place where the organism were observed (normally a geographical
region or place), the Event class is the relationship between Occurrence and Location
this registers sampling protocols and methods, dates, time and field notes, Taxon refers
to scientific names, vernacular names, etc. of the organism observed. The remaining cat-
egories cover relationships to other resources, measurements, and generic information

  14 https://git-ainf.aau.at/interactive-KB-debugging/debugger/wikis/onto-debugger
  15 https://github.com/stardog-union/pellet
  16 http://www.tdwg.org/
about records which in our case are not used. Specifically for the record level, DwC
recommends the use of a number of terms from Dublin Core17 .

3.3.1. Classes and Properties

The main classes of the ontology are taken from the vocabulary specified in DwC, as
well as some of classes and properties therein. The BioOnto structure and properties
assigned to its classes provide a series of useful reasoning tasks that can be formed by
SPARQL queries (see Section 4 for examples). The general model of BioOnto can be
seen in Figure 2. The class hierarchy of the OWL ontology consists of seven key classes:
Occurrence represents instances where the presence of an organism at a Location is
observed. It functions primarily as a node that connects Taxon to Events. Location is
a spatial region or named place. For DwC, a set of terms describing a place, whether
named or not. Event functions primarily as a node to connect one or more Occurrences
to an Event, and one or more Events at a Location. Taxon represents a unit of biodiversity
e.g. species, genus, family, etc. Agent foaf:Agent class is imported directly into BioOnto.
An agent (e.g. person, group, software or physical artifact). Region is an area of the
ocean delimited by a latitude and longitude. Dataset is an identification of the data set
belonging to OBIS or SNDB.


       Figure 2. Entity-relationship diagram of main classes and relationships defined in BioOnto.


      BioOnto defines a number of properties used to link classes (see Table 2), and
also reuses some properties of some commonly used terms like Basic Geo Vocabulary
(WGS84 lat/long18 ) and FOAF19 . In this way BioOnto accrues most of the scientific
and technical vocabulary required to achieve a semantic understanding of the most com-
monly used terms in biogeography and biodiversity, the ontology in OWL can be seen
in20 .

  17 http://dublincore.org/documents/dcmi-terms/
  18 https://www.w3.org/2003/01/geo/
  19 http://xmlns.com/foaf/spec/
  20 https://github.com/cenpat-gilia/BioOnto/blob/master/ontology/BioOnto.owl
Table 2. Object properties that link the main classes in the BioOnto model. Some properties like
is predator of has an inverse property linked by an owl:inverseOf relationship. For each property, the in-
tended subject class is declared the rdfs:domain of the term and the intended object class is declared the
rdfs:range of the term.
                 Object Property      Domain          Range            Inverse
                 has event            Occurrence      Event            event of
                 has location         Event           Location         location of
                 has taxon            Occurrence      Taxon            taxon of
                 into eez             Location        Region           has inside
                 part of              Occurrence      Dataset          has part
                 is predator of       Taxon           Taxon            is prey of
                 recorded by          Occurrence      Agent            -


4. Case study: Behavior of Marine Mammals

This section develops use cases that are relevant to researchers dedicated to the conser-
vation of the marine species and the study of animal behavior. One of the most common
problems faced by the scientists is to determine which species coexist in the Argentinean
Exclusive Economic Zone (AEEZ21 ). One of the species that is of particular interest to
scientists in the Argentine sea littoral is Mirounga Leonina or (Southern Elephant Seal),
because of the relevant information that provides with respect to the sea. For this reason,
Mirounga Leonina are used as oceanographic sampling platforms [26], they are ideal
carriers of electronic devices that register parameters like salinity, position, depth and
temperature, providing huge amounts of information associated with the key habitats.
     The data collected by different individuals during their migration is captured by
these devices, and are available in OBIS. In some cases, new biodiversity related know-
ledge may be discovered by the scientists using SNDB, (e.g. determine which fish species
live in the same region than the elephant seals). This information may not exist in OBIS
records, and therefore it is highly desirable to be able to integrate this information from
SNDB in a transparent way. The following subsections show examples where the axioms
that were defined in the ontology to enable them to answer these questions.

4.1. Inferring Predator/Prey Relationship

Scientists recently discovered that a large percentage of Southern Elephant Seal feeding
is the Lanternfish [27]. Then it is possible to define axioms in OWL to establish the
relationship predator/prey, a knowledge that was only implicit in the ontology. Using
rolification and the chain axiom property it can be expressed the above inference in OWL
with the following three axioms expressed in Manchester OWL [28] syntax:

(1) R_Elephant some Self
(2) R_Lanternfish some Self
(3) R_Elephant o owl : t o p O b j e c t P r o p e r t y o R_Lanternfish SubPropertyOf
    is_p redator_ of


  21 http://www.marineregions.org/gazetteer.php?p=details&id=8466
     Where R Elephant and R LanternFish are object properties, and it is re-
quired to add atoms of the form U(t,u), where U is the universal property, (i.e.,
owl:topObjectProperty). In this way after executing the reasoner, all instances of
the class ElephantSeal will be related through the property is predator of with all
instances of the class LanternFish. This approach was taken from [29] where instead
of using SWRL to define rules, we use OWL axioms.

4.2. Inferring Species into Argentinean Exclusive Economic Zone

To determine the species that inhabit the AEEZ, we can use (since the data are geore-
ferenced by latitude and longitude) axioms OWL to infer that a certain specie is present
in the AEEZ, for this we use the object property into eez and the following axioms
expressed in Manchester OWL syntax.

(4) Location and ( Lat some xsd : double [ >= " -58.4046 " ^^ xsd : double ,
    <= " -32.4483 " ^^ xsd : double ]) and ( long some xsd : double
    [ >= " -69.6095 " ^^ xsd : double , <= " -52.631 " ^^ xsd : double ])
    SubClassOf into_eez value AEEZ


    Any instance of the Location class that is within the maximum and minimum val-
ues of latitude and longitude, will be related by the property into eez with an instance
of ArgentineanEEZ class called AEEZ which represents the exclusive economic zone of
Argentina.

4.3. Querying and Reasoning Facilitated by BioOnto

This section shows an example of simple SPARQL queries that allow to explore the
data of both datasets, in particular we will make use of the information associated to the
species as well as to their location. The following query allows to retrieve the location
of all mammals containing the two datasets. This particular query is important because it
allows in the future to work with interfaces that allow the visualization of georeferenced
data. To run this query through the SPARQL interface of GraphDB, see22 .

PREFIX geo - pos : < http :// www . w3 . org /2003/01/ geo / wgs84_pos # >
PREFIX bio - onto : < http :// www . cenpat - conicet . gob . ar / ontology / >
PREFIX dwc : < http :// rs . tdwg . org / dwc / terms / >

SELECT ? s_name ? lat ? long
WHERE {
         ? s a dwc : Occurrence .
         ? s bio - onto : has_taxon ? taxon .
         ? taxon dwc : scient ificName ? s_name .
         ? taxon dwc : class ? class .
         ? s bio - onto : has_event ? event .
         ? event bio - onto : has_location ? loc .
         ? loc geo - pos : lat ? lat .
         ? loc geo - pos : long ? long .

  22 http://web.cenpat-conicet.gob.ar:7200/sparql?savedQueryName=bio-Loc-of-mammals
            FILTER regex ( STR (? class ) , " Mammalia " )
 }


     In this section we show examples that respond to the questions raised by researchers
studying animal behavior. In the case of predator/prey relations is interesting because it
allows to determine by reasoning, the species that are part of the food chain obtaining
the species of OBIS and SNDB. In addition to this reasoning, we can determine when
a species is considered a top predator, if it is not prey to any other species. Regarding
to the marine regions, we offer the possibility that the user can define a specific marine
region to perform some type of analysis. Reasoning allows to identify the species that
were observed there and determine, for example if the existence of a specie in that region
is due to the fact that it is feeding from another one.


5. Conclusions and Future Work

We have presented the initial steps performed in the ontology design called BioOnto
developed with researchers from CENPAT. The proposed model enables interoperability
and a common knowledge representation among databases SNDB and OBIS, allowing
the retrieval of information that cannot be gathered by any of the individual information
sources alone. Specifically, we presented the preliminary results of the ontology that (1)
allows establishing predator/prey relationships between marine species, (2) define marine
geographic regions and determine what species live there.
     We are currently working on the integration of other databases (e.g. FishBase23 )
which are required in research tasks within the CENPAT to answer questions about the
balance of species and their relationship with commercial fishing.
     We know that this databases could be integrated into the input data layer since the
proposed architecture is flexible to integrate data in structured or semi-structured format.
Furthermore in future works we will consider to link our ontology with relevant marine
ontologies24 .


References

 [1] Mark V Lomolino and James H Brown. Biogeography. Number QH84 L65 2006.
 [2] PN Halpin, Andrew J Read, BD Best, KD Hyrenbach, E Fujioka, MS Coyne, Larry B Crowder, SA Free-
     man, and C Spoerri. Obis-seamap: developing a biogeographic research data commons for the ecological
     studies of marine mammals, seabirds, and sea turtles. Marine Ecology Progress Series, 316:239–246,
     2006.
 [3] Vishwas Chavan and Lyubomir Penev. The data paper: a mechanism to incentivize data publishing in
     biodiversity science. BMC bioinformatics, 12(15):S2, 2011.
 [4] Vishwas S Chavan and Peter Ingwersen. Towards a data publishing framework for primary biod-
     iversity data: challenges and potentials for the biodiversity informatics community. BMC bioinformatics,
     10(14):S2, 2009.
 [5] John Wieczorek, David Bloom, Robert Guralnick, Stan Blum, Markus Döring, Renato Giovanni, Tim
     Robertson, and David Vieglais. Darwin core: An evolving community-developed biodiversity data
     standard. PLoS ONE, 2012.

  23 http://www.fishbase.org/
  24 http://mmisw.org/
 [6]   Ora Lassila and Ralph R Swick. Resource description framework (rdf) model and syntax specification.
       1999.
 [7]   Tim Berners-Lee, James Hendler, Ora Lassila, et al. The semantic web. Scientific american, 284(5):28–
       37, 2001.
 [8]   Steven J Baskauf, John Wieczorek, John Deck, and Campbell O Webb. Lessons Learned from Adapting
       the Darwin Core Vocabulary Standard for Use in RDF.
 [9]   Benjamin M Good and Mark D Wilkinson. The life sciences semantic web is full of creeps! Briefings
       in bioinformatics, 7(3):275–286, 2006.
[10]   O James Reichman, Matthew B Jones, and Mark P Schildhauer. Challenges and opportunities of open
       data in ecology. Science, 331(6018):703–705, 2011.
[11]   Anne E Thessen and David J Patterson. Data issues in the life sciences. 2011.
[12]   Yigal Arens, Chun-Nan Hsu, and Craig A Knoblock. Query processing in the sims information mediator.
       Advanced Planning Technology, 32:78–93, 1996.
[13]   Daphnis De Pooter, Ward Appeltans, Nicolas Bailly, Sky Bristol, Klaas Deneudt, Menashè Eliezer,
       Ei Fujioka, Alessandra Giorgetti, Philip Goldstein, Mirtha Lewis, et al. Toward a new data standard
       for combined marine biological and environmental datasets-expanding obis beyond species occurrences.
       Biodiversity Data Journal, (5), 2017.
[14]   Yannis Tzitzikas, Carlo Allocca, Chryssoula Bekiari, Yannis Marketakis, Pavlos Fafalios, Martin Doerr,
       Nikos Minadakis, Theodore Patkos, and Leonardo Candela. Integrating Heterogeneous and Distrib-
       uted Information about Marine Species through a Top Level Ontology. In Metadata and Semantics Re-
       search: 7th Research Conference, MTSR 2013, Thessaloniki, Greece, November 19-22, 2013. Proceed-
       ings, pages 289–301. Springer International Publishing, 2013.
[15]   Brian J Stucky, John Deck, Tom Conlin, Lukasz Ziemba, Nico Cellinese, and Robert Guralnick. The
       biscicol triplifier: bringing biodiversity data to the semantic web. BMC bioinformatics, 15(1):257, 2014.
[16]   K Döring M Robertson T Remsen D, Braak. Darwin Core Archive How-To Guide. 2011.
[17]   Cristiano Fugazza, Anna Basoni, Stefano Menegon, Alessandro Oggioni, Fabio Pavesi, Monica Pepe,
       Alessandro Sarretta, and Paola Carrara. RITMARE: Semantics- Aware harmonisation of data in Italian
       marine research. In Procedia Computer Science, 2014.
[18]   Adam Leadbetter, Robert Arko, Cynthia Chandler, Adam Shepherd, and Roy Lowry. Linked Data An
       Oceanographic Perspective. The Journal of ocean Technology, 8(3), 2013.
[19]   Adila Krisnadhi, Yingjie Hu, Krzysztof Janowicz, Pascal Hitzler, Robert Arko, Suzanne Carbotte,
       Cynthia Chandler, Michelle Cheatham, Douglas Fils, Timothy Finin, Peng Ji, Matthew Jones, Nazifa
       Karima, Kerstin Lehnert, Audrey Mickle, Thomas Narock, Margaret OBrien, Lisa Raymond, Adam
       Shepherd, Mark Schildhauer, and Peter Wiebe. The GeoLink modular oceanography ontology. In Lec-
       ture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture
       Notes in Bioinformatics), 2015.
[20]   Ruben Verborgh and Max De Wilde. Using OpenRefine. Packt Publishing Ltd, 2013.
[21]   Eric Prud’hommeaux and Andy Seaborne. SPARQL query language for RDF – W3C recommendation.
       Technical report, W3C, 2008.
[22]   Natalya F Noy, Deborah L McGuinness, et al. Ontology development 101: A guide to creating your first
       ontology, 2001.
[23]   Steffen Staab and Rudi Studer. Handbook on ontologies. Springer Science & Business Media, 2010.
[24]   Franz Baader. The description logic handbook: Theory, implementation and applications. Cambridge
       university press, 2003.
[25]   Ian Horrocks, Peter F Patel-Schneider, and Frank Van Harmelen. From shiq and rdf to owl: The making
       of a web ontology language. Web semantics: science, services and agents on the World Wide Web,
       1(1):7–26, 2003.
[26]   Fabien Roquet, Carl Wunsch, Gael Forget, Patrick Heimbach, Christophe Guinet, Gilles Reverdin,
       Jean Benoit Charrassin, Frederic Bailleul, Daniel P. Costa, Luis A. Huckstadt, Kimberly T. Goetz, Kit M.
       Kovacs, Christian Lydersen, Martin Biuw, Ole A. Nøst, Horst Bornemann, Joachim Ploetz, Marthan N.
       Bester, Trevor McIntyre, Monica C. Muelbert, Mark A. Hindell, Clive R. McMahon, Guy Williams,
       Robert Harcourt, Iain C. Field, Leon Chafik, Keith W. Nicholls, Lars Boehme, and Mike A. Fedak. Es-
       timates of the Southern Ocean general circulation improved by animal-borne instruments. Geophysical
       Research Letters, 2013.
[27]   Jade Vacquié-Garcia, Christophe Guinet, Cécile Laurent, and Frédéric Bailleul. Delineation of the south-
       ern elephant seal s main foraging environments defined by temperature and light conditions. Deep Sea
       Research Part II: Topical Studies in Oceanography, 113:145–153, 2015.
[28]   Matthew Horridge, Nick Drummond, John Goodwin, Alan L Rector, Robert Stevens, and Hai Wang.
       The manchester owl syntax. In OWLed, volume 216, 2006.
[29]   Md Kamruzzaman Sarker, David Carral, Adila Alfa Krisnadhi, and Pascal Hitzler. Modeling owl with
       rules: The rowl protege plugin. In International Semantic Web Conference (Posters & Demos), 2016.