=Paper= {{Paper |id=Vol-2042/paper36 |storemode=property |title=Leveraging Logical Rules for Efficacious Representation of Large Orthology Datasets |pdfUrl=https://ceur-ws.org/Vol-2042/paper36.pdf |volume=Vol-2042 |authors=Tarcisio M. Farias,Hirokazu Chiba,Jesualdo T. Fernández-Breis |dblpUrl=https://dblp.org/rec/conf/swat4ls/FariasCF17 }} ==Leveraging Logical Rules for Efficacious Representation of Large Orthology Datasets== https://ceur-ws.org/Vol-2042/paper36.pdf
          Leveraging logical rules for efficacious
        representation of large orthology datasets

 Tarcisio M. de Farias1,2 , Hirokazu Chiba3 , and Jesualdo T. Fernández-Breis4
    1
     Department of Computational Biology, University of Lausanne, Switzerland
                2
                  SIB Swiss Institute of Bioinformatics, Switzerland
                       tarcisio.mendesdefarias@unil.ch,
            3
              Database Center for Life Science (DBCLS), ROIS, Japan
                             chiba@dbcls.rois.ac.jp,
 4
   Departamento de Informática y Sistemas, Universidad de Murcia, IMIB-Arrixaca,
                               30100 Murcia, Spain.
                                  jfernand@um.es



        Abstract. In the semantic web applied to life sciences, ontologies pro-
        vide a basis to define concepts and to describe data in biological databases,
        thereby facilitate data interoperability across multiple resources. In the
        context of evolutionary genetics, the best corresponding genes across dif-
        ferent species (e.g. the insulin genes in the pig and the human) are called
        “orthologs”. Dozens of bioinformatic resources identify and describe such
        orthologs. To represent the orthology content, an OWL-based orthology
        ontology (ORTH) was recently proposed. However, ORTH ontology lacks
        a basis to infer pairwise relations between genes besides more specific and
        accurate definitions of class restrictions, property domains and property
        ranges - which is hampering wider adoption by orthology resources. To
        address this issue, we present in this paper our common efforts to define
        a release candidate of a second version of ORTH ontology. By using this
        ontology, we propose a logical rule-based approach to infer information
        which is not explicitly defined in the primary data. As a benefit of our
        approach, for example, we can avoid the materialization of several bil-
        lion triples to represent “is orthologous to” relation when considering the
        Orthologous Matrix (OMA) dataset.

        Keywords: ORTH ontology, OWL, Horn-like rule, ortholog, paralog,
        orthology database


1   Introduction
The shared genes among different species are evidence of evolution from a com-
mon ancestor. For example, we share approximately 90% of our genes with mice.
These related genes are called orthologs. Orthologs are genes in different species
that evolved from a common ancestral gene by a speciation event. These genes
are normally thought to retain the same function. The functional conservation
of related genes across species explains the success of model organism-based re-
search, which enables knowledge on human biology and medicine to be gained
2       T.M. de Farias, H. Chiba, J.T. Fernández-Breis

from other species, such as mice, fruit fly, or yeast. In this context, knowledge
of the orthologs between, say, mice and humans allows for studying biological
processes in mice, and then transferring the knowledge to humans.
    In the field of life sciences, ontologies have been identified as a key funda-
mental technology to achieve data interoperability across multiple resources and
to annotate data, the Gene Ontology [3] being the most popular and successful
one. The interest in ontologies in biomedicine can be illustrated by the fact that
repositories such as BioPortal [12] contain at the time of writing more than six
hundred biomedical ontologies, terminologies and controlled vocabularies. The
community of orthology researchers has increased its interest for ontologies in
the last years since the creation of the Quest for Orthologs (QfO) consortium1 .
QfO pursues the standardization and interoperability of orthology resources and
methods, including the development of common standards and formats for the
representation of orthology information and knowledge. The 2013 QfO meeting
[14] identified the potential benefits of semantic web technologies for the inter-
operability of orthology information. Since then, QfO researchers developed the
first version of the Orthology Ontology (ORTH)2 , which served to demonstrate
the feasibility of creating semantically interoperable orthology resources [6].
     The experience with the ORTH has shown some limitations for the activities
needed by the QfO community. More concretely, new orthology-related concepts
need to be formalized in the ontology and some aspects of the current represen-
tation need to be improved in order to permit a more powerful, reasoning-based
exploitation of orthology data. In this paper, we will justify why such changes
are necessary in the ORTH and will present our common efforts to define a re-
lease candidate (RC) of a second version of the ORTH. Besides, we examine and
compare the performance of two ways for executing queries that require inferenc-
ing. The main goal of this evaluation is to find the most appropriate approach
to infer pairwise orthology relations without needing to materialize them, since
that would increase significantly the number of triples to store in the already
large orthology datasets. Therefore, the main contribution of this paper is how to
efficaciously store orthology information using the Resource Description Frame-
work (RDF). The extension and re-engineering of the ORTH ontology are only
a step to achieve this goal.
    The structure of the rest of the paper is described next. In Section 2, we will
provide some background on orthology and on inferencing using semantic web
content. Section 3 will present the changes made to the ORTH. The method
for inferring pairwise orthology relations will be explained in Section 4. The
experimental results of comparing the execution of inference-based queries to
obtain pairwise orthology relations will be shown and discussed in Section 5.
Finally, some conclusions will be put forward in Section 6.


1
    https://questfororthologs.org/
2
    http://purl.org/net/orth
                        Leveraging logical rules for large orthology datasets.    3

2     Background

2.1    Basic concepts about orthology

Definition 1. Homologs are genes related to each other by descent from a com-
mon ancestry. Homology is a more general term to define the relationship between
genes separated by a speciation event (see Definition 2 for Ortholog) or the re-
lationship between genes separated by a genetic duplication event (see Definition
3 for Paralog).
Definition 2. Orthologs are genes in different species that evolved from a com-
mon ancestral gene by speciation. The orthologs are normally thought to retain
the same function in the course of evolution [7].
Definition 3. Paralogs are genes related by duplication. Unlike the general thought
for orthologs (see Definition 2), paralogs are more likely to evolve new functions.
Paralogs can be classified as inparalog and outparalog [7].
Definition 4. Xenologs are homologous genes that are neither orthologs nor
paralogs according to above definitions, but appear to be orthologous in genome
comparisons [7]. They occur due to horizontal gene transfer [15].
Definition 5. Hierarchical Orthologous Groups (HOGs) are defined as sets of
genes that have descended from a single common ancestor within a taxonomic
range of interest [2]. In the computer science context, the data structure to rep-
resent a HOG is a Tree.


2.2    Inference-based exploitation of orthology content

There is little experience in the optimization of queries on large RDF orthology
datasets. In [6], SPARQL queries were used for obtaining pairwise orthology
relations, and those queries required the use of some properties defined in the
ORTH in a transitive way. Such inferencing capability has to be provided by
the triple store supporting SPARQL1.1. In previous works, such queries were
executed over a series of graphs available in the same triple store. In [4], the au-
thors use the ORTH to compose conjunctive queries over various knowledge bases
(KBs) such as Microbial Genome Database (MBGD) 3 and Universal Protein
Resource (UniProt) 4 , although they did not investigate possible optimizations
for executing inference-based SPARQL queries.
    SPARQL query rewriting is a query optimization approach whose popularity
has increased significantly in recent years, and it is especially useful when infer-
encing is an important component in the execution of the queries [9]. SPARQL
query rewriting is based on changing the graph pattern included in the query,
ensuring that the semantics of the query is preserved by using mappings be-
tween the query elements and the ontology. The rewriting can affect the subject,
predicate or object of the triples of the query patterns.
3
    http://mbgd.genome.ad.jp
4
    http://www.uniprot.org
4       T.M. de Farias, H. Chiba, J.T. Fernández-Breis

    Languages such as SWRL5 , RIF6 or SPIN7 also permit to use inferencing
in data exploitation. SWRL and RIF permit the definition and the execution of
Horn-like rules, and SPIN is built on top of SPARQL. However, neither SPARQL
query rewriting or the other mentioned languages have been explored to the best
of our knowledge as solutions for the exploitation of large orthology datasets.


3   Constructing the updated ontology

One of the main advantage of a DL-based ontology for knowledge representa-
tion is leveraging Horn-like rules to infer information which is not explicitly
described in the primary data. In the context of recent genomics, leveraging
inference enables us to store a large dataset in a compact form by retrieving
implicit information on demand (see Section 4 for further details). However, the
previously published ORTH ontology has several issues to be addressed in order
to take advantage of the DL-based ontological representation:

 1. The ORTH ontology is not fully compliant with OWL 2 DL due to ontologies
    imported.
 2. There are not properties to describe pairwise relations between genes.
 3. Missing definitions of property’s domain and range.
 4. Class restrictions need to be reviewed.
 5. Missing several species in the imported taxonomy ontology.

    In the following paragraphs, we present how we solve those issues. For the
sake of simplicity, in the rest of this paper we omit the namespace prefixes
whenever it does not compromise the understandability.
    DL compliance. The first release of the ORTH ontology8 asserts that rdfs
:Resource v > (i.e. rdfs:Resource a owl:Class) and > v ∀hasSource.rdfs:Resource.
Nevertheless, in the OWL 2 DL profile for the sake of decidability, an en-
tity can not be an instance and a class at the same time. As a reminder, the
rdfs:Resource is an instance of rdfs:Class and owl:Class is a subclass of rdfs:Class.
Therefore, not all RDFS classes are legal OWL DL classes. Although, in terms
of data modeling this issue is not a relevant problem, without fixing this we
can not take advantage of the available reasoning tools. These tools are funda-
mentally important to our Horn-like rule-based approach presented in Section
4. To address this first issue, we removed the axioms rdfs:Resource v > and
> v ∀hasSource.rdfs:Resource.
    Pairwise relations. In genetics, we can relate genes according to a com-
mon ancestral DNA sequence such as homolog, ortholog, paralog, xenolog, in-
paralog and outparalog relationships. The first version of ORTH ontology per-
mits to obtain the pairwise relations by means of SPARQL queries over the
5
  https://www.w3.org/Submission/SWRL/
6
  https://www.w3.org/TR/rif-overview/
7
  http://spinrdf.org/
8
  https://bioportal.bioontology.org/ontologies/ORTH
                          Leveraging logical rules for large orthology datasets.   5

semantic, representation of the HOGs, but does not contain properties to as-
sert these relations between genes. However, being able to represent, persist
and exploit such relations is needed for some exploitation scenarios. To be
able to represent the pairwise relations, we include the axioms in Listing 3.1.
                              > v ∀hasHomolog.SequenceU nit
                              ∃hasHomolog.> v SequenceU nit
                              hasOrtholog v hasHomolog
                              hasP aralog v hasHomolog
                              hasXenolog v hasHomolog

       Listing 3.1. The axioms added to describe homologous pairwise relations.

Similar properties to hasHomolog, hasOrtholog and hasParalog already exist in
the Semanticscience Integrated Ontology (SIO) ontology. However, SIO does not
specify the domain and range of these properties. Moreover, SIO is a more gen-
eral purpose ontology, it has been reused in ORTH. Nonetheless, for the sake
of interoperability, we can state that the ORTH ontology pairwise relations are
subproperties of their correspondent SIO properties when exist.
    Property and class restrictions. To exemplify a property’s range modi-
fication, we modified the range of the hasCluster property from GeneTreeNode
into HomologsCluster class. This is because the property value must not be a
gene but a cluster. Further details of changes in class restrictions and property’s
domain and range in the ORTH ontology are available on the following URL:
https://github.com/qfo/OrthologyOntology.
    Species taxonomy ontology. The NCBI organismal taxonomy ontology
used in the first version of ORTH ontology refers to a view of the NCBITaxon
ontology9 . Thus, it does not describe an exhaustive list of species. Because of
this, we replaced the NCBI 1 class with NCBITaxon 1 that is the root taxonomy
class in the NCBITaxon ontology.
    Several classes in life sciences related ontologies are not supposed to be in-
stantiated or they are singleton classes (i.e. the class is only instantiated once).
Some examples are the classes of the following life science ontologies: Gene On-
tology [3], UBERON ontology [11], SIO ontology and also NCBITaxon ontology.
Therefore, when importing the NCBITaxon ontology along with the new ver-
sion of ORTH ontology, one instance must be created for each species classes
to assign the ‘in taxon’ property for a SequenceUnit instance, which is done
using the Punning10 feature of OWL 2. This class instantiation is necessary to
be DL compliant because a NCBITaxon class can not be directly assigned to
the ‘in taxon’ property. As a reminder, only an instance can be a value of an
object property. Further analysis of the drawbacks of defining a large Termino-
logical Box (TBox) with singleton classes instead of having a smaller TBox with
a relevant Assertional Box (ABox) are beyond of the scope of this paper. For
information, the NCBITaxon ontology contains about 1,600,000 classes.
    To build the new RC ORTH Ontology, we made 27 modifications in the previ-
ous ORTH ontology version that include adding and removing properties, prop-
9
     http://www.obofoundry.org/ontology/ncbitaxon.html
10
     https://www.w3.org/TR/owl2-new-features/#F12:_Punning
6        T.M. de Farias, H. Chiba, J.T. Fernández-Breis

erty domain, property range, classes and class restrictions. A full description of
these modifications is available on https://github.com/qfo/OrthologyOntology.
The RC ORTH ontology is available to download on the following URL:
http://purl.org/net/orth_rc.


4    Inferring pairwise relations from hierarchical structures

End-users are typically interested in pairwise relationships such as “is ortholo-
gous to”. Because of this, from now on by considering the RC ORTH ontology
(DL-based) that is described in Section 3, we can assert pairwise relations be-
tween genes. However, today’s orthology information providers store all pairwise
relationships, which grow quadratically with the number of genes or genomes.
To address this problem, we capture the implicit information of pairwise rela-
tionships with an inference engine. This information is implicitly structured in
HOGs (see Section 2 for further details). In doing so, the data to be stored and
retrieved scales linearly. For example, we do not need to store pairwise orthologs
between species because they can be inferred by applying the R1 Horn-like rule
shown in Listing 4.1. Thus, with our approach we can infer new information
instead of materializing it. For example, we can avoid the materialization of
6,464,814,646 triples to explicitly define orthologous relationships when consid-
ering solely 1,048,561 out 4,172,982 orthologous clusters in the latest Ortholo-
gous Matrix (OMA) database (DB) release. For comparison reasons, by using
the HOGs, we solely need 16,911,449 triples to implicitly define the pairwise
orthologs from HOGs in OMA.
R1: OrthologsCluster(cluster)∧ hasHomologousMember(cluster, node1 ) ∧ hasHomologousMember
  (cluster, node2 )∧ ‘has part’(node2 , seq2 ) ∧ ‘has part’(node1 , seq1 )∧ SequenceUnit(seq1 )∧
  SequenceUnit(seq2 ) ∧ (node1 6= node2 ) → hasOrtholog(seq1 , seq2 )

R2: ParalogsCluster(cluster)∧ hasHomologousMember(cluster, node1 ) ∧ hasHomologousMember
  (cluster, node2 )∧ ‘has part’(node2 , seq2 ) ∧ ‘has part’(node1 , seq1 )∧ SequenceUnit(seq1 )∧
  SequenceUnit(seq2 ) ∧ (node1 6= node2 ) → hasParalog(seq1 , seq2 )

Listing 4.1. The Horn-like rules that infers the hasOrtholog(R1) and hasParalog(R2)
properties for a given SequenceUnit instance (e.g. Gene instance).

    Listing 4.2 contains the equivalent subquery to the R1 rule in Listing 4.1
to retrieve the implicit hasOrtholog assertions. This subquery can be used with
a SPARQL query rewrite approach [8] to infer the hasOrtholog relations be-
tween genes (or proteins). Therefore, it is an alternative solution to a general
purpose inference engine. For example, triple stores which does not fully sup-
port reasoning can consider Listing 4.2 subquery to replace the occurrences of
hasOrtholog in the original SPARQL query. For example, let us suppose the
following SPARQL query SELECT * { ?g1 :hasOrtholog ?g2. ?g1 :geneName
‘APOC1’. }. By parsing this query, a SPARQL query rewrite approach identi-
fies the basic graph pattern (BGP) ?g1 :hasOrtholog ?g2 that is replaced with
the graph between braces in Listing 4.2 by also considering variable names (e.g.
?seq 1 is replaced with ?g1 ). The expanded query is then executed in a SPARQL
                         Leveraging logical rules for large orthology datasets.    7

endpoint (i.e. triple store). Moreover, in Section 5, we present the performance
in terms of query execution time and retrieved results along with a discussion
about the benefits and drawbacks of both approaches.
            SELECT ?seq_1 ?seq_2 {
              ?cluster a :OrthologsCluster.
              ?cluster :hasHomologousMember ?node_1.
              ?cluster :hasHomologousMember ?node_2.
              ?node_1 :hasHomologousMember* ?seq_1.
              ?node_2 :hasHomologousMember* ?seq_2.
              {?seq_1 a :Gene. ?seq_2 a :Gene.} UNION
              {?seq_1 a :Protein. ?seq_2 a :Protein.}
              FILTER (?node_1 != ?node_2)}

Listing 4.2. The subquery to assert the hasOrtholog property for a given SequenceUnit
instance (e.g. Gene or Protein instance).

    The R2 rule in Listing 4.1 is a Horn-like rule to infer hasParalog property.
The equivalent SPARQL subquery for hasParalog is similar to the subquery
in Listing 4.2 except by the fact that the first triple in Listing 4.2 ?cluster a
:OrthologsCluster is replaced with ?cluster a :ParalogsCluster.
    Some resources actually use orthologous clusters as homologous clusters. To
solve this issue at the query level, we can add a condition in the R1 rule in
Listing 4.1 and the query in Listing 4.2 to only consider genes/proteins in differ-
ent species (i.e. orthologs). Nevertheless, the concepts of homolog and ortholog
should not be misleading.
    As a consequence of our proposed Horn-like rule-based approach, we can
also make it easier to write queries for retrieving orthology information since the
second version of the ORTH ontology is a more fine-grained ontology. There are
property values assigned by applying Horn-like rules (e.g. Semantic Web Rule
Language rules) at query execution time.


5   Results and Discussion

To further justify the gain in terms of storage by inferring pairwise relations
instead of materializing them, we inferred about 8,034,238,900 hasParalog as-
sertions between proteins in the OMA DB by considering the R2 rule in Listing
4.1. These inferred assertions also consider the symmetric inferences (i.e. if A
hasParalog B then B hasParalog A). Therefore, with the ORTH ontology based
on HOGs, we can efficaciously represent RDF-based homology relations such as
hasParalog and hasOrtholog.
    The experiment has consisted on comparing the time performance of SPARQL
query rewrite and DL-safe [10] Horn-like rule based approaches. For this pur-
pose we have used the subqueries presented in Section 4. Each query has been
executed thirty times for each approach. We have solely considered one OMA
HOG at the LUCA taxonomic level, so containing 2,727 proteins. In this exper-
iment, we have used the Stardog 5 triple store [1] with 6GB of dedicated RAM
memory. All the tests were run in a computer with 3.5GHz dual-core Intel Core
i7 processor, Turbo Boost up to 4.0GHz, 16GB of 2133MHz LPDDR3 memory
8        T.M. de Farias, H. Chiba, J.T. Fernández-Breis

and 1TB SSD. The choice of the Stardog is due to the fact that it supports DL-
safe Horn-like rules combined with OWL2 constructs and reasoning at query
execution time [5, 13].
    We executed the Q1 and Q2 queries in Listing 5.1 by using a SPARQL query
rewrite approach and the Stardog’s DL-safe rule inference engine. The Q1 query
retrieves all hasOrtholog relations of the protein with the HUMAN29522 OMA
identifier. This protein is the cytochrome c oxidase subunit 1 encoded by the
MT-CO1 gene. Table 1 presents the results obtained in terms of query execution
time in milliseconds (mean and standard deviation) and the number of retrieved
results for the 30 executions of Q1 and Q2 queries. The Q2 query (see Listing
5.1) retrieves all hasParalog relations for the same protein (i.e. HUMAN29522 ).
               Q1: SELECT ?seq_1 { ?seq_1 orth:hasOrtholog oma:PROTEIN_HUMAN29522 }
               Q2: SELECT ?seq_1 { ?seq_1 orth:hasParalog oma:PROTEIN_HUMAN29522 }

Listing 5.1. Querying the orthologous (Q1) and paralogous (Q2) genes of MT-CO1
human gene in OMA database.

    From Table 1, we can conclude the SPARQL query rewrite approach is
≈106ms and ≈40ms faster in average than the DL-safe rule based approach
to retrieve the same amount of hasOrtholog and hasParalog assertions, respec-
tively. As a reminder, for the results in these tables, we only considered the HOG
that contains the HUMAN29522 protein. Although, there are 589,223 HOGs in
OMA DB. Table 2 shows the results of executing the queries in Listing 5.1 taking
into account all OMA HOGs and using a timeout of 5 minutes.
 Query Approach                    Mean time(ms) Std deviation (σ) #Results
 Q1        SPARQL query rewrite 193.7               33.8           2,722
 Q1        DL-safe rule based      300.3            78.1           2,722
 Q2        SPARQL query rewrite 65.1                13.0           4
 Q2        DL-safe rule based      104.6            17.8           4
Table 1. Performance comparison between SPARQL query rewrite and DL-safe Horn-
like rule based approaches for Q1 and Q2 queries in Listing 5.1.

    Table 2 demonstrates that the DL-safe Horn-like rule based approach is not
able to retrieve any results after 5 minutes of query execution by using the Star-
dog triple store. This is mainly because the Horn-like rules to infer hasParalog
and hasOrtholog relations contain a transitive property labeled as “has part”
instead of the :hasHomologousMember* SPARQL property path11 (see query
in Listing 4.2). The performance issues are due to the fact that Stardog pro-
cesses first the ‘has part’ transitive property that does not contain any subject
or object assigned. Therefore, Stardog attempts to infer all possible ‘has part’
assertions over all HOGs to afterwards apply the join operations. As a reminder,
for the tests in Table 2, we are considering the whole OMA DB that contains
9,443,947 proteins without counting alternative splicing. This explains why the
DL-safe rule based approach based on Stardog is not capable of retrieving any re-
sult in some milliseconds. However, by using :hasHomologousMember* SPARQL
11
     https://www.w3.org/TR/sparql11-property-paths/
                        Leveraging logical rules for large orthology datasets.   9

property path, Stardog calculates the query execution plan better as justified in
Table 2. Because of this, Stardog’s SPARQL processor retrieves all results in
milliseconds. This also justifies why the SPARQL query rewrite approach had
better results than the DL-safe rule based one in Table 1 when considering only
one HOG.
 Query Approach                  Mean time(ms) Std deviation (σ) #Results
 Q1       SPARQL Query rewrite 216.5              109.5                2,722
 Q1       DL-safe rule based     300,000          -                    -
 Q2       SPARQL Query rewrite 66.8               16.2                 4
 Q2       DL-safe rule based     300,000          -                    -
Table 2. Performance comparison between SPARQL query rewrite and DL-safe Horn-
like rule based approaches for Q1 and Q2 queries in Listing 5.1 by considering the
entire OMA database.

    Despite the Stardog’s results depicted in this section to process transitive
properties, the main benefit of using the Horn-like rule based approach described
in Section 4 is the possibility of reusing inferred concepts and properties to
define other Horn-like rules. This can be done in a modular way similar to
a function in traditional programming languages (e.g. C language). Therefore,
implicit information in an orthology database becomes explicit by defining these
logical rules. Another benefit is the fact that we can take advantage of general
purpose inference engines to process the Horn-like rules.


6   Conclusion

To build the RC of a second version of the ORTH ontology, we made 27 modifi-
cations in the previous ORTH version that include adding and removing prop-
erties, property domain, property range, classes and class restrictions. We also
discussed how the ORTH ontology should be instantiated to avoid for example
non-compliance with DL due to imported ontologies. Moreover, we described the
benefits of using a rule based approach to infer new information from the orthol-
ogy data. In doing so, we can drastically reduce the number of stored triples,
facilitate the work of writing SPARQL queries and reuse inferred properties to
define new rules. We also argue about performance issues of a Horn-like rule
based approach compared to a query rewrite approach. Although our experi-
ments by using Stardog show that a SPARQL query rewrite approach is more
efficient, we cannot conclude it is significantly better than a DL-safe Horn-like
rule-based one. This is because Stardog does not calculate the query execution
plan in the same way as for transitive properties and SPARQL property path.
     One final remark is concern about performing the tests in Section 5 by us-
ing alternative triple stores that support Horn-like rules combined with OWL
2 constructs and perform reasoning at query execution time. In future work
we will consider annotating the ORTH entities by harnessing natural language
processing and keyword searching techniques.
10      T.M. de Farias, H. Chiba, J.T. Fernández-Breis

Acknowledgements
This work has been financed by the Swiss National Research Programme (NFP)
75 (see http://www.nfp75.ch) - SNSF Project 167149. Part of the work was
supported by the ROIS International Networking project and conducted through
NBDC/DBCLS BioHackathon 2017 (see http://www.biohackathon.org).


References
 1. Complexible Inc. : Stardog 5: The manual (2017) Available online: http://docs.
    stardog.com/. Last accessed on October, 10th 2017.
 2. Altenhoff, A.M., Gil, M., Gonnet, G.H., Dessimoz, C.: Inferring hierarchical or-
    thologous groups from orthologous gene pairs. PLoS One 8(1) (2013) e53786
 3. Ashburner, M., Ball, C.A., Blake, J.A., Botstein, D., Butler, H., Cherry, J.M.,
    Davis, A.P., Dolinski, K., Dwight, S.S., Eppig, J.T., et al.: Gene ontology: tool for
    the unification of biology. Nature genetics 25(1) (2000) 25
 4. Chiba, H., Uchiyama, I.: Spang: a sparql client supporting generation and reuse
    of queries for distributed rdf databases. BMC bioinformatics 18(1) (2017) 93
 5. de Farias, T.M., Roxin, A., Nicolle, C.: Swrl rule-selection methodology for ontol-
    ogy interoperability. Data & Knowledge Engineering 105 (2016) 53–72
 6. Fernández-Breis, J.T., Chiba, H., del Carmen Legaz-Garcı́a, M., Uchiyama, I.: The
    orthology ontology: development and applications. Journal of biomedical semantics
    7(1) (2016) 34
 7. Koonin, E.V.: Orthologs, paralogs, and evolutionary genomics. Annu. Rev. Genet.
    39 (2005) 309–338
 8. Makris, K., Gioldasis, N., Bikakis, N., Christodoulakis, S.: Ontology mapping
    and sparql rewriting for querying federated rdf data sources. On the Move to
    Meaningful Internet Systems, OTM 2010 (2010) 1108–1117
 9. Makris, K., Gioldasis, N., Bikakis, N., Christodoulakis, S.: Sparql rewriting for
    query mediation over mapped ontologies. Technical University of Crete (2010)
10. Motik, B.: Reasoning in description logics using resolution and deductive
    databases. PhD thesis
11. Mungall, C.J., Torniai, C., Gkoutos, G.V., Lewis, S.E., Haendel, M.A.: Uberon,
    an integrative multi-species anatomy ontology. Genome biology 13(1) (2012) R5
12. Noy, N.F., Shah, N.H., Whetzel, P.L., Dai, B., Dorf, M., Griffith, N., Jonquet, C.,
    Rubin, D.L., Storey, M.A., Chute, C.G., et al.: Bioportal: ontologies and integrated
    data resources at the click of a mouse. Nucleic acids research 37(suppl 2) (2009)
    W170–W173
13. Pauwels, P., de Farias, T.M., Zhang, C., Roxin, A., Beetz, J., De Roo, J., Nicolle, C.:
    A performance benchmark over semantic rule checking approaches in construction
    industry. Advanced Engineering Informatics 33 (2017) 68–88
14. Sonnhammer, E., Gabaldón, T., Sousa da Silva, A., Martin, M., Robinson-Rechavi,
    M., Boeckmann, B., Thomas, P., Dessimoz, C.: Big data and other challenges in
    the quest for orthologs. Bioinformatics 30(21) (2014) 2993–2998
15. Soucy, S.M., Huang, J., Gogarten, J.P.: Horizontal gene transfer: building the web
    of life. Nature Reviews Genetics 16(8) (2015) 472–482