=Paper= {{Paper |id=Vol-1515/regular15 |storemode=property |title=Using Aber-OWL for fast and scalable reasoning over BioPortal ontologies |pdfUrl=https://ceur-ws.org/Vol-1515/regular15.pdf |volume=Vol-1515 |dblpUrl=https://dblp.org/rec/conf/icbo/SlaterGSH15 }} ==Using Aber-OWL for fast and scalable reasoning over BioPortal ontologies== https://ceur-ws.org/Vol-1515/regular15.pdf
    Using Aber-OWL for fast and scalable reasoning over BioPortal
                             ontologies
           Luke Slater 1∗, Georgios V Gkoutos2 , Paul N Schofield3 , Robert Hoehndorf1
1
  Computational Bioscience Research Center, King Abdullah University of Science and Technology, 4700 KAUST,
                                     23955-6900, Thuwal, Saudi Arabia
  2
    Department of Computer Science, Aberystwyth University, Aberystwyth, SY23 3DB, Wales, United Kingdom
3
  Department of Physiology, Development and Neuroscience, University of Cambridge, Downing Street, CB2 3EG,
                                         England, United Kingdom




ABSTRACT                                                                      However, enabling automated reasoning over multiple ontologies
   Reasoning over biomedical ontologies using their OWL semantics          is a challenging task since as automated reasoning can be highly
has traditionally been a challenging task due to the high theoretical      complex and costly in terms of time and memory consumption
complexity of OWL-based automated reasoning. As a consequence,             (Tobies, 2000). In particular, ontologies formulated in the Web
ontology repositories, as well as most other tools utilizing ontologies,   Ontology Language (OWL) (Grau et al., 2008) can utilize
either provide access to ontologies without use of automated               statements based on highly expressive description logics (Horrocks
reasoning, or limit the number of ontologies for which automated           et al., 2000), and therefore queries that utilize automated reasoning
reasoning-based access is provided. We apply the Aber-OWL                  cannot, in general, be guaranteed to finish in a reasonable amount
infrastructure to provide automated reasoning-based access to all          of time.
accessible and consistent ontologies in BioPortal (368 ontologies).           Prior work on large-scale automated reasoning over biomedical
We perform an extensive performance evaluation to determine query          ontologies has often focused on the set of ontologies in Bioportal,
times, both for queries of different complexity as well as for queries     as it is one of the largest collections of ontologies freely available.
that are performed in parallel over the ontologies. We demonstrate         To enable inferences over this set of ontologies, modularization
that, with the exception of a few ontologies, even complex and parallel    techniques have been applied (Del Vescovo et al., 2011) using
queries can now be answered in milliseconds, therefore allowing            the notion of locality-based modules, and demonstrated that, for
automated reasoning to be used on a large scale, to run in parallel,       most ontologies and applications, relatively small modules can be
and with rapid response times.                                             extracted over which queries can be answered more efficiently.
                                                                           Other work has focused on predicting the performance of reasoners
                                                                           when applied to the set of BioPortal ontologies (Sazonau et al.,
1   INTRODUCTION                                                           2013), and could demonstrate that performance of particular
Major ontology repositories such as the BioPortal (Noy et al.,             reasoners can reliably be predicted; at the same time, the authors
2009), OntoBee (Xiang et al., 2011), or the Ontology Lookup                have conducted an extensive evaluation of average classification
Service (Cote et al., 2006), have existed for a number of years,           times of each ontology.
and currently contain several hundred ontologies, enabling ontology           Other approaches apply RDFS reasoning (Patel-Schneider et al.,
creators and maintainers to publish their ontology releases and make       2004) for providing limited, yet fast, inference capabilities in
them available to the wider community.                                     answering queries over Bioportal’s set of ontologies through a
   Besides the hosting functionality that such repositories offer,         SPARQL interface (Salvadores et al., 2012, 2013). Alternatively,
they usually also provide certain web-based features for browsing,         systems such as OntoQuery (Tudose et al., 2013) provide access
comparing, visualising and processing ontologies. One particularly         to ontologies through automated reasoning but limit the number of
useful feature, currently missing from the major ontology                  ontologies.
repositories, is the ability to provide online access to reasoning            The Aber-OWL (Hoehndorf et al., 2015) system is a novel
services simultaneously over many ontologies. Such a feature               ontology repository that aims to allow access to multiple ontologies
would enable the use of semantics and deductive inference when             through automated reasoning utilizing the OWL semantics of the
processing data characterized with the ontologies these repositories       ontologies. Aber-OWL mitigates the complexity challenge by using
contain (Hoehndorf et al., 2015). Moreover, the ability to query           a reasoner which supports only a subset of OWL (i.e., the OWL
multiple ontologies simultaneously further enables data integration        EL profile (Motik et al., 2009)), ignoring ontology axioms and
across domains and data sources. For example, there is an increasing       queries that do not fall within this subset. This enables the provision
amount of RDF (Manola and Miller, 2004) data becoming available            of polynomial-time reasoning, which is sufficiently fast for many
through public SPARQL (Seaborne and Prud’hommeaux, 2008)                   practical uses even when applied to large ontologies. However, thus
endpoints (Jupp et al., 2014; The Uniprot Consortium, 2007;                far, the Aber-OWL software is only applied to a few, manually
Belleau et al., 2008; Williams et al., 2012), which utilise multiple       selected, ontologies, and therefore does not have a similar coverage
ontologies to annotate entities.                                           as other ontology repositories, nor does it cater for reasoning over
                                                                           large sets of ontologies such as the ones provided by the BioPortal
                                                                           ontology dataset (Bioportal contains, as of 9 March 2015, 428
∗ To whom correspondence should be addressed: luke.slater@kaust.edu.sa     ontologies consisting of 6,668,991 classes).



 Copyright c 2015 for this paper by its authors. Copying permitted for private and academic purposes                                            1
Slater et al



   Here, we apply the Aber-OWL framework to reason over the                  comparison, BioPortal currently (9 March 2015) includes a total of
majority of the available ontologies in Bioportal. We evaluate               6,668,991 classes.
the performance of querying ontologies with Aber-OWL, utilizing
337 ontologies from BioPortal, we evaluate Aber-OWL’s ability                2.2   Use of the Aber-OWL reasoning infrastructure
to perform different types of queries as well as its scalability in          Aber-OWL (Hoehndorf et al., 2015) is an ontology repository and
performing queries that are executed in parallel. We demonstrate             query service built on the OWLAPI (Horridge et al., 2007) library,
that the Aber-OWL framework makes it possible to provide, at least,          which allows access to a number of ontologies through automated
light-weight description logic reasoning over most of the freely             reasoning. In particular, Aber-OWL allows users or software
accessible ontologies contained in BioPortal, with a relatively low          applications to query the loaded ontologies using Manchester OWL
memory footprint and high scalability in respect to the number               Syntax (Horridge et al., 2006), using the class and property
of queries executed in parallel, using only a single medium-sized            labels as short-form identifiers for classes. Aber-OWL exposes this
server as hardware to provide these services. Furthermore, we                functionality on the Internet through a JSON API as well as a
identify several ontologies for which querying using automated               web interface available on http://aber-owl.net. To answer
reasoning performs significantly worse than the majority of the other        queries, Aber-OWL utilizes the ELK reasoner (Kazakov et al.,
ontologies tested, and discuss potential explanations and solutions.         2014, 2011), a highly optimized reasoner that supports the OWL-
                                                                             EL profile. Ontologies which are not OWL-EL are automatically
                                                                             transmuted by the reasoner by means of ignoring all non-EL axioms,
2     METHODS                                                                though as of 2013 50.7% of ontologies in Bioportal were natively
2.1   Selection of ontologies                                                using it (Matentzoglu et al., 2013).
We selected all ontologies contained in BioPortal as candidate                  We extended the Aber-OWL framework to obtain a list of
ontologies, and attempted to download the current versions of all the        ontologies from the Bioportal repository, periodically checking for
ontologies for which a download link was provided by BioPortal. A            new ontologies as well as for new versions of existing ontologies. As
summary of the results is presented in Table 1.                              a result, our testing version of Aber-OWL maintains a mirror of the
                                                                             accessible ontologies available in BioPortal. Furthermore, similarly
                                                                             to the functionality provided by BioPortal, a record of older versions
                         Total              427                              of ontologies is kept within Aber-OWL, so that, in the future, the
                         Loadable           368                              semantic difference between ontology versions could be computed.
                         Used               337                                 In addition, we expanded the Aber-OWL software to count and
                         Unobtainable       39                               provide statistics about:
                         Non-parseable      17                                  • The ontologies which failed to load, with associated error
                         Inconsistent       3                                      messages;
                         No Labels          31                                 • Axioms, axiom types, and number of classes per ontology; and
Table 1. Summary of Ontologies used in our test. The loadable ontologies        • Axioms, axiom types, and number of classes over all
are the ones obtained from BioPortal which could be parsed using the OWL
                                                                                  ontologies contained within Aber-OWL.
 API and which were found to be consistent when classified with the ELK
                                                                                For each query to Aber-OWL, we also provide the query
reasoner. We exclude 31 ontologies that do not contain any labels from our
                                analysis.                                    execution time within Aber-OWL and pass this information back
                                                                             to the client along with the result-set of the query.
                                                                                All information is available through Aber-OWL’s JSON API,
                                                                             and the source code freely available at https://github.com/
   Out of 427 total ontologies listed by Bioportal, only 368 could           bio-ontology-research-group/AberOWL.
be directly downloaded and processed by Aber-OWL. Reasons for                2.3   Experimental setup
failure to load ontologies include the absence of a download link
for listed ontologies, proprietary access to ontologies or ontologies        In order to evaluate the performance of querying single and multiple
that are only available in proprietary data formats (e.g., some of the       ontologies in Aber-OWL, randomly queries of different complexity
ontologies and vocabularies provided as part of the Unified Medical          were generated and executed. Since the ELK reasoner utilises a
Language Systems (Bodenreider, 2004)). 39 ontologies were not                cache for answering queries that have already been computed, each
obtainable. Furthermore, 17 ontologies that could be downloaded              of the generated query consisted of a new class expression. The
were not parseable with the OWL API, indicating a problem in the             following types of class expressions were used in the generated
file format used to distribute the ontology. Three ontologies were           queries (for randomly generated A, B, and R):
inconsistent at the reasoning stage. Several ontologies also referred          • Primitive class: A
to unobtainable ontologies as imports; however, we included these
ontologies in our analysis, utilizing only the classes and axioms that         • Conjunctive query: A and B
were accessible. As Aber-OWL currently relies on the use of labels             • Existential query: R some A
to construct queries, we further removed 31 ontologies that did not            • Conjunctive existential query: A and R some B
include any labels from our test set.
   Overall, we use set of 337 ontologies in our experiments                    300 random queries for each of these type were generated for each
consisting of 3,466,912 classes and 6,997,872 logical axioms (of             ontology that was tested (1,200 queries in total per ontology). Each
which 12,721 are axioms involving relations, i.e., RBox axioms). In          set of the 300 random queries that was generated, was subsequently


2                             Copyright c 2015 for this paper by its authors. Copying permitted for private and academic purposes
                                                                                                                         Scalable Reasoning



split into three sets each of which contained 100 class expressions.
The random class expressions contained in the resulting sets were
then utilised to perform superclass (100 queries), equivalent (100
queries) and subclass (100 queries) queries and the response time of
the Aber-OWL framework was recorded for each of the query.
   We further test the scalability of answering the queries by
performing these queries in parallel. For this purpose, we remotely
query Aber-OWL with one query at once, 100 queries in parallel,
and 1,000 queries in parllel.
   In our test, we record the response time of each query, based
on the statistics provided by the Aber-OWL server; in particular,
response time does not include network latency. All tests are
performed on a server with 128GB memory and two Intel Xeon                                      (a) primitive classes
E5-2680v2 10-core 2.8GHz CPUs with hyper-threading activated
(resulting in 40 virtual cores). The ELK reasoner underlying Aber-
OWL is permitted to use all available (i.e., all 40) cores to perform
classification and respond to queries.


3   RESULTS AND DISCUSSION
On average, when performing a single query over Aber-OWL, query
results are returned in 10.8 milliseconds (standard deviation: 48.0
milliseconds). The time required to answer a query using Aber-
OWL correlates linearly with the number of logical axioms in the
ontologies (Pearson correlation, ρ = 0.33), and also strongly
correlates with the number of queries performed in parallel (Pearson
correlation, ρ = 0.82). Figure 1 shows the query times for the                                (b) conjunctive queries
ontologies based on the type of query, and Figure 2 shows the
query times based on different number of queries run in parallel.
The maximum observed memory consumption for the Aber-OWL
server while performing these tests was 66.1 GB.
   We observe several ontologies for which query times are
significantly higher than for the other ontologies. The most prevalent
outlier is the NCI Thesaurus (Sioutos et al., 2007) for which
average query time is 600 ms when performing a single query
over Aber-OWL. Previous analysis of NCI Thesaurus has identified
axioms which heavily impact the performance of classification for
the ontology using multiple description logic reasoners (Gonçalves
et al., 2011). The same analysis has also shown that it can
significantly improve reasoning time to add inferred axioms to the
ontology. To test whether this would also allow us to improve                                  (c) existential queries
reasoning time over the NCI Thesaurus in Aber-OWL and using
the ELK reasoner, we apply the Elvira modularization software
(Hoehndorf et al., 2011), using the HermiT reasoner to classify
the NCI Thesaurus and adding all inferred axioms that fall into the
OWL-EL profile to the ontology, as opposed to ELK’s approach
of ignoring non-EL axioms during classification. We then repeat
our experiments. Figure 3 shows the different reasoning times for
NCI Thesaurus before and after processing with Elvira. Query
time reduces from 703 ms (standard deviation: 689 ms) before
processing with Elvira to 51 ms (standard deviation: 42 ms) after
processing with Elvira, demonstrating that adding inferred axioms
and removing axioms that do not fall in the OWL-EL profile can be
used to improve query time.
   Another outlier with regard to average query time is the                              (d) conjunctive existential queries
Natural Products Ontology (NATPRO, http://bioportal.
bioontology.org/ontologies/NATPRO). However, as
NATPRO is expressed in OWL-Full, it cannot reliably be classified        Fig. 1: Query times as function of the number of logical axioms in
with a Description Logic reasoner, and therefore we cannot apply         the ontologies, separated by the type of query.


 Copyright c 2015 for this paper by its authors. Copying permitted for private and academic purposes                                     3
Slater et al




         (a) Sequential querying                            (b) 100 parallel queries                        (c) 1,000 parallel queries


 Fig. 2: Query times as function of the number of logical axioms in the ontologies, separated by the number of queries executed in parallel.




                                                 Fig. 3: Query times over the NCI Thesaurus.



the same approach to improve the performance of responding to             and, to a lesser degree, the Drug Ontology (DRON) (Hanna
queries.                                                                  et al., 2013), similar ‘culprit-finding’ analysis methods may be
                                                                          applied as have previously been applied for the NCI Thesaurus
3.1   Future Work                                                         (Gonçalves et al., 2011). These methods may also allow the
The performance of using automated reasoning for querying                 ontology maintainers to identifying possible modifications to their
ontologies relies heavily on the type of reasoner used. We have           ontologies that would result in better reasoner performance.
used the ELK (Kazakov et al., 2014, 2011) reasoner in our
evaluation; however, it is possible to substitute ELK with any other
OWLAPI-compatible reasoners. In particular, novel reasoners such          4    CONCLUSION
as Konklude (Steigmiller et al., 2014), which outperform ELK in           We have demonstrated that it is feasible to reason over most of the
many tasks (Bail et al., 2014), may provide further improvements in       ontologies available in BioPortal in real time, and that queries over
performance and scalability.                                              these ontologies can be answered quickly, in real-time, and using
  We identified several ontologies as leading to performance              only standard server hardware. We further tested the performance
problems, i.e., they are outliers during query time testing. For these    of answering queries in parallel, and show that, for the majority of
ontologies, including the Natural Products Ontology (NATPRO),             cases, even highly parallel access allows quick response times.


4                           Copyright c 2015 for this paper by its authors. Copying permitted for private and academic purposes
                                                                                                                                                           Scalable Reasoning



   We have also identified a number of ontologies for which                                     Workshop on OWL Experiences and Directions.
performance of automated reasoning, at least when using Aber-                                Horrocks, I., Sattler, U., and Tobies, S. (2000). Practical reasoning for very expressive
                                                                                                description logics. Logic Journal of the IGPL, 8(3), 239–264.
OWL and the ELK reasoner, is significantly worse, which renders
                                                                                             Jupp, S., Malone, J., Bolleman, J., Brandizi, M., Davies, M., Garcia, L., Gaulton, A.,
them particularly problematic for application that carry heavy                                  Gehant, S., Laibe, C., Redaschi, N., Wimalaratne, S. M., Martin, M., Le Novre, N.,
parallel loads. At least for some of these ontologies, pre-processing                           Parkinson, H., Birney, E., and Jenkinson, A. M. (2014). The EBI RDF platform:
ontologies using tools such as Elvira (Hoehndorf et al., 2011) can                              linked open data for the life sciences. Bioinformatics, 30(9), 1338–1339.
mitigate these problems.                                                                     Kazakov, Y., Krötzsch, M., and Simančı́k, F. (2011). Unchain my EL reasoner. In
                                                                                                Proceedings of the 23rd International Workshop on Description Logics (DL’10),
   The ability to reason over a very large number of ontologies,                                CEUR Workshop Proceedings. CEUR-WS.org.
such as all the ontologies in BioPortal, opens up the possibility to                         Kazakov, Y., Krötzsch, M., and Simancik, F. (2014). The incredible elk. Journal of
frequently use reasoning not only locally when making changes to a                              Automated Reasoning, 53(1), 1–61.
single ontology, but also monitor – in real time – the consequences                          Manola, F. and Miller, E., editors (2004). RDF Primer. W3C Recommendation. World
                                                                                                Wide Web Consortium.
that a change may have on other ontologies, in particular on
                                                                                             Matentzoglu, N., Bail, S., and Parsia, B. (2013). A corpus of owl dl ontologies.
ontologies that may import the ontologies that is being changed.                                Description Logics, 1014, 829–841.
Using automated reasoning over all ontologies within a domain                                Motik, B., Grau, B. C., Horrocks, I., Wu, Z., Fokoue, A., and Lutz, C. (2009). Owl 2
therefore has the potential to increase interoperability between                                web ontology language: Profiles. Recommendation, World Wide Web Consortium
ontologies and associated data by verifying mutual consistency and                              (W3C).
                                                                                             Noy, N. F., Shah, N. H., Whetzel, P. L., Dai, B., Dorf, M., Griffith, N., Jonquet, C.,
enabling queries across multiple ontologies, and our results show                               Rubin, D. L., Storey, M.-A. A., Chute, C. G., and Musen, M. A. (2009). Bioportal:
that such a system can now be implemented with the available                                    ontologies and integrated data resources at the click of a mouse. Nucleic acids
software tools and commonly used server hardware.                                               research, 37(Web Server issue), W170–173.
                                                                                             Patel-Schneider, P. F., Hayes, P., and Horrocks, I. (2004). Owl web ontology language
                                                                                                semantics and abstract syntax section 5. rdf-compatible model-theoretic semantics.
ACKNOWLEDGEMENTS                                                                                Technical report, W3C.
REFERENCES                                                                                   Salvadores, M., Horridge, M., Alexander, P. R., Fergerson, R. W., Musen, M. A., and
                                                                                                Noy, N. F. (2012). Using sparql to query bioportal ontologies and metadata. In The
Bail, S., Glimm, B., Jiménez-Ruiz, E., Matentzoglu, N., Parsia, B., and Steigmiller, A.,       Semantic Web–ISWC 2012, pages 180–195. Springer.
   editors (2014). ORE 2014: OWL Reasoner Evaluation Workshop. Number 1207 in                Salvadores, M., Alexander, P. R., Musen, M. A., and Noy, N. F. (2013). Bioportal as
   CEUR Workshop Proceedings. CEUR-WS.org, Aachen, Germany.                                     a dataset of linked biomedical ontologies and terminologies in rdf. Semantic web,
Belleau, F., Nolin, M., Tourigny, N., Rigault, P., and Morissette, J. (2008).                   4(3), 277–284.
   Bio2RDF: Towards a mashup to build bioinformatics knowledge systems. Journal              Sazonau, V., Sattler, U., and Brown, G. (2013). Predicting performance of owl
   of Biomedical Informatics, 41(5), 706–716.                                                   reasoners: Locally or globally? Technical report, Technical report, School of
Bodenreider, O. (2004). The Unified Medical Language System (UMLS): integrating                 Computer Science, University of Manchester.
   biomedical terminology. Nucleic Acids Res, 32(Database issue), D267–D270.                 Seaborne, A. and Prud’hommeaux, E. (2008). SPARQL query language for RDF.
Cote, R., Jones, P., Apweiler, R., and Hermjakob, H. (2006). The ontology lookup                W3C recommendation, W3C. http://www.w3.org/TR/2008/REC-rdf-sparql-query-
   service, a lightweight cross-platform tool for controlled vocabulary queries. BMC            20080115/.
   Bioinformatics, 7(1), 97+.                                                                Sioutos, N., de Coronado, S., Haber, M. W., Hartel, F. W., Shaiu, W.-L., and Wright,
Del Vescovo, C., Gessler, D. D., Klinov, P., Parsia, B., Sattler, U., Schneider, T., and        L. W. (2007). Nci thesaurus: a semantic model integrating cancer-related clinical
   Winget, A. (2011). Decomposition and modular structure of bioportal ontologies.              and molecular information. Journal of biomedical informatics, 40(1), 30–43.
   In The Semantic Web–ISWC 2011, pages 130–145. Springer.                                   Steigmiller, A., Liebig, T., and Glimm, B. (2014). Konclude: System description. Web
Gonçalves, R. S., Parsia, B., and Sattler, U. (2011). Analysing multiple versions of           Semantics: Science, Services and Agents on the World Wide Web, 27(1).
   an ontology: A study of the nci thesaurus. In 24th International Workshop on              The Uniprot Consortium (2007). The universal protein resource (uniprot). Nucleic
   Description Logics, page 147. Citeseer.                                                      Acids Res, 35(Database issue).
Grau, B., Horrocks, I., Motik, B., Parsia, B., Patelschneider, P., and Sattler, U. (2008).   Tobies, S. (2000). The complexity of reasoning with cardinality restrictions and
   OWL 2: The next step for OWL. Web Semantics: Science, Services and Agents on                 nominals in expressive description logics. J. Artif. Int. Res., 12(1), 199–217.
   the World Wide Web, 6(4), 309–322.                                                        Tudose, I., Hastings, J., Muthukrishnan, V., Owen, G., Turner, S., Dekker, A., Kale,
Hanna, J., Joseph, E., Brochhausen, M., and Hogan, W. (2013). Building a drug                   N., Ennis, M., and Steinbeck, C. (2013). Ontoquery: easy-to-use web-based owl
   ontology based on rxnorm and other sources. Journal of Biomedical Semantics,                 querying. Bioinformatics, 29(22), 2955–2957.
   4(1), 44.                                                                                 Williams, A. J., Harland, L., Groth, P., Pettifer, S., Chichester, C., Willighagen, E. L.,
Hoehndorf, R., Dumontier, M., Oellrich, A., Wimalaratne, S., Rebholz-Schuhmann,                 Evelo, C. T., Blomberg, N., Ecker, G., Goble, C., and Mons, B. (2012). Open phacts:
   D., Schofield, P., and Gkoutos, G. V. (2011). A common layer of interoperability             semantic interoperability for drug discovery. Drug Discovery Today, 17(2122), 1188
   for biomedical ontologies based on OWL EL. Bioinformatics, 27(7), 1001–1008.                 – 1198.
Hoehndorf, R., Slater, L., Schofield, P. N., and Gkoutos, G. V. (2015). Aber-owl: a          Xiang, Z., Mungall, C. J., Ruttenberg, A., and He, Y. (2011). Ontobee: A linked data
   framework for ontology-based data access in biology. BMC Bioinformatics.                     server and browser for ontology terms. In Proceedings of International Conference
Horridge, M., Drummond, N., Goodwin, J., Rector, A., Stevens, R., and Wang, H.                  on Biomedical Ontology, pages 279–281.
   (2006). The Manchester OWL Syntax. Proc. of the 2006 OWL Experiences and
   Directions Workshop (OWL-ED2006).
Horridge, M., Bechhofer, S., and Noppens, O. (2007). Igniting the OWL 1.1 touch
   paper: The OWL API. In Proceedings of OWLED 2007: Third International




 Copyright c 2015 for this paper by its authors. Copying permitted for private and academic purposes                                                                                 5