Introduction

Snorocket 2.0: Concrete Domains and Concurrent Classi cation

Alejandro Metke-Jimenez

alejandro.metke@csiro.au 0

Michael Lawley

michael.lawley@csiro.au 0 0 The Australian e-Health Research Centre ICT Centre , CSIRO Brisbane, Queensland , Australia

Snorocket is a high-performance ontology reasoner that supports a subset of the OWL EL pro le. In the newest version, additional expressive power has been added to support concrete domains, enabling the classi cation of ontologies that use these constructs. Also, the reasoning algorithm has been modi ed to support concurrent classi cation. This feature is important because it enables the use of the full processing power available in modern multi-processor hardware.

ontology classi cation concrete domains concurrent

Introduction Background

The initial version of Snorocket was developed to support the fast classi cation of SNOMED CT and therefore only included support for a limited number of constructs. A table comparing the OWL EL constructs supported by Snorocket and other EL reasoners is available on the Snorocket websitez.

Concrete domains are supported by several general tableaux-based reasoners such as FaCT++ [ 8 ] and HermiT [ 9 ]. The only specialised EL reasoner that currently supports concrete domains is ELK [ 4 ].

Concrete domains are used in AMT mainly to model quantities in the definition of medicines. An OWL version of AMT v3 can be obtained by using an updated version of the Perl script originally included in the SNOMED CT distribution. An example of a typical axiom found in AMT is available on the Snorocket websitex.

Most of the commonly used reasoners, including FACT++ [ 8 ], HermiT [ 9 ], CEL [ 6 ], and jCEL [ 7 ] are only capable of using a single processor. To our knowledge, the only reasoner that has successfully implemented a concurrent classi cation algorithm is ELK [ 4 ]. Because most modern hardware achieves better performance by providing more than one processor or core, it is important to be able to make use of this extra processing power. 3

Architecture

zhttp://aehrc.com/software/snorocket/index.html#constructs xhttp://aehrc.com/software/snorocket/index.html#amtv3 {http://aehrc.com/software/snorocket/index.html#model

Snorocket 2.0 snorocket-core

Internal

Model Snorocket

API uses ontology-model

uses ontology-import In description logics a concrete domain is a construct that can be used to de ne new classes by specifying restrictions on attributes that have literal values (as opposed to relationships to other concepts). For example, children of age six can be de ned by using the concrete domain expression 9hasAge:(=; 6). The class of individuals, in this case children of age six, is expressed as a restriction on the age attribute, which has a numeric value. The binary operators <; <=; >; >= can also be used in a concrete domain expression, and attributes can have other types of literal values such as oating point numbers, string literals, and dates. Support for equality An ontology can contain many complex axioms that include nested sub-expressions. The CEL algorithm works with normalised axioms and therefore creates a conservative extension of the original ontology containing only axioms in normal form [ 2 ]. The normal forms and the corresponding completion rules R1 to R5 from the original CEL algorithm are shown in Table 1. The last two normal forms and completion rule R6 have been added to support concrete domains.

The normalised forms of concrete domain expressions are A v 9f:(o; v) and 9f:(o; v) v A, where f represents a feature, o an operator, and v a value. The original normalisation algorithm requires only minor changes to deal with these new constructs.

The classi cation algorithm does require signi cant changes to deal with the new concrete domain axioms. A new type of queue is introduced to deal with the queue entries of the form A v 9f:(o; v) and it is initialised with these axioms. The entries are then processed in the following way: 1. The axioms of the form 9f:(o; v) v B that match the feature f of the data type in the queue entry are retrieved. 2. The data types are then compared using the eval() function. 3. If only the equality operator needs to be supported then the two data types are considered to be matching if their literal value is equal.

Support for other operators It is known that supporting arbitrary combinations of di erent operators leads to intractability [ 3 ]. In this implementation no checks are made to ensure that the ontology being classi ed complies with the restrictions that guarantee tractability. If non-compliant axioms are found then the reasoning procedure will be sound but possibly incomplete.

Adding support for other operators requires a modi cation to the eval() function that compares the data types when evaluating feature queue entries. The di erent combinations of operators and values have to be evaluated to determine if there is a match or not.

For example, consider the following axioms: toddler child

person u 9hasAge:( ; 3) person u 9hasAge:( ; 17)

After the normalisation process these axioms are transformed into the following: 9hasAge:( ; 17) v A child v person 9hasAge:( ; 3) v B toddler v person person u A v child child v 9hasAge:( ; 17) person u B v toddler toddler v 9hasAge:( ; 3)

These axioms allow us to infer that a toddler is also a child (but a child is not necessarily a toddler). This conclusion is derived when evaluating the expressions toddler v 9hasAge:( ; 3) and 9hasAge:( ; 17) v A. The eval() function in this case takes the arguments ( ; 3; ; 17) and returns a positive match because all the possible values of the rst operator-value pair are covered by the possible values of the second operator-value pair. Whenever this is not the case the function returns false. Notice that this happens in some cases regardless of the literal values. For example, assuming we are dealing with integer values, eval(x; <; y >) and eval(x; >; y; <) will always return false because no matter what values are assigned to x and y, the second operator-value pair will never be able to cover all the possible values expressed by the rst pair. 4.2

Concurrent classi cation This new version of Snorocket implements a multi-threaded saturation algorithm inspired by the algorithm used by ELK. The main idea of the algorithm is to split the computation into contexts that can be processed by workers independently while generating minimal locking overhead. Details of the original algorithm can be found in [ 5 ]. The main techniques in the algorithm can be applied in a straightforward manner to the CEL algorithm implemented by Snorocket. 5

Experimental results

Protege was used to compare the performance of Snorocket against four other ontology reasoners: FaCT++, HermiT, jCel, and ELK. The previous version of Snorocket was also included. Two OWL ontologies were used in the tests: SNOMED CT and AMT v3. Both of these were derived from the RF2 distribution les using the corresponding Perl scripts.

The experiments were run in a computer equipped with a 3.3 GHz Intel i5 processor with 4 cores, 8 GB of physical memory, and running Windows 7. Protege was run with Java 7 and a heap size of 4 GB. All the experiments use elapsed time as an indicator and use the external timing reported by Protege. The multi-threaded reasoners (ELK 0.32 and Snorocket 2.0.1) were run using 4 threads.

Table 2 shows the pro les of the selected ontologies and Table 3 shows the classi cation times, in seconds, achieved by the reasoners, averaged over 5 runs.

Ontology SNOMED CT AMT FACT++ 1.6.2 HermiT 1.3.7 jCel 0:15z ELK 0.32 Snorocket 1.3.4 Snorocket 2.0.1 The results show that the performance of the tableaux-based reasoners was very poor when classifying AMT. On the other hand, the specialised EL reasoners were able to classify it in a fraction of the time. ELK currently provides the best performance, which is expected since Snorocket's multi-threaded implementation is based on the same techniques but has not been optimised. Also, Snorocket only runs the saturation phase concurrently, while the rest of the steps are still run sequentially. 6

Conclusions and future work

This paper presented Snorocket 2.0 and compared it against its previous version and four other reasoners using two large medical ontologies. Even though ELK obtained the fastest results, Snorocket 2.0 achieved competitive performance. Snorocket's built-in support for SNOMED CT distribution formats makes it an interesting alternative to ELK for SNOMED CT-centric applications.

Future work will include adding multi-threading to the whole classi cation process and incorporating the restrictions necessary to ensure tractability when dealing with concrete domains, either as a hard restriction or as a warning to the user.

zThe current version of the jCel plugin is 0.18.2 but version 0.15 was the most recent one that was compatible with our testing environment.

1. Lawley , M. J. , Bousquet , C. : Fast classi cation in Protege: Snorocket as an OWL 2 EL reasoner . In: Proc. 6th Australasian Ontology Workshop (IAOA10) . Conferences in Research and Practice in Information Technology , pp. 45 { 49 . ( 2010 )

2. Baader , F. , Brandt , S. , Lutz , C. : Pushing the EL envelope . In: International Joint Conference on Arti cial Intelligence , p. 364 ( 2005 )

3. Magka , D. , Kazakov , Y. , Horrocks , I. : Tractable Extensions of the Description Logic EL with Numerical Datatypes . In: Proc. of the Int. Joint Conf. on Automated Reasoning (IJCAR 2010 ). LNAI, vol. 6173 , pp. 61 { 75 . Springer ( 2010 )

4. Kazakov , Y. , Krotzsch, M. , Simanck , F. : ELK Reasoner: Architecture and Evaluation . In: Proceedings of the 1st International Workshop on OWL Reasoner Evaluation, CEUR Workshop Proceedings , ( 2012 )

5. Kazakov , Y. , Kotzsch, M. , Simanck , F. : Concurrent Classi cation of EL+ Ontologies . In: The Semantic Web ISWC 2011 , pp. 305 { 320 ( 2011 )

6. Baader , F. , Lutz , C. , Suntisrivaraporn , B. : E cient reasoning in EL+ . In: Proceedings of DL 2006 , p. 189 ( 2006 )

7. Mendez , J. jcel: A Modular Rule-based Reasoner . In: Proc. of the 1st Int. Workshop on OWL Reasoner Evaluation (ORE12) , pp. 130 - 135 ( 2012 )

8. Tsarkov , D. , Horrocks , I.: FaCT++ description logic reasoner: System description . In: Proc. 3rd Int. Joint Conf. on Automated Reasoning (IJCAR 2006 ). LNCS, vol. 4130 , pp. 292 - 297 . Springer ( 2006 )

9. Motik , B. , Shearer , R. , Horrocks , I.: HermiT: Hypertableau Reasoning for Description Logics . Journal of Arti cial Intelligence Research 36 , pp. 165 { 228 ( 2009 )

10. Sirin , E. , Parsia , B. , Grau , B.C. , Kalyanpur , A. , Katz , Y. : Pellet: A practical OWL-DL reasoner . J. of Web Semantics 5 ( 2 ), pp. 51 - 53 ( 2007 )