Concept

CEL-A Short System Demonstration∗

Boontawee Suntisrivaraporn

Franz Baader

Carsten Lutz Theoretical Computer Science

TU Dresden

Germany

baader

clu}@tcs.inf.tu-dresden.de

Introduction to the CEL system.

168 174

Description logics (DLs) are an important family of formalisms for reasoning about ontologies. CEL (Classifier for E L) is a free (for non-commercial use) LISP-based reasoner for the description logic E L+ [2], supporting as its main reasoning task the computation of the subsumption hierarchy induced by E L+ ontologies. The most distinguishing feature of CEL is that, unlike other modern DL reasoners, it implements a polynomial-time algorithm, which allows it to process very large ontologies in reasonable time. The underlying description logic E L+ is a practically useful sub-language of E L++ introduced in [1]. Despite being less expressive than other description logics such as SHIQ and OWL, it offers a selected set of expressive means that are tailored towards the formulation of biological and medical ontologies, some of the most prominent of which are the Gene Ontology (Go) [7], the Galen Medical Knowledge Base (Galen) [5], and the Systematized Nomenclature of Medicine (Snomed) [3, 6].

Concept

the top concept the bottom concept conjunction existential restriction

DL Syntax CEL Syntax

⊤ top ⊥ bottom C1 ⊓ · · · ⊓ Cn (and C1 · · · Cn)

∃r.C (some r C) • At least 8MB of available hard-disk space.

CEL as a stand-alone reasoner. In order to use CEL to classify an ontology, the user must already have the ontology formulated in EL+ in a small extension of the KRSS syntax [ 4 ], henceforth called CEL syntax. With this LISP-like syntax, it is easy to port existing ontologies that have been used with well-known DL reasoners like FaCT and RACER. Table 1 shows the concept constructors available in EL+ that can be used to form concept descriptions. For building up ontologies, the expressive means shown in Table 2 can be used, where conventionally A, B denotes a named concept, C, D concept descriptions, and r, s named roles. Though only implies and role-inclusion axioms can sufficiently model any EL+ ontology, it is often very useful and also makes the ontology more comprehensible to provide auxiliary axioms. An EL+ ontology is effectively a text file containing axioms of the forms shown in the right column of Table 1.

For example, Figure 1 shows a medical ontology about the concept Endocarditis (an inflammation of a heart valve) formulated in CEL syntax. The user can either load this ontology into the system by calling (load-ontology "med.tbox") or enter interactively at the prompt each axiom of the ontology. The preprocess is carried out while the ontology is being loaded, and once this is finished, (classify-ontology) can be invoked 2Considerably more memory may be needed for larger ontologies. (define-primitive-concept Endocardium (and Tissue (some contained-in HeartWall) (some contained-in HeartValve))) (define-primitive-concept HeartWall (and BodyWall

(some part-of Heart))) (define-primitive-concept HeartValve (and BodyValve

(some part-of Heart))) (define-primitive-concept Endocarditis (and Inflammation

(some has-location Endocardium))) (define-primitive-concept Inflammation (and Disease

(some acts-on Tissue))) (define-concept Heartdisease (and Disease

(some has-location Heart))) (implies (and Heartdisease

(some has-location HeartValve))

CriticalHeartDisease) to classify all concept names occurring in the ontology (eager subsumption approach). Subsumption query between two concept names can be queried using (subsumes? B A). If this is called after classification, it simply looks up in the computed subsumption hierarchy. Otherwise, it runs a single subsumption test and answers without needing to classify the whole ontology first (lazy subsumption approach). After having classified the whole ontology, CEL allows the user to output the classification results in different formats: (output-supers) to output the subsumer sets for all concept names occurring in the ontology; (output-taxonomy) to output the Hasse diagram of the subsumption hierarchy, i.e., just the direct parent-child relationships; and (output-hierarchy) to output the hierarchy as a graphical indented tree. As an example, Figure 2 depicts screen shots of the results of (output-hierarchy) and (output-taxonomy) after classifying the ontology med.tbox.

Through its command-line options, CEL can also work as a stand-alone reasoner without interaction from users. For instance, the command line:

$cel -l filename -c -outputHierarchy -q can be entered to load and classify an ontology from filename , and then output the hierarchy. For a more detailed description of the CEL interface, we refer to the CEL user manual (available on the CEL homepage).

CEL as a backend reasoner. Alternatively, the user can also exploit CEL reasoning capabilities through the DIG interface3 and a graphical ontology editor. To do this, CEL has to be started as a DIG reasoning server by the following command line: $cel -digServer [port ] where port is defaulted to 8080 but can be overridden.

Once started in this mode, ontology editors can be connected to CEL and exploit its reasoning power either locally or remotely via the Internet. The upper floating dialog in Figure 3, “Reasoner Inspector,” displays the expressive means that can be handled by CEL in terms of DIG language. The DIG interface for CEL has been tested successfully with Prot´eg´e OWL editor.4 The main window in Figure 3 illustrates the asserted subsumption hierarchy (input) and the inferred subsumption hierarchy (output) within the editor, whereas the small floating dialog, “Connected to CEL 0.9,” displays the interactions between the DIG client and the DIG server.

3The DIG (DL Implementation Group) interface is an XML-based standard that defines an interfacing language for seamless communication between a DL service provider (DIG server) and a DL application (DIG client). See http://dl.kr.org/dig/

4See http://protege.stanford.edu/plugins/owl/ Empirical Success of CEL.

We have compared the performance of CEL with three of the most advanced DL systems: FaCT++ (v1.1.0), RacerMaster (v1.9.0), and Pellet (v1.3b). These systems implement tableau-based decision procedures for expressive DLs in which subsumption is ExpTimecomplete. All experiments have been performed on a PC with 2.8GHz Intel Pentium 4 processor and 512MB memory running Linux v2.6.14. For Pellet, we used JVM v1.5 and set the Java heap space to 256MB (as recommended by the implementers).

Our experiments are based on three important bio-medical ontologies: Go, Galen, and Snomed. Since Galen uses some expressivity that CEL cannot handle, we have simplified it by removing inverse role axioms and treating functional roles as ordinary ones, and obtained an EL+ ontology OGalen. (Of course, also the other reasoners, which could have handled inverse and functional roles, were applied to OGalen rather than full Galen.) We have obtained two other benchmarks, OGo and OSnomed, from the other two ontologies. However, Snomed has two right-identity rules similar to the third role axiom in our example (see Fig. 1). These axioms are passed to CEL, but not to the other reasoners, as the latter do not support right identities. Additionally, to get a smaller version of Snomed that can be dealt with by standard DL reasoners, concept axioms role axioms concept names role names CEL FaCT++ RacerMaster Pellet 5.8 6.9 19 1,357

OGalen

OcSonroemed

OSnomed 20,465/0/0 2,041/699/1,214

1 438 20,465 2,740 1 413 we also consider a fragment obtained by keeping only concept definitions, and call it

Snomed. Some information on the size and structure of these benchmarks is given in Ocore the upper part of Table 3, where the first row shows the numbers of primitive concept definitions, concept definitions, and general concept inclusions, respectively. The results of our experiments are summarized in the lower part of Table 3, where all classification times are shown in seconds and unattainable means that the reasoner failed due to memory exhaustion. Notable, CEL outperforms all the reasoners in all benchmarks except OGalen, where RacerMaster is as fast. CEL and FaCT++ are the only reasoners that can classify OSnomed, whereas RacerMaster and Pellet fail. Pellet and the original Snomed. version of FaCT (not shown in the table) even fail to classify Ocore

The empirical results for the performance of CEL described above show that it can compete with, and often outperforms, the fastest tableau-based DL systems. The robustness and scalability of tractable reasoning is visible, especially in the case of Snomed with almost four hundred thousand concepts. We view these results as evidence of empirical success of the CEL system, and also as a strong argument for the use of tractable DLs based on extensions of EL provided that the expresivity is sufficient in the domain of interest.

[1]

Baader ,

Brandt , and

Lutz . Pushing the EL envelope . In Proc. of the 19th Int. Joint Conf. on Artificial Intelligence (IJCAI 2005 ), Edinburgh, UK, 2005 . Morgan-Kaufmann Publishers.

[2]

Baader ,

Lutz , and

Suntisrivaraporn . Efficient reasoning in EL+ . In Proceedings of the 2006 International Workshop on Description Logics (DL2006) , CEURWS, 2006 .

[3]

Cote ,

Rothwell ,

Palotay ,

Beckett , and L. Brochu. The systematized nomenclature of human and veterinary medicine . Technical report , SNOMED International, Northfield, IL: College of American Pathologists, 1993 .

[4]

Patel-Schneider and

Swartout . Description-logic knowledge representation system specification from the krss group of the arpa knowledge sharing effort . Technical report, DARPA Knowledge Representation System Specification (KRSS) Group of the Knowledge Sharing Initiative , 1993 .

[5]

Rector and I. Horrocks. Experience building a large, re-usable medical ontology using a description logic with transitivity and concept inclusions . In Proceedings of the Workshop on Ontological Engineering, AAAI Spring Symposium (AAAI'97) , Stanford, CA, 1997 . AAAI Press.

[6]

K.A.

Spackman . Managing clinical terminology hierarchies using algorithmic calculation of subsumption: Experience with SNOMED-RT . Journal of the American Medical Informatics Association (JAMIA) , 2000 . Fall Symposium Special Issue.

[7]

The

Gene Ontology Consortium . Gene Ontology: Tool for the unification of biology . Nature Genetics , 25 : 25 - 29 , 2000 .