Evaluating OWL 2 Reasoners in the context of Clinical
 Decision Support in Lung Cancer Treatment Selection

                           M. Berkan Sesen1 , Ernesto Jiménez-Ruiz2 ,
                          René Bañares-Alcántara1, Sir Michael Brady3
              1
                      Department of Engineering Science, University of Oxford, UK
                  2
                       Department of Computer Science, University of Oxford, UK
                        3
                          Department of Oncology, University of Oxford, UK


       Abstract. This paper evaluates the performances of the OWL 2 reasoners Her-
       miT, FaCT++ and Pellet in the context of an ontological clinical decision support
       system in lung cancer care. In the first set of experiments, we compare how the
       classification and realisation times of the L UCADA and L UCADA-S NOMED CT
       ontologies vary as we expand their TBoxes with additional guideline rule knowl-
       edge. In the second set of experiments, we investigate the effect of increasing the
       ABox of the L UCADA ontology on the realisation times.


1 Introduction
Lung cancer is the most common and deadliest type of cancer, and is responsible for
21% of all cancer-related deaths globally. In England, care decisions for lung cancer pa-
tients are made by multidisciplinary teams (MDTs) that are comprised of clinical staff
from diverse backgrounds. These teams meet weekly in cancer centres across the coun-
try in order to come to treatment decisions for each patient in their care. Usually, MDTs
make use of their combined experience and knowledge of published clinical guidelines
to decide upon the next stage of treatment for a patient [1]. The National Lung Can-
cer Audit (NLCA) data reveals that one of the major problems in the management of
lung cancer care in England is the substantial level of unjustified variation in treatment
decisions between different cancer centres [14, 13].
     In order to reduce variability in clinical practice, clinical guidelines provide well
defined sets of directions and evidence based standards to assist clinicians on decisions
about appropriate clinical procedures [6]. However, as unstructured and free-text doc-
uments, clinical guidelines are usually not readily accessible at the point of decision
making in the MDT meetings. Fortunately, clinical decision support (CDS) systems
that computerise and automate the daily management of guidelines can facilitate access
to guideline information in these meetings.
     The computerisation of guideline rules can be achieved by structured logical lan-
guages which can express guideline rule eligibility and decision criteria. To date, many
proprietary expression languages [4, 9, 11, 19, 20] have been proposed in order to en-
code and interpret guideline rules that are in a machine readable format. The interpreta-
tion of computerised guideline rules are carried out by execution engines that can match
the encoded guideline rule criteria against existing patient records in order to infer rule
applicability for different patient records.
    In [16], we proposed OWL 2 [2] as a suitable candidate for encoding guideline rule
criteria in the context of a CDS system for lung cancer care and we outlined a purely
ontological guideline rule inference framework. In this paper, we focus on performance
evaluations of off-the-shelf OWL 2 reasoners for inferring patient rule applicability
based on the guideline rule inference framework presented in [16].


2 LUCADA ontology
Since 2004, the NLCA has collected all lung cancer patient data in England within the
English Lung Cancer Dataset (L UCADA) [13] in order to gain a better understanding
of the care delivered during referral, diagnosis and treatment of lung cancer patients.
We have manually built a domain specific OWL 2 lung cancer ontology based on the
L UCADA data model.4 The L UCADA ontology provides the semantic layer of the Lung
Cancer Assistant [16], an ontology-based system that is capable of providing guideline
rule-based decision support during lung cancer MDT meetings.
    S NOMED CT [15] is the reference ontology of choice across the information sys-
tems within the National Health Service (NHS). Thus, to facilitate interoperability with
other NHS applications, we integrated L UCADA with a lung cancer-specific module
of S NOMED CT. To this end, we have (i) identified the classes in S NOMED CT related
to those in L UCADA and established correspondences (i.e. mappings) between them;
and (ii) extracted a small fragment of S NOMED CT that captures the meaning of such
relevant classes (i.e., a domain-specific module). S NOMED CT, however, is a complex
ontology describing more than 300,000 classes; as a result, computing mappings with
L UCADA is infeasible without suitable tool support. Thus, to perform task (i) we used
the interactive-mode of the ontology matching system LogMap [7, 8]. Additionally, in
order to perform task (ii), we used the ontology modularization technique described
in [3]. Table 1 provides a side by side comparison of L UCADA and the integrated ontol-
ogy L UCADA-S NOMED CT in terms of number of entities, axioms and expressivity.
    In order to incorporate lung cancer guideline knowledge, we introduced the patient
scenario class into both ontologies [16]. A guideline rule consists of an antecedent, i.e.
rule body, which specifies the eligibility criteria for the rule and a consequent, i.e. rule
head, which encapsulates the action(s) to take when the conditions in the antecedent
are satisfied [5]. According to our guideline rule inference framework, we represent
the guideline rule antecedents as defined patient scenario classes, whose equivalent
class capture the semantics for rule eligibility criteria. As an example, the eligibility
for the guideline rule5 “Consider radiotherapy for Stage I, II, III patients with good
performance status” is encoded as the following OWL 2 class equivalence axiom:
     GR1 ≡                       GoodPerformancePatient ⊓ ∃hasClinicalFinding.
                (NeoplasticDisease ⊓ ∃hasPreHistology.NonsmallCellCarcinoma ⊓
               ∃hasPreTNMStaging.string ⊓ ∀hasPreTNMStaging.{I, II, III})
 4
   Through a data sharing agreement between the University of Oxford and NLCA, we have been
   granted access to an anonymised version of L UCADA dataset.
 5
   The guideline rules have been extracted from from National Institute for Clinical Excellence
   (NICE) document [12].
   Table 1: Summary of the L UCADA and L UCADA-S NOMED CT ontology metrics

                         Ontology
                                       L UCADA-S NOMED CT          L UCADA
            Metric
            DL Expressivity                ALCHIF(D)             ALCHI(D)
            # Classes                           1553                  376
            # Object properties                  63                    37
            # Data Properties                    63                    63
            # Equiv. class axioms               1010                    0
            # Subclass of axioms                 999                  386
            # Prop. domain axioms                 97                   97
            # Prop. range axioms                  30                   30


    Furthermore, we represent a patient record as a set of OWL 2 individual axioms
with respect to the terminological knowledge captured within the L UCADA and the
integrated L UCADA-S NOMED CT ontologies as exemplified in [16]. According to this,
a patient record is characterised (on average) by 25 class and property assertion axioms.
An OWL 2 reasoner can be used to determine whether a specific patient is a member
of a particular patient scenario class, and therefore, subject to the recommendations or
actions of the respective guideline rule.


3 Evaluation
We evaluated the scalability of our guideline rule inference framework with off-the-
shelf OWL 2 reasoners: HermiT 1.3.7 [10], Pellet 2.3.0 [17] and FaCT++ 1.6.2 [18].
The tests have been performed on a Windows 7 64-bit desktop computer with 15 GiB
of RAM and an Intel Xeon 2.27 GHz CPU. Overall, we report two sets of experimental
results as given below. Note that all results reported here have been acquired as averages
of at least 10 repetitions of the described experimental setup.

3.1 Increasing the TBox with patient scenarios
In the first set of experiments we compared how the classification and realisation times
of L UCADA and L UCADA-S NOMED CT ontologies varied as we increased the guide-
line rule coverage (i.e. patient scenarios classes). To this end, we incrementally added to
each ontology 40 patient scenarios, represented as equivalent class axioms (see Section
2), and recorded the times taken by each reasoner to perform classification (i.e. exe-
cution of precomputeInferences(CLASS HIERARCHY) method) and realisation of
only one patient individual (i.e. execution of the method getTypes()).
    Figures 1 and 2 summarise the reasoning times obtained for the L UCADA and L U -
CADA -S NOMED CT ontologies respectively. In both figures, we only report the total
inference times (classification + realisation) for FaCT++ and Pellet since the individ-
ual realisation times for these two reasoners were negligible. However, for HermiT
                   250                                                                             250
                                 FaCT++ (total)
                                     Pellet (total)
                           HermiT (classification)
                             HermiT (realisation)

                   200                                                                             200


                   150                                                                             150
       Time (ms)


                   100                                                                             100


                    50                                                                             50


                     0                                                                             0
                           5            10           15         20         25      30   35    40
                                                     Number of patient scenarios


       Fig. 1: Reasoning times for L UCADA containing 1 to 40 patient scenarios


                                     FaCT++ (total)
                                         Pellet (total)
                   50000       HermiT (classification)                                        50000
                                 HermiT (realisation)


                   40000                                                                      40000
       Time (ms)


                    9000                                                                      9000


                    7000                                                                      7000


                    5000                                                                      5000


                    3000                                                                      3000


                    1000                                                                      1000

                           5             10          15         20         25      30   35   40
                                                     Number of patient scenarios


Fig. 2: Reasoning times for L UCADA-S NOMED CT containing 1 to 40 patient scenarios


we present classification and realisation times separately, since realisation takes up a
significant portion of the total inference time (up to 0.2ms for L UCADA and 1s for L U -
CADA -S NOMED CT ). We note that the classification times for all three reasoners are
below one second for the L UCADA ontology, whereas they rise to 9 and 50 seconds re-
                   70000                                                        70000
                               FaCT++
                                 Pellet
                                HermiT
                   60000                                                        60000


                   50000                                                        50000


                   40000                                                        40000
       Time (ms)


                   30000                                                        30000


                   20000                                                        20000


                   10000                                                        10000


                      0                                                          0
                           0              20   40              60   80        100
                                               Number of patients


                   Fig. 3: Realisation times in L UCADA with 1 to 100 patient records


spectively with FaCT++ and Pellet for the integrated L UCADA-S NOMED CT ontology.
Note that HermiT classifies the integrated ontology the fastest, with classification times
ranging from 1.6s to 2.2s.


3.2 Increasing the ABox with patient records

In the second set of experiments, we incrementally added 100 patient records, repre-
sented as OWL 2 individuals axioms (see Section 2), to the L UCADA ontology which
contained 40 patient scenarios. Figure 3 compares the realisation times (i.e. execution
of the method getTypes() for each patient individual) obtained by all three reasoners.
As expected, realisation times increase as more patients are added to the ontology. It is
noticeable that FaCT++ and HermiT have very disparate behaviours. While the increase
in realisation times with respect to the number of patient individual in the ontology is
fairly gradual and linear for FaCT++, the realisation times for HermiT increase very
quickly and clearly in a non-linear fashion. Although not as severe as the realisation
times achieved by HermiT, Pellet realisation times are also considerably slower com-
pared to FaCT++ and seem to increase non-linearly.


4 Conclusions

In this paper we evaluated empirically the classification and realisation performances
of the three most commonly used OWL 2 reasoners within our guideline rule inference
framework. We found that FaCT++ is the best choice for our application since it pro-
vides very fast inference times for both classification and realisation. We also found
that HermiT provides the fastest TBox reasoning times for the integrated L UCADA-
S NOMED CT ontology; but it performs poorly in ABox reasoning with both ontologies.
Finally, we found that Pellet performs well in classifying the L UCADA ontology but
struggles with the L UCADA-S NOMED CT ontology, which contains many axioms in-
herited from S NOMED CT.


Acknowledgements

The LCA project was funded by the CDT in Healthcare Innovation programme within
the Institute of Biomedical Engineering, Oxford University. We would also like to ac-
knowledge the clinical inputs from our collaborators Dr Michael Peake, Prof Fergus
Gleeson and Dr Donald Tse during the elicitation of guideline rules from the literature.
Jiménez-Ruiz was partially supported by the Seventh Framework Program (FP7) of
the European Commission under Grant Agreement 318338, ”Optique”, and the EPSRC
projects Score!, ExODA and MaSI3 .


References

 1. Austin, M.: Information Integration and Decision Support for Multidisciplinary Team Meet-
    ings on Colorectal Cancer. Ph.D. thesis, University of Oxford (2008)
 2. Cuenca Grau, B., Horrocks, I., Motik, B., Parsia, B., Patel-Schneider, P.F., Sattler, U.: OWL
    2: The next step for OWL. J. Web Sem. 6(4), 309–322 (2008)
 3. Cuenca Grau, B., Horrocks, I., Kazakov, Y., Sattler, U.: Modular reuse of ontologies: Theory
    and practice. J. Artif. Intell. Res. 31, 273–318 (2008)
 4. Fox, J., Johns, N., Lyons, C., Rahmanzadeh, A., Thomson, R., Wilson, P.: PROforma: a
    general technology for clinical decision support systems. Computer Methods and Programs
    in Biomedicine 54(12), 59 – 67 (1997)
 5. Horrocks, I., Patel-Schneider, P.F., Boley, H., Tabet, S., Grosof, B., Dean, M.: SWRL: A
    Semantic Web Rule Language Combining OWL and RuleML. Tech. rep., World Wide Web
    Consortium (2004)
 6. Isern, D., Sánchez, D., Moreno, A.: HeCaSe2: A Multi-agent Ontology-Driven Guideline
    Enactment Engine. In: 5th International Central and Eastern European Conference on Multi-
    Agent Systems. pp. 322–324 (2007)
 7. Jiménez-Ruiz, E., Cuenca Grau, B.: LogMap: Logic-based and Scalable Ontology Matching.
    In: Int’l Sem. Web Conf. (ISWC). pp. 273–288 (2011)
 8. Jiménez-Ruiz, E., Cuenca Grau, B., Zhou, Y., Horrocks, I.: Large-scale interactive ontology
    matching: Algorithms and implementation. In: European Conf. on Artif. Intell. (ECAI). pp.
    444–449 (2012)
 9. Miksch, S., Shahar, Y., Johnson, P.D.: Asbru: A Task-Specific, Intention-Based, and Time-
    Oriented Language for Representing Skeletal Plans. In: 7th Workshop on Knowledge Engi-
    neering: Methods & Languages (KEML-97) (1997)
10. Motik, B., Shearer, R., Horrocks, I.: Hypertableau reasoning for description logics. J. Artif.
    Intell. Res. 36, 165–228 (2009)
11. Musen, M.A., Tu, S.W., Das, A.K., Shahar, Y.: EON: A Component-Based Approach to
    Automation of Protocol-Directed Therapy. Journal of the American Medical Informatics
    Association 3(6) (1996)
12. NICE: The Diagnosis and Treatment of Lung Cancer (Update). National Collaborating Cen-
    tre for Cancer (UK). NICE Clinical Guidelines, No. 121. (2011), available from: http:
    //www.ncbi.nlm.nih.gov/books/NBK99021/
13. NLCA: The National Clinical Lung Cancer Audit (LUCADA) Data Manual (2010), available
    from: http://www.hscic.gov.uk/lung
14. Riaz, S.P., Lchtenborg, M., Jack, R.H., Coupland, V.H., Linklater, K.M., Peake, M.D., Mller,
    H.: Variation in surgical resection for lung cancer in relation to survival: Population-based
    study in england 20042006. European Journal of Cancer 48(1), 54 – 60 (2012)
15. Schulz, S., Cornet, R., Spackman, K.A.: Consolidating SNOMED CT’s ontological commit-
    ment. Applied Ontology 6(1), 1–11 (2011)
16. Sesen, M.B., Bañares-Alcántara, R., Fox, J., Kadir, T., Brady, J.M.: Lung Cancer Assistant:
    An ontology-driven, online decision support prototype for lung cancer treatment selection.
    In: OWL: Experiences and Directions Workshop (OWLED) (2012)
17. Sirin, E., Parsia, B., Grau, B.C., Kalyanpur, A., Katz, Y.: Pellet: A practical OWL-DL rea-
    soner. J. Web Sem. 5(2), 51–53 (2007)
18. Tsarkov, D., Horrocks, I.: FaCT++ Description Logic Reasoner: System Description. In:
    Third International Joint Conference on Automated Reasoning, IJCAR. pp. 292–297 (2006)
19. Tu, S.W., Campbell, J.R., Glasgow, J., Nyman, M.A., McClure, R., McClay, J., Parker, C.,
    Hrabak, K.M., Berg, D., Weida, T., Mansfield, J.G., Musen, M.A., Abarbanel, R.M.: The
    SAGE Guideline Model: Achievements and Overview. Journal of the American Medical
    Informatics Association 14(5), 589 – 598 (2007)
20. Wang, D., Peleg, M., Tu, S.W., Boxwala, A.A., Ogunyemi, O., Zeng, Q.T., Greenes, R.A.,
    Patel, V.L., Shortliffe, E.H.: Design and implementation of the GLIF3 guideline execution
    engine. Journal of Biomedical Informatics 37(5), 305–318 (2004)