Lung Cancer Assistant: An Ontology-Driven, Online Decision
       Support Prototype for Lung Cancer Treatment Selection

     M. Berkan Sesen, MSc1, Rene Banares-Alcantara, PhD1, John Fox,
            PhD1, Timor Kadir, PhD1, J. Michael Brady, PhD1
                       1
                         University of Oxford, UK

Abstract
This paper describes the modelling of the LUCADA lung cancer ontology in OWL 2
and how this ontology is utilised by the online clinical decision support application
Lung Cancer Assistant (LCA) for categorising patients and producing guideline-
based treatment recommendations with the help of ontological inference. LCA is
aimed to assist clinicians by interpreting existing patient data and use the results of
this analysis to make meaningful predictions and facilitate the implementation of
clinical guideline rules into daily practice.
1.    Introduction
Lung cancer is the most common type of cancer and constitutes 21% of all cancer-
related deaths. The two key elements in improving the survival rates for lung cancer
have been reported as earlier diagnosis and referral to specialist multidisciplinary
teams (MDTs) [1]. The main challenge for MDTs is that no single member of the
team can sketch a comprehensive picture of the whole care pathway for a cancer
patient on their own due to the complex and interdisciplinary nature of the treatment
selection process.
In order to achieve the best outcomes, MDTs must keep abreast of an ever-growing
flood of data from various disciplines and sources. This poses a significant
informatics challenge, which can be addressed by using a clinical decision support
(CDS) system that can assist clinicians by interpreting existing patient data for
making meaningful predictions and facilitating the implementation of clinical
guideline rules into daily practice.
With this motivation, we are building Lung Cancer Assistant (LCA), an online,
ontology-driven CDS prototype that aims to assist lung cancer experts in developing
patient-specific and evidence-based treatment decisions that would maximise the
benefits for the patient. The project is a joint effort of the Institute of Biomedical
Engineering of Oxford University and the National Lung Cancer Audit (NLCA) of
the NHS. LCA is being built and tested on the English lung cancer database
(LUCADA), which consists of approximately 115,000 patient records.
In this paper, we describe how we have designed the LUCADA ontology, which is a
domain-specific OWL 2 EL module of SNOMED-CT and how we make use of this
ontology to formalise clinical guideline rules and to categorise patient entries.
2.    Background
The adoption of clinical guidelines improves the overall quality of patient care and
helps reduce practice variability and care expenses [2]. CIG formalisms have emerged
over the last two decades with the aim of improving the acceptance, maintenance and
daily application of clinical guidelines by providing guideline-based
recommendations for clinicians and recording the decisions and actions taken.
To date, the Guideline Interchange Format (GLIF3) [3], [4], Asbru [5], EON [6],
SAGE [7] and PROforma [8] have been the most prominent of CIG models in terms
of the number of clinical applications and scientific publications [2], [9], [10],[11].
All of these CIG formalisms come with proprietary expression languages to specify
the decision criteria and to regulate the propagation of rules. In this work, we propose
OWL-2 [12] as an alternative expression language that can be utilised for these
purposes. This proposition is motivated by various reasons. First of all, none of the
CIG formalisms above is suitable for the combination of guideline decision support
with probabilistic techniques, which is a part of LCA that is being developed at the
moment but is not discussed in this paper. In addition, some of the aforementioned
CIG formalisms are discontinued or not maintained anymore, some do not have an
execution engine and some just do not support the integration of an external
standardised knowledge base, such as an ontology.
Researchers increasingly focus their efforts on the standardisation and the
consolidation of the widening net of terminology and relations used in the
multidisciplinary domain of cancer. From a CDS application’s perspective,
interoperability and adoption of a standardised vocabulary for representing domain
knowledge is of utmost importance. SNOMED-CT is the most-comprehensive
clinical ontology and has been approved by the NHS Information Standards Board as
the full fundamental standard for all medical information applications with the
National Health Service (NHS) of the UK. Therefore, we have chosen to model the
domain knowledge for LCA in SNOMED-CT. Another major advantage of using
OWL-2 for rule representation is the convenience of seamlessly extending the
ontology with additional class expressions. Due to its size, the adoption of SNOMED-
CT for a domain specific application entails the extraction of a module.
3.   Modelling of LUCADA Ontology
The LUCADA database is maintained by NLCA with the aim of improving the
outcome for people diagnosed with lung cancer and mesothelioma. It currently
consists of 114 data items and around 115,000 patients. Detailed descriptions of all
data items can be found in the LUCADA Data Manual v3.1.3 on the web [13].
In order to create the ontological representation of the tabular LUCADA database
structure, we have manually extracted a domain-specific OWL 2 EL module of
SNOMED-CT, which only includes concepts that are immediately relevant to
represent the 114 LUCADA data items. This module constitutes the clinical domain
of the LUCADA ontology and it covers all data items in the database, enabling a
semantically accurate mapping of patients between the database and the ontology. In
addition to the clinical domain, LUCADA ontology contains a small argumentation
domain, which holds the classes that are used for guideline rule inference and
argumentation. Figure 1 depicts the higher level classes and properties in the
LUCADA ontology.
For the clinical domain, we have identified some key SNOMED-CT concepts that
were essential for representing a patient record in LUCADA. Based on this, we
modelled a data item either as an object property or a data property in the ontology.
The object properties and data properties, added to the ontology for this purpose, are
shown in blue and green respectively in Figure 1. In its current version, the LUCADA
ontology T-Box consists of 396 classes, 35 object properties and 60 datatype
properties.


Figure 1. The LUCADA Ontology: The orange circles represent SNOMED-CT classes
(concepts) and the black circles represent non-SNOMED-CT argumentation classes. The
blue arrows represent object properties between the classes and the green list items
represent the datatype properties belonging to the respective classes.
In order to keep the size of the module small and data abstraction concise, we have
prioritised modelling data items as data properties of relevant classes. Representation
in the form of an object property often entailed the inclusion of a corresponding new
class in addition to the object property representing the data item and therefore was
only chosen in the following situations:
1) When the creation of a distinctive class for a data item could be beneficial to
   model other data items through the newly created class’ properties. An example
   of this is the ‘hasTreatmentPlan’ object property which has domain Patient and range
   Treatment Plan. Here, the inclusion of Treatment Plan as a separate concept enables
   connecting it to constituent treatment options under Procedure with the
   ‘includesTreatment’ object property. This structure enables modelling a patient with a
   suggested treatment plan which includes various distinct treatment types.
2) When more than one data item could be grouped under a common parent
   concept. An example of this is the introduction of the SNOMED-CT concept
   Clinical Finding, subsuming all diseases such as Primary (Cancer) Diagnosis,
   Dementia, Cardiovascular Disease and other comorbidities. This allows use of a
   single object property, ‘hasClinicalFinding’, to connect a Patient individual to all
   (taxonomically) disease-related concepts taken from the database.
3) When suitable object properties already existed within SNOMED-CT. While
   building the LUCADA ontology, precedence was given to using the properties
   inherently defined in SNOMED-CT. These are originally called defining
   attributes but translate into object properties in the OWL-2 representation of
   SNOMED-CT.
Some data items did not have one to one mappings with a SNOMED-CT concept. In
such cases, the data items were modelled as compound concept definitions. For
instance, the ‘Severe Weight Loss’ data item, which is one of the comorbidity types in
the database, was modelled as “Weight Loss Finding ’Severity’ Severe”, where Weight Loss Finding
and Severe are classes and ‘Severity’ is an object property in SNOMED-CT. Furthermore
some data items were just too complex to be represented within the vocabulary of
SNOMED-CT. As a result, 13 new classes had to be manually added to the LUCADA
ontology. An example of such a class is Induction Chemotherapy to downstage before surgery,
which appears as a treatment plan type in LUCADA. Since modelling this as a
compound structure was not feasible, a concept with the same name was added to the
LUCADA ontology. It is worthwhile to mention that for this very concept, a concept
addition request was made to the SNOMED-CT UK Terminology Centre. After their
review, the request was accepted on July 8 2011 and the concept has been added in
the October 2011 SNOMED-CT release.
4.   Patient and Guideline Rule Representation in LUCADA Ontology
Once the ontology design was complete, we have written a program using Java v.6
and Java OWL API v.3.2.3 [14] for the automatic transfer of patients from the
database to LUCADA ontology. Figure 2 depicts how the fictitious patient,
Jenny_Sesen is represented in the ontology. As can be seen, we create corresponding
Patient, Cancer and Histology individuals (depicted as purple diamonds) to represent
Jenny_Sesen’s unique database record given in Figure 2a. In cases where the data
item to be represented is not patient-specific, such as Tumour Laterality: Right, we
use a reference individual to represent that standard value rather than creating a new
individual for every patient entry. This implementation of reference individuals is
mostly used for standard valued concepts such as Severity, Side, Decision, and Treatment
Plan.
Figure 2. (a) The tabular database representation of the patient record ‘Jenny_Sesen’ (b)
The ontological representation of ‘Jenny_Sesen’ in the LUCADA ontology, where the
purple diamonds are individuals and orange circles are classes, blue arrows are object
properties and green list items are data properties in the ontology.
Following the transfer of patients to the ontology, we have expanded the Java
program to include a guideline rule inference framework, which is used to represent
guideline rule criteria as Patient Scenario class expressions in the ontology. The hybrid
Patient Scenario class is a subclass of both SNOMED-CT’s Patient class and the
proprietary Argumentation class. It can conceptually be regarded as a hypothetical patient
cohort that fulfils a guideline’s rule criteria. We can demonstrate this by analysing a
guideline rule taken from the NICE Lung Cancer 2011 Guidelines [1].
“Offer chemotherapy to patients with stage III or IV NSCLC and good performance
status (WHO 0, 1 or a Karnofsky score of 80–100).”                     (R1)
We can break this rule down into two functional components: 1) the head, which
specifies the patients for whom the rule is applicable; 2) the body, which specifies the
action(s) to take when the specific conditions in the head are satisfied [15]. The head
of R1 can be written in the OWL-2 syntax as a compound logical expression given in
(E1):
“(hasClinicalFinding some (NeoplasticDisease and ((hasPreTNMStaging value "III") or
(hasPreTNMStaging value "IV")) and (hasPreHistology some NonsmallCellCarcinoma)))
and (hasPerformanceStatus some (WHOPerfStatusGrade0 or WHOPerfStatusGrade1))” (E1)

(E1) can be translated into plain English as: (Individuals) whose performance status is
either 0 or 1 and who have Neoplastic Disease with TNM staging of either 3 or 4 and
histology finding type non-small cell cancer. In order to formalise the guideline rules
in the ontology, we create a Patient Scenario subclass for each such rule and then add a
corresponding equivalence class expression as given in (E1). Adoption of the OWL 2
EL profile for the LUCADA ontology allowed us to make use of the existential
quantification restrictions employed in Patient Scenario class expressions. When an
ontology reasoner is run, all patients that fit the equivalent class expression, i.e.
guideline rule criteria, of a specific rule are automatically inferred to be members of
this particular Patient Scenario.


Figure 3. A reference-individual-level connection between a Patient Scenario, a Decision
and a Treatment Plan, which represent the rule body statement “Offer Chemotherapy”
What action(s) to take for the members of a particular Patient Scenario, i.e. the rule body,
are then explicated through an object property relation between the reference
individuals of the Patient Scenario, Decision and Treatment Plan classes. This is depicted in
Figure 3 for the body of rule R1.

5.   Lung Cancer Assistant
The LUCADA ontology and the guideline inference framework have been brought
together within the Lung Cancer Assistant (LCA) web-based decision support
application, which is developed using the GWT Software Development Kit (SDK) for
Java. As with all web-based software, the Lung Assistant architecture is separated
into two components: 1) Client-side code that runs on an end user’s local computer
and connects to a server as necessary; 2) Server-side code that runs on a remote server
and is reachable from an end user’s local computer on demand.
The LUCADA ontology and the database are kept on the server-side and are utilised
through event calls that are triggered from the client-side when a user interacts with
the application. The patient interactions are handled by various ontology editing
classes, database to ontology converting classes and the rule inference framework
classes on the server side. All ontology-related classes were written using the OWL
API v3.2.3.
The current version of LCA allows creating, updating and saving patients records.
Following any of these user events, the ontology reasoner is triggered to ‘reclassify’
the ontology on the server-side, which ensures that newly entered data are reflected in
the patient-specific guideline recommendations immediately. Due to the number of
patients in the database, storing patient-record individuals in the ontology was not
feasible in terms of classification. When all 115,000 patient records were added, the
ontology approximated to 1 GB in size with over 700,000 individuals. None of the
commonly used reasoners, i.e. HermiT, Fact++, Pellet, could manage to finish
classifying this ontology. Therefore, we have designed the architecture, in which upon
creation each patient is temporarily added to the ontology for classification and then
removed. The inferred Patient Scenario memberships are then stored in a separate table
in the database to record what rules apply to each patient.


Figure 4. A screenshot of the Treatment tab of LCA: The patient specific treatment
arguments are displayed within the Treatment Options box which is automatically
generated as a result of the rule inference framework explained
The data item fields in the Lung Assistant UI are distributed into different tabs in the
order of the corresponding LUCADA sections given in [13]. A screenshot of the
Treatment Tab of the application is given in Figure 4. When a new patient is saved or
an existing patient is loaded, the treatment recommendations are displayed in the
scrollable Treatment Options box.

6. Discussion and Future Work
In the current version of LCA, ontological inference is used to create patient-specific
treatment arguments by automatically grouping patients based on guideline rules in
the ontology. So far, we have input all rules that concern treatment selection in the
British Thoracic Society guidelines into LCA. Our next task will be extracting and
formalising rules from the National Institute for Clinical Excellence guideline
documents into the system.
In addition, we are currently benchmarking various Bayesian techniques to integrate
with the current guideline-based decision support functionality. The probabilistic
inference, provided by the suitable Bayesian classifier, is intended to reinforce the
qualitative patient-specific arguments by quantifying their reliability and introducing
degrees of support based on the argument claim’s significance on survival.

7. References
[1]      NICE, “Quick Reference Guide Lung cancer,” 2011.
[2]      P. D. E. Clercq, K. Kaiser, and A. Hasman, “Computer-interpretable
Guideline Formalisms,” Studies In Health Technology And Informatics, vol. 139, pp.
22-43, 2008.
[3]      A. Boxwala et al., “GLIF3: a representation format for sharable computer-
interpretable clinical practice guidelines.,” Journal of biomedical informatics, vol. 37,
no. 3, pp. 147-61, Jun. 2004.
[4]      D. Wang et al., “Design and implementation of the GLIF3 guideline
execution engine.,” Journal of biomedical informatics, vol. 37, no. 5, pp. 305-18, Oct.
2004.
[5]      S. Miksch, Y. Shahar, and P. Johnson, “ASBRU: A task Specific, Intention-
based, and Time Oriented Language for Representing Skeletal Plans,” Aids, pp. 1-25,
1997.
[6]      M. Musen, S. TU, and K. Das, “EON: a component-based approach to
automation of protocol-directed therapy,” Journal of the American Medical
Informatics Association, vol. 3, no. 6, 1996.
[7]      P. Ram et al., “Executing clinical practice guidelines using the SAGE
execution engine.,” Studies in health technology and informatics, vol. 107, no. Pt 1,
pp. 251-5, Jan. 2004.
[8]      D. Sutton and J. Fox, “The Syntax and Semantics of the PRO forma
Guideline Modeling Language,” Journal of the American Medical Informatics
Association, vol. 10, no. 5, pp. 433-443, 2003.
[9]      A. Boxwala et al., “Toward a representation format for sharable clinical
guidelines.,” Journal of biomedical informatics, vol. 34, no. 3, pp. 157-69, Jun. 2001.
[10]     D. Isern and A. Moreno, “Computer-based execution of clinical guidelines: a
review.,” International journal of medical informatics, vol. 77, no. 12, pp. 787-808,
Dec. 2008.
[11]     M. Peleg, S. Tu, and J. Bury, “Comparing Guideline Models : A Case-study
Approach,” Journal of the American Medical Informatics Association, pp. 52-68,
2003.
[12]     W3C, “OWL 2,” 2009. [Online]. Available:
http://www.w3.org/TR/2009/PR-owl2-overview-20090922/.
[13]     The National Lung Cancer Audit, “The National Clinical Lung Cancer Audit
(LUCADA) Data Manual,” 2010.
[14]     M. Horridge and S. Bechhofer, “OWL API Version 3.2.3,” 2011. [Online].
Available: http://owlapi.sourceforge.net/.
[15]     W3C, “SWRL: A Semantic Web Rule Language Combining OWL and
RuleML,” 2011. [Online]. Available: http://www.w3.org/Submission/SWRL/.