Genomic CDS: an example of a complex ontology for
      pharmacogenetics and clinical decision support

                                     Matthias Samwald1
                       1
                        Medical University of Vienna, Vienna, Austria
                       matthias.samwald@meduniwien.ac.at


       Abstract. Individual genetic data can be used to better predict the efficacy and
       safety of medications for individual patients. The Genomic Clinical Decision
       Support (Genomic CDS) ontology aims to utilize advanced Web Ontology
       Language 2 (OWL 2) reasoning for this task. The important, clear-cut medical
       use case, the complex axioms in the ontology and the heavy use of qualified
       cardinality restrictions make the ontology an interesting test object for new
       OWL 2 reasoners with improved performance.

       Keywords: OWL, pharmacogenetics, clinical decision support


1      Motivation

Different patients can react drastically different to the same type of medication (Fig.
1). The goal of personalized medicine and pharmacogenetics is to predict an individu-
al patient’s response by analyzing genetic markers that influence how medications are
metabolized or able to bind to their targets.


 Fig. 1. The efficacy and safety of medications can drastically vary between patients. The goal
 of pharmacogenetics is to classify patients into subgroup based on genetic markers, to better
                 predict which treatments could help and which could do harm.

To produce clinically valid and trustworthy predictions, no errors or ambiguities
should arise in the process of inferring a patient’s likely response from raw genetic
data. Current formalisms, data infrastructures and software applications leave many
opportunities for introducing such errors and ambiguities. Ontologies formalized with
the Web Ontology Language 2 (OWL 2) could be an excellent choice for tackling this
problem, but the complexity and potentially large scale of ontologies in this domain
also pose formidable challenges to currently available OWL 2 reasoners.


2      The Genomic CDS ontology

The Genomic Clinical Decision Support (Genomic CDS) ontology is an OWL 2 on-
tology aimed at representing pharmacogenetic knowledge and providing clinical deci-
sion support based on pharmacogenetic data. It is being developed by members of the
Clinical Pharmacogenomics Task Force, which is part of the Health Care and Life
Science Interest Group of the World Wide Web Consortium (W3C). The OWL files
of the ontology, as well as ‘demo’ files containing example patient data can be down-
loaded from http://www.genomic-cds.org/ont/snapshot-june-2013
We also created a simplified version of the Genomic CDS ontology, called ‘Genomic
CDS light’, which does not contain some of the axioms of the full ontology. Both
versions of the ontology have ALCQ expressivity. They are characterized by exten-
sive use of qualified cardinality restrictions.

The goals of developing the ontology are:
 Providing a simple and concise formalism for representing pharmacogenetic
  knowledge,
 Finding errors and lacking definitions in pharmacogenetic knowledge bases
 Automatically assigning alleles and phenotypes to patients
 Matching patients to clinically appropriate pharmacogenetic guidelines and clinical
  decision support messages

In the most common scenario, genetic patient data in OWL format is combined with
the axioms of the Genomic CDS ontology, and an OWL reasoner is used to infer
matching pharmacogenetic treatment recommendations. Several inference steps are
needed to derive matching treatment recommendations from raw data about genetic
markers (Fig. 2). The raw data consists of small variants in the genetic code, which in
most cases are so-called single nucleotide polymorphisms (SNPs), such as an ‘A’
instead of a ‘G’. Alleles are variants of a gene that are defined by containing sets of
such small variants. Phenotypes are referring to the specific effects that certain small
variants and alleles can have on the organism, e.g., how quickly a patient metabolizes
a specific drug. Clinical guidelines can use small variants, alleles and/or phenotypes
to match patients with treatment recommendations

The human genome usually contains two copies of each gene (one from the father,
one from the mother), with each copy potentially bearing multiple genetic variants.
Because of this, the ontologies rely heavily on qualified cardinality restrictions with
cardinalities of two, which seems to cause performance issues with most current
OWL reasoners.


 Fig. 2. : Through a series of inference steps, matching pharmacogenetic treatment guidelines
                           are inferred from raw genetic patient data.

A simplified example of a rule for inferring an allele (CYP2C9*3) and its single nu-
cleotide polymorphisms (SNPs) from a so-called ‘tagging SNP’ (a SNP that is neces-
sary and sufficient for inferring the presence of the allele) looks like this in Manches-
ter syntax:
Class: 'human with CYP2C9*3'
    EquivalentTo:
           has some rs1057910_C
    SubClassOf:
       has some 'CYP2C9 *3',
       (has some rs1057910_C)
        and (has some rs1057911_A)
        and (has some rs1799853_C)
        and (has some rs2256871_A)
        and (has some rs72558188_AGAAATGGAA)
An example of an axiom for inferring an adequate clinical decision support message
for the anticoagulant drug warfarin (based on a combination of alleles and SNPs ac-
cording to an official recommendation in the drug label):

Class: 'human triggering CDS rule 7'
  EquivalentTo:
   (has some 'CYP2C9*1') and (has some 'CYP2C9*3')
    and (has exactly 2 rs9923231_C)
   Annotations:
     label "human triggering CDS rule 7",
     CDS_message "3-4 mg warfarin per day should
       be considered as a starting dose range for
       a patient with this genotype according to
       the Warfarin drug label (Bristol-Myers
       Squibb)."

We used two OWL 2 reasoners with our ontology: TrOWL1 [1] and HermiT2 [2]. We
also evaluated other OWL 2 reasoners (Fact++3, Pellet4) in early stages of the project,
but excluded them from further tests because they did not terminate or crashed even
with small, preliminary versions of the ontology we developed. We compared the
performance of the two reasoners on a virtual machine running on the Amazon Elastic
Cloud Computing (EC2) cloud5. The machine was of the “High-Memory Extra Large
Instance” type, running Microsoft Windows Server 2008, with 17.1 GB of memory, a
64-bit platform, and two virtual cores with 3.25 EC2 compute units each.
The reasoners were run as plugins in the 64 bit version of the Protégé 4.2 ontology
editor. The initial heap size for Protégé was 1010 bytes (10 GB), and the maximum
allowed heap size was 1.5x1010 bytes (15 GB). The TrOWL reasoner plugins with
version 0.6 and 1.1 were each run three times for each ontology, and the mean of the
time needed for classification was calculated. The HermiT 1.3.8 plugin was run once
for each version of the ontology.

These preliminary tests showed TrOWL to be significantly more performant than
HermiT for classifying the ontologies (
Table 1). However, HermiT was able to identify biologically meaningful inconsisten-
cies present in genomic-cds-demo.owl (but not present in the light version of the on-
tology). TrOWL did not recognize these inconsistencies, most likely because it only
partially covers the OWL 2 DL ruleset. These results show that only TrOWL is per-
formant enough to be used in realistic settings (e.g. for clinical decision support), but
that HermiT could serve to test and validate the results from TrOWL during develop-


1
    http://trowl.eu
2
    http://www.hermit-reasoner.com/
3
    http://code.google.com/p/factplusplus/
4
    http://clarkparsia.com/pellet/
5
    http://aws.amazon.com/en/ec2/
ment (possibly comparing the results of the two reasoners for smaller ontology frag-
ments).

    Table 1. Reasoning performance: TrOWL is significantly more performant than HermiT in
              classifying our demo ontology (OWL 2 DL with ALCQ expressivity)

                                     HermiT 1.3.8        TrOWL 1.1          TrOWL 0.6


genomic-cds-light-demo.owl             3 hours 48         1.5 seconds       18 seconds
(2150 classes, 9500 axioms)             minutes

genomic-cds-demo.owl                detected incon-       5.8 seconds       54 seconds
(2300 classes, 11000 axioms)           sistencies


3        Conclusions and outlook

The Genomic CDS ontology is an example of an OWL 2 ontology for clinical genet-
ics and decision support. Even though it is focused on a relatively small set of the
most important pharmacogenetic markers, the ontology poses a significant challenge
to currently available OWL 2 reasoners. There is great need for reasoners that are
optimized for the kinds of OWL axioms encountered in ontologies dealing with clini-
cal genomics.


4        Acknowledgements

The research leading to these results has received funding from the Austrian Science
Fund (FWF): [PP 25608-N15].


References

1. Thomas, E., Pan, J.Z., Ren, Y.: TrOWL: Tractable OWL 2 Reasoning Infrastructure. the
   Proc. of the Extended Semantic Web Conference (ESWC2010) (2010).
2. Motik, B., Shearer, R., Horrocks, I.: Hypertableau Reasoning for Description Logics. J.
   Artif. Intell. Res. 36, 165–228 (2009).