=Paper= {{Paper |id=None |storemode=property |title=Developing DCO: The DebugIT core ontology for antibiotics resistence modelling |pdfUrl=https://ceur-ws.org/Vol-714/ShortPaper08_Schober.pdf |volume=Vol-714 |dblpUrl=https://dblp.org/rec/conf/smbm/SchoberBST10 }} ==Developing DCO: The DebugIT core ontology for antibiotics resistence modelling== https://ceur-ws.org/Vol-714/ShortPaper08_Schober.pdf
                   Developing DCO: The DebugIT Core Ontology
                        for Antibiotics Resistence Modelling

                    Daniel Schober                                 Martin Boeker
         Institute for Medical Biometry and             Institute for Medical Biometry and
        Medical Informatics, Freiburg Univer-          Medical Informatics, Freiburg Univer-
           sity Medical Center, Germany                   sity Medical Center, Germany
              schober@imbi.uni-                           beakmachine@gmail.com
                   freiburg.de

                   Ilinca Tudose                                   Stefan Schulz
        Institute for Medical Biometry and              Institute for Medical Biometry and
       Medical Informatics, Freiburg Univer-           Medical Informatics, Freiburg Univer-
          sity Medical Center, Germany                    sity Medical Center, Germany
        ilinca.tudose@gmail.com                             stschulz@imbi.uni-
                                                                  freiburg.de

                                                           lyze antibiotics prescription practices and their
                    Abstract                               outcomes across Europe and intents to exploit
                                                           this knowledge to detect patient safety related
    Antibiotics resistance development in Euro-            patterns in distributed hospital data, i.e. to dis-
    pean hospitals has increased alarmingly in             cover indicators for better treatments and ulti-
    recent years. To counteract this danger, a             mately antibiotics resistance prevention [1].
    semantic web based IT solution is proposed             The challenge here is to establish a coherent
    which intends to integrate the access to rele-
                                                           and systematic exchange of rich data, harmo-
    vant clinical data repositories from different
    European hospitals. This endeavor relies on            nised across the different DebugIT sites and
    formalized and shared models of the clinical           their Clinical Data Repositories (CDR), includ-
    domain. We describe the development proc-              ing information about patients, their illness
    ess of the DebugIT Core Ontology, which is             situations, pathogens and antibiotics therapies.
    a key-mediator for semantic as well as syn-            The semantic glue towards integrating such
    tactic clinical data integration in the men-           data is the DebugIT Core Ontology (DCO), an
    tioned endeavor. We show how UML dia-                  application ontology that enables data miners
    grams can be used to illustrate the ontology           to query distributed CDRs in a semantically
    engineering phase and the ontologies use               rich and content driven manner.
    case. Some domain statements are given
                                                           Here we outline basic DCO engineering meth-
    which are then converted into more human
    friendly representations to be verified by             ods, illustrate some example statements ex-
    medical experts.                                       pressed in DCO and show how these are ex-
                                                           ploited by logics reasoners and visualization
1    Introduction                                          tools providing views readily understandable
                                                           by biologists not acquainted with logics for-
Antibiotics resistance development poses a                 malisms.
significant problem in today’s hospital care.
Massive amounts of clinical data relevant for              2    Methods
this domain are being collected and stored in
proprietary but unconnected systems in hetero-             DCO is developed in the description logics RL
geneous format, preventing re-use and exploi-              flavor 1 , using the Protégé 4.1 ontology editor
tation of potentially valuable data. The De-               [2]. The dco.owl file leverages on the domain
bugIT project (Detecting and Eliminating                   upper level ontology BioTop [3] by direct
Bacteria UsinG Information Technology,                     owl:import. Detailed design principle docu-
http://www.debugit.eu/), a large scale EU                  1
funded data integration project, intends to ana-            http://www.w3.org/TR/owl2-
                                                           profiles/#OWL_2_RL


                                                     139
Figure 1: UML activity diagram illustrating DCO engineering upon receiving a new clinical question

mentation is available on the supplementary
material website:                                       2.1   Input sources for DCO enrichment
http://www.imbi.uni-freiburg.de/~schober/DCO/           The main input sources for ontology popula-
                                                        tion in the kickoff-phase have been the hospi-
                                                        tals CDR schemata, ensuring a data-driven bot-


                                                  140
Figure 2: UML Use Case diagram illustrating Ontology usage in different formalisation steps of an ex-
ample competency question. For SPARQL code examples we refer to the supplementary material.


tom-up enrichment approach. Further sources                 browser, we have set up an HTML serialisa-
were competency questions (CQ) and later                    tion 4 .
specific term requests stated by collaborators              To illustrate the DCO ontology engineering
in a web-forum.                                             process in detail, we here present a UML activ-
                                                            ity diagram illustrating ontology engineering
2.2    Competency Questions                                 activities upon acceptance of a new compe-
To be able to verify whether DCO is suffi-                  tency question (Fig. 1). Additional graphics
ciently complete to represent our use case, we              illustrating the DCO engineering method in the
have gathered competency questions [4] from                 kick-off phase can be found in the supplemen-
clinicians (see Supplementary material). The                tary material.
ontology needs to contain a necessary and suf-
                                                            2.4    Information integration via SPARQL
ficient set of axioms to represent these ques-
tions, which will serve as benchmark for DCO                The gap between the different hospitals CDR
content coverage evaluation.                                is bridged by linking RDF models of the vari-
                                                            ous local CDR to DCO concepts in a mapping
2.3    DCO maintenance and evolution                        SPARQL query. In the query process two
DCO is maintained using a Subversion (SVN)                  kinds of ontologies are applied: DCO is used
repository 2 allowing easy detection of work                for formulating a hospital independent clinical
progress exploiting log files and allows for file           SPARQL query. In another query formaliza-
revision history tracking. Progress monitoring              tion step DCO is then mapped to the local
between the ontology work package (WP1a)                    CDR via an RDF converted database schema 5
and the other involved work packages is real-               called data definition ontology (DDO), acting
ized via weekly teleconferences along the                   as a query mediator to the proprietary hospital
SCRUM 3 project management methodology.                     data. This approach is outlined in more detail
To access the ontology conveniently in a web                in [5]. Within the DebugIT interoperability

                                                            4
                                                              http://www.imbi.uni-
                                                            freiburg.de/~schober/dco_owlDoc/
2                                                           5
  http://www.greeninghealthcare.org/repository/debu           E.g. a DDO with a PREFIX inserm:
git/trunk                                                   http://debugit1.spim.jussieu.fr/resource/vocab/ as in
3
   http://www.scrum.org/scrumguides/                        example query


                                                      141
    Figure 3: An OntoGraf view on the tripartite ontological dco disease model. A disposition is realized by
    a disease (Process), which is manifested as a disorder (PathologicalStructure).


platform a clinical query is successively for-              healthcare domain is a granular and expressive
malized from natural language over semi-                    disease model, i.e. distinguishing pathological
formal intermediate query steps towards a for-              processes and agents from pathological struc-
mal site dependent local data set query. During             tures and dispositions (Fig. 3). We started
this process it is passed from the clinician over           modeling a prototypical infectious disease,
to a data miner and further on to the different             Pneumonia together with some of its key-
local data managers. To illustrate how DCO                  aspects in a simple pre-coordinated way:
concepts are used within these different query              Pneumonia Inflammation Ҵ has-participant.LungTissue
formalisation steps we here include a UML use               AcutePneumonia Pneumonia Ҵ bearer-of. AcuteQual-
case diagram (Fig. 2).                                      ity
                                                            BacterialPneumonia Pneumonia Ҵ has-agent. Bacteri-
                                                            aPopulation
3       Results                                             ViralPneumonia Pneumonia Ҵ has-agent. VirusPopula-
3.1      DCO current metrics                                tion
                                                            The above however misses some aspects, e.g.
The current description logic expressivity is               it allows for a Pneumonia in the kidney, be-
SRIF(D). We are using the Hermit DL reason-                 cause LungTissue and KidneyTissue have not
er 6 , which takes ~2 minutes to classify DCO               been made disjoint. We also can‘t specify
including BioTop on an average PC. Table 1                  whether an AcutePneumonia can have, besides
illustrates the statistics of DCO and BioTop.               Acute, further qualities, e.g. Chronic, or
Ontology     elements Count DCO BioTop                      whether the has-agent in BacterialPneumonia
and axioms              (all)                               can also be filled by, e.g. VirusPopulation. We
Classes                 1281    965      375                therefore needed to provide SubclassOf defini-
Object Properties (re- 78       3        74                 tions for infering e.g.
lations)                                                    BacterialPneumonia Ҳ BacterialInflammation
Datatype Properties     11      10       0
                                                            Mereotopological axioms were needed in order
Subclass Axioms         1494    1050 444
                                                            to infer from
Equivalent Class Axi- 197       98       99
                                                            Pneumonia     Inflammation Ҵ   has-participant. Lung-
oms
                                                            Tissue
Disjoint Axioms         76      1        75
                                                            and
Table 1: Content and size of DCO and its Biotop             LungTissue Ҳ part-of. Lung
upper level ontology
                                                            that
3.2      An ontological model of infectious                 Pneumonia Ҳ has-locus. Lung
         diseases                                           Disjoints like Process Ҳ ¬Structure were added to
                                                            be able to infer that
A knowledge domain of great importance for                  PathologicalProcess Ҳ ¬PathologicalStructure
the DebugIT project, but also for the wider                 We amended DCO successively, providing
                                                            restrictions for post-coordinations, e.g. con-
6
    http://hermit-reasoner.com/


                                                      142
straining a user to enter new Pneumonias only                 English (ACE) 9 . This creates natural language
in Loci where lung tissue exists:                             text that can be used to present ontology frag-
Pneumonia Ҳ !has-locus. ( locus-of. LungTissue)               ments to domain experts and makes enduser
By this and exploiting the following restric-                 verification of complex DL statements possible
tions                                                         by the non-ontology expert. E.g. it renders the
LungTissue Ҳ has-locus. Lung                                  Manchester OWL Syntax
   Lung Ҳ has-locus. Thorax
    has-locus.Thorax Ҳ ¬ has-locus. (Abdomen ҵ Ex-                  InfectiousDisease
tremity)                                                                EquivalentTo
an ontology-based annotation interface can                                 biotop:AcquiredPathologicalState
now provide and guide a user with correct lo-                              and (biotop:hasAgent some
                                                                             (biotop:Organism
calisations possible for a certain infectious dis-                            and (biotop:bearerOf some InfectorRole)))
ease [6].
3.3    Maintaining multiple-parenthood by                     into the following ACE sentence:
       a logics reasoner                                      “Every InfectiousDisease is an AcquiredPathological-
   From an engineering standpoint, we apply                   State that hasAgent an Organism that bearerOf an Infec-
                                                              torRole. Every AcquiredPathologicalState that hasAgent
the normalization approach of [7] and use sin-                an Organism that bearerOf an InfectorRole is an Infec-
gle-asserted parenthood throughout the taxon-                 tiousDisease.”
omy. This facilitates the orientation in the tax-
onomy and its maintenance. The description                    4       Discussion
logics reasoner Hermit infers multiple parent-
hood from the formal restrictions. E.g. it en-                We have reported on the development of a
richs the Sample class hierarchy by auto-                     clinical ontology for data integration and anno-
classifying BodyLiquidSamples, e.g. from the                  tation. Although a certain level of semantic
given facts                                                   integration has been reached, many steps must
BloodҲ BodyLiquid                                             be performed manually and hence are error-
BodyLiquidSample           Sample   Ҵ        derives-         prone as well as time and resource intensive.
From.BodyLiquid                                               Regarding the issue to what an extend the on-
BloodSample Sample Ҵ     derivesFrom.Blood                    tology should contain pre-coordinated expres-
the reasoner infers that                                      sions, creating restrictions for guiding users in
BloodSampleҲ BodyLiquidSample                                 post-coordinative class generation enforces a
enriching the taxonomy. This is also done for                 transition from OWL 2 EL towards RL expres-
all other liquid samples.                                     sivity, because disjoints and universal restric-
3.4    Graphical visualisations                               tion constructors need to be applied. Reasoners
                                                              used to prevent redundant post-coordinations
To allow biomedical experts not acquainted                    need to be fast, which is still rarely the case.
with ontology editors to view, understand and                 Traditional large scale RL ontology reasoning
check parts of the ontology, we apply ontology                is slow and might not be feasible for post-
visualizations as generated by the OwlProp-                   coordination at annotation time when a large
Viz 7 and OntoGraf 8 Protégé plugins (Fig.3)                  set of constraints needs to be verified timely.
which enable visual, parallel and hence faster                Here, fast local, incremental reasoning meth-
perception of the term networks.                              ods need to be investigated.
                                                              Some ontologically difficult areas were the
3.5    Constraint natural languages
                                                              modeling of time, e.g. introducing TimeQual-
To allow biomedical experts not acquainted                    ity classes versus using simple xsd:dateTime
with description logics to view, understand and               datatype properties; how to model intervals
check parts of the ontology, we investigate                   such as episode of care or patient stay; process
highly enduser compliant ways to present onto-                modifications like adapted or merely planned
logical statements via Constrained Natural                    therapies also depend on a rigid time model.
Languages (CNL) like Attempto Controlled                      We tried to find a pragmatic compromise be-
                                                              tween needed complexity and ease of use of
                                                              time related expressions. Time constructs ex-
7
  http://protegewiki.stanford.edu/index.php/OWLPro            ploitable by a reasoner were only included
pViz
8                                                             9
   http://protegewiki.stanford.edu/wiki/OntoGraf                  http://attempto.ifi.uzh.ch/aceview/


                                                        143
when not making expressions overly difficult                      References
to read and create for a human user.
                                                                  [1] Lovis C et al. DebugIT for patient safety -
Regarding ontology evaluation, CNLs are not                           improving the treatment with antibiotics
yet in a stage where they can contribute to a                         through multimedia data mining of hetero-
better understanding of more complex and es-                          geneous clinical data. Stud Health Technol In-
pecially nested DL expressions. Some expres-                          form. 136 (2008), 641-6
sions, annoyingly the more interesting ‘hub-                      [2] Noy NF, Crubezy M, Fergerson RW,
node’ ones, could not be transcribed and, e.g.                        Knublauch H, Tu SW, Vendetti J, Musen MA:
the above example should have generated the                           Protege-2000: An Open-source Ontology-
text                                                                  development and Knowledge-acquisition En-
“Every InfectiousDisease is an AcquiredPathological-                  vironment. Proc AMIA Symp 2003:953.
State that has as an agent an Organism that is the beare-         [3] Elena Beisswanger, Stefan Schulz, Holger
rOf an InfectorRole”                                                  Stenzhorn, and Udo Hahn. BioTop: An upper
                                                                      domain ontology for the life sciences – a de-
to be intuitive. Further it needs to be investi-
                                                                      scription of its current structure, contents,
gated how large ontologies can be sub-                                and interfaces to OBO ontologies. Applied
structured into small digestible parts or mod-                        Ontology, 3(4):202–212, 2008.
ules that can be timely managed by domain                         [4] Grueninger, M and Fox, M (1994). The role of
specialists.                                                          competency questions in enterprise engineer-
                                                                      ing. In IFIP WG 5.7, Workshop Benchmarking.
5       Conclusion                                                    Theory and Practice, Trondheim/Norway.
                                                                  [5] Daniel Schober, Martin Boeker, Jessica Bul-
   We believe to have created a robust and                            lenkamp, Csaba Huszka, Kristof Depraetere,
scalable disease model that can serve the wider                       Douglas Teodoro, Nadia Nadah, Remy Cho-
biomedical domain. Hence, a next step will be                         quet, Christel Daniel, Stefan Schulz, The De-
the submission of the above disease definitions                       bugIT Core Ontology: semantic integration
as a content ontology design pattern, e.g. to-                        of antibiotics resistance patterns, Proceedings
wards the OntologyDesignPattern.org reposi-                           MEDINFO 2010
tory [8]. Further such micro-models will fol-                     [6] Stefan Schulz, Daniel Schober, Djamila Raufie,
low, e.g. for modeling drugs and their prescrip-                      Martin Boeker, Pre- and Postcoordination in
                                                                      Biomedical Ontologies, OBML 2010 Work-
tions.
                                                                      shop Proceedings, IMISE-Report Nr 2/2010,
Whereas earlier attempts integrating CDRs via                         ISSN 1610-7233, Universität Leipzig, 2010
purely syntactical means fail to exploit com-                     [7] Rector AL, Rogers JE, Zanstra PE, Van Der
puter interpretable formal semantics [9], pro-                        Haring E: OpenGALEN: open source medical
jects begin to appear that show the usefulness                        terminology and tools. AMIA Annu Symp Proc
and even feasibility of applying owl-DL se-                           2003:982.
mantics in healthcare data integration settings.                  [8] V. Presutti and A. Gangemi. Content ontology
The LinkedLifeData 10 , a platform for semantic                       design patterns as practical building blocks
data integration trough RDF warehousing                               for web ontologies. In Proceedings of ER2008.
demonstrates how efficient reasoning can help                         Barcelona, Spain, 2008.
                                                                  [9] Brailer, D. (2005). Interoperability: The Key
to resolve conflicts within the data. However,
                                                                      to the Future Health Care System, Health Af-
such goal, and this is also an important lesson                       fairs (The Policy J. of the Health Sphere), Vol.
learned in the DebugIT endeavor, can only be                          10 (January), 19-21.
achieved if particular care is taken on reason-
ing performance. Logics-based reasoning will
only be feasible in realistically large ontologies
when computationally expensive owl-RL con-
structs are applied consciously. Ultimately the
fast-paced progress in semantic web technolo-
gies leads to frequent changes in even the most
basic tools, such as APIs, reasoners and
SPARQL endpoint software. Due to this inher-
ent dynamics one should constantly check
where one can restrict one-self to a more ro-
bust subset of cutting-edge techniques.
10
     http://linkedlifedata.com/


                                                            144