=Paper=
{{Paper
|id=None
|storemode=property
|title=Developing DCO: The DebugIT core ontology for antibiotics resistence modelling
|pdfUrl=https://ceur-ws.org/Vol-714/ShortPaper08_Schober.pdf
|volume=Vol-714
|dblpUrl=https://dblp.org/rec/conf/smbm/SchoberBST10
}}
==Developing DCO: The DebugIT core ontology for antibiotics resistence modelling==
Developing DCO: The DebugIT Core Ontology
for Antibiotics Resistence Modelling
Daniel Schober Martin Boeker
Institute for Medical Biometry and Institute for Medical Biometry and
Medical Informatics, Freiburg Univer- Medical Informatics, Freiburg Univer-
sity Medical Center, Germany sity Medical Center, Germany
schober@imbi.uni- beakmachine@gmail.com
freiburg.de
Ilinca Tudose Stefan Schulz
Institute for Medical Biometry and Institute for Medical Biometry and
Medical Informatics, Freiburg Univer- Medical Informatics, Freiburg Univer-
sity Medical Center, Germany sity Medical Center, Germany
ilinca.tudose@gmail.com stschulz@imbi.uni-
freiburg.de
lyze antibiotics prescription practices and their
Abstract outcomes across Europe and intents to exploit
this knowledge to detect patient safety related
Antibiotics resistance development in Euro- patterns in distributed hospital data, i.e. to dis-
pean hospitals has increased alarmingly in cover indicators for better treatments and ulti-
recent years. To counteract this danger, a mately antibiotics resistance prevention [1].
semantic web based IT solution is proposed The challenge here is to establish a coherent
which intends to integrate the access to rele-
and systematic exchange of rich data, harmo-
vant clinical data repositories from different
European hospitals. This endeavor relies on nised across the different DebugIT sites and
formalized and shared models of the clinical their Clinical Data Repositories (CDR), includ-
domain. We describe the development proc- ing information about patients, their illness
ess of the DebugIT Core Ontology, which is situations, pathogens and antibiotics therapies.
a key-mediator for semantic as well as syn- The semantic glue towards integrating such
tactic clinical data integration in the men- data is the DebugIT Core Ontology (DCO), an
tioned endeavor. We show how UML dia- application ontology that enables data miners
grams can be used to illustrate the ontology to query distributed CDRs in a semantically
engineering phase and the ontologies use rich and content driven manner.
case. Some domain statements are given
Here we outline basic DCO engineering meth-
which are then converted into more human
friendly representations to be verified by ods, illustrate some example statements ex-
medical experts. pressed in DCO and show how these are ex-
ploited by logics reasoners and visualization
1 Introduction tools providing views readily understandable
by biologists not acquainted with logics for-
Antibiotics resistance development poses a malisms.
significant problem in today’s hospital care.
Massive amounts of clinical data relevant for 2 Methods
this domain are being collected and stored in
proprietary but unconnected systems in hetero- DCO is developed in the description logics RL
geneous format, preventing re-use and exploi- flavor 1 , using the Protégé 4.1 ontology editor
tation of potentially valuable data. The De- [2]. The dco.owl file leverages on the domain
bugIT project (Detecting and Eliminating upper level ontology BioTop [3] by direct
Bacteria UsinG Information Technology, owl:import. Detailed design principle docu-
http://www.debugit.eu/), a large scale EU 1
funded data integration project, intends to ana- http://www.w3.org/TR/owl2-
profiles/#OWL_2_RL
139
Figure 1: UML activity diagram illustrating DCO engineering upon receiving a new clinical question
mentation is available on the supplementary
material website: 2.1 Input sources for DCO enrichment
http://www.imbi.uni-freiburg.de/~schober/DCO/ The main input sources for ontology popula-
tion in the kickoff-phase have been the hospi-
tals CDR schemata, ensuring a data-driven bot-
140
Figure 2: UML Use Case diagram illustrating Ontology usage in different formalisation steps of an ex-
ample competency question. For SPARQL code examples we refer to the supplementary material.
tom-up enrichment approach. Further sources browser, we have set up an HTML serialisa-
were competency questions (CQ) and later tion 4 .
specific term requests stated by collaborators To illustrate the DCO ontology engineering
in a web-forum. process in detail, we here present a UML activ-
ity diagram illustrating ontology engineering
2.2 Competency Questions activities upon acceptance of a new compe-
To be able to verify whether DCO is suffi- tency question (Fig. 1). Additional graphics
ciently complete to represent our use case, we illustrating the DCO engineering method in the
have gathered competency questions [4] from kick-off phase can be found in the supplemen-
clinicians (see Supplementary material). The tary material.
ontology needs to contain a necessary and suf-
2.4 Information integration via SPARQL
ficient set of axioms to represent these ques-
tions, which will serve as benchmark for DCO The gap between the different hospitals CDR
content coverage evaluation. is bridged by linking RDF models of the vari-
ous local CDR to DCO concepts in a mapping
2.3 DCO maintenance and evolution SPARQL query. In the query process two
DCO is maintained using a Subversion (SVN) kinds of ontologies are applied: DCO is used
repository 2 allowing easy detection of work for formulating a hospital independent clinical
progress exploiting log files and allows for file SPARQL query. In another query formaliza-
revision history tracking. Progress monitoring tion step DCO is then mapped to the local
between the ontology work package (WP1a) CDR via an RDF converted database schema 5
and the other involved work packages is real- called data definition ontology (DDO), acting
ized via weekly teleconferences along the as a query mediator to the proprietary hospital
SCRUM 3 project management methodology. data. This approach is outlined in more detail
To access the ontology conveniently in a web in [5]. Within the DebugIT interoperability
4
http://www.imbi.uni-
freiburg.de/~schober/dco_owlDoc/
2 5
http://www.greeninghealthcare.org/repository/debu E.g. a DDO with a PREFIX inserm:
git/trunk http://debugit1.spim.jussieu.fr/resource/vocab/ as in
3
http://www.scrum.org/scrumguides/ example query
141
Figure 3: An OntoGraf view on the tripartite ontological dco disease model. A disposition is realized by
a disease (Process), which is manifested as a disorder (PathologicalStructure).
platform a clinical query is successively for- healthcare domain is a granular and expressive
malized from natural language over semi- disease model, i.e. distinguishing pathological
formal intermediate query steps towards a for- processes and agents from pathological struc-
mal site dependent local data set query. During tures and dispositions (Fig. 3). We started
this process it is passed from the clinician over modeling a prototypical infectious disease,
to a data miner and further on to the different Pneumonia together with some of its key-
local data managers. To illustrate how DCO aspects in a simple pre-coordinated way:
concepts are used within these different query Pneumonia Inflammation Ҵ has-participant.LungTissue
formalisation steps we here include a UML use AcutePneumonia Pneumonia Ҵ bearer-of. AcuteQual-
case diagram (Fig. 2). ity
BacterialPneumonia Pneumonia Ҵ has-agent. Bacteri-
aPopulation
3 Results ViralPneumonia Pneumonia Ҵ has-agent. VirusPopula-
3.1 DCO current metrics tion
The above however misses some aspects, e.g.
The current description logic expressivity is it allows for a Pneumonia in the kidney, be-
SRIF(D). We are using the Hermit DL reason- cause LungTissue and KidneyTissue have not
er 6 , which takes ~2 minutes to classify DCO been made disjoint. We also can‘t specify
including BioTop on an average PC. Table 1 whether an AcutePneumonia can have, besides
illustrates the statistics of DCO and BioTop. Acute, further qualities, e.g. Chronic, or
Ontology elements Count DCO BioTop whether the has-agent in BacterialPneumonia
and axioms (all) can also be filled by, e.g. VirusPopulation. We
Classes 1281 965 375 therefore needed to provide SubclassOf defini-
Object Properties (re- 78 3 74 tions for infering e.g.
lations) BacterialPneumonia Ҳ BacterialInflammation
Datatype Properties 11 10 0
Mereotopological axioms were needed in order
Subclass Axioms 1494 1050 444
to infer from
Equivalent Class Axi- 197 98 99
Pneumonia Inflammation Ҵ has-participant. Lung-
oms
Tissue
Disjoint Axioms 76 1 75
and
Table 1: Content and size of DCO and its Biotop LungTissue Ҳ part-of. Lung
upper level ontology
that
3.2 An ontological model of infectious Pneumonia Ҳ has-locus. Lung
diseases Disjoints like Process Ҳ ¬Structure were added to
be able to infer that
A knowledge domain of great importance for PathologicalProcess Ҳ ¬PathologicalStructure
the DebugIT project, but also for the wider We amended DCO successively, providing
restrictions for post-coordinations, e.g. con-
6
http://hermit-reasoner.com/
142
straining a user to enter new Pneumonias only English (ACE) 9 . This creates natural language
in Loci where lung tissue exists: text that can be used to present ontology frag-
Pneumonia Ҳ !has-locus. ( locus-of. LungTissue) ments to domain experts and makes enduser
By this and exploiting the following restric- verification of complex DL statements possible
tions by the non-ontology expert. E.g. it renders the
LungTissue Ҳ has-locus. Lung Manchester OWL Syntax
Lung Ҳ has-locus. Thorax
has-locus.Thorax Ҳ ¬ has-locus. (Abdomen ҵ Ex- InfectiousDisease
tremity) EquivalentTo
an ontology-based annotation interface can biotop:AcquiredPathologicalState
now provide and guide a user with correct lo- and (biotop:hasAgent some
(biotop:Organism
calisations possible for a certain infectious dis- and (biotop:bearerOf some InfectorRole)))
ease [6].
3.3 Maintaining multiple-parenthood by into the following ACE sentence:
a logics reasoner “Every InfectiousDisease is an AcquiredPathological-
From an engineering standpoint, we apply State that hasAgent an Organism that bearerOf an Infec-
torRole. Every AcquiredPathologicalState that hasAgent
the normalization approach of [7] and use sin- an Organism that bearerOf an InfectorRole is an Infec-
gle-asserted parenthood throughout the taxon- tiousDisease.”
omy. This facilitates the orientation in the tax-
onomy and its maintenance. The description 4 Discussion
logics reasoner Hermit infers multiple parent-
hood from the formal restrictions. E.g. it en- We have reported on the development of a
richs the Sample class hierarchy by auto- clinical ontology for data integration and anno-
classifying BodyLiquidSamples, e.g. from the tation. Although a certain level of semantic
given facts integration has been reached, many steps must
BloodҲ BodyLiquid be performed manually and hence are error-
BodyLiquidSample Sample Ҵ derives- prone as well as time and resource intensive.
From.BodyLiquid Regarding the issue to what an extend the on-
BloodSample Sample Ҵ derivesFrom.Blood tology should contain pre-coordinated expres-
the reasoner infers that sions, creating restrictions for guiding users in
BloodSampleҲ BodyLiquidSample post-coordinative class generation enforces a
enriching the taxonomy. This is also done for transition from OWL 2 EL towards RL expres-
all other liquid samples. sivity, because disjoints and universal restric-
3.4 Graphical visualisations tion constructors need to be applied. Reasoners
used to prevent redundant post-coordinations
To allow biomedical experts not acquainted need to be fast, which is still rarely the case.
with ontology editors to view, understand and Traditional large scale RL ontology reasoning
check parts of the ontology, we apply ontology is slow and might not be feasible for post-
visualizations as generated by the OwlProp- coordination at annotation time when a large
Viz 7 and OntoGraf 8 Protégé plugins (Fig.3) set of constraints needs to be verified timely.
which enable visual, parallel and hence faster Here, fast local, incremental reasoning meth-
perception of the term networks. ods need to be investigated.
Some ontologically difficult areas were the
3.5 Constraint natural languages
modeling of time, e.g. introducing TimeQual-
To allow biomedical experts not acquainted ity classes versus using simple xsd:dateTime
with description logics to view, understand and datatype properties; how to model intervals
check parts of the ontology, we investigate such as episode of care or patient stay; process
highly enduser compliant ways to present onto- modifications like adapted or merely planned
logical statements via Constrained Natural therapies also depend on a rigid time model.
Languages (CNL) like Attempto Controlled We tried to find a pragmatic compromise be-
tween needed complexity and ease of use of
time related expressions. Time constructs ex-
7
http://protegewiki.stanford.edu/index.php/OWLPro ploitable by a reasoner were only included
pViz
8 9
http://protegewiki.stanford.edu/wiki/OntoGraf http://attempto.ifi.uzh.ch/aceview/
143
when not making expressions overly difficult References
to read and create for a human user.
[1] Lovis C et al. DebugIT for patient safety -
Regarding ontology evaluation, CNLs are not improving the treatment with antibiotics
yet in a stage where they can contribute to a through multimedia data mining of hetero-
better understanding of more complex and es- geneous clinical data. Stud Health Technol In-
pecially nested DL expressions. Some expres- form. 136 (2008), 641-6
sions, annoyingly the more interesting ‘hub- [2] Noy NF, Crubezy M, Fergerson RW,
node’ ones, could not be transcribed and, e.g. Knublauch H, Tu SW, Vendetti J, Musen MA:
the above example should have generated the Protege-2000: An Open-source Ontology-
text development and Knowledge-acquisition En-
“Every InfectiousDisease is an AcquiredPathological- vironment. Proc AMIA Symp 2003:953.
State that has as an agent an Organism that is the beare- [3] Elena Beisswanger, Stefan Schulz, Holger
rOf an InfectorRole” Stenzhorn, and Udo Hahn. BioTop: An upper
domain ontology for the life sciences – a de-
to be intuitive. Further it needs to be investi-
scription of its current structure, contents,
gated how large ontologies can be sub- and interfaces to OBO ontologies. Applied
structured into small digestible parts or mod- Ontology, 3(4):202–212, 2008.
ules that can be timely managed by domain [4] Grueninger, M and Fox, M (1994). The role of
specialists. competency questions in enterprise engineer-
ing. In IFIP WG 5.7, Workshop Benchmarking.
5 Conclusion Theory and Practice, Trondheim/Norway.
[5] Daniel Schober, Martin Boeker, Jessica Bul-
We believe to have created a robust and lenkamp, Csaba Huszka, Kristof Depraetere,
scalable disease model that can serve the wider Douglas Teodoro, Nadia Nadah, Remy Cho-
biomedical domain. Hence, a next step will be quet, Christel Daniel, Stefan Schulz, The De-
the submission of the above disease definitions bugIT Core Ontology: semantic integration
as a content ontology design pattern, e.g. to- of antibiotics resistance patterns, Proceedings
wards the OntologyDesignPattern.org reposi- MEDINFO 2010
tory [8]. Further such micro-models will fol- [6] Stefan Schulz, Daniel Schober, Djamila Raufie,
low, e.g. for modeling drugs and their prescrip- Martin Boeker, Pre- and Postcoordination in
Biomedical Ontologies, OBML 2010 Work-
tions.
shop Proceedings, IMISE-Report Nr 2/2010,
Whereas earlier attempts integrating CDRs via ISSN 1610-7233, Universität Leipzig, 2010
purely syntactical means fail to exploit com- [7] Rector AL, Rogers JE, Zanstra PE, Van Der
puter interpretable formal semantics [9], pro- Haring E: OpenGALEN: open source medical
jects begin to appear that show the usefulness terminology and tools. AMIA Annu Symp Proc
and even feasibility of applying owl-DL se- 2003:982.
mantics in healthcare data integration settings. [8] V. Presutti and A. Gangemi. Content ontology
The LinkedLifeData 10 , a platform for semantic design patterns as practical building blocks
data integration trough RDF warehousing for web ontologies. In Proceedings of ER2008.
demonstrates how efficient reasoning can help Barcelona, Spain, 2008.
[9] Brailer, D. (2005). Interoperability: The Key
to resolve conflicts within the data. However,
to the Future Health Care System, Health Af-
such goal, and this is also an important lesson fairs (The Policy J. of the Health Sphere), Vol.
learned in the DebugIT endeavor, can only be 10 (January), 19-21.
achieved if particular care is taken on reason-
ing performance. Logics-based reasoning will
only be feasible in realistically large ontologies
when computationally expensive owl-RL con-
structs are applied consciously. Ultimately the
fast-paced progress in semantic web technolo-
gies leads to frequent changes in even the most
basic tools, such as APIs, reasoners and
SPARQL endpoint software. Due to this inher-
ent dynamics one should constantly check
where one can restrict one-self to a more ro-
bust subset of cutting-edge techniques.
10
http://linkedlifedata.com/
144