<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Evaluating Modelling Approaches for Medical Image Annotations</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Jasmin Opitz</string-name>
          <email>opitzj@cs.manchester.ac.uk</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Bijan Parsia</string-name>
          <email>bparsia@cs.manchester.ac.uk</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ulrike Sattler</string-name>
          <email>sattler@cs.manchester.ac.uk</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>The University of Manchester</institution>
        </aff>
      </contrib-group>
      <abstract>
        <p>Information system designers face many challenges w.r.t. selecting appropriate semantic technologies and deciding on a modelling approach for their system. However, there is no clear methodology yet to evaluate “semantically enriched” information systems. In this paper we present a case study on different modelling approaches for annotating medical images and introduce a conceptual framework that can be used to analyse the fitness of information systems and help designers to spot the strengths and weaknesses of various modelling approaches as well as managing trade-offs between modelling effort and their potential benefits.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>Information systems can have very different shapes and variants and designing
such systems involves taking important modelling decisions to optimise
information retrieval. The different dimensions to be taken into account are performance,
maintainability and usability of the system. Some research has been carried out
to evaluate information systems w.r.t. these dimensions [3, 4, 6], however, none
of these approaches analyses the quality of such a system w.r.t. its queriability,
i.e. how easy or comfortable it is for a user to formulate queries. Since
information retrieval is the main purpose of an information system, we believe that the
queriability and therefore the difficulty of assessing the information is a crucial
point.</p>
      <p>Recently, there has been a lot of discussion about “semantically enriched”
information systems, especially about using ontologies for modelling data.
Ontologies have the potential of modelling information in a way that they can
capture the “meaning” of the content by using expressive knowledge
representation formalisms (such as Description Logics [1]) and therefore achieve good
information retrieval results. However, ontology-based information systems can
come in many different fashions. It has to be decided how expressive the
conceptual schema is, whether it is possible to use an off-the-shelf schema or one
that is tailored to the application. Furthermore, it is crucial that the schema
and the data are well-suited in order to enable good queriability and retrieval
performance. Depending on the design decisions, more or less modelling effort
is involved. Information system designers need a methodology that helps them
to understand the benefits and trade-offs of various approaches and to make an
informed decision about the “optimal” modelling approach for their needs.</p>
      <p>In this paper we present a case study on medical image annotations. Medical
images are usually stored in large databases or file systems along with various
information about the images, such as radiologists reports (usually formulated
in natural language). Although it is possible to retrieve images based on these
descriptions with full-text search and keyword queries, the results are prone to
have a low recall and precision [8]. There is a wide range of alternative modelling
approaches. The relevant terms in the natural language descriptions could be
mapped to an underlying schema, such as a thesaurus or to an ontology. The
more expressive the schema, the more “meaning” can be potentially modelled
in the image annotations. This might result in better retrieval performance but
also in more modelling effort while creating the annotations, e.g. depending on
whether or not the information extraction can be done automatically or has to
be done manually.</p>
      <p>There is a trade-off between how much effort is involved in the design of the
system and the creation of meaningful image annotations and how this leads
to better queriability and better quality of the retrieval results. In order to
understand this trade-off and to analyse and compare different approaches for
modelling medical image annotations we will use a conceptual framework [9] that
has been designed for the evaluation of information systems with a particular
focus on queriability. We will outline five modelling approaches for medical image
annotations that range from a simple text-based information system to
fullfledged ontology-based information systems based on an established medical
ontology, namely SNOMED CT.1 Applying the framework to these modelling
approaches will highlight their strengths and weaknesses and the framework’s
measurements will allow us to compare the modelling approaches with each
other. This case study is one of several studies that we are carrying out to
evaluate and refine the framework.</p>
      <p>The remainder of the paper is organised as follows. In Section 2 we outline
several possible modelling approaches for medical image annotations. Section
3 introduces the Evaluation Framework and how it can be applied to measure
the fitness of a modelling approach. In Section 4 we apply the framework to the
particular modelling approaches that we outlined for medical image annotations.
1 SNOMED CT: http://www.ihtsdo.org/snomed-ct/
We measure and compare their fitness and queriability and discuss the weak and
strong points of each modelling approach. Section 5 concludes the paper.
2</p>
      <p>Modelling Approaches for a Medical Image Information
System
An information system for medical image annotations can be designed in many
different ways. In the case study described in this paper we distinguish between
three general categories of modelling approaches. A simple text-based
information system, a slightly more advances thesaurus-based information system and an
ontology-based information system. For the latter we distinguish between three
different ways of designing an ontology-based information system, differing in
the expressivity of the underlying schema and the annotations.</p>
      <p>All of the five modelling approaches are based on the same data corpus. We
used 42 publically available chest radiology images and their natural language
(English) radiology reports from the web-based radiology database EURORAD.2
The reports contain information about image type and modality, findings, body
parts, diagnoses, etc. An early experiment on image retrieval with this data set
has been published in [8]. The original data, i.e. the natural language reports,
are processed differently for each of the five modelling approaches.</p>
      <p>The various approaches described below are representative. Obviously, there
are other alternatives or mixtures between those mentioned. Some of them could
be partly improved with some customising. The selection in this paper represents
distinct groups of modelling approaches that we chose deliberately to highlight
their strengths and weaknesses.
2.1</p>
      <p>Text-based Modelling Approach
In a text-based modelling approach (MA1) for our medical image information
system the data collection consists merely of the unprocessed natural language
descriptions. The queries are conjunctions of natural language keywords and
the query results are obtained by carrying out a full-text search over the image
descriptions.</p>
      <p>This method involves very little modelling effort and simple querying (like
in an internet search engine). However, the approach is not very powerful:
neither the queries nor the underlying data can identify synonyms, homonyms,
acronyms, spelling mistakes or capture taxonomical or relational information.
Therefore, a text-based approach is not suitable for capturing the full semantics
of the image descriptions and most likely leads to low recall and precision of the
retrieval results.
2 EURORAD: http://eurorad.org/
2.2</p>
      <p>Thesaurus and Text Mining based Modelling Approach
For this modelling approach (MA2) we processed the natural language
descriptions with publically available, off-the-shelf text mining tools. We used GENIA
tagger3 to extract noun phrases, verb phrases and adjective phrases from the
textual descriptions and processed these with MetaMap4 in order to map them
to SNOMED CT classes via UMLS.5 For each image an annotation file was
created containing a list of SNOMED CT classes that reflect the relevant terms
extracted from the text. Although SNOMED CT is strictly speaking an ontology
that can be expressed in OWL and conforms to the OWL 2 profile OWL EL
[7], we merely use it in the sense of a thesaurus for this modelling approach, i.e.
each image is “tagged” with a list of SNOMED CT classes and we only make
use of the synonyms and the taxonomical information that are contained in the
ontology and do not capture any relational information.</p>
      <p>Again, the queries are expressed in natural language and are processed in
the same way and with the same tools as the image descriptions. Each query is
transformed to a list of SNOMED CT classes and matched against the image
annotations.</p>
      <p>The advantages of a thesaurus-based modelling approach compared to a
purely text-based approach are that recall and precision can be increased due to
the fact that a thesaurus can recognise synonyms to a certain extent and contains
taxonomical information so that image annotations that contain subclasses of
the query terms are retrieved as well [5]. The additional modelling effort required
to achieve these benefits is limited to incorporating an off-the-shelf thesaurus and
processing the textual descriptions automatically with off-the-shelf text mining
tools. On the other hand, this approach does not capture relational information
and the tools do not map all the available information perfectly, i.e. the
translation process is error prone (e.g. mapping to wrong class or no mapping at all,
inability to recognise acronyms).
2.3</p>
    </sec>
    <sec id="sec-2">
      <title>Ontology-based Modelling Approaches</title>
      <p>All of the three ontology-based modelling approaches we discuss in this paper
(MA3, MA4 and MA5) involve translating the textual image descriptions
manually to Abox assertions to different schemas (SNOMED CT Tboxes). The queries
are formulated as OWL class expressions.</p>
      <p>We distinguish between three variants of ontology-based annotations:
3 GENIA tagger: http://www-tsujii.is.s.u-tokyo.ac.jp/GENIA/tagger/
4 MetaMap: http://metamap.nlm.nih.gov/
5 UMLS: http://www.nlm.nih.gov/research/umls/
– MA3: image annotation Abox (A1) with class assertions to a SNOMED CT</p>
      <p>Tbox (T1)
– MA4: image annotation Abox (A2) with class and role assertions to a SNOMED</p>
      <p>CT Tbox (T1)
– MA5: image annotation Abox (A3) with class and role assertions to a SNOMED
CT Tbox (T2) that contains some additional “image-annotation-specific”
roles</p>
      <p>MA5 uses a slightly different Tbox than MA3 and MA4 in the sense that we
created an additional set of roles and a role hierarchy in order to bypass the
SNOMED CT specific role groups. An example of a disease in SNOMED CT
that is defined using role groups is NeoplasmOfLung. The concept is defined as
follows:6</p>
      <p>NeoplasmOfLung ≡ DisorderOfLung
∃roleGroup( ∃AssociatedMorphology.Neoplasm</p>
      <p>∃FindingSite.LungStructure)</p>
      <p>For MA5, we introduced three additional roles: shows, hasFinding and
hasLocation and defined the following role hierarchy:
roleGroup ◦ AssociatedMorphology
roleGroup ◦ FindingSite
shows ◦ hasFinding
shows ◦ hasLocation
hasFinding
hasLocation
shows
shows</p>
      <p>If we want to find all images that show neoplasms in MA5, we can formulate
a simple OWL class expression query like Image ∃shows.Neoplasm and would
retrieve images labelled with Image ∃roleGroup.∃AssociatedMorphology.NeoplasmOfLung
without having to use the complicated role group construct in the query.</p>
      <p>Furthermore, we introduced the role derivingFrom in order to indicate a
causal relationship between two findings, e.g. a metastasis and a primary
tumour.</p>
      <p>Similar to the thesaurus-based modelling approach (see Section 2.2) we use
an off-the-shelf schema. By using an underlying ontology we can take advantage
of all the benefits that a thesaurus-based modelling approach involves (e.g.
synonyms and taxonomical relations between classes). MA4 and MA5 additionally
capture relational information and are therefore better suited to capture the
actual meaning of the image descriptions and increase recall and precision of the
results compared to the other modelling approaches.
6 To improve readability, we use slightly abbreviated class names and DL syntax.</p>
      <p>On the other hand, all of these modelling approaches involve a high modelling
effort due to the manual translation of natural language to ontology-based
annotations. This requires both domain knowledge as well as knowledge of OWL.
Furthermore, the benefits that these approaches potentially implicate can only
be taken advantage of if the schema, data and queries are well-suited.
3</p>
      <p>Information System Evaluation Framework
We start by formalising the relevant components of a (semantically enriched)
information system for which we are then going to evaluate and compare the
different modelling approaches. We will use the term “modelling approach” to
describe the whole system consisting of data, schema, (an abstraction of) queries,
and a query language. A more detailed description of this framework can be
found in [9].
3.1</p>
      <p>A Modelling Approach
A modelling approach MA = (S, D, R, QL) consists of
– a schema S: a finite description of the semantics of the data, e.g. a database
schema, a logic program, or the Tbox of an ontology, which can be empty.
– the data D: e.g. tables and rows in a relational database, ground facts, or
ontology Abox assertions.
– a set of information requests R: each r ∈ R represents the answer to a
query of D, and is given as a set (of tuples) over D. Ideally, R should be
representative for the queries to be answered by the information system to
be built.
– a query language QL: e.g. SQL, (union of) conjunctive queries, OWL class
expressions.</p>
      <p>An information request asks for tuples of the given data that are relevant for
the user. The request needs to be distinguished from the actual query, which is a
specific manifestation of the information request formulated in QL. An
information request r can correspond to 0, 1 or more queries in a given query language.
The former is the case if there are no queries in QL whose answers would be
exactly the tuples in r when asked over S and D, i.e., if QL is unable to express
the information request over the given schema and data. In the case that there
are one or more queries, some of them might be more easily expressible than
others.</p>
      <p>The only assumptions we make is that the query language QL comes with a
semantics that identifies, for a given query q of arity n in QL, data D, and schema
S, the set of certain answers [2]. More precisely, we assume the existence of an
entailment relation |=, and use Ind(D) for the set of individuals or constants in
D to define cert(·) as follows:</p>
      <p>cert(q, S, D) = {w ∈ Ind(D)n | S ∪ D |= q(w)}.
3.2</p>
    </sec>
    <sec id="sec-3">
      <title>Measuring the Fitness of a Modelling Approach</title>
      <p>The basic characteristic we want to evaluate is the fitness of a modelling
approach, i.e. how well the schema and the data are suited to enable the
formulation of “fit” queries for answering the given information requests. The fitness
of a modelling approach can be determined by analysing the syntactic, semantic
and/or cognitive complexity of the queries that correspond to the information
requests and depends on the fitness function.</p>
      <p>The Fitness Function Different queries that correspond to an information
request can vary in length and be more or less complex, e.g. in terms of using
relations and constructors such as conjunctions, disjunctions, etc. They can also
be more or less difficult to understand from a cognitive perspective. For example,
a human user might find a query that uses terms that are actual words (in the
sense that they exist in a domain expert’s dictionary) easier to understand than
one that uses anonymous identifiers. The purpose of the fitness function is to
capture this complexity.</p>
      <p>The framework is parameterised with a fitness function f that associates each
query q in QL with some value f (q) that is intended to capture its fitness. We
only require that f maps QL into a totally ordered set (M, &lt;), e.g. R or N4, which
we call the query’s fitness value. Obvious examples of fitness functions are (i) a
query’s length, (ii) a query’s length combined with the number of constructors
involved, either via some (weighted) summation or into a vector, or (iii) a query’s
length combined with the number of terms not to be found in a domain expert’s
diactionary, or any combinations or extensions of these.</p>
      <p>The smaller the fitness value, the “better” the query. We read f (q) &lt; f (q ) as
q being “better” or “fitter” than q . The framework evaluates the “best queries”
for an information request, e.g., the shortest and least complex queries. The
fitness function induces a partial order on the queries.</p>
      <p>The Query Space Each information request r ∈ R has an associated query
space: first, we define correct queries cQ(r, S, D) as those that answer exactly
an information request r over S and D:</p>
      <p>cQ(r, S, D) = {q | q is a QL query and cert(q, S, D) = r(D)}.</p>
      <p>Next, we define best queries bQ(r, S, D, f ) as those correct queries whose
fitness is maximal. Clearly, best queries depend on how we measure fitness, and
thus on the fitness function f :
bQ(r, S, D, f ) = {q ∈ cQ(r, S, D) | there is no q ∈ cQ(r, S, D) :
f (q ) &lt; f (q)}.</p>
      <p>Since the bQ(·) are the “fittest” queries among the correct queries, any two
queries in bQ(·) are equally fit, and we can abbreviate their fitness as follows:
for f (qi) = f (qj ), we set f ({q1, ..., qk}) to be f (q1).
3.3</p>
      <p>Using the Evaluation Framework to Compare Modelling
Approaches
If we want to compare several modelling approaches, we can compare, for each
information request ri ∈ R and each of the modelling approaches, the fitness
of the best queries. This can unveil the strengths and weaknesses of the
information system to the system designer. For example, if there are information
requests for which the set of correct queries is empty, then f (bQ(r, S, D, f )) is
prohibitively bad. To overcome this, we can then decide whether to select a
different, more powerful query language or to change the schema or the way the
data is modelled—or whether perhaps that particular information request is of
too little importance for such a change. The measurements can also help to point
out where the trade-offs between modelling effort and benefits in terms of
easier query answering are. For example, considering an ontology-based modelling
approach, whether more modelling effort for a more expressive Tbox would be
justified for the sake of simpler queries.</p>
      <p>Applying the framework to one modelling approach MA = (S, D, R, QL)
reveals for each r, the fitness value of the best queries: f (bQj ). In particular, it
will identify information requests for which it is hard to specify a query in QL
and those for which this is impossible. When comparing different modelling
approaches we can compare the point-to-point fitness for each information request.
4</p>
      <p>Applying the Framework to a Medical Image</p>
      <p>Annotation System
We will now demonstrate how the evaluation framework can be applied to the
various modelling approaches we sketched in Section 2. The goal of this
evaluation is to find out how to model the information in order to get optimal retrieval
results, how much effort is involved in this “optimal” modelling and how fit the
modelling approaches are w.r.t. queriability.
A set of representative information requests R is derived from the content of
the original, natural language image descriptions: image types and modalities,
clinical findings, complex findings (e.g. involving locations) and combinations of
the former.</p>
      <p>– r1: involves one clinical finding: “All images that show neoplasms.”
– r2: involves two concepts, an image type and an image projection: “All X-ray
images with lateral projection.”
– r3: involves a clinical finding combined with a qualifier value: “All images
that show left-sided pleural effusions.”
– r4: involves a clinical finding combined with a body structure: “All images
that show soft tissue masses in the pleural membrane.”
– r5: involves a causal relationship between two findings: “All images with
metastases deriving from a carcinoma.”</p>
      <p>The information requests are extensional, i.e. they are mapped to a set of
answers from the data space. Above, we listed English descriptions of r1 to r6
for better understanding.
4.2</p>
      <sec id="sec-3-1">
        <title>Modelling Approaches</title>
        <p>We formalise the modelling approaches that we introduced in Section 2:
– MA1 = (∅, text, R, keywords)
– MA2 = (T1, concept list, R, keywords)
– MA3 = (T1, A1, R, CL)
– MA4 = (T1, A2, R, CL)
– MA5 = (T2, A3, R, CL)</p>
        <p>Note that all representation of the data are derived from the same data set,
i.e. the original natural language image descriptions. In the various modelling
approaches we tried to extract information from this corpus in different ways,
some of which capture more “meaning” than others.
4.3</p>
      </sec>
      <sec id="sec-3-2">
        <title>Results</title>
        <p>
          m11 = (
          <xref ref-type="bibr" rid="ref1">1, 0, 0</xref>
          )
r2 xray
lateral
m21 = (
          <xref ref-type="bibr" rid="ref1 ref2">2, 1, 0</xref>
          )
r3 pleural effusion
left
m31 = (
          <xref ref-type="bibr" rid="ref1 ref2">2, 1, 0</xref>
          )
r4 soft tissue mass
pleural membrane
m41 = (
          <xref ref-type="bibr" rid="ref1 ref2">2, 1, 0</xref>
          )
r5 metastasis
carcinoma
        </p>
        <p>
          MA2
Neoplasm
m12 = (
          <xref ref-type="bibr" rid="ref1">1, 0, 0</xref>
          )
Xray
LateralProjection
m22 = (
          <xref ref-type="bibr" rid="ref1 ref2">2, 1, 0</xref>
          )
PleuralEffusion
LeftSided
m32 = (
          <xref ref-type="bibr" rid="ref1 ref2">2, 1, 0</xref>
          )
SoftTissueMass
PleuralMembrane
m42 = (
          <xref ref-type="bibr" rid="ref1 ref2">2, 1, 0</xref>
          )
Metastasis
        </p>
        <p>Carcinoma
The first observation we make is that not all of the queries noted in the tables
return the correct answers to their corresponding information request. A query
on a gray background indicates that it answers the information request only
partially. In these cases the modelling approach is not expressive enough to
capture the full meaning of the image description. For example, the queries for
MA1 are merely keywords, i.e. they cannot capture synonyms, homonyms etc. In
r1 we ask for image descriptions that contain the word “neoplasm”. If we have a
description that contains the synonym “tumour” instead, it will not be retrieved
and the results therefore have a low recall. In r2 we ask for “Xray images with
lateral projection.” With the keywords “xray” and “lateral” we might also find
images that show the “lateral wall of the trachea”, but have a postero-anterior
projection. In this case the query results have a low precision. We measured
recall and precision in a preliminary study with the same data set and published
the results in [8].</p>
        <p>The requests r3 – r5 describe relational information, e.g. a finding in a
particular location or a finding with a qualifier. Neither MA1 nor MA2 and MA3,
however, can capture relational information. For example, if there is an image
annotation that contains a “soft tissue mass” located in “chest wall” and a
“thickening” of the “pleural membrane”, it would be retrieved with any of the
queries for MA1 – MA3, although it is not an answer to r4. Again, these queries
lead to low precision.</p>
        <p>The request r5 expresses a causal relationship between two findings, i.e.
“metastases that derive from a carcinoma.” Only MA5 is expressive enough
to capture this information by using the derivingFrom property that was added
to SNOMED CT for this modelling approach.</p>
        <p>The framework also helps us to compare the fitness of the modelling
approaches. MA1 – MA5 differ in expressivity, complexity and required modelling
effort. MA1 is the least expressive, but also the least complex and involving
comparatively little modelling effort. The textual image descriptions do not have to
be processed in any way and the queries are just natural language keywords, i.e.
the user does not have to know a particular query language. However, as stated
above, recall and precision of the query results are lower than in the more
expressive modelling approaches because MA1 does not take into account synonyms,
homonyms, relational information etc.</p>
        <p>The queries in MA2 are equally good as the ones for MA1. This approach does
not involve a formal query language (we just used the keywords from the
English representation of the respective information request which is then mapped
to SNOMED CT terms automatically). Essentially, the user can use the same
keywords as in MA1. Since MA2 can recognise synonyms and taxonomical
relationships (which are defined in the thesaurus), its retrieval results are much
better than those of MA1. It has to be noted however, that the tools we used
to automatically map the text to concepts of the SNOMED CT ontology only
know a finite number of synonyms for each concept. Therefore, natural language
queries might lead to losses in recall. Losses in precision are also possible if
homonyms are mapped to the wrong concept.</p>
        <p>MA3 is very similar to MA2 in the sense that it merely lists SNOMED CT
classes that were identified in the textual image descriptions. However, in MA3
the mapping has been done manually and not with the support of tools as
in MA2. The manual mapping involved a considerably higher data modelling
effort. Additionally, for MA3 the queries are formulated as OWL class
expressions, which are more complex than natural language keywords. In summary,
the queries in MA2 are slightly fitter than the ones in MA3 and the quality of
the results for both modelling approaches is comparable.</p>
        <p>MA4 and MA5 on the other hand are very expressive and capture the
meaning of the original image descriptions very well and therefore lead to better recall
and precision of the query results compared to the other modelling approaches.
But that comes to the cost of a much higher modelling effort (due to the manual
translation and the modelling of relational information) and higher complexity
of the queries. An interesting observation is that a little more modelling effort
on the schema of MA5 (i.e. adding some roles and role hierarchy axioms) greatly
improves the queriability of this approach compared to the relatively similar
approach MA4 without these axioms. Both in the image annotations and in the
queries, cumbersome SNOMED CT specific constructs like roleGroups can be
avoided and lead to a better readability of the annotations and fitter queries and
therefore increased queriability.</p>
        <p>In general the evaluation framework helps us to understand:
– which are the weak spots of each modelling approach (e.g. low quality of
query results in MA1, high complexity of the queries in MA4)
– which are the strong spots of each modelling approach (e.g. MA5 captures
the meaning particularly well and has high recall and precision)
With this knowledge the system designer can now assess the trade-offs
between the advantages and disadvantages that these measurements indicate and
the modelling effort that is required for each of the approaches. Based on this,
informed decisions can be made. For example, that MA2 and MA3 are more or
less similarly fit but MA3 requires significantly more effort due to the manual
translation of the data. Therefore, MA2 might be preferred over MA3.</p>
        <p>Another observation could be that if the precision of the results is not crucial,
one might prefer a simpler modelling approach (e.g. MA2) that involves less
modelling effort than the more expressive approaches but still leads to acceptable
results. However, if recall and precision of the results are of high priority, one
might want to invest some more effort and select MA5.</p>
        <p>Furthermore, the evaluation makes salient how we could improve or customise
a modelling approach in order to make it better or make use of its full potential.
For example, modifying MA4 by adding some role axioms to the schema leads
to a much fitter modelling approach MA5.
5</p>
        <p>Conclusion and Future Work
We compared heterogeneous modelling approaches for an information system for
medical image annotations. In order to evaluate the strengths and weaknesses
of these approaches we used an information system evaluation framework that
was designed to measure the fitness of modelling approaches with a particular
focus on their queriability. The evaluation framework can make interesting
characteristics of a modelling approach salient and therefore help information system
designers to assess the trade-off between modelling effort and queriability and
to make an informed decision about which modelling approach is most suitable
for their needs.</p>
        <p>Currently we are refining the evaluation framework so that it can be used
for a wider range of scenarios and applications. We are extending the framework
to measure not only the fitness of exact queries, but also such that answer
information requests only partially. Furthermore, we incorporate measurements
of false positives and false negatives in the query answers in order to combine
the fitness of a query with retrieval performance measurements. Other possible
extensions are e.g. measuring the flexibility of a system by evaluating in how
many ways a user can formulate a fit query for an information request. Another
adornment would be to combine the fitness values with measurements for query
answering performance so that more information about the system can be taken
into account. We are currently evaluating these extensions in other case studies.</p>
        <p>The particular case study presented in this paper lead to the conclusion
that ontology-based image annotations can lead to good retrieval performance
in terms of recall and precision and good queriability if an appropriate modelling
technique is used. However, it turned out that this involves high modelling effort
since manual intervention in the translation process is required. It would be
worth to investigate whether it is possible to combine MA2 and MA5 by using
text mining tools not only to automatically map concepts but also relational
information. Another important observation we made is that SNOMED CT is
suitable for modelling image descriptions and - most important - that a rather
small customisation, i.e. adding some role axioms in order to bypass cumbersome
SNOMED CT constructs such as roleGroups, can lead to great benefits in terms
of queriability and comprehension.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>F.</given-names>
            <surname>Baader</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Calvanese</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. L.</given-names>
            <surname>McGuinness</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Nardi</surname>
          </string-name>
          , and
          <string-name>
            <given-names>P. F.</given-names>
            <surname>Patel-</surname>
          </string-name>
          Schneider, editors.
          <source>The Description Logic Handbook: Theory</source>
          , Implementation, and
          <string-name>
            <surname>Applications</surname>
          </string-name>
          . Cambridge University Press,
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>D.</given-names>
            <surname>Calvanese</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G. D.</given-names>
            <surname>Giacomo</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M.</given-names>
            <surname>Lenzerini</surname>
          </string-name>
          .
          <article-title>On the Decidability of Query Containment under Constraints</article-title>
          .
          <source>In PODS</source>
          , pages
          <fpage>149</fpage>
          -
          <lpage>158</lpage>
          ,
          <year>1998</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>J. S.</given-names>
            <surname>Chandler</surname>
          </string-name>
          and
          <string-name>
            <given-names>T. G.</given-names>
            <surname>DeLutis. A</surname>
          </string-name>
          <article-title>Methodology for the Performance Evaluation of Information Systems under Multiple Criteria</article-title>
          . In International Computer Measurement Group Conference, pages
          <fpage>221</fpage>
          -
          <lpage>229</lpage>
          ,
          <year>1976</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>S. Y.</given-names>
            <surname>Harmon</surname>
          </string-name>
          .
          <article-title>Application of a Technique for Evaluating Information System Architectural Designs</article-title>
          .
          <source>In Symposium on Engineering of Computer-Based Systems. IEEE Computer Society</source>
          ,
          <year>1999</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <given-names>M.</given-names>
            <surname>Krauthammer</surname>
          </string-name>
          and
          <string-name>
            <given-names>G.</given-names>
            <surname>Nenadic</surname>
          </string-name>
          .
          <article-title>Term Identification in the Biomedical Literature</article-title>
          .
          <source>Journal of Biomedical Informatics</source>
          ,
          <volume>37</volume>
          (
          <issue>6</issue>
          ):
          <fpage>512</fpage>
          -
          <lpage>526</lpage>
          ,
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <given-names>E. L.</given-names>
            <surname>Morse</surname>
          </string-name>
          .
          <article-title>Evaluation Methodologies for Information Management Systems</article-title>
          .
          <source>DLib Magazine</source>
          ,
          <volume>8</volume>
          (
          <issue>9</issue>
          ),
          <year>2002</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <given-names>B.</given-names>
            <surname>Motik</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. F.</given-names>
            <surname>Patel-Schneider</surname>
          </string-name>
          , and
          <string-name>
            <surname>B. Parsia.</surname>
          </string-name>
          <article-title>OWL 2 Web Ontology Language: Structural Specification and Functional-Style Syntax</article-title>
          .
          <source>Technical report, W3C Recommendation</source>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <given-names>J.</given-names>
            <surname>Opitz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Parsia</surname>
          </string-name>
          , and
          <string-name>
            <given-names>U.</given-names>
            <surname>Sattler</surname>
          </string-name>
          .
          <article-title>Using Ontologies for Medical Image Retrieval - An Experiment</article-title>
          .
          <source>In OWL Experiences and Directions (OWLED</source>
          <year>2009</year>
          ), volume
          <volume>529</volume>
          <source>of CEUR Workshop Proceedings. CEUR-WS.org</source>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <given-names>J.</given-names>
            <surname>Opitz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Parsia</surname>
          </string-name>
          , and
          <string-name>
            <given-names>U.</given-names>
            <surname>Sattler. Information System</surname>
          </string-name>
          <article-title>Analysis</article-title>
          .
          <source>In International Workshop on Evaluation of Semantic Technologies (IWEST</source>
          <year>2010</year>
          ),
          <source>CEUR Workshop Proceedings. CEUR-WS.org</source>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>