<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Incorporating Probabilistic Knowledge in HealthAgents: a Conceptual Graph Approach</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Madalina Croitoru</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Srinandan Dasmahapatra</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Paul Lewis</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Electronics and Computer Science, University of Southampton</institution>
          ,
          <addr-line>SO171BJ</addr-line>
          ,
          <country country="UK">UK</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>HealthAgents is a multi-agent, distributed decision support system for brain tumor diagnosis. Knowledge needs to be shared amongst different agents in order to assist clinicians when making diagnosis / prognosis. Existing terminological standards led to the development of a vocabulary to facilitate interoperability. Querying expressivity requirements as well as the need for visual capabilities further led to the development of a Conceptual Graph based description of the data sources: knowledge oriented specification. However, an important part of the medical knowledge is not encoded in this formalism: background knowledge regarding statistical correlations. As a decision support system, HealthAgents should provide the clinician all possible related information about a case. This paper presents a way of encoding and utilising such statistical information. The Simple Conceptual Graphs that describe a given hospital cases will be used to retrieve related information. Logical subsumption will be used for retrieval, while the statistical correlations will be presented to the clinician as part of the decision support system.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>In this paper we address the problem of integrating a set of statistical rules
with a first order logic based formalism: Conceptual Graphs. This integration
is thought from the perspective of a medical decision support system (DSS).
In this context the clinical user of the DSS will be presented with potentially
useful information related to a patient case. This new information will help in
the selection of appropriate machine learning mechanisms to be used for case
classification.</p>
      <p>
        The work described in this paper will present a first step towards the
integration of statistical data with Conceptual Graphs. Our choice of Conceptual
Graphs is twofold. First, it provides easy integration with the KOS framework
described in [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. Second, the clinician feedback will be done in natural language
and Conceptual Graphs will facilitate this translation. While the motivation for
the work is obvious: the need of integrating the existing statistical rules with the
conceptual graphs formalism; the justification for our approach needs a couple
of remarks. First, the decision support system has to provide the clinician with a
number of machine learning algorithms for case classification. These algorithms
have been trained on a set of data with certain features (age, sex etc.). It is
important to select the appropriate classifiers. At the same time the choice of
classifiers is not only based on the patient case as such, but also on a set of
statistical correlations that the clinician has observed. This rationale calls for
the integration of reasoning capabilities for case retrieval (logical subsumption)
with existing statistical correlations provided by textbooks or concrete hospital
cases. Second, the nature of the system under discussion has to be considered: a
decision *support* system. Indeed, our aim is to make best use of the knowledge
available by presenting related information to the doctor. We do not want to
develop a statistical based reasoning system, but simply to provide the clinician
with all potential useful information about a case. Due to this reason, our work
is evaluated empirically, looking at the usefulness of the information we provided
for clinicians.
      </p>
      <p>In conclusion, the advantages of the proposed approach are two fold:
modularization for representation and easy evolution. Indeed, the logic and the
statistical aspects are kept separate but exploited in a joined manner. Due to the
nature of our representation we can easily integrate new domain knowledge /
terminologies / ontologies, as a mapping between the tree representations of the
terminologies and the support. In particular, the last point makes our approach
very useful for the medical domain in particular, where a number of different
names associated to the same object are generally accepted.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Motivation and related work</title>
      <p>
        HealthAgents [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] is an agent-based, distributed decision-support system (DSS)
that employs clinical information, Magnetic Resonance Imaging (MRI) data,
Magnetic Resonance Spectroscopy (MRS) data and genomic DNA profile
information. It is important to highlight at this stage that due to the medical nature
of our system we are not interested in combining the logical and statistical
inference aspects. While this is an interesting directions of work ([
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]) we believe
that these approaches are unsuitable for our project for the following reasons:
(1) The clinical users are reluctant of using a system that performs statistical
reasoning for them. The motive is that potentially undiscovered classes of tumors
could be discarded as part of the reasoning process; (2) Second, the nature of
the domain makes the identification of independent variables difficult; (3) Third,
exhaustive scenarios cannot be provided for representational completeness.
      </p>
      <p>
        We propose a Conceptual Graph based methodology for retrieving relevant
information that might help the clinician in the process of classifier selection.
The textbook rules and correlations from the literature have been translated into
a set of rules with a degree of belief attached. These rules follow the spirit of [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ],
only with the statistical aspect included. When a new patient case needs to be
sent to the appropriate classifiers, the clinical data of the patient is translated
into a Conceptual Graph. Subgraphs of this Conceptual Graph will then be
projected in order to retrieve relevant information. We detail our methodology
further in the next section.
      </p>
    </sec>
    <sec id="sec-3">
      <title>The HealthAgents System</title>
      <p>The envisaged functionality of HealthAgents (see Figure 1) is to provide
better classification accuracy for brain tumors using non invasive procedures:
MRI scans, MRS scans, HRMAS and microarray information. The distributed
nature of the system (with data located in different geographic areas:
Birmingham, Barcelona, Valencia) will ensure a large number of cases available. These
cases will be used for training classifiers on particular sets of data (e.g. male vs
female, certain age groups, certain types of tumors, brain locations etc.). The
classifiers will be invoked when a new patient case is presented to the system.
Depending on the clinical data of the patient and the location of the tumor (as
available from the MRI scan) the clinician makes the choice for what classifiers
to invoke. The classifiers will provide a differentiated diagnosis (discriminating
between two or more possible tumor types). Depending on the classifier results
and the MRS scan, the clinician makes his decision or invokes another classifier.</p>
      <p>
        Knowledge contained in the data sources is described by the means of
Conceptual Graphs. This allows us to build upon the existing HADOM ontology
while not overcomplicating the ontology with rules to describe data extraction
techniques that employ different parameters which greatly influence the outcome
data. An immediate advantage of our Conceptual Graphs choice is their graph
based reasoning mechanisms which allow versatile querying algorithms [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. The
Conceptual Graph querying will allow for the clinician to search for a similar
case within the cases in the HealthAgents network.
      </p>
      <p>In this paper we would like to provide a functionality that allows to present
extra information to the clinician that will allow to make a more informed choice
of the classifiers to be invoked. Indeed, all the clinical knowledge relating brain
tumor types with age, sex or brain tumor location is not exploited at all in the
current version of our prototype. We propose translating such correlation rules
(available from textbooks and scientific articles) into Conceptual Graph rules
with an associated degree of belief. We will then use projection to select the
relevant rules for a given patient case and show them to the doctor in descending
order of their belief degree.
4</p>
    </sec>
    <sec id="sec-4">
      <title>Using Conceptual Graphs and probabilistic information</title>
      <p>In this section we will detail our methodology and provide a concrete example
of its functionality.</p>
      <p>First, we will describe how textbook rules and statistical correlation have
been translated to a Conceptual Graph representation (Section 4.1). This
statistical information was made available from books and relevant scientific articles.</p>
      <p>Section 4.2 explains how these rules and correlations can be applied on an
instance of a patient case (also represented as a Conceptual Graph). As the
outcome, the doctor will be presented with a labelled tree where labels reflect
the degree of probability of each rule. It is important to highlight that these
labels will solely be used for the doctor as a guidance for classifier selection and
not for probabilistic inference.</p>
      <p>Each section we will first present an intuitive overview of the proposed
methodology, followed by the formal description of our work. At the end of each
section a concrete example is provided. However, a few definitions are needed
to ensure consistency of the formalism presented throughout the paper. These
definitions are provided below.</p>
      <p>Let G = (VC ; VR; EG) be a bipartite graph. If, for each vR 2 VR, there is a
linear order e1 = fvR; v1g; : : : ; ek = fvR; vkg on the set of edges incident to vR
(k = dG(v) is the degree of vR), then G is called an ordered bipartite graph.
Given a node v 2 VC [ VR, NG(v) denotes the neighbours set of this node, i.e.
NG(v) = fw 2 VC [ VRjfv; wg 2 EGg. Similarly, if A µ VR [ VC , its neighbours
set is denoted as NG(A) = [v2ANG(v) ¡ A. We also denote the i-th neighbour of
vR 2 VR by N Gi(vR), meaning that ei = (vR; N Gi(vR)) 2 EG. If G = (VCG; VRG; E)
is an ordered bipartite graph and A µ VRG, then the subgraph spanned by A in
G is the graph [A]G = (NG(A); A; E0), where NG(A) is the neighbor set of A in
G.</p>
      <p>A conceptual graph support consists of a concept type hierarchy, a relation
type hierarchy, a set of individual markers that refer to specific concepts and
a generic marker, denoted by *, which refers to an unspecified concept. More
precisely, a support is a 4-tuple S = (TC ; TR; I; ¤) where:
- TC is a finite, partially ordered set (poset) of concept types (TC ; ·) that defines
a type hierarchy where 8x; y 2 TC , x · y means that x is a subtype of y; the
top element of this hierarchy is the universal type &gt;C ;
- TR is a finite set of relation types partitioned into k posets (T Ri; ·)i=1;k of
relation types of arity i (1 · i · k), where k is the maximum arity of a relation
type in TR; each relation type of arity i, namely r 2 T i , has an associated
R
signature ¾(r) 2 TC £ : : : £ TC , which specifies the maximum concept type of
| i t{imzes }
each of its arguments; this means that if we use r(x1; : : : ; xi), then xj is a concept
of type(xj ) · ¾(r)j (1 · j · i); the partial orders on relation types of the same
arity must be signature-compatible, i.e. 8r1; r2 2 T Ri r1 · r2 ) ¾(r1) · ¾(r2);
- I is a countable set of individual markers that refer to specific concepts;
- ¤ is the generic marker that refers to an unspecified concept (however, this
concept has a specified type);
- The sets TC , TR, I and f¤g are mutually disjoint;
- I [ f¤g is partially ordered by x · y if and only if x = y or y = ¤.</p>
      <p>A (Simple) Conceptual Graph (SCG) is a 3-tuple SG = [S; G; ¸], where:
- S = (TC ; TR; I; ¤) is a support;
- G = (VC ; VR; EG; l) is an ordered bipartite graph;
- ¸ is a labelling of the nodes of G with elements from the support S: 8r 2
VR; ¸(r) 2 TRdG(r); 8c 2 VC ; ¸(c) 2 TC £ ¡I [ f¤g¢ such that if c = N Gi(r),
¸(r) = tr and ¸(c) = (tc; refc) then tc · ¾i(r).</p>
      <p>When the support is fixed, we use the notation SG = (G; ¸), or we refer to
the CG G and its labelling function ¸G.</p>
      <p>If (G; ¸G) and (H; ¸H ) are two CGs (defined on the same support S) then G ¸ H
(G subsumes H) if there is a projection from G to H. A projection is a mapping
¼ from the vertices set of G to the vertices set of H, which maps concept vertices
of G into concept vertices of H, relation vertices of G into relation vertices of
H, preserves adjacency (if the concept vertex v in V G is the ith neighbor of
C
relation vertex r 2 VRG then ¼(v) is the ith neighbor of ¼(r)) and furthermore
¸G(x) ¸ ¸H (¼(x)) for each vertex x of G.
4.1</p>
      <sec id="sec-4-1">
        <title>Statistical Conceptual Graph Rules</title>
        <p>
          This section describes how to exploit the statistical correlations contained in
textbooks to select appropriate classifiers for HealthAgents. Statements such as
“Medulloblastoma account for 20% of all pediatric tumors” or “85% of
medulloblastoma occur by the age of 15” are translated into Conceptual Graph (CG)
based rules (as described in [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ]) with the corresponding associated degree of
belief. We provide the definition for such rules below.
        </p>
        <p>
          If S is a fixed support, then a rule defined on S (see [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ]) is any CG H, over
the support S, having specified a bipartition (Hyp; Conc) of its set of relation
nodes VRH . The subgraph of H spanned by Hyp, [Hyp]H is called the hypothesis
of the rule H, and the subgraph spanned by Conc, [Conc]H , is the conclusion of
the rule H.
        </p>
        <p>Applying a rule H to a CG G means to find a projection ¼ from [Hyp]H to
G, to add a disjoint copy of [Conc]H to G, and finally to identify in this graph
each concept node v 2 VC[Conc]H \ VC[Hyp]H to ¼(v), its image by ¼. The new CG
has</p>
        <p>hasType
Age:&lt;15</p>
        <p>Patient:*
has
obtained, G0, is called an immediate derivation of G, by the application of rule
H, and following ¼. A probabilistic rule is pair (R; p(R), where R is a rule and
p(R) is its probability.</p>
        <p>In Figure 2 two such probabilistic rules for the tumor type medulloblastoma
are presented. The first rule states that if the patient has a tumor (as encoded
by the white labelled relation “has”) then the tumor type is medulloblastoma (as
encoded by the grey labelled relation “hasType”) with a probability degree of 0.2.
Similar, the second rule states that is a patient has a tumor and that tumor is of
the type medulloblastoma then the patient is under 15 with a probability degree
of 0.85. The support for these rules has been omitted for simplicity reasons.
These two rules have been extracted from a pediatric study on tumor types and
are the only two available rules for the tumor type medulloblastoma. This is an
important fact, as it shows that the number of such correlation rules is not large,
thus not affecting the computational effectiveness of our approach. We will show
how these rules are applied for HealthAgents in the next section.
4.2</p>
      </sec>
      <sec id="sec-4-2">
        <title>Conceptual Graph Derivation Tree</title>
        <p>This section will detail how the rules introduced in the previous section can be
used on a specific instance of a patient case. All of the relevant rules for the
patient instance will be applied and a derivation tree built. The derivation tree
will be used for the clinician to have an overview on potentially useful information
prior to classifier selection. The weights on the tree edges will only be used as an
indication of correlations in the field. Please note that due to the way we defined
the derivation tree the same rule can be applied twice, therefore not ensuring
independency. This is the main reason why we do not use the derivation tree for
probabilistic inference, but rather for an organized exploration of the available
information relevant to a particular case. It is also important to mention that the
derivation tree cannot get potentially very large due to the number of available
rules for each of the tumor types.
hasAge
has
hasLocation
0.2</p>
        <p>hasType Medul oblastoma:*
0.85</p>
        <p>Age:&gt;15</p>
        <p>Patient:A Tumor:*</p>
        <p>TemporalNode:*
has</p>
        <p>hasLocation
Age:&lt;15 hasAge
hasType</p>
        <p>Medul oblastoma:*
Age:&gt;15 Patient:A Tumor:* TemporalNode:*
has</p>
        <p>hasLocation</p>
        <p>Let R a set of rules defined on S and G a CG over S. Then G, R derives
a CG G0 if there exists a sequence of immediate derivations leading to G0 by
applications of rules in R. The set of all CGs G0 which can be derived from
a CG G using R by means of sequences of immediate derivations of length at
most k is denoted by Rk(G) and can be described as a derivation tree having as
nodes CGs, rooted in G and having as directed edges pairs of CGs representing
immediate derivations. If the rules in R are probabilistic, then each such directed
edge has assigned as weight the probability of the rule used.</p>
        <p>Figure 3 presents such derivation tree obtained from a patient case of over
15, with a tumor in the temporal lobe. The clinician intuition (based on the
MRS scan) is that medulloblastoma is an potential diagnosis and the two rules
previously shown for medulloblastoma have been applied. As a consequence a
contradiction was obtained: given the fact that medulloblastomas account for
20% of cases, 85% of those will be on patients under 15, and the patient was
over 15.</p>
        <p>Please note that if the clinician would not have any intuition on the tumor
type, then all the rules relevant to tumor types and further consequences would
have been applied. Even if the rule will state that for the particular instance
tumor location a tumor type is not possible, the outcome will be presented to
the clinician. The motivation is that a potentially new type of tumor could be
under discursion and by performing “reasoning” this aspect would be ignored. It
is therefore very important, in the context of this domain, to present the clinician
with all possible information related to the patient case.
5</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Conclusion and future work</title>
      <p>In this paper we provided a methodology for integrating probabilistic information
to enhance the HealthAgents decision support system. We have shown how the
probabilistic rules retrieved from textbooks can be translated into a Conceptual
Graph formalism and then how they can be applied for building a derivation
tree.</p>
      <p>In advancing out work we have to keep the knowledge representation and
reasoning research tightly coupled with the clinician feedback in the domain. So
far, the clinician have proved reluctant to discarding information as part of the
reasoning process. However, future work will look at pruning the derivation tree
based on contradiction and reorganizing information based on such pruning. We
would also like to facilitate intuitive navigation of such tree and current work is
looking at addressing such design problems.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>C.</given-names>
            <surname>Arús</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Celda</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Dasmahapatra</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Dupplaw</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>González-Vélez</surname>
          </string-name>
          ,
          <string-name>
            <surname>S. van Huffel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Lewis</surname>
          </string-name>
          , M. Lluch i Ariet,
          <string-name>
            <given-names>M.</given-names>
            <surname>Mier</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Peet</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M.</given-names>
            <surname>Robles</surname>
          </string-name>
          .
          <article-title>On the design of a web-based decision support system for brain tumour diagnosis using distributed agents</article-title>
          .
          <source>In WI-IATW'06: 2006 IEEE/WIC/ACM Int Conf on Web Intelligence &amp; Intelligent Agent Technology</source>
          , pages
          <fpage>208</fpage>
          -
          <lpage>211</lpage>
          ,
          <string-name>
            <surname>Hong</surname>
            <given-names>Kong</given-names>
          </string-name>
          ,
          <year>December 2006</year>
          . IEEE.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>J.-F.</given-names>
            <surname>Baget and M.-L. Mugnier</surname>
          </string-name>
          .
          <article-title>Extensions of Simple Conceptual Graphs: the Complexity of Rules and Constraints</article-title>
          .
          <source>Jour. of Artif. Intell. Res.</source>
          ,
          <volume>16</volume>
          :
          <fpage>425</fpage>
          -
          <lpage>465</lpage>
          ,
          <year>2002</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>M.</given-names>
            <surname>Croitoru</surname>
          </string-name>
          and
          <string-name>
            <given-names>E.</given-names>
            <surname>Compatangelo</surname>
          </string-name>
          .
          <article-title>Conceptual graph projection: a tree decomposition-based approach</article-title>
          . In P. Doherty, Mylopuolos, and C. Welty, editors,
          <source>Proc. of the 10th Int'l Conf. on the Principles of Knowledge Representation and Reasoning (KR'2006)</source>
          , pages
          <fpage>271</fpage>
          -
          <lpage>276</lpage>
          . AAAI,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>M.</given-names>
            <surname>Croitoru</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Hu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Dashmapatra</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Lewis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Dupplaw</surname>
          </string-name>
          , and
          <string-name>
            <given-names>L.</given-names>
            <surname>Xiao</surname>
          </string-name>
          .
          <article-title>A conceptual graph description of medical data for brain tumour classification</article-title>
          . In Conceptual Structures:
          <article-title>Knowledge Architectures for Smart Applications</article-title>
          , 15th International Conference on Conceptual Structures,
          <string-name>
            <surname>ICCS</surname>
          </string-name>
          <year>2007</year>
          ,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <given-names>J.</given-names>
            <surname>Halpern</surname>
          </string-name>
          and
          <string-name>
            <given-names>D.</given-names>
            <surname>Koller</surname>
          </string-name>
          .
          <article-title>Representation dependence in probabilistic inference</article-title>
          .
          <source>JAIR</source>
          ,
          <volume>21</volume>
          :
          <fpage>319</fpage>
          -
          <lpage>356</lpage>
          ,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <given-names>T.</given-names>
            <surname>Lukasiewicz</surname>
          </string-name>
          .
          <article-title>Expressive probabilistic description logics</article-title>
          .
          <source>Artif</source>
          . Intell.,
          <volume>176</volume>
          :
          <fpage>852</fpage>
          -
          <lpage>883</lpage>
          ,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>