<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>Artificial Intelligence in Medicine</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <article-id pub-id-type="doi">10.1371/journal</article-id>
      <title-group>
        <article-title>Bayesian Networks in Medicine: Presenting Query Response Uncertainty for Decision Support</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Janneke H. Bolt</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Anna Berghuis</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Arjen Hommersom</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Marike Lombaers</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Johanna M.A. Pijnenborg</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Silja Renooij</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Open University of the Netherlands, Faculty of Science</institution>
          ,
          <addr-line>Heerlen</addr-line>
          ,
          <country country="NL">The Netherlands</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Radboud University Medical Center, Department of Obstetrics and Gynaecology</institution>
          ,
          <addr-line>Nijmegen</addr-line>
          ,
          <country country="NL">The Netherlands</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Utrecht University, Department of Information and Computing Sciences</institution>
          ,
          <addr-line>Utrecht</addr-line>
          ,
          <country country="NL">The Netherlands</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2004</year>
      </pub-date>
      <volume>30</volume>
      <issue>2004</issue>
      <fpage>201</fpage>
      <lpage>214</lpage>
      <abstract>
        <p>Despite their good characteristics, tools based on Bayesian networks are not yet widely used in medical decision support. Meeting the needs of the intended users is crucial for the acceptance of these tools, and clinical involvement in their development is thus required to promote their use. During the development of a Bayesian network-based tool for the prediction of lymph node metastases in patients with endometrial cancer, as one of the users needs a measure of query response uncertainty was put forward. In this paper, we sustain meeting this need by exploring options for the presentation of query response uncertainty. We consider the level of detail, one-sided versus two-sided intervals and two diferent 'look-ahead' options. The diferent options are illustrated through a small example network.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Bayesian networks</kwd>
        <kwd>Decision support</kwd>
        <kwd>Query response uncertainty</kwd>
        <kwd>Explainable AI</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        In healthcare, many decisions with often far-reaching consequences have to be made. It is therefore
not surprising that, already from the early development of computer-based tools for decision making,
such systems were built for the medical domain (e.g. [
        <xref ref-type="bibr" rid="ref1 ref2">1, 2</xref>
        ]). Among the computer-based tools for
decision making are the well-known Bayesian networks [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. Bayesian networks have great advantages
for clinical decision making such as a quite intuitive visualization of the problem domain by means of
their graph structure, and their ability to provide predictions for incomplete and/or conflicting patient
ifndings [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. Despite their suitable characteristics, Bayesian networks are not yet widely used in clinical
practice. Kyrimi et al. [5] performed a comprehensive literature review to analyze this low adoption and
provided an overview of “benefits, barriers and facilitating factors” for the use of Bayesian networks
in healthcare. As one of the barriers, clinicians’ resistance is mentioned, with clinical involvement
as one of the facilitating factors. This paper presents research related to employing this facilitating
factor. More specifically, in a joint team of AI researchers and clinicians, we investigate the need
for explanation types and forms in the context of a Bayesian network tasked with the challenging
preoperative prediction of lymph node metastasis (LNM) for patients with endometrial cancer.
      </p>
      <p>Given the diagnosis of endometrial cancer, the preoperative prediction of the presence of LNM has
a great impact on the proposed treatment strategy. However, although there are several biomarkers
related to LNM, none of them are currently used in daily practice; the tumor grade is used as most
important predictor. This clinical problem resulted in an initiative within the European Network for
Individualized Treatment of Endometrial Cancer (ENITEC) for the development of a Bayesian network
for the prediction of LNM [6]. This network, ENDORISK, includes nine preoperative variables and has
as main outcome measures LNM and 5-year disease-specific survival.</p>
      <p>One of the needs put forward by the intended clinical users of ENDORISK is that the system provides
information on the uncertainty in query responses. In particular, the network can be used to compute the
probability of LNM for a specific patient, but there is no indication of the reliability of this probability.</p>
      <p>This wish for information on the uncertainty of query responses relates to the concept causability,
a notion used within Explainable AI (XAI), which encompasses measurements for the quality of
explanations [7]. In this context it is particularly important that a system provides an indication of the
uncertainty in its predictions [8]. Causability relates to a user-oriented perspective on explainability of
AI models, which is in contrast with the majority of XAI methods that focus on providing technical
insights into the inner workings of these models [9].</p>
      <p>In this paper we propose and discus diferent options for the presentation and explanation of query
response uncertainty, were we base the latter on an existing method for approximating the variance of
a Bayesian network query [10].</p>
    </sec>
    <sec id="sec-2">
      <title>2. Preliminaries</title>
      <sec id="sec-2-1">
        <title>2.1. Bayesian networks and naive Bayesian networks</title>
        <p>
          A Bayesian network is a model of a joint probability distribution Pr over a set of stochastic variables
V, consisting of a directed acyclic graph and a set of conditional probability distributions. In this
paper we presume that all  ∈ V are discrete. Variables will be denoted by upper-case letters (),
and their values by lower-case letters (); sets of variables by bold-face upper-case letters (A) and
their instantiations by bold-face lower-case letters (a). Given a binary variable ,  = 1 is often
written as  and  = 2 is often written as ¯. Each variable  ∈ V is represented by a node  in the
network’s graph1. (Conditional) independence between the variables is captured by the graph’s set of
arcs according to the d-separation criterion [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ]. The strength of the probabilistic relationships between
the variables is captured by the set of conditional probability distributions Pr( | p()), where p()
denotes a combination of values, or instantiation, of the parents of . The probabilities that define these
distributions are called the network parameters. The joint probability distribution is then defined by:
Pr(V) = ∏︁ Pr( | p())
        </p>
        <p>∈V
Using the formula above, a Bayesian network can yield any probability of the modeled problem domain.</p>
        <p>A naive Bayesian network is a Bayesian network with a specific graph structure. The graph consists
of just a single target node with a set of observable feature child nodes and captures the assumption that
the observable features are all mutually independent given the target node. For a naive network with a
binary valued target node , and feature nodes F, the probability of Pr(ℎ | e), with e = {1, . . . , }
an instantiation of E ⊆ F is computed from the network parameters as follows</p>
        <p>∏︀∈{1,...,} Pr( | ℎ) · Pr(ℎ)
Pr(ℎ | e) = ∑︀ℎ∈{ℎ,ℎ¯} ∏︀∈{1,...,} Pr( | ℎ ) · Pr(ℎ )
(1)</p>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. Query response uncertainty</title>
        <p>The standard query response from a Bayesian network is a single outcome probability, computed
from the network’s parameters. Since in practice the parameters of a network are estimates, based
on population samples and/or uncertain expert opinions, the query response, even using an exact
inference algorithm, will be an estimate as well. Van Allen et al. [10, Theorem 2] provide a formula to
approximate the variance of the query responses of a Bayesian network, resulting from the uncertainty
in the parameter probabilities. Below we give the formula we used to approximate the variance of
1The terms node and variable will be used interchangeably.
the outcome Pr(ℎ | e), of a naive Bayesian network with target variable  and  observed features
e = {1, . . . , }, in which the parameters are estimated from data, based on [10, Theorem 2]:
˜2(Pr(ℎ|e)) =</p>
        <p>∑︁
par ∈ PAR
(p2ar − par) − (par − par)2
par
where
• PAR = {Pr(ℎ), Pr(1|ℎ), . . . , Pr(|ℎ), Pr(1|ℎ¯), . . . , Pr(|ℎ¯)};
• par is the partial derivative of Equation (1) with respect to par;
• par is the estimate of par based on relative frequency in the data, and
• par is the number of cases involved in computing par plus 1.</p>
        <p>
          To represent the higher-order uncertainty in the outcome probability, we define a distribution over
Pr(ℎ|e) with variance  2 = ˜2(Pr(ℎ|e)) and mean  = Pr(ℎ|e). Following Van Allen et al. [10] we
use a beta distribution, which has the advantages of being confined to [
          <xref ref-type="bibr" rid="ref1">0, 1</xref>
          ] and being able to model
skewed distributions. The parameters  and  that control the shape of the beta distribution are related
to its variance  2 and mean  as follows [10, Equation 4]:
 =  ·
[︁  · (1 −  ) −  2 ]︁
        </p>
        <p>2
 = (1 −  ) · [︁  · (1 −  ) −  2 ]︁
 2</p>
        <p>This beta distribution in turn can be used to compute Bayesian credible intervals for Pr(ℎ|e), that is,
an interval that contains Pr(ℎ|e) with a given confidence. In this paper we consider the 95% credible
interval. In computing such interval, the lower and upper end of the interval have to be chosen such
that the area under the curve of the probability density function (PDF) of the distribution within this
interval is 0.95. This can be done in more ways than one. In this paper we consider two-sided intervals
with 2.5% of the area below its lower end and 2.5% above its upper end, and one-sided intervals with
5% of the area above its upper end. We take a one-sided interval with an upper end, since it is important
to know whether the risk of LNM is with a given confidence below some threshold value. Note that for
the one-sided interval a lower end of 0 can be used since the area under the curve below 0 will always
be 0. Note moreover that, for the same distribution, the upper end of the one-sided 95% interval will be
lower than the upper end of the two-sided 95% interval.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Example network</title>
      <p>The ENDORISK network is a detailed Bayesian network in which nine clinical and molecular biomarkers
are integrated to predict lymph node metastasis (LNM) and 5-year disease-specific survival [ 6]. To
simplify both computations and illustrations, we simplified this network to the naive Bayesian network
depicted in Figure 1, with one target variable,  , with the values  for LNM is present (confirmed by
lymphadenectomy) and ¯ for LNM is absent (confirmed by lymphadenectomy or by the absence of
recurrence without therapy), and five diagnostic variables. Further details of the network are given in
Figure 1. For the estimation of the parameters of the naive network we used data provided by Radboud
University Medical Center, collected within the ENITEC. From the 952 provided cases we only used
the 328 cases with complete observations for the variables in the example network. For clarity, in
Figure 1 just the mean values of the parameters are given, and not the underlying numbers of cases. For
the discretization of the continuous variables we used the same threshold values as were used in the
ENDORISK network except for the variable  . Our adapted threshold for  made this variable more
informative than the threshold used in the ENDORISK network. The values of the binary variables
are chosen such that the observations , ,  and  are indications that LNM will be present and the
observations ¯, ¯, ¯ and ¯ are indications that LNM will be absent. In our illustrations we will indicate
for the patient favorable observations with the color green and unfavorable observations with the color
red.</p>
      <p>We note that the simplified network is merely used to compute and illustrate the diferent options for
the presentation of query response uncertainty. The actual output probabilities may be diferent from
those computed from the original network; however, here we are not interested in the performance
of the network, but in options for presenting uncertainty. For the purpose of illustrating our general
approach, the simplified version of the network sufices just as well.</p>
    </sec>
    <sec id="sec-4">
      <title>4. Options for presenting query response uncertainty</title>
      <p>The regular outcome of a standard query for our example network from Figure 1 is the probability
of LNM given the observations for a patient entered into the network. Recall that our physicians
expressed the wish for a measure on the confidence in this outcome. To provide for such a measure, we
implemented the method to compute the variance on the outcome, finding a suitable beta distribution
over the outcome and computing a Bayesian credible interval over this outcome, as described in Section 2.
However, given the extra information on query response uncertainty, there are several options regarding
which information is presented to the users and also how this information is presented. The choices
made may have a considerable impact on the usability of the system and moreover can influence the
interpretation of the results. In the following sections we explore several options: varied levels of detail,
one-sided intervals versus two-sided intervals, and two diferent ’look-ahead’ options.</p>
      <sec id="sec-4-1">
        <title>4.1. Level of detail</title>
        <p>Suppose that the observations ¯ and 1 are entered. We find that the probability Pr(|¯, 1) = 0.02
with an approximated variance of 8.11 · 10− 5 and a beta distribution over Pr(|¯, 1) with  = 5.0
and  = 240.2. From this beta distribution in turn the two-sided 95% credible interval (0.007; 0.041)
is computed. Figure 2 shows, with a decreasing level of detail, three diferent ways to report over these
data. The first option just states all outcomes numerically. The other two options are more visual. The
second provides the mean outcome value, gives the graph of the beta distribution, and indicates the
mean and the 95% credible interval with vertical bars on the x-axis. The third option provides the mean
outcome value and visualizes both the mean and the 95% credible interval in blue in the gray bar that
represents the entire 0-1 interval. In all options, we also provide an illustrative threshold value of 0.15
that indicates the posterior outcome probability above which the patient will be considered for a certain
treatment trajectory. In the view of our team, the clarity of the output gains with the decrease of detail.</p>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. One-sided or two-sided intervals</title>
        <p>In providing credible intervals for query responses, as described in Section 2.2, one has a choice in the
upper and lower end of the credible interval. In the previous section a two-sided 95% credible interval
(0.007; 0.041) for Pr(|¯, 1) = 0.02 is given, however, also the one-sided 95% interval (0; 0.037)
could have been reported. In the choice for a treatment strategy, the upper end of the credible interval
is the one that will influence the clinical decision most, since the treatment will be diferent in case
the risk of LNM is below some threshold value; in that case, the risk of LNM might be regarded as
’low enough’. This argument supports the choice of a one-sided interval. However, with a one-sided
interval the level of uncertainty of the outcome is somewhat obscured, while these intervals were added
to provide information on this level of uncertainty in the first place. Such obscuration is illustrated
in Figure 3, which displays the one-sided and the two-sided 95% credible intervals for LNM present
given the observations ¯, ¯, ,  and 3. We find that Pr(|¯, ¯, , , 3) = 0.66, with one-sided interval
(0; 0.852) and two-sided interval (0.388; 0.878). The much larger one-sided interval suggests more
uncertainty which is not correct since the two 95% credible intervals are based on the same data.</p>
      </sec>
      <sec id="sec-4-3">
        <title>4.3. Look-ahead: one step look-ahead or full scenarios</title>
        <p>From the discussions in our team we noted that the intervals reported in the query response might be
confusing. The upper end of the credible interval can be interpreted as the maximal shift in outcome that
can still occur when more diagnostic information becomes available, which is incorrect. As described
in Section 2.2, this upper end marks the end of the interval that contains the probability of LNM with a
given confidence given the current diagnostic information.</p>
        <p>To avoid confusion, we consider two options that provide insight into the shift in outcome that may
occur given more diagnostics: the ’one step look-ahead’ option and the ’full scenarios’ option. In the
one step look-ahead option, for all unobserved diagnostic variables separately, the system consecutively
iflls in all possible outcomes and subsequently computes the probability of LNM given the observations
already available plus this additional information. In the full scenario option, the system does the same,
but then for all unobserved variables combined. All this is illustrated in Figure 4 for an example case
with observations , ¯, and 1 and variables  and  not yet observed. We find that Pr(|¯1) = 0.08,
with a 95%-credible interval of which the upper end is just above the threshold of 0.15. The left side
illustrates the one step look-ahead option, by showing the efect on the probability of LNM of observing
a value for either  or : , ¯,  and ¯ are shown from top to bottom. The right side illustrates the full
scenarios option by showing the efect on LNM of observing all combinations for  and : , ¯¯, ¯
and ¯ are shown from top to bottom.</p>
        <p>We observe that the one step look-ahead option clearly shows that the observation of  gives a
greater shift in the probability of LNM than the observation of  . This information might be used for
subsequent test selection, which is an advantage of this option. Moreover, the number of possibilities
grows linearly with the number of variables not yet observed, while in the full scenarios option this
number grows exponentially. This also is in favor of the one step look-ahead option. The full scenarios
option, on the other hand, shows the worst and the best case, which information may also be valuable.</p>
        <p>Both options give the insight that a physician might need more information in this case since the
upper end can still reach values distinctly above and distinctly below the threshold.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Discussion, conclusions and future research</title>
      <p>In this paper we explored diferent options for presenting query response uncertainty in Bayesian
networks to end users. We considered a varied level of detail, one-sided versus two-sided credible
intervals and two diferent ’look-ahead’ options. In future research we want to elaborate on these
options by a further involvement of more intended users. We want to investigate which level of detail
is preferred in what circumstances, or by which type of user. In addition, we will study how users
actually interpret the credible intervals and whether they indeed prefer two-sided intervals over
onesided intervals. Also, we will further investigate whether the look-ahead options are indeed useful for
test-selection. Note that the full scenarios option might yield to many scenarios to present. In that case
it might be an alternative to present just a subset of these scenarios. Which subset is most useful for
the user of the system is subject for further research. Moreover we note that the full scenario option
is related to the concept of same-decision probability (SDP), which is, as stated by the authors: "the
probability that we would have made the same threshold-based decision, had we known the state of
some hidden variables pertaining to our decision" [11]. The SDP can be viewed as a kind of summary
of the full scenario option, adding up the probabilities of outcomes leading to the same decision. We
will investigate how to incorporate such additional information. Providing such information may also
be an alternative in case the full scenarios option results in too many scenarios. Another interesting
question, regarding all options, is what the actual impact is of the use of intervals in the query response
on the decisions made. From a more technical perspective we would like to investigate whether there
are situations in which the approximation method yields results that deviate too much from the true
values. Moreover we are interested in investigating the best ways to include expert opinions.</p>
    </sec>
    <sec id="sec-6">
      <title>6. Acknowledgements</title>
      <p>This publication is part of the project PersOn with file number P21-03 of the research programme
Perspective which is (partly) financed by the Dutch Research Council (NWO).</p>
      <p>The data used were collected within the European Network of Individual Treatment in Endometrial
Cancer (ENITEC) and were provided by Radboud University Medical Center.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>E. H.</given-names>
            <surname>Shortlife</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B. G.</given-names>
            <surname>Buchanan</surname>
          </string-name>
          ,
          <article-title>A model of inexact reasoning in medicine</article-title>
          ,
          <source>Mathematical Biosciences</source>
          <volume>23</volume>
          (
          <year>1975</year>
          )
          <fpage>351</fpage>
          -
          <lpage>379</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>E. H.</given-names>
            <surname>Shortlife</surname>
          </string-name>
          ,
          <article-title>Medical informatics and clinical decision making: the science and the pragmatics</article-title>
          ,
          <source>Medical Decision Making</source>
          <volume>11</volume>
          (
          <year>1991</year>
          )
          <fpage>S2</fpage>
          -
          <lpage>S14</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>J.</given-names>
            <surname>Pearl</surname>
          </string-name>
          ,
          <article-title>Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference</article-title>
          ., Morgan Kaufmann,
          <year>1988</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>P. J.</given-names>
            <surname>Lucas</surname>
          </string-name>
          , L. C.
          <string-name>
            <surname>van der Gaag</surname>
          </string-name>
          , A.
          <string-name>
            <surname>Abu-Hanna</surname>
          </string-name>
          ,
          <article-title>Bayesian networks in biomedicine and health-</article-title>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>