1. Introduction

Artificial Intelligence in Medicine

10.1371/journal

Bayesian Networks in Medicine: Presenting Query Response Uncertainty for Decision Support

Janneke H. Bolt

0 2

Anna Berghuis

Arjen Hommersom

Marike Lombaers

Johanna M.A. Pijnenborg

Silja Renooij

2 0 Open University of the Netherlands, Faculty of Science , Heerlen , The Netherlands 1 Radboud University Medical Center, Department of Obstetrics and Gynaecology , Nijmegen , The Netherlands 2 Utrecht University, Department of Information and Computing Sciences , Utrecht , The Netherlands

2004

30 2004 201 214

Despite their good characteristics, tools based on Bayesian networks are not yet widely used in medical decision support. Meeting the needs of the intended users is crucial for the acceptance of these tools, and clinical involvement in their development is thus required to promote their use. During the development of a Bayesian network-based tool for the prediction of lymph node metastases in patients with endometrial cancer, as one of the users needs a measure of query response uncertainty was put forward. In this paper, we sustain meeting this need by exploring options for the presentation of query response uncertainty. We consider the level of detail, one-sided versus two-sided intervals and two diferent 'look-ahead' options. The diferent options are illustrated through a small example network.

eol>Bayesian networks Decision support Query response uncertainty Explainable AI

1. Introduction

In healthcare, many decisions with often far-reaching consequences have to be made. It is therefore not surprising that, already from the early development of computer-based tools for decision making, such systems were built for the medical domain (e.g. [ 1, 2 ]). Among the computer-based tools for decision making are the well-known Bayesian networks [ 3 ]. Bayesian networks have great advantages for clinical decision making such as a quite intuitive visualization of the problem domain by means of their graph structure, and their ability to provide predictions for incomplete and/or conflicting patient ifndings [ 4 ]. Despite their suitable characteristics, Bayesian networks are not yet widely used in clinical practice. Kyrimi et al. [5] performed a comprehensive literature review to analyze this low adoption and provided an overview of “benefits, barriers and facilitating factors” for the use of Bayesian networks in healthcare. As one of the barriers, clinicians’ resistance is mentioned, with clinical involvement as one of the facilitating factors. This paper presents research related to employing this facilitating factor. More specifically, in a joint team of AI researchers and clinicians, we investigate the need for explanation types and forms in the context of a Bayesian network tasked with the challenging preoperative prediction of lymph node metastasis (LNM) for patients with endometrial cancer.

Given the diagnosis of endometrial cancer, the preoperative prediction of the presence of LNM has a great impact on the proposed treatment strategy. However, although there are several biomarkers related to LNM, none of them are currently used in daily practice; the tumor grade is used as most important predictor. This clinical problem resulted in an initiative within the European Network for Individualized Treatment of Endometrial Cancer (ENITEC) for the development of a Bayesian network for the prediction of LNM [6]. This network, ENDORISK, includes nine preoperative variables and has as main outcome measures LNM and 5-year disease-specific survival.

One of the needs put forward by the intended clinical users of ENDORISK is that the system provides information on the uncertainty in query responses. In particular, the network can be used to compute the probability of LNM for a specific patient, but there is no indication of the reliability of this probability.

This wish for information on the uncertainty of query responses relates to the concept causability, a notion used within Explainable AI (XAI), which encompasses measurements for the quality of explanations [7]. In this context it is particularly important that a system provides an indication of the uncertainty in its predictions [8]. Causability relates to a user-oriented perspective on explainability of AI models, which is in contrast with the majority of XAI methods that focus on providing technical insights into the inner workings of these models [9].

In this paper we propose and discus diferent options for the presentation and explanation of query response uncertainty, were we base the latter on an existing method for approximating the variance of a Bayesian network query [10].

2. Preliminaries 2.1. Bayesian networks and naive Bayesian networks

A Bayesian network is a model of a joint probability distribution Pr over a set of stochastic variables V, consisting of a directed acyclic graph and a set of conditional probability distributions. In this paper we presume that all ∈ V are discrete. Variables will be denoted by upper-case letters (), and their values by lower-case letters (); sets of variables by bold-face upper-case letters (A) and their instantiations by bold-face lower-case letters (a). Given a binary variable , = 1 is often written as and = 2 is often written as ¯. Each variable ∈ V is represented by a node in the network’s graph1. (Conditional) independence between the variables is captured by the graph’s set of arcs according to the d-separation criterion [ 3 ]. The strength of the probabilistic relationships between the variables is captured by the set of conditional probability distributions Pr( | p()), where p() denotes a combination of values, or instantiation, of the parents of . The probabilities that define these distributions are called the network parameters. The joint probability distribution is then defined by: Pr(V) = ∏︁ Pr( | p())

∈V Using the formula above, a Bayesian network can yield any probability of the modeled problem domain.

A naive Bayesian network is a Bayesian network with a specific graph structure. The graph consists of just a single target node with a set of observable feature child nodes and captures the assumption that the observable features are all mutually independent given the target node. For a naive network with a binary valued target node , and feature nodes F, the probability of Pr(ℎ | e), with e = {1, . . . , } an instantiation of E ⊆ F is computed from the network parameters as follows

∏︀∈{1,...,} Pr( | ℎ) · Pr(ℎ) Pr(ℎ | e) = ∑︀ℎ∈{ℎ,ℎ¯} ∏︀∈{1,...,} Pr( | ℎ ) · Pr(ℎ ) (1)

2.2. Query response uncertainty

The standard query response from a Bayesian network is a single outcome probability, computed from the network’s parameters. Since in practice the parameters of a network are estimates, based on population samples and/or uncertain expert opinions, the query response, even using an exact inference algorithm, will be an estimate as well. Van Allen et al. [10, Theorem 2] provide a formula to approximate the variance of the query responses of a Bayesian network, resulting from the uncertainty in the parameter probabilities. Below we give the formula we used to approximate the variance of 1The terms node and variable will be used interchangeably. the outcome Pr(ℎ | e), of a naive Bayesian network with target variable and observed features e = {1, . . . , }, in which the parameters are estimated from data, based on [10, Theorem 2]: ˜2(Pr(ℎ|e)) =

∑︁ par ∈ PAR (p2ar − par) − (par − par)2 par where • PAR = {Pr(ℎ), Pr(1|ℎ), . . . , Pr(|ℎ), Pr(1|ℎ¯), . . . , Pr(|ℎ¯)}; • par is the partial derivative of Equation (1) with respect to par; • par is the estimate of par based on relative frequency in the data, and • par is the number of cases involved in computing par plus 1.

To represent the higher-order uncertainty in the outcome probability, we define a distribution over Pr(ℎ|e) with variance 2 = ˜2(Pr(ℎ|e)) and mean = Pr(ℎ|e). Following Van Allen et al. [10] we use a beta distribution, which has the advantages of being confined to [ 0, 1 ] and being able to model skewed distributions. The parameters and that control the shape of the beta distribution are related to its variance 2 and mean as follows [10, Equation 4]: = · [︁ · (1 − ) − 2 ]︁

2 = (1 − ) · [︁ · (1 − ) − 2 ]︁ 2

This beta distribution in turn can be used to compute Bayesian credible intervals for Pr(ℎ|e), that is, an interval that contains Pr(ℎ|e) with a given confidence. In this paper we consider the 95% credible interval. In computing such interval, the lower and upper end of the interval have to be chosen such that the area under the curve of the probability density function (PDF) of the distribution within this interval is 0.95. This can be done in more ways than one. In this paper we consider two-sided intervals with 2.5% of the area below its lower end and 2.5% above its upper end, and one-sided intervals with 5% of the area above its upper end. We take a one-sided interval with an upper end, since it is important to know whether the risk of LNM is with a given confidence below some threshold value. Note that for the one-sided interval a lower end of 0 can be used since the area under the curve below 0 will always be 0. Note moreover that, for the same distribution, the upper end of the one-sided 95% interval will be lower than the upper end of the two-sided 95% interval.

3. Example network

The ENDORISK network is a detailed Bayesian network in which nine clinical and molecular biomarkers are integrated to predict lymph node metastasis (LNM) and 5-year disease-specific survival [ 6]. To simplify both computations and illustrations, we simplified this network to the naive Bayesian network depicted in Figure 1, with one target variable, , with the values for LNM is present (confirmed by lymphadenectomy) and ¯ for LNM is absent (confirmed by lymphadenectomy or by the absence of recurrence without therapy), and five diagnostic variables. Further details of the network are given in Figure 1. For the estimation of the parameters of the naive network we used data provided by Radboud University Medical Center, collected within the ENITEC. From the 952 provided cases we only used the 328 cases with complete observations for the variables in the example network. For clarity, in Figure 1 just the mean values of the parameters are given, and not the underlying numbers of cases. For the discretization of the continuous variables we used the same threshold values as were used in the ENDORISK network except for the variable . Our adapted threshold for made this variable more informative than the threshold used in the ENDORISK network. The values of the binary variables are chosen such that the observations , , and are indications that LNM will be present and the observations ¯, ¯, ¯ and ¯ are indications that LNM will be absent. In our illustrations we will indicate for the patient favorable observations with the color green and unfavorable observations with the color red.

We note that the simplified network is merely used to compute and illustrate the diferent options for the presentation of query response uncertainty. The actual output probabilities may be diferent from those computed from the original network; however, here we are not interested in the performance of the network, but in options for presenting uncertainty. For the purpose of illustrating our general approach, the simplified version of the network sufices just as well.

4. Options for presenting query response uncertainty

The regular outcome of a standard query for our example network from Figure 1 is the probability of LNM given the observations for a patient entered into the network. Recall that our physicians expressed the wish for a measure on the confidence in this outcome. To provide for such a measure, we implemented the method to compute the variance on the outcome, finding a suitable beta distribution over the outcome and computing a Bayesian credible interval over this outcome, as described in Section 2. However, given the extra information on query response uncertainty, there are several options regarding which information is presented to the users and also how this information is presented. The choices made may have a considerable impact on the usability of the system and moreover can influence the interpretation of the results. In the following sections we explore several options: varied levels of detail, one-sided intervals versus two-sided intervals, and two diferent ’look-ahead’ options.

4.1. Level of detail

Suppose that the observations ¯ and 1 are entered. We find that the probability Pr(|¯, 1) = 0.02 with an approximated variance of 8.11 · 10− 5 and a beta distribution over Pr(|¯, 1) with = 5.0 and = 240.2. From this beta distribution in turn the two-sided 95% credible interval (0.007; 0.041) is computed. Figure 2 shows, with a decreasing level of detail, three diferent ways to report over these data. The first option just states all outcomes numerically. The other two options are more visual. The second provides the mean outcome value, gives the graph of the beta distribution, and indicates the mean and the 95% credible interval with vertical bars on the x-axis. The third option provides the mean outcome value and visualizes both the mean and the 95% credible interval in blue in the gray bar that represents the entire 0-1 interval. In all options, we also provide an illustrative threshold value of 0.15 that indicates the posterior outcome probability above which the patient will be considered for a certain treatment trajectory. In the view of our team, the clarity of the output gains with the decrease of detail.

4.2. One-sided or two-sided intervals

In providing credible intervals for query responses, as described in Section 2.2, one has a choice in the upper and lower end of the credible interval. In the previous section a two-sided 95% credible interval (0.007; 0.041) for Pr(|¯, 1) = 0.02 is given, however, also the one-sided 95% interval (0; 0.037) could have been reported. In the choice for a treatment strategy, the upper end of the credible interval is the one that will influence the clinical decision most, since the treatment will be diferent in case the risk of LNM is below some threshold value; in that case, the risk of LNM might be regarded as ’low enough’. This argument supports the choice of a one-sided interval. However, with a one-sided interval the level of uncertainty of the outcome is somewhat obscured, while these intervals were added to provide information on this level of uncertainty in the first place. Such obscuration is illustrated in Figure 3, which displays the one-sided and the two-sided 95% credible intervals for LNM present given the observations ¯, ¯, , and 3. We find that Pr(|¯, ¯, , , 3) = 0.66, with one-sided interval (0; 0.852) and two-sided interval (0.388; 0.878). The much larger one-sided interval suggests more uncertainty which is not correct since the two 95% credible intervals are based on the same data.

4.3. Look-ahead: one step look-ahead or full scenarios

From the discussions in our team we noted that the intervals reported in the query response might be confusing. The upper end of the credible interval can be interpreted as the maximal shift in outcome that can still occur when more diagnostic information becomes available, which is incorrect. As described in Section 2.2, this upper end marks the end of the interval that contains the probability of LNM with a given confidence given the current diagnostic information.

To avoid confusion, we consider two options that provide insight into the shift in outcome that may occur given more diagnostics: the ’one step look-ahead’ option and the ’full scenarios’ option. In the one step look-ahead option, for all unobserved diagnostic variables separately, the system consecutively iflls in all possible outcomes and subsequently computes the probability of LNM given the observations already available plus this additional information. In the full scenario option, the system does the same, but then for all unobserved variables combined. All this is illustrated in Figure 4 for an example case with observations , ¯, and 1 and variables and not yet observed. We find that Pr(|¯1) = 0.08, with a 95%-credible interval of which the upper end is just above the threshold of 0.15. The left side illustrates the one step look-ahead option, by showing the efect on the probability of LNM of observing a value for either or : , ¯, and ¯ are shown from top to bottom. The right side illustrates the full scenarios option by showing the efect on LNM of observing all combinations for and : , ¯¯, ¯ and ¯ are shown from top to bottom.

We observe that the one step look-ahead option clearly shows that the observation of gives a greater shift in the probability of LNM than the observation of . This information might be used for subsequent test selection, which is an advantage of this option. Moreover, the number of possibilities grows linearly with the number of variables not yet observed, while in the full scenarios option this number grows exponentially. This also is in favor of the one step look-ahead option. The full scenarios option, on the other hand, shows the worst and the best case, which information may also be valuable.

Both options give the insight that a physician might need more information in this case since the upper end can still reach values distinctly above and distinctly below the threshold.

5. Discussion, conclusions and future research

In this paper we explored diferent options for presenting query response uncertainty in Bayesian networks to end users. We considered a varied level of detail, one-sided versus two-sided credible intervals and two diferent ’look-ahead’ options. In future research we want to elaborate on these options by a further involvement of more intended users. We want to investigate which level of detail is preferred in what circumstances, or by which type of user. In addition, we will study how users actually interpret the credible intervals and whether they indeed prefer two-sided intervals over onesided intervals. Also, we will further investigate whether the look-ahead options are indeed useful for test-selection. Note that the full scenarios option might yield to many scenarios to present. In that case it might be an alternative to present just a subset of these scenarios. Which subset is most useful for the user of the system is subject for further research. Moreover we note that the full scenario option is related to the concept of same-decision probability (SDP), which is, as stated by the authors: "the probability that we would have made the same threshold-based decision, had we known the state of some hidden variables pertaining to our decision" [11]. The SDP can be viewed as a kind of summary of the full scenario option, adding up the probabilities of outcomes leading to the same decision. We will investigate how to incorporate such additional information. Providing such information may also be an alternative in case the full scenarios option results in too many scenarios. Another interesting question, regarding all options, is what the actual impact is of the use of intervals in the query response on the decisions made. From a more technical perspective we would like to investigate whether there are situations in which the approximation method yields results that deviate too much from the true values. Moreover we are interested in investigating the best ways to include expert opinions.

6. Acknowledgements

This publication is part of the project PersOn with file number P21-03 of the research programme Perspective which is (partly) financed by the Dutch Research Council (NWO).

The data used were collected within the European Network of Individual Treatment in Endometrial Cancer (ENITEC) and were provided by Radboud University Medical Center.

[1]

E. H.

Shortlife ,

B. G.

Buchanan , A model of inexact reasoning in medicine , Mathematical Biosciences 23 ( 1975 ) 351 - 379 .

[2]

E. H.

Shortlife , Medical informatics and clinical decision making: the science and the pragmatics , Medical Decision Making 11 ( 1991 ) S2 - S14 .

[3]

Pearl , Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference ., Morgan Kaufmann, 1988 .

[4]

P. J.

Lucas , L. C. van der Gaag , A. Abu-Hanna , Bayesian networks in biomedicine and health-