=Paper= {{Paper |id=Vol-2068/exss10 |storemode=property |title=Interpreting Intelligibility under Uncertain Data Imputation |pdfUrl=https://ceur-ws.org/Vol-2068/exss10.pdf |volume=Vol-2068 |authors=Brian Lim,Danding Wang,Tze Ping Loh,Kee Yuan Ngiam |dblpUrl=https://dblp.org/rec/conf/iui/LimWLN18 }} ==Interpreting Intelligibility under Uncertain Data Imputation== https://ceur-ws.org/Vol-2068/exss10.pdf
 Interpreting Intelligibility under Uncertain Data Imputation
             Brian Y. Lim, Danding Wang                                           Tze Ping Loh, Kee Yuan Ngiam
             National University of Singapore                                       National University Hospital
                       Singapore                                                             Singapore
               brianlim@comp.nus.edu.sg                                    {tze_ping_loh, kee_yuan_ngiam}@nuhs.edu.sg
                 wangdanding@u.nus.edu
ABSTRACT                                                                   his level would be in the healthy normal range and impute
Many methods have been proposed to make machine                            the patient’s reading as the mean of other patients with
learning more interpretable, but these have mainly been                    normal CA levels. On the other hand, if a patient had a
evaluated with simple use cases and well-curated datasets.                 recent high CA level, we may apply carry forward
In contrast, real-world data presents issues that can                      imputation to estimate his current level to be the same.
compromise the proper interpretation of explanations by                    Data imputation raises a potential problem of how to
end users. In this work, we investigate the impact of                      interpret explanations that depend on data features input
missing data and imputation on how users would                             into the model. How would a user trust the importance of a
understand, and use explanation features and propose two                   feature’s value in influencing an inference outcome if the
approaches to provide explanation interfaces for explaining                value was not measured, but estimated from imputation?
feature attribution with uncertainty due to missing data                   We hypothesize that users will have lower trust in cases of
imputation. This work aims to improve the understanding                    high data imputation and that some visualization methods
and trust of intelligible healthcare analytics in clinical end             may help to alleviate this problem.
users to help drive the adoption of AI.
                                                                           In this position paper, we discuss the importance of
Author Keywords                                                            considering how data pre-processing to handle practical
Intelligibility, Explanations,            Imputation,        Interfaces,   issues of real-world data affects the usefulness and
Visualization, User Study.                                                 interpretation of explanations about machine learning
ACM Classification Keywords                                                models. We will focus on the use case of disease risk
H.5.m. Information interfaces and presentation (e.g., HCI):                prediction using the structured data of electronic medical
Miscellaneous                                                              records (EMR). EMRs typically contain a lot of missing
                                                                           data, not necessarily due to errors in data collection, but
INTRODUCTION                                                               because of the wide variety of tests that patients can take
Intelligibility has been proposed as a capability to enable                and that patients only take few necessary tests occasionally
systems to explain their inner state, reasoning mechanisms                 [17]. For example, a non-diabetic person may not need to
and priorities to help users understand and trust them [1,                 measure his HbA1c as frequently as a diabetic, and HbA1c
12]. A recent review has identified that research on                       only needs to be measured once every three months.
explainable systems typically focuses on explanation
generation algorithms on systems with well-curated data or                 Specifically, we seek to answer the following research
based on theory, and explanation interfaces with simple                    questions:
models and small datasets [1]. While some empirical user                   RQ1. What information will clinicians need to interpret
studies have shown explanations to be effective in well-                   how a clinical decision support system with disease risk
behaved, albeit simple and synthetic use cases (e.g., [3,                  prediction makes its decision and how will this change
12]), real data and systems face issues and challenges to                  given their awareness that some data was imputed?
make data processing and data mining messy. In particular,
datasets often have missing data and imputation is typically               RQ2. How can a suitable explanation be generated and
used to estimate the true value of the missing data.                       presented to clinicians to alleviate the loss of trust in
                                                                           explanations due to imputation?
Several methods can be used to impute data, such as
substituting with zeros, substituting with mean values of the              RQ3. How will the imputation-aware explanation model
missing variable, carrying forward (or backward) a nearby                  and interface be interpreted by clinicians and how will it
observed value, or model-based imputation (e.g., with                      affect their decision making?
hidden Markov [17]). For example, if a patient has never                   APPROACHES: TWO EXPLANATION INTERFACES FOR
been tested for blood calcium (CA), we may assume that                     IMPUTED DATA
                                                                           While there are several techniques to generate explanations,
                                                                           such as explanations by identifying similar instances [8] or
Β© 2018. Copyright for the individual papers remains with the authors.      by rule associations [11], we will focus on explanations by
Copying permitted for private and academic purposes.                       additive feature attribution or influence scores (e.g., LIME
ExSS '18, March 11, Tokyo, Japan.
                                                                       overfitting, but we will leverage regularization to penalize
                                                                       features with higher uncertainty. Features with high
                                                                       uncertainty will have reduced influence scores or be hidden.
                                                                       Therefore, the explanation will show adjusted influence
                                                                       scores where some influences are reduced (e.g., horizontal
                                                                       bars shiftwed towards zero), or some features are not shown
                                                                       (influence bars hidden).
                                                                       For simplicity, we leverage LIME [15] to generate
                                                                       explanations and use the simple linear regression with
                                                                       regularization as the locally approximate explainer model.
                                                                       Training this explainer model involves minimizing the
                                                                       following loss function (simplified for brevity):
  Figure 1. Mockup of feature attribution explanations with
 uncertainty visualizations. Each horizontal bar chart represents                     ΞΎ(π‘₯) = π‘Žπ‘Ÿπ‘” min β„’(𝑓, 𝑔) + Ξ©(𝑔)               (1)
                                                                                                     𝑔
the influence score due to the feature of the row. The vertical line
   in the bar indicates the influence score calculated by current      where, as defined in [15], β„’(𝑓, 𝑔) is the local fidelity of the
  methods (e.g., LIME [15]), and the shaded region indicates the       explainer model, 𝑔, with respect to the model to be
uncertainty calculated by error propagation due to missing values.     explained, 𝑓 and π‘₯ is the data instance being explained.
                                                                       Ξ©(𝑔) is the measure of complexity (converse of
[15], QII [4], GA2M [3]). This explanation style has been
                                                                       interpretability). We use Lasso regression as is common for
popular for generating explanations for healthcare analytics
                                                                       simple linear regression with sparsity regularization, so
(e.g., Bussone et al. [3], GA2M [3], Prospector [10]).
                                                                       Ξ©(𝑔) = πœ†1 ‖𝛽‖1 . We extend this term to include a penalty
We propose two approaches to improve user trust in                     for the uncertainty due to imputation, such that
explanations given the increased uncertainty of imputations
                                                                                        Ξ©(𝑔) = πœ†1 ‖𝛽‖1 + πœ†2 ‖𝛽‖2𝑬                 (2)
– based on expressing the uncertainty or hiding uncertain
and, hence, confusing information.                                     where 𝛽 is the explainer model parameters (coefficients in
Visualizing Uncertainty Distribution                  of   Feature     the sparse linear model in our case), πœ†1 and πœ†2 are
Attribution Scores due to Imputation                                   hyperparameters to tune the complexity of the explanation,
Visualizing uncertainty is a well-studied approach in HCI              and 𝑬 is a diagonal matrix where the 𝑗th element equals to
                                                                                          2
and information visualization to communicate errors and                the uncertainty πœ€0𝑗  , and ‖𝛽‖𝑬 = (𝛽 𝑇 𝑬𝛽)1/2 . Here both
uncertainty to end users [6, 7, 8]. This has been shown to             sparsity and uncertainty are penalized to increase
improve user trust and decision making, but may also lead              interpretability. By tuning the two hyperparameter πœ†1 and
to information overload or compromise trust [8, 13]. We                πœ†2 , we could change the complexity of the explanation with
will extend the typical presentation of explanations where             respect to number of features shown and how much to hide
each feature, π‘₯𝑖 , has an influence score, 𝑓(π‘₯𝑖 ) = 𝑓𝑖 . With          or de-emphasize uncertain features.
uncertainty due to imputation, the influence score will
                                                                       FUTURE USER EXPERIMENTS:                   DISEASE       RISK
become 𝑓(π‘₯𝑖 + βˆ†π‘₯𝑖 ) = 𝑓𝑖 + βˆ†π‘“π‘– , where βˆ†π‘₯𝑖 is the error                PREDICTION USE CASE
distribution of feature π‘₯𝑖 and βˆ†π‘“π‘– is the propagated                   We will investigate the impact of missing data on user trust
(calculated) distribution in influence score due to the error.         in the explanations with an application use case in
The distribution can be calculated by assuming a Gaussian              predictive healthcare analytics on electronic medical
distribution or performing a Monte Carlo simulation on                 records (EMR). We will specifically focus on diagnosing
propagated scores based on estimated error. Drawing from               hyperparathyroidism and recruit clinicians as the target
various taxonomies evaluated for usability [14], we will               user. We aim to improve their understanding, trust and
present the uncertainty in explanations as a distribution of           decision making when using intelligible disease risk
influence scores in the form of violin plots (see Figure 1).           prediction. We will conduct two user studies:
We choose violin plots for their ability to express more
detail in a probability distribution than box plots, while also        Formative user study: to understand the usability
being compact.                                                         breakdowns in interpreting explanations given the
                                                                       awareness that some data features are based on data
De-emphasizing        imputed      features     via    Uncertainty     imputations, and user requirements for intelligibility. We
Regularization
                                                                       will present users with several inference instances (i)
We exploit the tendency for clinicians to suppress or ignore
                                                                       without explanations, (ii) with explanations, and (iii) with
uncertain data [16]. Therefore, this approach seeks to hide
                                                                       missing data indicated. To understand how users interpret
features that have high uncertainty due to imputation.
                                                                       the explanation information and make their decisions, we
Feature Regularization is commonly used to simplify and                will have them think aloud as they examine several use
generalize models in machine learning and to reduce                    cases and conduct structured interviews. While we already
have hypothesized two approaches to generating                  5.   Datta, A., Sen, S., & Zick, Y. (2016, May).
uncertainty-aware explanations, with this initial study, we          Algorithmic transparency via quantitative input
aim to learn more explanation approaches which users may             influence: Theory and experiments with learning
want to better characterize the uncertainty due to imputation        systems. In Security and Privacy (SP), 2016 IEEE
and what could be shown to regain their trust.                       Symposium on (pp. 598-617). IEEE.
Evaluative user study: we will implement our two                6.   Jung, M. F., Sirkin, D., GΓΌr, T. M., & Steinert, M.
explanation interfaces into diagnostic dashboard prototypes          (2015, April). Displayed uncertainty improves driving
and perform a comparative evaluation with baselines of no            experience and behavior: The case of range anxiety in
explanation and with basic feature attribution explanations          an electric car. In Proceedings of the 33rd Annual
(e.g., LIME [15]). We note that the amount of uncertainty            ACM Conference on Human Factors in Computing
can confound the user’s level of trust in the system [13].           Systems (pp. 2201-2210). ACM.
Therefore, we will control both the system confidence level     7.   Kay, M., Morris, D., & Kientz, J. A. (2013,
and amount of imputation in patient cases used in the                September). There's no such thing as gaining a pound:
experiment scenarios. These will be varied as a secondary            Reconsidering the bathroom scale user interface.
independent variable. We will measure the accuracy of user           In Proceedings of the 2013 ACM international joint
diagnosis (correct/wrong with respect to labels from                 conference on Pervasive and ubiquitous
hospital discharge reports), speed of decision (from first           computing (pp. 401-410). ACM.
viewing patient data to final decision), confidence in
                                                                8.   Kay, M., Kola, T., Hullman, J. R., & Munson, S. A.
diagnosis (7-point Likert scale), trust in the system
                                                                     (2016, May). When (ish) is my bus?: User-centered
prediction (7-point Likert scale), and understanding of the
                                                                     visualizations of uncertainty in everyday, mobile
patient case (coded from transcribed interviews and think
                                                                     predictive systems. In Proceedings of the 2016 CHI
aloud (e.g., see [12, 13]).
                                                                     Conference on Human Factors in Computing
CONCLUSION                                                           Systems (pp. 5092-5103). ACM.
In this position paper, we have discussed the importance of
                                                                9.   Koh, P. W., & Liang, P. (2017). Understanding black-
considering how data pre-processing, specifically data
                                                                     box predictions via influence functions. arXiv preprint
imputation, may compromise the interpretation and trust of
                                                                     arXiv:1703.04730.
explainable AI. We briefly presented two approaches to
address the resultant uncertainty by either visualizing the     10. Krause, J., Perer, A., & Ng, K. (2016, May).
uncertainty or by hiding it. We propose two experiments to          Interacting with Predictions: Visual Inspection of
understand the impact of missing data on the requirements           Black-box Machine Learning Models. In Proceedings
for explainable AI and to evaluate the efficacy of the              of the 2016 CHI Conference on Human Factors in
proposed solutions.                                                 Computing Systems (pp. 5686-5697). ACM.
REFERENCES                                                      11. Letham, B., Rudin, C., McCormick, T. H., & Madigan,
1.   Abdul, A., Vermeulen, J., Wang, D., Lim, B. Y.,                D. (2015). Interpretable classifiers using rules and
     Kankanhalli, M. 2018. Trends and Trajectories for              Bayesian analysis: Building a better stroke prediction
     Explainable, Accountable and Intelligible Systems: An          model. The Annals of Applied Statistics, 9(3), 1350-
     HCI Research Agenda. In Proceedings of the SIGCHI              1371.
     Conference on Human Factors in Computing Systems.          12. Lim, B. Y., Dey, A. K., & Avrahami, D. (2009, April).
     CHI '18.                                                       Why and why not explanations improve the
2.   Bellotti, V., & Edwards, K. (2001). Intelligibility and        intelligibility of context-aware intelligent systems.
     accountability: human considerations in context-aware          In Proceedings of the SIGCHI Conference on Human
     systems. Human–Computer Interaction, 16(2-4), 193-             Factors in Computing Systems (pp. 2119-2128). ACM.
     212.                                                       13. Lim, B. Y., & Dey, A. K. (2011, September).
3.   Bussone, A., Stumpf, S., & O'Sullivan, D. (2015,               Investigating intelligibility for uncertain context-aware
     October). The role of explanations on trust and reliance       applications. In Proceedings of the 13th international
     in clinical decision support systems. In Healthcare            conference on Ubiquitous computing (pp. 415-424).
     Informatics (ICHI), 2015 International Conference              ACM.
     on (pp. 160-169). IEEE.                                    14. Pang, A. T., Wittenbrink, C. M., & Lodha, S. K.
4.   Caruana, R., Lou, Y., Gehrke, J., Koch, P., Sturm, M.,         (1997). Approaches to uncertainty visualization. The
     & Elhadad, N. (2015, August). Intelligible models for          Visual Computer, 13(8), 370-390.
     healthcare: Predicting pneumonia risk and hospital 30-     15. Ribeiro, M. T., Singh, S., & Guestrin, C. (2016,
     day readmission. In Proceedings of the 21th ACM                August). Why should i trust you?: Explaining the
     SIGKDD International Conference on Knowledge                   predictions of any classifier. In Proceedings of the
     Discovery and Data Mining (pp. 1721-1730). ACM.                22nd ACM SIGKDD International Conference on
    Knowledge Discovery and Data Mining (pp. 1135-
    1144). ACM.
16. Simpkin, A. L., & Schwartzstein, R. M. (2016).
    Tolerating uncertaintyβ€”the next medical
    revolution?. New England Journal of
    Medicine, 375(18), 1713-1715.
17. Zheng, K., Gao, J., Ngiam, K. Y., Ooi, B. C., & Yip,
    W. L. J. (2017, August). Resolving the bias in
    electronic medical records. In Proceedings of the 23rd
    ACM SIGKDD International Conference on
    Knowledge Discovery and Data Mining (pp. 2171-
    2180). ACM.