=Paper=
{{Paper
|id=Vol-2068/exss10
|storemode=property
|title=Interpreting Intelligibility under Uncertain Data Imputation
|pdfUrl=https://ceur-ws.org/Vol-2068/exss10.pdf
|volume=Vol-2068
|authors=Brian Lim,Danding Wang,Tze Ping Loh,Kee Yuan Ngiam
|dblpUrl=https://dblp.org/rec/conf/iui/LimWLN18
}}
==Interpreting Intelligibility under Uncertain Data Imputation==
Interpreting Intelligibility under Uncertain Data Imputation Brian Y. Lim, Danding Wang Tze Ping Loh, Kee Yuan Ngiam National University of Singapore National University Hospital Singapore Singapore brianlim@comp.nus.edu.sg {tze_ping_loh, kee_yuan_ngiam}@nuhs.edu.sg wangdanding@u.nus.edu ABSTRACT his level would be in the healthy normal range and impute Many methods have been proposed to make machine the patientβs reading as the mean of other patients with learning more interpretable, but these have mainly been normal CA levels. On the other hand, if a patient had a evaluated with simple use cases and well-curated datasets. recent high CA level, we may apply carry forward In contrast, real-world data presents issues that can imputation to estimate his current level to be the same. compromise the proper interpretation of explanations by Data imputation raises a potential problem of how to end users. In this work, we investigate the impact of interpret explanations that depend on data features input missing data and imputation on how users would into the model. How would a user trust the importance of a understand, and use explanation features and propose two featureβs value in influencing an inference outcome if the approaches to provide explanation interfaces for explaining value was not measured, but estimated from imputation? feature attribution with uncertainty due to missing data We hypothesize that users will have lower trust in cases of imputation. This work aims to improve the understanding high data imputation and that some visualization methods and trust of intelligible healthcare analytics in clinical end may help to alleviate this problem. users to help drive the adoption of AI. In this position paper, we discuss the importance of Author Keywords considering how data pre-processing to handle practical Intelligibility, Explanations, Imputation, Interfaces, issues of real-world data affects the usefulness and Visualization, User Study. interpretation of explanations about machine learning ACM Classification Keywords models. We will focus on the use case of disease risk H.5.m. Information interfaces and presentation (e.g., HCI): prediction using the structured data of electronic medical Miscellaneous records (EMR). EMRs typically contain a lot of missing data, not necessarily due to errors in data collection, but INTRODUCTION because of the wide variety of tests that patients can take Intelligibility has been proposed as a capability to enable and that patients only take few necessary tests occasionally systems to explain their inner state, reasoning mechanisms [17]. For example, a non-diabetic person may not need to and priorities to help users understand and trust them [1, measure his HbA1c as frequently as a diabetic, and HbA1c 12]. A recent review has identified that research on only needs to be measured once every three months. explainable systems typically focuses on explanation generation algorithms on systems with well-curated data or Specifically, we seek to answer the following research based on theory, and explanation interfaces with simple questions: models and small datasets [1]. While some empirical user RQ1. What information will clinicians need to interpret studies have shown explanations to be effective in well- how a clinical decision support system with disease risk behaved, albeit simple and synthetic use cases (e.g., [3, prediction makes its decision and how will this change 12]), real data and systems face issues and challenges to given their awareness that some data was imputed? make data processing and data mining messy. In particular, datasets often have missing data and imputation is typically RQ2. How can a suitable explanation be generated and used to estimate the true value of the missing data. presented to clinicians to alleviate the loss of trust in explanations due to imputation? Several methods can be used to impute data, such as substituting with zeros, substituting with mean values of the RQ3. How will the imputation-aware explanation model missing variable, carrying forward (or backward) a nearby and interface be interpreted by clinicians and how will it observed value, or model-based imputation (e.g., with affect their decision making? hidden Markov [17]). For example, if a patient has never APPROACHES: TWO EXPLANATION INTERFACES FOR been tested for blood calcium (CA), we may assume that IMPUTED DATA While there are several techniques to generate explanations, such as explanations by identifying similar instances [8] or Β© 2018. Copyright for the individual papers remains with the authors. by rule associations [11], we will focus on explanations by Copying permitted for private and academic purposes. additive feature attribution or influence scores (e.g., LIME ExSS '18, March 11, Tokyo, Japan. overfitting, but we will leverage regularization to penalize features with higher uncertainty. Features with high uncertainty will have reduced influence scores or be hidden. Therefore, the explanation will show adjusted influence scores where some influences are reduced (e.g., horizontal bars shiftwed towards zero), or some features are not shown (influence bars hidden). For simplicity, we leverage LIME [15] to generate explanations and use the simple linear regression with regularization as the locally approximate explainer model. Training this explainer model involves minimizing the following loss function (simplified for brevity): Figure 1. Mockup of feature attribution explanations with uncertainty visualizations. Each horizontal bar chart represents ΞΎ(π₯) = πππ min β(π, π) + Ξ©(π) (1) π the influence score due to the feature of the row. The vertical line in the bar indicates the influence score calculated by current where, as defined in [15], β(π, π) is the local fidelity of the methods (e.g., LIME [15]), and the shaded region indicates the explainer model, π, with respect to the model to be uncertainty calculated by error propagation due to missing values. explained, π and π₯ is the data instance being explained. Ξ©(π) is the measure of complexity (converse of [15], QII [4], GA2M [3]). This explanation style has been interpretability). We use Lasso regression as is common for popular for generating explanations for healthcare analytics simple linear regression with sparsity regularization, so (e.g., Bussone et al. [3], GA2M [3], Prospector [10]). Ξ©(π) = π1 βπ½β1 . We extend this term to include a penalty We propose two approaches to improve user trust in for the uncertainty due to imputation, such that explanations given the increased uncertainty of imputations Ξ©(π) = π1 βπ½β1 + π2 βπ½β2π¬ (2) β based on expressing the uncertainty or hiding uncertain and, hence, confusing information. where π½ is the explainer model parameters (coefficients in Visualizing Uncertainty Distribution of Feature the sparse linear model in our case), π1 and π2 are Attribution Scores due to Imputation hyperparameters to tune the complexity of the explanation, Visualizing uncertainty is a well-studied approach in HCI and π¬ is a diagonal matrix where the πth element equals to 2 and information visualization to communicate errors and the uncertainty π0π , and βπ½βπ¬ = (π½ π π¬π½)1/2 . Here both uncertainty to end users [6, 7, 8]. This has been shown to sparsity and uncertainty are penalized to increase improve user trust and decision making, but may also lead interpretability. By tuning the two hyperparameter π1 and to information overload or compromise trust [8, 13]. We π2 , we could change the complexity of the explanation with will extend the typical presentation of explanations where respect to number of features shown and how much to hide each feature, π₯π , has an influence score, π(π₯π ) = ππ . With or de-emphasize uncertain features. uncertainty due to imputation, the influence score will FUTURE USER EXPERIMENTS: DISEASE RISK become π(π₯π + βπ₯π ) = ππ + βππ , where βπ₯π is the error PREDICTION USE CASE distribution of feature π₯π and βππ is the propagated We will investigate the impact of missing data on user trust (calculated) distribution in influence score due to the error. in the explanations with an application use case in The distribution can be calculated by assuming a Gaussian predictive healthcare analytics on electronic medical distribution or performing a Monte Carlo simulation on records (EMR). We will specifically focus on diagnosing propagated scores based on estimated error. Drawing from hyperparathyroidism and recruit clinicians as the target various taxonomies evaluated for usability [14], we will user. We aim to improve their understanding, trust and present the uncertainty in explanations as a distribution of decision making when using intelligible disease risk influence scores in the form of violin plots (see Figure 1). prediction. We will conduct two user studies: We choose violin plots for their ability to express more detail in a probability distribution than box plots, while also Formative user study: to understand the usability being compact. breakdowns in interpreting explanations given the awareness that some data features are based on data De-emphasizing imputed features via Uncertainty imputations, and user requirements for intelligibility. We Regularization will present users with several inference instances (i) We exploit the tendency for clinicians to suppress or ignore without explanations, (ii) with explanations, and (iii) with uncertain data [16]. Therefore, this approach seeks to hide missing data indicated. To understand how users interpret features that have high uncertainty due to imputation. the explanation information and make their decisions, we Feature Regularization is commonly used to simplify and will have them think aloud as they examine several use generalize models in machine learning and to reduce cases and conduct structured interviews. While we already have hypothesized two approaches to generating 5. Datta, A., Sen, S., & Zick, Y. (2016, May). uncertainty-aware explanations, with this initial study, we Algorithmic transparency via quantitative input aim to learn more explanation approaches which users may influence: Theory and experiments with learning want to better characterize the uncertainty due to imputation systems. In Security and Privacy (SP), 2016 IEEE and what could be shown to regain their trust. Symposium on (pp. 598-617). IEEE. Evaluative user study: we will implement our two 6. Jung, M. F., Sirkin, D., GΓΌr, T. M., & Steinert, M. explanation interfaces into diagnostic dashboard prototypes (2015, April). Displayed uncertainty improves driving and perform a comparative evaluation with baselines of no experience and behavior: The case of range anxiety in explanation and with basic feature attribution explanations an electric car. In Proceedings of the 33rd Annual (e.g., LIME [15]). We note that the amount of uncertainty ACM Conference on Human Factors in Computing can confound the userβs level of trust in the system [13]. Systems (pp. 2201-2210). ACM. Therefore, we will control both the system confidence level 7. Kay, M., Morris, D., & Kientz, J. A. (2013, and amount of imputation in patient cases used in the September). There's no such thing as gaining a pound: experiment scenarios. These will be varied as a secondary Reconsidering the bathroom scale user interface. independent variable. We will measure the accuracy of user In Proceedings of the 2013 ACM international joint diagnosis (correct/wrong with respect to labels from conference on Pervasive and ubiquitous hospital discharge reports), speed of decision (from first computing (pp. 401-410). ACM. viewing patient data to final decision), confidence in 8. Kay, M., Kola, T., Hullman, J. R., & Munson, S. A. diagnosis (7-point Likert scale), trust in the system (2016, May). When (ish) is my bus?: User-centered prediction (7-point Likert scale), and understanding of the visualizations of uncertainty in everyday, mobile patient case (coded from transcribed interviews and think predictive systems. In Proceedings of the 2016 CHI aloud (e.g., see [12, 13]). Conference on Human Factors in Computing CONCLUSION Systems (pp. 5092-5103). ACM. In this position paper, we have discussed the importance of 9. Koh, P. W., & Liang, P. (2017). Understanding black- considering how data pre-processing, specifically data box predictions via influence functions. arXiv preprint imputation, may compromise the interpretation and trust of arXiv:1703.04730. explainable AI. We briefly presented two approaches to address the resultant uncertainty by either visualizing the 10. Krause, J., Perer, A., & Ng, K. (2016, May). uncertainty or by hiding it. We propose two experiments to Interacting with Predictions: Visual Inspection of understand the impact of missing data on the requirements Black-box Machine Learning Models. In Proceedings for explainable AI and to evaluate the efficacy of the of the 2016 CHI Conference on Human Factors in proposed solutions. Computing Systems (pp. 5686-5697). ACM. REFERENCES 11. Letham, B., Rudin, C., McCormick, T. H., & Madigan, 1. Abdul, A., Vermeulen, J., Wang, D., Lim, B. Y., D. (2015). Interpretable classifiers using rules and Kankanhalli, M. 2018. Trends and Trajectories for Bayesian analysis: Building a better stroke prediction Explainable, Accountable and Intelligible Systems: An model. The Annals of Applied Statistics, 9(3), 1350- HCI Research Agenda. In Proceedings of the SIGCHI 1371. Conference on Human Factors in Computing Systems. 12. Lim, B. Y., Dey, A. K., & Avrahami, D. (2009, April). CHI '18. Why and why not explanations improve the 2. Bellotti, V., & Edwards, K. (2001). Intelligibility and intelligibility of context-aware intelligent systems. accountability: human considerations in context-aware In Proceedings of the SIGCHI Conference on Human systems. HumanβComputer Interaction, 16(2-4), 193- Factors in Computing Systems (pp. 2119-2128). ACM. 212. 13. Lim, B. Y., & Dey, A. K. (2011, September). 3. Bussone, A., Stumpf, S., & O'Sullivan, D. (2015, Investigating intelligibility for uncertain context-aware October). The role of explanations on trust and reliance applications. In Proceedings of the 13th international in clinical decision support systems. In Healthcare conference on Ubiquitous computing (pp. 415-424). Informatics (ICHI), 2015 International Conference ACM. on (pp. 160-169). IEEE. 14. Pang, A. T., Wittenbrink, C. M., & Lodha, S. K. 4. Caruana, R., Lou, Y., Gehrke, J., Koch, P., Sturm, M., (1997). Approaches to uncertainty visualization. The & Elhadad, N. (2015, August). Intelligible models for Visual Computer, 13(8), 370-390. healthcare: Predicting pneumonia risk and hospital 30- 15. Ribeiro, M. T., Singh, S., & Guestrin, C. (2016, day readmission. In Proceedings of the 21th ACM August). Why should i trust you?: Explaining the SIGKDD International Conference on Knowledge predictions of any classifier. In Proceedings of the Discovery and Data Mining (pp. 1721-1730). ACM. 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 1135- 1144). ACM. 16. Simpkin, A. L., & Schwartzstein, R. M. (2016). Tolerating uncertaintyβthe next medical revolution?. New England Journal of Medicine, 375(18), 1713-1715. 17. Zheng, K., Gao, J., Ngiam, K. Y., Ooi, B. C., & Yip, W. L. J. (2017, August). Resolving the bias in electronic medical records. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 2171- 2180). ACM.