Multimodal Explanations for User-centric Medical Decision Support Systems Bettina Finzel1 , David Elias Tafler1 , Anna Magdalena Thaler1 , Ute Schmid1 1 Cognitive Systems, University of Bamberg An der Weberei 5 96047 Bamberg, Germany Abstract also understandable to human decision makers. Such sys- tems can empower the user to validate the system and to re- Based on empirical evidence indicating that different types main in control of decisions, which is a crucial requirement of explanations should be used to satisfy different users in in medicine (Tizhoosh and Pantanowitz 2018). terms of their information needs and to increase trust in the system, we motivate the use of multimodal explanations for Recent research uses inductive logic programming (ILP) decisions made by a machine learning model to support medi- to train models that can be explained to the user in a com- cal diagnoses. We present a system through which medical prehensible way and which are capable to deal with complex professionals or students can obtain verbal explanations for a relational data. In contrast to visual explanations that can classification by means of a dialogue and to which they can transfer only the information of occurrence or absence, a re- make queries to get prototypical examples in the form of im- lational approach such as ILP can express arbitrarily com- ages showing typical health conditions. Our approach could plex relationships, for example spatial and temporal rela- be used for validating algorithmic decisions using a human- tions as well as recursion (Schmid 2018). In addition to the in-the-loop method or for medical education. training data, ILP can be enriched by existing background knowledge that is by expert knowledge, be it for training or Introduction correction of learned models (Schmid and Finzel 2020). In the past, there have been prominent transparent systems that In medical diagnostics, Deep Learning is increasingly used made decisions based on relational data, such as MYCIN to classify patient data. Due to the condition that such sys- (Shortliffe 2012). However, the focus there was more on tems must be transparent, great progress has been made in building expert systems and less on satisfying different users the field of explainable artificial intelligence (XAI) in re- through multimodal explanations. cent years. Work on visual explanations for example for the To close this gap, we build on the findings from empirical classification of human tissue (Hägele et al. 2020), malaria research on multimodal explanations as well as on a recent probes (Schallner et al. 2019) and facial expressions for pa- research work that combines two different explanation ap- tients suffering from pain (Rieger et al. 2020) shows that proaches, namely verbal, dialogue-based as well as visual the methods developed can be used to reveal which fea- image-based explanations. We use our approach for the first tures a deep neural network has found relevant for classi- time to classify and explain medical data. fication. However, relational information in terms of com- plex relationships between features of the data was not used Related Work to make the classification decision. As recent works have emphasized, medical diagnosis is often based on examin- Explainable Artificial Intelligence (XAI) aims to make AI ing relational data, thus, a model that is able to incorporate systems, their decisions and actions comprehensible. Given these and to explain its decision in a relational manner, is the high stakes involved in medical diagnosis and human key (Bruckert, Finzel, and Schmid 2020; Schmid and Finzel health in general, it is obvious that AI systems applied in 2020; Holzinger et al. 2021). Moreover, the focus of these this domain need to be understood by the user. To this end, works has not been to present different explanations, that is, an AI system may produce explanations of various types in varying modalities, in order to take different angles in ex- and formats. Building on previous work on combining ver- plaining a classification, to satisfy different users in terms bal and visual explanations in artificial intelligence (Finzel of their need for information. However, being able to make et al. 2021), we here apply these concepts to the medical decisions based on complex relationships and being able to domain. We explore the potential use of different kinds of explain them in as many ways as possible are two important explanations given by a diagnostic decision support system aspects of building systems that are not only transparent but for assessing primary tumors in tissue samples. Specifically, we consider dialogue- and image-based explanation, allow- Copyright c 2021 for this paper by its authors. Use permitted un- ing for step-wise exploration of the reasons behind a classi- der Creative Commons License Attribution 4.0 International (CC fication outcome as well as displaying images on demand BY 4.0). that show prototypical examples of health conditions. The fact that there exists more than one type of explana- tion suggests that not every explanation fits every situation. For instance, to explain the classification of an object as be- longing to one of two categories that only differ by color, a visual explanation might be preferred over a verbal ex- planation based on the nature of the problem. In contrast to this, visual classification tasks that rely on relational infor- mation rather than a simple presence or absence of features might require additional verbal explanations to improve the joint performance, trust in the decision aid system and to correctly counteract faulty system predictions (Thaler and Schmid 2021). We further want to point to the importance of the person requesting the explanation. The idea of tailoring explana- tions to and evaluating them based on the goals of the ex- Figure 1: An example of a colon tissue sample under the plainee is not new in cognitive science (Leake 1991) and has microscope containing different stages (T1-T3) of tumors in been supported by more recent empirical evidence. For ex- accordance to the widely used TMN staging system (Wit- ample (Vasilyeva, Wilkenfeld, and Lombrozo 2015) showed tekind, Bootz, and Meyer 2004) with different colon tissues that people prefer explanations (formal, mechanistic, tele- involved: mucosa (M), submucosa (SM), muscularis propria ological) that are consistent with their goals. Also in the (MP) and pericolic adipose tissue (P). The image was taken field of XAI, the users’ inter- and intraindividual differences from (Pierangelo et al. 2013) for illustration of our use case. have been recognized as important to the development and improvement of XAI (e.g. (Gunning and Aha 2019; Miller 2019; Kulesza et al. 2015)). While staging refers to determining the extent of a tumor In medicine, the potential applications of AI are many- (location, size and spreading in different layers of tissue), fold, spanning diagnostics, therapeutics, population health grading examines the abnormality of the appearance of tu- management, administration, and regulation (He et al. mor cells and tumor tissue. In this paper, we focus on the 2019). In order to make such systems transparent, their ex- task of staging for primary tumors that is determining inva- planations need to consider certain user characteristics, such sion depth of an original tumor in human colon tissue lay- as their expertise and goals. Especially in applications with ers. We therefore look at spatial relationships between a tu- potentially extreme consequences, such as decision support mor and its surrounding tissue layers. The most widely used for diagnosing illnesses, the diagnostician needs to under- system for tumor staging is the TMN staging system (Wit- stand the system’s recommendations to make well-informed tekind, Bootz, and Meyer 2004). This system is used to de- decisions. Along these lines, (Holzinger et al. 2019) have note the stage of a tumor in pathology reports. The letters differentiated between explainability, a more technical at- T, M and N are combined with further letters or numbers to tribute of an algorithm, and causability, a feature of expla- indicate the exact stage. We focus on the T category, which nations that describes how well an explanation can transfer is concerned with the size and the extent of the main tumor, causal understanding to a human user. In order to increase also called the primary tumor. If a primary tumor is found causablilty of medical decision support systems, we com- in the colon tissue, it is assigned with one of five possible bine different kinds of explanations. The user can request stages: Tis, T1, T2, T3 or T4. The higher the number after various explanations via a conversational interaction with the T, the bigger the extent of the tumor. Tis stands for car- the system and thus control the transmission of understand- cinoma in situ and denotes a tumor that hasn’t yet extended ing that is explanation. to the next tissue layer. The stages can be further differenti- ated depending on the kind of tissue affected by the tumor Multimodal Explanations for Medical Decision (e.g. T4a, T4b). Note that there are further assignments, e.g. Making TX for tumors that cannot be assessed or T0 if there is no evidence for a tumor. We disregard these cases. In this section we show how our multimodal explanation ap- Figure 1 shows a colon tissue sample, where healthy tis- proach can be applied to the medical use case of primary tu- sue and three of four stages are present (corresponding to mor staging. We first introduce the medical terminology and the 4 zones separated by dotted lines). The leftmost zone concepts for tumor staging and present examples for verbal, contains healthy tissue that can be divided into mucosa (M), dialogue-based explanations as well as visual, prototype- submucosa (SM), muscularis propria (MP) and pericolic based explanations accordingly. adipose tissue (P). Zone 2 includes a T1 tumor (invading the mucosa and the submucosa), zone 3 a T2 tumor (extending Primary Tumor Classification in Colon Tissue to the muscularis propria) and zone 4 contains a T3 tumor Samples (that is growing past the boundaries of the muscularis pro- The task of classifying tumors requires different competen- pria into the preicolic adipose tissue). The letters C and S cies and diagnostic steps. The main tasks involved are tumor included in Figure 1 denote tumor cells and tumor stroma, staging and grading (Wittekind, Bootz, and Meyer 2004). H, B and U denote further diagnostic areas, however, this Figure 2: An explanatory tree for stage t2(scan 0708), that can be queried by the user to get a local explanation why scan 0708 is labeled as T2 (steps A and B). A dialogue is realized by further requests, either to get more visual explanations in terms of prototypes (step C) or to get more verbal explanations in a drill-down manner (step D). is not important for the work presented here and therefore et al. 2021). not explained further. To increase the readability of the fol- An exemplary rule from an ILP model that was trained lowing paragraphs for the reader who may not be familiar to recognize tissue samples of stage T2, states that a scan with medical terminology, we will name the different tis- A is classified as stage T2, if it holds that A contains B and sue types in the following subsections mucosa, submucosa, B is a tumor and B invades C and C is muscle tissue. This muscle and fat tissue. rule represented in the logic programming language Prolog would state: “stage t2(A) :- contains(A,B), is a(B,tumor), in- Dialogue- and Prototype-based Explanations vades(B,C), is a(C,muscle).”. The upper case letters A, B, C Likewise to the example presented in previous work (Finzel are variables that can be substituted by lower case constants et al. 2021), we can translate the given expert knowledge in- by applying the rule to the given positive examples, mean- troduced by Figure 1 into background knowledge and train ing that the background knowledge consisting of the spatial an ILP model on examples and the background knowledge relationships satisfies the learned rule. to obtain rules for the classification of stages which can be The explanatory tree we create to explain the classifica- then used to produce verbal explanations for a conversa- tion of an individual example results in a structure presented tional dialogue with the user. in Figure 2 based on a logical prove procedure introduced in For the example presented in Figure 1 we can get an- (Finzel et al. 2021). The class label is set to the root node notations of the different tissues manually or automatically of such the explanatory tree and the reasons for the class de- (Schmid and Finzel 2020) and determine with the help of cision, given by the substitution of variables in the learned a spatial calculus, whether they intersect (Bruckert, Finzel, rule, determine the child nodes of the root node. For our and Schmid 2020). Providing examples for each stage (T1- colon tissue example individual parts of the rule (e.g. the in- T4), where the corresponding background knowledge con- vades relationship) can be explained by further background tains the information, which tissues intersect (tumor inter- knowledge (in this case the definition of some spatial rela- sects mucosa for T1 example, tumor intersects muscle for tionship intersects that was computed based on geometric T2 example, and so on) as well as providing negative, con- properties of the input data). The explanatory tree can be trastive examples, we can derive a set of rules for each stage. traversed in a conversational manner (see Figure 2) to obtain This set of rules can be seen as a global explanation, mean- verbal explanations for the reasons of a stage classification ing it explains the characteristics of a class. These rules con- of a particular microscopy scan. tain variables and relationships between them that are sat- A special property of our approach is that we complement isfied by all positive examples and no negative examples. the verbal explanations by visual explanations in terms of The background knowledge can be arbitrarily complex, con- prototypes, in cases, where a verbal explanations cannot be sisting of either only singular properties or more sophisti- presented due to limits of expression (e.g. if the user wants cated relationships, such as the definition of spatial relations to see how a certain tissue type looks like). Explanations by and reasoning rules. Given the learned rules and the back- means of prototypes are based on the idea that categories, ground knowledge, we created so-called explanatory trees especially those without unambiguous necessary and suffi- that explains the classification of individual examples and cient criteria for including or excluding examples, can be can be therefore considered as local explanations (Finzel represented by a central tendency of the category members, called a prototype (Rosch 1987). We chose prototypes for With respect decision support systems that are based on the implementation of a complementary explanation method learned models requirements have been recently stated (Bo- besides verbal explanation, because research has shown that hanec 2021) which we want to discuss by means of our prototypes are relevant, among others in category learning implementation. In his work Bohanec points out that there (Minda and Smith 2001), scheme-inductive reasoning as are five requirements that should be fulfilled. The first one a successful diagnostic reasoning strategy (Coderre et al. is correctness, meaning that the model should provide cor- 2003) and expert teaching (Sternberg and Horvath 1995). In rect (valid, right) information given the decision problem. our model, prototypes are representative category members Second, the model should fulfill completeness, a property and displayed as images. that refers to considering all relevant aspects of the decision Having the explanatory tree as well as the images of the problem and providing answers for all possible inputs. Next, prototypes, the user can now traverse the explanatory tree he mentions consistency in terms of logical and preferential and asking for prototypes through a dialogue with the sys- consistency. Another important requirement is comprehensi- tem. In comparison to our work presented first in (Finzel bility of provided information for the user. Finally, Bohanec et al. 2021), we slightly adapted the requests a user can mentions convenience, referring to easily accessible, timely make. Likewise to the approach presented in (Finzel et al. information, appropriate for the task and the user. 2021) the user can ask for a global explanation, e.g., what Our implementation fulfills a part of these requirements does stage T2 mean. In order to request for local explana- by design. The underlying ILP algorithm that produced the tions, the user can pose the following requests (see Figure model is complete and consistent with respect to problem 2): domain (Finzel et al. 2021). Furthermore, ILP output can be considered to be comprehensible for humans, especially • Which class label has ? (reference A) since it is easy to translate it to verbal statements in the form • Explain why has class ! (refer- of natural language (Muggleton et al. 2018). Furthermore, ence B) our approach heads towards convenience by presenting ex- planations in different modalities to suite different users, • Show me ! (reference C, displays a prototype) levels of understanding and tasks. Correctness is ensured at • Explain further why ! (reference D, allows for least by the deductive step, when explanatory trees are cre- drill-down of explanations) ated from previously induced rules. Users can furthermore request to return to the last expla- Conclusion nation in order to proceed with their search for answers on different branches of the explanatory tree. Motivated by empirical evidence that indicates that multi- The whole implementation, including the files to train an modal explanations are beneficial for understanding, we pre- ILP model, the code to create an explanatory tree, the im- sented an approach and its implementation that combines ages showing the prototypes as well as the dialogue-based verbal, dialogue-based explanations with visual, prototype- interface are available via a git repository1 . based explanations in order to give insights into the rea- sons of a decision of a model trained to classify the stage of cancerous colon tissue samples. We applied inductive logic Discussion programming to generate this said model, an approach that Often interpretable approaches are seen as an alternative to fulfills the requirements of completeness, consistency and explanation generation for black boxes: It has been argued comprehensibility by design. In such visually complex do- that for high stake decision making, for instance in health mains, near misses as a further type of example-based expla- care, interpretable models should be preferred over ex-post nations (Rabold, Siebers, and Schmid 2021) can be helpful explanation generation for neural network models (Rudin to communicate information about the decision boarders be- 2019). It has been pointed out that explanations might be tween diagnostic categories. Aspects like convenience could misleading and inspire unjustified trust (Babic et al. 2021). be evaluated empirically in the future. Our approach could However, although interpretable models such as decision be further extended for an application in medical education, rules or ILP models are white box and therefore inspectable which is an interesting field to use new explanatory tech- – providing explanations might be still necessary. Similar to niques (Chan and Zary 2019), for example in histopatho- computer programs, white box models might be inspectable logical diagnosis (Crowley and Medvedeva 2003). Further in principle but are often too complex for easy comprehensi- empirical investigations shall evaluate the helpfulness of our bility. Explanation mechanisms as the ones proposed in this implementation. paper are helpful or even necessary to communicate the right information in the most suitable modality and in adequate Acknowledgments detail. The work presented in this paper is funded by grant 1 Gitlab repository of our implementation of multimodal expla- FKZ 01IS18056 B, BMBF ML-3 Transparent Medical Ex- nations (including two example data sets: a proof-of-concept data pert Companion (TraMeExCo), 2018-2021. We say many set from the animal world and a data set for colon tissue classi- thanks to our project partners from the Fraunhofer IIS fication for T1-T4 stages): https://gitlab.rz.uni-bamberg.de/cogsys/ (Volker Bruns, Dr. Michaela Benz) and the University Hos- public/multi-level-multi-modal-explanation pital Erlangen (Dr. med. Carol Geppert, Dr. med. Markus Eckstein, and Prof. Dr. Arndt Hartmann, head of institute Leake, D. B. 1991. Goal-based explanation evaluation. Cog- of pathology) who provided us with data our simulated case nitive Science 15(4): 509–545. study is based upon and with the knowledge about assessing Miller, T. 2019. Explanation in artificial intelligence: In- primary tumors in colon cancer microscopy scans. sights from the social sciences. Artificial intelligence 267: 1–38. References Minda, J. P.; and Smith, J. D. 2001. Prototypes in category Babic, B.; Gerke, S.; Evgeniou, T.; and Cohen, I. G. 2021. learning: the effects of category size, category structure, and Beware explanations from AI in health care. Science stimulus complexity. Journal of Experimental Psychology: 373(6552): 284–286. Learning, Memory, and Cognition 27(3): 775. Bohanec, M. 2021. From Data and Models to Decision Sup- Muggleton, S. H.; Schmid, U.; Zeller, C.; Tamaddoni- port Systems: Lessons and Advice for the Future, 191–211. Nezhad, A.; and Besold, T. 2018. Ultra-strong machine Springer. learning: comprehensibility of programs learned with ILP. Bruckert, S.; Finzel, B.; and Schmid, U. 2020. The Next Machine Learning 107(7): 1119–1140. Generation of Medical Decision Support: A Roadmap To- Pierangelo, A.; Manhas, S.; Benali, A.; Fallet, C.; Totobe- ward Transparent Expert Companions. Frontiers in Artificial nazara, J.-L.; Antonelli, M. R.; Novikova, T.; Gayet, B.; Intelligence 3: 75. Martino, A. D.; and Validire, P. 2013. Multispectral Mueller Chan, K. S.; and Zary, N. 2019. Applications and challenges polarimetric imaging detecting residual cancer and cancer of implementing artificial intelligence in medical education: regression after neoadjuvant treatment for colorectal carci- integrative review. JMIR medical education 5(1): e13930. nomas. Journal of Biomedical Optics 18(4): 1 – 10. doi: 10.1117/1.JBO.18.4.046014. Coderre, S.; Mandin, H.; Harasym, P. H.; and Fick, G. H. Rabold, J.; Siebers, M.; and Schmid, U. 2021. Generat- 2003. Diagnostic reasoning strategies and diagnostic suc- ing contrastive explanations for inductive logic program- cess. Medical education 37(8): 695–703. ming based on a near miss approach. Machine Learning Crowley, R. S.; and Medvedeva, O. 2003. A general ar- doi:10.1007/s10994-021-06048-w. chitecture for intelligent tutoring of diagnostic classification Rieger, I.; Kollmann, R.; Finzel, B.; Seuss, D.; and Schmid, problem solving. In AMIA Annual Symposium Proceedings, U. 2020. Verifying Deep Learning-based Decisions for Fa- volume 2003, 185. American Medical Informatics Associa- cial Expression Recognition. In 28th European Symposium tion. on Artificial Neural Networks, Computational Intelligence Finzel, B.; Tafler, E. D.; Scheele, S.; and Schmid, U. 2021. and Machine Learning, ESANN 2020, Bruges, Belgium, Oc- Explanation as a process: user-centric construction of multi- tober 2-4, 2020, 139–144. level and multi-modal explanations. In German Conference Rosch, E. 1987. Wittgenstein and categorization research in on Artificial Intelligence (Künstliche Intelligenz), (to be pub- cognitive psychology. In Meaning and the growth of under- lished). Springer. standing, 151–166. Springer. Gunning, D.; and Aha, D. 2019. DARPA’s explainable artifi- Rudin, C. 2019. Stop explaining black box machine learning cial intelligence (XAI) program. AI Magazine 40(2): 44–58. models for high stakes decisions and use interpretable mod- Hägele, M.; Seegerer, P.; Lapuschkin, S.; Bockmayr, M.; els instead. Nature Machine Intelligence 1(5): 206–215. Samek, W.; Klauschen, F.; Müller, K.-R.; and Binder, A. Schallner, L.; Rabold, J.; Scholz, O.; and Schmid, U. 2019. 2020. Resolving challenges in deep learning-based analy- Effect of Superpixel Aggregation on Explanations in LIME - ses of histopathological images using explanation methods. A Case Study with Biological Data. CoRR abs/1910.07856. Scientific reports 10(1): 1–12. URL http://arxiv.org/abs/1910.07856. He, J.; Baxter, S. L.; Xu, J.; Xu, J.; Zhou, X.; and Zhang, K. Schmid, U. 2018. Inductive Programming as Approach to 2019. The practical implementation of artificial intelligence Comprehensible Machine Learning. In DKB/KIK@ KI, 4– technologies in medicine. Nature medicine 25(1): 30–36. 12. Holzinger, A.; Langs, G.; Denk, H.; Zatloukal, K.; and Schmid, U.; and Finzel, B. 2020. Mutual Explanations for Müller, H. 2019. Causability and explainability of artificial Cooperative Decision Making in Medicine. Journal: KI- intelligence in medicine. Wiley Interdisciplinary Reviews: Künstliche Intelligenz 2: 227–233. Data Mining and Knowledge Discovery 9(4): e1312. Shortliffe, E. 2012. Computer-based medical consultations: Holzinger, A.; Malle, B.; Saranti, A.; and Pfeifer, B. 2021. MYCIN, volume 2. Elsevier. Towards multi-modal causability with Graph Neural Net- Sternberg, R. J.; and Horvath, J. A. 1995. A prototype view works enabling information fusion for explainable AI. In- of expert teaching. Educational researcher 24(6): 9–17. formation Fusion 71: 28–37. ISSN 1566-2535. Thaler, A. M.; and Schmid, U. 2021. Explaining Machine Kulesza, T.; Burnett, M.; Wong, W.-K.; and Stumpf, S. 2015. Learned Relational Concepts in Visual Domains-Effects of Principles of explanatory debugging to personalize interac- Perceived Accuracy on Joint Performance and Trust. In Pro- tive machine learning. In Proceedings of the 20th interna- ceedings of the Annual Meeting of the Cognitive Science So- tional conference on intelligent user interfaces, 126–137. ciety, volume 43. Tizhoosh, H. R.; and Pantanowitz, L. 2018. Artificial intel- ligence and digital pathology: challenges and opportunities. Journal of pathology informatics 9. Vasilyeva, N.; Wilkenfeld, D.; and Lombrozo, T. 2015. Goals Affect the Perceived Quality of Explanations. Cog- nitive Science . Wittekind, C.; Bootz, F.; and Meyer, H.-J. 2004. Tumoren des Verdauungstraktes. In Wittekind, C.; Bootz, F.; and Meyer, H.-J., eds., TNM Klassifikation maligner Tumoren, International Union Against Cancer, 53–88. Springer.