Multimodal Explanations for User-centric Medical Decision Support Systems
                Bettina Finzel1 , David Elias Tafler1 , Anna Magdalena Thaler1 , Ute Schmid1
                                             1
                                                 Cognitive Systems, University of Bamberg
                                                            An der Weberei 5
                                                        96047 Bamberg, Germany


                           Abstract                                    also understandable to human decision makers. Such sys-
                                                                       tems can empower the user to validate the system and to re-
  Based on empirical evidence indicating that different types          main in control of decisions, which is a crucial requirement
  of explanations should be used to satisfy different users in         in medicine (Tizhoosh and Pantanowitz 2018).
  terms of their information needs and to increase trust in the
  system, we motivate the use of multimodal explanations for              Recent research uses inductive logic programming (ILP)
  decisions made by a machine learning model to support medi-          to train models that can be explained to the user in a com-
  cal diagnoses. We present a system through which medical             prehensible way and which are capable to deal with complex
  professionals or students can obtain verbal explanations for a       relational data. In contrast to visual explanations that can
  classification by means of a dialogue and to which they can          transfer only the information of occurrence or absence, a re-
  make queries to get prototypical examples in the form of im-         lational approach such as ILP can express arbitrarily com-
  ages showing typical health conditions. Our approach could           plex relationships, for example spatial and temporal rela-
  be used for validating algorithmic decisions using a human-          tions as well as recursion (Schmid 2018). In addition to the
  in-the-loop method or for medical education.                         training data, ILP can be enriched by existing background
                                                                       knowledge that is by expert knowledge, be it for training or
                       Introduction                                    correction of learned models (Schmid and Finzel 2020). In
                                                                       the past, there have been prominent transparent systems that
In medical diagnostics, Deep Learning is increasingly used             made decisions based on relational data, such as MYCIN
to classify patient data. Due to the condition that such sys-          (Shortliffe 2012). However, the focus there was more on
tems must be transparent, great progress has been made in              building expert systems and less on satisfying different users
the field of explainable artificial intelligence (XAI) in re-          through multimodal explanations.
cent years. Work on visual explanations for example for the               To close this gap, we build on the findings from empirical
classification of human tissue (Hägele et al. 2020), malaria          research on multimodal explanations as well as on a recent
probes (Schallner et al. 2019) and facial expressions for pa-          research work that combines two different explanation ap-
tients suffering from pain (Rieger et al. 2020) shows that             proaches, namely verbal, dialogue-based as well as visual
the methods developed can be used to reveal which fea-                 image-based explanations. We use our approach for the first
tures a deep neural network has found relevant for classi-             time to classify and explain medical data.
fication. However, relational information in terms of com-
plex relationships between features of the data was not used                                Related Work
to make the classification decision. As recent works have
emphasized, medical diagnosis is often based on examin-                Explainable Artificial Intelligence (XAI) aims to make AI
ing relational data, thus, a model that is able to incorporate         systems, their decisions and actions comprehensible. Given
these and to explain its decision in a relational manner, is           the high stakes involved in medical diagnosis and human
key (Bruckert, Finzel, and Schmid 2020; Schmid and Finzel              health in general, it is obvious that AI systems applied in
2020; Holzinger et al. 2021). Moreover, the focus of these             this domain need to be understood by the user. To this end,
works has not been to present different explanations, that is,         an AI system may produce explanations of various types
in varying modalities, in order to take different angles in ex-        and formats. Building on previous work on combining ver-
plaining a classification, to satisfy different users in terms         bal and visual explanations in artificial intelligence (Finzel
of their need for information. However, being able to make             et al. 2021), we here apply these concepts to the medical
decisions based on complex relationships and being able to             domain. We explore the potential use of different kinds of
explain them in as many ways as possible are two important             explanations given by a diagnostic decision support system
aspects of building systems that are not only transparent but          for assessing primary tumors in tissue samples. Specifically,
                                                                       we consider dialogue- and image-based explanation, allow-
Copyright c 2021 for this paper by its authors. Use permitted un-      ing for step-wise exploration of the reasons behind a classi-
der Creative Commons License Attribution 4.0 International (CC         fication outcome as well as displaying images on demand
BY 4.0).                                                               that show prototypical examples of health conditions.
   The fact that there exists more than one type of explana-
tion suggests that not every explanation fits every situation.
For instance, to explain the classification of an object as be-
longing to one of two categories that only differ by color,
a visual explanation might be preferred over a verbal ex-
planation based on the nature of the problem. In contrast to
this, visual classification tasks that rely on relational infor-
mation rather than a simple presence or absence of features
might require additional verbal explanations to improve the
joint performance, trust in the decision aid system and to
correctly counteract faulty system predictions (Thaler and
Schmid 2021).
   We further want to point to the importance of the person
requesting the explanation. The idea of tailoring explana-
tions to and evaluating them based on the goals of the ex-         Figure 1: An example of a colon tissue sample under the
plainee is not new in cognitive science (Leake 1991) and has       microscope containing different stages (T1-T3) of tumors in
been supported by more recent empirical evidence. For ex-          accordance to the widely used TMN staging system (Wit-
ample (Vasilyeva, Wilkenfeld, and Lombrozo 2015) showed            tekind, Bootz, and Meyer 2004) with different colon tissues
that people prefer explanations (formal, mechanistic, tele-        involved: mucosa (M), submucosa (SM), muscularis propria
ological) that are consistent with their goals. Also in the        (MP) and pericolic adipose tissue (P). The image was taken
field of XAI, the users’ inter- and intraindividual differences    from (Pierangelo et al. 2013) for illustration of our use case.
have been recognized as important to the development and
improvement of XAI (e.g. (Gunning and Aha 2019; Miller
2019; Kulesza et al. 2015)).                                       While staging refers to determining the extent of a tumor
   In medicine, the potential applications of AI are many-         (location, size and spreading in different layers of tissue),
fold, spanning diagnostics, therapeutics, population health        grading examines the abnormality of the appearance of tu-
management, administration, and regulation (He et al.              mor cells and tumor tissue. In this paper, we focus on the
2019). In order to make such systems transparent, their ex-        task of staging for primary tumors that is determining inva-
planations need to consider certain user characteristics, such     sion depth of an original tumor in human colon tissue lay-
as their expertise and goals. Especially in applications with      ers. We therefore look at spatial relationships between a tu-
potentially extreme consequences, such as decision support         mor and its surrounding tissue layers. The most widely used
for diagnosing illnesses, the diagnostician needs to under-        system for tumor staging is the TMN staging system (Wit-
stand the system’s recommendations to make well-informed           tekind, Bootz, and Meyer 2004). This system is used to de-
decisions. Along these lines, (Holzinger et al. 2019) have         note the stage of a tumor in pathology reports. The letters
differentiated between explainability, a more technical at-        T, M and N are combined with further letters or numbers to
tribute of an algorithm, and causability, a feature of expla-      indicate the exact stage. We focus on the T category, which
nations that describes how well an explanation can transfer        is concerned with the size and the extent of the main tumor,
causal understanding to a human user. In order to increase         also called the primary tumor. If a primary tumor is found
causablilty of medical decision support systems, we com-           in the colon tissue, it is assigned with one of five possible
bine different kinds of explanations. The user can request         stages: Tis, T1, T2, T3 or T4. The higher the number after
various explanations via a conversational interaction with         the T, the bigger the extent of the tumor. Tis stands for car-
the system and thus control the transmission of understand-        cinoma in situ and denotes a tumor that hasn’t yet extended
ing that is explanation.                                           to the next tissue layer. The stages can be further differenti-
                                                                   ated depending on the kind of tissue affected by the tumor
Multimodal Explanations for Medical Decision                       (e.g. T4a, T4b). Note that there are further assignments, e.g.
                 Making                                            TX for tumors that cannot be assessed or T0 if there is no
                                                                   evidence for a tumor. We disregard these cases.
In this section we show how our multimodal explanation ap-            Figure 1 shows a colon tissue sample, where healthy tis-
proach can be applied to the medical use case of primary tu-       sue and three of four stages are present (corresponding to
mor staging. We first introduce the medical terminology and        the 4 zones separated by dotted lines). The leftmost zone
concepts for tumor staging and present examples for verbal,        contains healthy tissue that can be divided into mucosa (M),
dialogue-based explanations as well as visual, prototype-          submucosa (SM), muscularis propria (MP) and pericolic
based explanations accordingly.                                    adipose tissue (P). Zone 2 includes a T1 tumor (invading the
                                                                   mucosa and the submucosa), zone 3 a T2 tumor (extending
Primary Tumor Classification in Colon Tissue                       to the muscularis propria) and zone 4 contains a T3 tumor
Samples                                                            (that is growing past the boundaries of the muscularis pro-
The task of classifying tumors requires different competen-        pria into the preicolic adipose tissue). The letters C and S
cies and diagnostic steps. The main tasks involved are tumor       included in Figure 1 denote tumor cells and tumor stroma,
staging and grading (Wittekind, Bootz, and Meyer 2004).            H, B and U denote further diagnostic areas, however, this
Figure 2: An explanatory tree for stage t2(scan 0708), that can be queried by the user to get a local explanation why scan 0708
is labeled as T2 (steps A and B). A dialogue is realized by further requests, either to get more visual explanations in terms of
prototypes (step C) or to get more verbal explanations in a drill-down manner (step D).


is not important for the work presented here and therefore         et al. 2021).
not explained further. To increase the readability of the fol-        An exemplary rule from an ILP model that was trained
lowing paragraphs for the reader who may not be familiar           to recognize tissue samples of stage T2, states that a scan
with medical terminology, we will name the different tis-          A is classified as stage T2, if it holds that A contains B and
sue types in the following subsections mucosa, submucosa,          B is a tumor and B invades C and C is muscle tissue. This
muscle and fat tissue.                                             rule represented in the logic programming language Prolog
                                                                   would state: “stage t2(A) :- contains(A,B), is a(B,tumor), in-
 Dialogue- and Prototype-based Explanations                        vades(B,C), is a(C,muscle).”. The upper case letters A, B, C
Likewise to the example presented in previous work (Finzel         are variables that can be substituted by lower case constants
et al. 2021), we can translate the given expert knowledge in-      by applying the rule to the given positive examples, mean-
troduced by Figure 1 into background knowledge and train           ing that the background knowledge consisting of the spatial
an ILP model on examples and the background knowledge              relationships satisfies the learned rule.
to obtain rules for the classification of stages which can be         The explanatory tree we create to explain the classifica-
then used to produce verbal explanations for a conversa-           tion of an individual example results in a structure presented
tional dialogue with the user.                                     in Figure 2 based on a logical prove procedure introduced in
   For the example presented in Figure 1 we can get an-            (Finzel et al. 2021). The class label is set to the root node
notations of the different tissues manually or automatically       of such the explanatory tree and the reasons for the class de-
(Schmid and Finzel 2020) and determine with the help of            cision, given by the substitution of variables in the learned
a spatial calculus, whether they intersect (Bruckert, Finzel,      rule, determine the child nodes of the root node. For our
and Schmid 2020). Providing examples for each stage (T1-           colon tissue example individual parts of the rule (e.g. the in-
T4), where the corresponding background knowledge con-             vades relationship) can be explained by further background
tains the information, which tissues intersect (tumor inter-       knowledge (in this case the definition of some spatial rela-
sects mucosa for T1 example, tumor intersects muscle for           tionship intersects that was computed based on geometric
T2 example, and so on) as well as providing negative, con-         properties of the input data). The explanatory tree can be
trastive examples, we can derive a set of rules for each stage.    traversed in a conversational manner (see Figure 2) to obtain
This set of rules can be seen as a global explanation, mean-       verbal explanations for the reasons of a stage classification
ing it explains the characteristics of a class. These rules con-   of a particular microscopy scan.
tain variables and relationships between them that are sat-           A special property of our approach is that we complement
isfied by all positive examples and no negative examples.          the verbal explanations by visual explanations in terms of
The background knowledge can be arbitrarily complex, con-          prototypes, in cases, where a verbal explanations cannot be
sisting of either only singular properties or more sophisti-       presented due to limits of expression (e.g. if the user wants
cated relationships, such as the definition of spatial relations   to see how a certain tissue type looks like). Explanations by
and reasoning rules. Given the learned rules and the back-         means of prototypes are based on the idea that categories,
ground knowledge, we created so-called explanatory trees           especially those without unambiguous necessary and suffi-
that explains the classification of individual examples and        cient criteria for including or excluding examples, can be
can be therefore considered as local explanations (Finzel          represented by a central tendency of the category members,
called a prototype (Rosch 1987). We chose prototypes for                  With respect decision support systems that are based on
the implementation of a complementary explanation method               learned models requirements have been recently stated (Bo-
besides verbal explanation, because research has shown that            hanec 2021) which we want to discuss by means of our
prototypes are relevant, among others in category learning             implementation. In his work Bohanec points out that there
(Minda and Smith 2001), scheme-inductive reasoning as                  are five requirements that should be fulfilled. The first one
a successful diagnostic reasoning strategy (Coderre et al.             is correctness, meaning that the model should provide cor-
2003) and expert teaching (Sternberg and Horvath 1995). In             rect (valid, right) information given the decision problem.
our model, prototypes are representative category members              Second, the model should fulfill completeness, a property
and displayed as images.                                               that refers to considering all relevant aspects of the decision
   Having the explanatory tree as well as the images of the            problem and providing answers for all possible inputs. Next,
prototypes, the user can now traverse the explanatory tree             he mentions consistency in terms of logical and preferential
and asking for prototypes through a dialogue with the sys-             consistency. Another important requirement is comprehensi-
tem. In comparison to our work presented first in (Finzel              bility of provided information for the user. Finally, Bohanec
et al. 2021), we slightly adapted the requests a user can              mentions convenience, referring to easily accessible, timely
make. Likewise to the approach presented in (Finzel et al.             information, appropriate for the task and the user.
2021) the user can ask for a global explanation, e.g., what               Our implementation fulfills a part of these requirements
does stage T2 mean. In order to request for local explana-             by design. The underlying ILP algorithm that produced the
tions, the user can pose the following requests (see Figure            model is complete and consistent with respect to problem
2):                                                                    domain (Finzel et al. 2021). Furthermore, ILP output can
                                                                       be considered to be comprehensible for humans, especially
• Which class label has <example>? (reference A)                       since it is easy to translate it to verbal statements in the form
• Explain why <example> has class <class label>! (refer-               of natural language (Muggleton et al. 2018). Furthermore,
  ence B)                                                              our approach heads towards convenience by presenting ex-
                                                                       planations in different modalities to suite different users,
• Show me <concept>! (reference C, displays a prototype)               levels of understanding and tasks. Correctness is ensured at
• Explain further why <relation>! (reference D, allows for             least by the deductive step, when explanatory trees are cre-
  drill-down of explanations)                                          ated from previously induced rules.

   Users can furthermore request to return to the last expla-                                  Conclusion
nation in order to proceed with their search for answers on
different branches of the explanatory tree.                            Motivated by empirical evidence that indicates that multi-
   The whole implementation, including the files to train an           modal explanations are beneficial for understanding, we pre-
ILP model, the code to create an explanatory tree, the im-             sented an approach and its implementation that combines
ages showing the prototypes as well as the dialogue-based              verbal, dialogue-based explanations with visual, prototype-
interface are available via a git repository1 .                        based explanations in order to give insights into the rea-
                                                                       sons of a decision of a model trained to classify the stage of
                                                                       cancerous colon tissue samples. We applied inductive logic
                          Discussion                                   programming to generate this said model, an approach that
Often interpretable approaches are seen as an alternative to           fulfills the requirements of completeness, consistency and
explanation generation for black boxes: It has been argued             comprehensibility by design. In such visually complex do-
that for high stake decision making, for instance in health            mains, near misses as a further type of example-based expla-
care, interpretable models should be preferred over ex-post            nations (Rabold, Siebers, and Schmid 2021) can be helpful
explanation generation for neural network models (Rudin                to communicate information about the decision boarders be-
2019). It has been pointed out that explanations might be              tween diagnostic categories. Aspects like convenience could
misleading and inspire unjustified trust (Babic et al. 2021).          be evaluated empirically in the future. Our approach could
However, although interpretable models such as decision                be further extended for an application in medical education,
rules or ILP models are white box and therefore inspectable            which is an interesting field to use new explanatory tech-
– providing explanations might be still necessary. Similar to          niques (Chan and Zary 2019), for example in histopatho-
computer programs, white box models might be inspectable               logical diagnosis (Crowley and Medvedeva 2003). Further
in principle but are often too complex for easy comprehensi-           empirical investigations shall evaluate the helpfulness of our
bility. Explanation mechanisms as the ones proposed in this            implementation.
paper are helpful or even necessary to communicate the right
information in the most suitable modality and in adequate                                 Acknowledgments
detail.
                                                                       The work presented in this paper is funded by grant
   1
     Gitlab repository of our implementation of multimodal expla-      FKZ 01IS18056 B, BMBF ML-3 Transparent Medical Ex-
nations (including two example data sets: a proof-of-concept data      pert Companion (TraMeExCo), 2018-2021. We say many
set from the animal world and a data set for colon tissue classi-      thanks to our project partners from the Fraunhofer IIS
fication for T1-T4 stages): https://gitlab.rz.uni-bamberg.de/cogsys/   (Volker Bruns, Dr. Michaela Benz) and the University Hos-
public/multi-level-multi-modal-explanation                             pital Erlangen (Dr. med. Carol Geppert, Dr. med. Markus
Eckstein, and Prof. Dr. Arndt Hartmann, head of institute           Leake, D. B. 1991. Goal-based explanation evaluation. Cog-
of pathology) who provided us with data our simulated case          nitive Science 15(4): 509–545.
study is based upon and with the knowledge about assessing          Miller, T. 2019. Explanation in artificial intelligence: In-
primary tumors in colon cancer microscopy scans.                    sights from the social sciences. Artificial intelligence 267:
                                                                    1–38.
                        References                                  Minda, J. P.; and Smith, J. D. 2001. Prototypes in category
Babic, B.; Gerke, S.; Evgeniou, T.; and Cohen, I. G. 2021.          learning: the effects of category size, category structure, and
Beware explanations from AI in health care. Science                 stimulus complexity. Journal of Experimental Psychology:
373(6552): 284–286.                                                 Learning, Memory, and Cognition 27(3): 775.
Bohanec, M. 2021. From Data and Models to Decision Sup-             Muggleton, S. H.; Schmid, U.; Zeller, C.; Tamaddoni-
port Systems: Lessons and Advice for the Future, 191–211.           Nezhad, A.; and Besold, T. 2018. Ultra-strong machine
Springer.                                                           learning: comprehensibility of programs learned with ILP.
Bruckert, S.; Finzel, B.; and Schmid, U. 2020. The Next             Machine Learning 107(7): 1119–1140.
Generation of Medical Decision Support: A Roadmap To-               Pierangelo, A.; Manhas, S.; Benali, A.; Fallet, C.; Totobe-
ward Transparent Expert Companions. Frontiers in Artificial         nazara, J.-L.; Antonelli, M. R.; Novikova, T.; Gayet, B.;
Intelligence 3: 75.                                                 Martino, A. D.; and Validire, P. 2013. Multispectral Mueller
Chan, K. S.; and Zary, N. 2019. Applications and challenges         polarimetric imaging detecting residual cancer and cancer
of implementing artificial intelligence in medical education:       regression after neoadjuvant treatment for colorectal carci-
integrative review. JMIR medical education 5(1): e13930.            nomas. Journal of Biomedical Optics 18(4): 1 – 10. doi:
                                                                    10.1117/1.JBO.18.4.046014.
Coderre, S.; Mandin, H.; Harasym, P. H.; and Fick, G. H.
                                                                    Rabold, J.; Siebers, M.; and Schmid, U. 2021. Generat-
2003. Diagnostic reasoning strategies and diagnostic suc-
                                                                    ing contrastive explanations for inductive logic program-
cess. Medical education 37(8): 695–703.
                                                                    ming based on a near miss approach. Machine Learning
Crowley, R. S.; and Medvedeva, O. 2003. A general ar-               doi:10.1007/s10994-021-06048-w.
chitecture for intelligent tutoring of diagnostic classification    Rieger, I.; Kollmann, R.; Finzel, B.; Seuss, D.; and Schmid,
problem solving. In AMIA Annual Symposium Proceedings,              U. 2020. Verifying Deep Learning-based Decisions for Fa-
volume 2003, 185. American Medical Informatics Associa-             cial Expression Recognition. In 28th European Symposium
tion.                                                               on Artificial Neural Networks, Computational Intelligence
Finzel, B.; Tafler, E. D.; Scheele, S.; and Schmid, U. 2021.        and Machine Learning, ESANN 2020, Bruges, Belgium, Oc-
Explanation as a process: user-centric construction of multi-       tober 2-4, 2020, 139–144.
level and multi-modal explanations. In German Conference            Rosch, E. 1987. Wittgenstein and categorization research in
on Artificial Intelligence (Künstliche Intelligenz), (to be pub-   cognitive psychology. In Meaning and the growth of under-
lished). Springer.                                                  standing, 151–166. Springer.
Gunning, D.; and Aha, D. 2019. DARPA’s explainable artifi-          Rudin, C. 2019. Stop explaining black box machine learning
cial intelligence (XAI) program. AI Magazine 40(2): 44–58.          models for high stakes decisions and use interpretable mod-
Hägele, M.; Seegerer, P.; Lapuschkin, S.; Bockmayr, M.;            els instead. Nature Machine Intelligence 1(5): 206–215.
Samek, W.; Klauschen, F.; Müller, K.-R.; and Binder, A.            Schallner, L.; Rabold, J.; Scholz, O.; and Schmid, U. 2019.
2020. Resolving challenges in deep learning-based analy-            Effect of Superpixel Aggregation on Explanations in LIME -
ses of histopathological images using explanation methods.          A Case Study with Biological Data. CoRR abs/1910.07856.
Scientific reports 10(1): 1–12.                                     URL http://arxiv.org/abs/1910.07856.
He, J.; Baxter, S. L.; Xu, J.; Xu, J.; Zhou, X.; and Zhang, K.      Schmid, U. 2018. Inductive Programming as Approach to
2019. The practical implementation of artificial intelligence       Comprehensible Machine Learning. In DKB/KIK@ KI, 4–
technologies in medicine. Nature medicine 25(1): 30–36.             12.
Holzinger, A.; Langs, G.; Denk, H.; Zatloukal, K.; and              Schmid, U.; and Finzel, B. 2020. Mutual Explanations for
Müller, H. 2019. Causability and explainability of artificial      Cooperative Decision Making in Medicine. Journal: KI-
intelligence in medicine. Wiley Interdisciplinary Reviews:          Künstliche Intelligenz 2: 227–233.
Data Mining and Knowledge Discovery 9(4): e1312.                    Shortliffe, E. 2012. Computer-based medical consultations:
Holzinger, A.; Malle, B.; Saranti, A.; and Pfeifer, B. 2021.        MYCIN, volume 2. Elsevier.
Towards multi-modal causability with Graph Neural Net-              Sternberg, R. J.; and Horvath, J. A. 1995. A prototype view
works enabling information fusion for explainable AI. In-           of expert teaching. Educational researcher 24(6): 9–17.
formation Fusion 71: 28–37. ISSN 1566-2535.                         Thaler, A. M.; and Schmid, U. 2021. Explaining Machine
Kulesza, T.; Burnett, M.; Wong, W.-K.; and Stumpf, S. 2015.         Learned Relational Concepts in Visual Domains-Effects of
Principles of explanatory debugging to personalize interac-         Perceived Accuracy on Joint Performance and Trust. In Pro-
tive machine learning. In Proceedings of the 20th interna-          ceedings of the Annual Meeting of the Cognitive Science So-
tional conference on intelligent user interfaces, 126–137.          ciety, volume 43.
Tizhoosh, H. R.; and Pantanowitz, L. 2018. Artificial intel-
ligence and digital pathology: challenges and opportunities.
Journal of pathology informatics 9.
Vasilyeva, N.; Wilkenfeld, D.; and Lombrozo, T. 2015.
Goals Affect the Perceived Quality of Explanations. Cog-
nitive Science .
Wittekind, C.; Bootz, F.; and Meyer, H.-J. 2004. Tumoren
des Verdauungstraktes. In Wittekind, C.; Bootz, F.; and
Meyer, H.-J., eds., TNM Klassifikation maligner Tumoren,
International Union Against Cancer, 53–88. Springer.