Identifying XAI User Needs: Gaps between Literature
                                and Use Cases in the Financial Sector
                                Jenia Kim1,*,†, Henry Maathuis1,2,*,†, Kees van Montfort3 and Danielle Sent1,2
                                1HU University of Applied Sciences Utrecht, Research Group Artificial Intelligence,

                                Heidelberglaan 15, 3584 CS Utrecht, The Netherlands
                                2Jheronimus Academy of Data Science, Tilburg University, Eindhoven University of Technology,

                                St. Janssingel 92, 5211 DA ’s-Hertogenbosch, The Netherlands
                                3Amsterdam University of Applied Sciences, Wibautstraat 2-4 1091 GM Amsterdam, The Netherlands


                                            Abstract
                                            One aspect of a responsible application of Artificial Intelligence (AI) is ensuring that the operation and
                                            outputs of an AI system are understandable for non-technical users, who need to consider its recom-
                                            mendations in their decision making. The importance of explainable AI (XAI) is widely acknowledged;
                                            however, its practical implementation is not straightforward. In particular, it is still unclear what the
                                            requirements are of non-technical users from explanations, i.e. what makes an explanation meaningful. In
                                            this paper, we synthesize insights on meaningful explanations from a literature study and two use cases
                                            in the financial sector. We identified 30 components of meaningfulness in XAI literature. In addition, we
                                            report three themes associated with explanation needs that were central to the users in our use cases,
                                            but are not prominently described in literature: actionability, coherent narratives and context. Our results
                                            highlight the importance of narrowing the gap between theoretical and applied responsible AI.

                                            Keywords
                                            Explainable AI, Finance, Human-Centered Evaluation


                                1. Introduction
                                Artificial Intelligence (AI) is increasingly being integrated into business processes of financial
                                services providers in the Netherlands. This goes hand in hand with awareness within these
                                organizations that AI needs to be implemented responsibly (e.g., [1]). One of the aspects of
                                responsible application of AI is ensuring that the outcomes and the internal workings of an
                                AI-based system are understandable for the non-technical employees who interact with it, such
                                as risk underwriters and claim handlers (e.g., [2]). This is important since these employees
                                need to be able to communicate the reasoning behind decisions to the customers, for example,
                                explain why an insurance claim, or a loan request, was rejected.
                                   While financial services companies acknowledge the importance of explainable AI (XAI),
                                they indicate that its practical implementation is not straightforward. In particular, it is unclear
                                HHAI-WS 2024: Workshops at the Third International Conference on Hybrid Human-Artificial Intelligence (HHAI), June
                                10—14, 2024, Malmö, Sweden
                                *Corresponding author.

                                †These authors contributed equally.

                                Qi jenia.kim@hu.nl (J. Kim); henry.maathuis@hu.nl (H. Maathuis)
                                8 0009-0008-5067-4640 (J. Kim); 0009-0002-5542-0478 (H. Maathuis); 0009-0007-2803-5095 (K. v. Montfort);
                                0000-0002-4703-5345 (D. Sent)
                                          © 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).


CEUR
                  ceur-ws.org
Workshop      ISSN 1613-0073
Proceedings
what constitutes a “good" explanation from the point of view of the non-technical user. While
the objective quality of an explanation (e.g., correctness, completeness) can be measured by
the developers of the AI system, it does not ensure that the explanation is meaningful to the
end-user and achieves goals like understandability, trust and good decision making.
   In the FIN-X project1, researchers and organizations from the financial sector work together to
address this gap. We aim to develop practical guidelines that will detail (a) what is a meaningful
explanation for a user of an AI system, (b) how to communicate explanations in a meaningful
way, and (c) how to evaluate the meaningfulness of explanations. To achieve this goal, we
synthesize insights from literature, as well as legal requirements from the GDPR and EU AI Act,
and requirements from the use cases provided by our industry partners. Using this information,
we create prototypes and evaluate them with the intended users.
   In this paper, we report the findings from the first phases of the project. We focus on findings
from literature and the use cases; the legal requirements are out of scope for this paper. We
present some gaps and insights that emerged from comparing the explanation requirements
mentioned in the use cases and those identified in literature. This highlights the complementary
nature of academic research and practical real-world implementations, and the importance of
narrowing the gap between theoretical and applied Responsible AI.


2. Literature Study
As part of the FIN-X project, a systematic literature review was performed, which focused on how
explanations are evaluated in empirical studies with users. The idea is that the properties that
researchers choose to evaluate with users are components of what is considered a meaningful
explanation by the XAI research community. The systematic review on aspects of meaningful
explanations is currently under review [3]; in this study, we focus on a subset of our findings.

2.1. Method
In November 2023, we performed a search in five databases (ACM Digital Library, Scopus, Web
of Science, IEEE Xplore and PubMed) to find abstracts that contain two elements: (1) mention
of explainable AI or XAI, and (2) mention of words related to evaluation from a user perspective:
meaningful, trustworthy, understandable or interpretable. The query returned 3,103 papers; after
deduplication, 1,655 unique papers remained. These papers went through a few rounds of
filtering, after which 73 papers remained that fulfilled our inclusion criteria: papers that (a)
involve an AI-based system with explanations, and (b) report an empirical evaluation of the
explanations in a user study.

2.2. Selected insights from the literature study
We systematically collected the properties evaluated in the user studies in our set of papers;
we consider these properties to be components of a meaningful explanation. We found that a
meaningful explanation has 30 properties, according to the reviewed literature. We categorized
these 30 properties along three dimensions, (as also shown in Figure 1):
1https://www.internationalhu.com/research/projects/fin-x
Figure 1: Components of a meaningful explanation (expl=explanation, sys=system)


    • The in-context quality of the explanation (11 components). Is the explanation satis-
      fying, understandable, useful, actionable, sufficient, compact, trustworthy, correct, typical,
      easy to understand, and easy to use?
    • The contribution of the explanation to human-AI interaction (17 components). Does the
      explanation help the user to better understand the AI system? Does it improve the user’s
      perception of the AI system as trustworthy, useful, satisfying, competent, honest, benevolent,
      controllable, predictable, transparent, easy to understand, easy to use, and engaging? Does
      the explanation help the user to better understand the interaction with the AI system? Does
      it make the interaction less cognitively demanding? Does it increase the user’s confidence
      in the decision? Does it increase the readiness to adopt the AI system and use it?
    • The contribution of the explanation to human-AI performance (2 components). Does
      the explanation improve the user’s performance on the task? Does it help the user to
      discover new insights?


3. Use Cases
As part of the FIN-X project, each collaborating company was asked to submit a use case,
involving an AI system that is currently used in the organization. Here, we focus on two use
cases. Company A offers credit (loans) to businesses. Their AI application estimates the chance
of approving a credit request; it outputs a score (approval chance) and a local feature importance
explanation, i.e. the three top factors contributing to the score. The AI application of Company
B detects risk of fraud in insurance claims; it outputs a risk score and a local feature importance
explanation, i.e. the five top factors contributing to the score.
   We chose to focus on these two use cases because they are similar in the type of output (score)
and the type of explanation (local feature importance); moreover, in both cases, non-technical
employees (risk underwriters in A, claim handlers in B) make a final decision with the support
of the AI system.

3.1. Method
In each company, five stakeholders of the AI systems were interviewed, who were indicated by
the company as knowledgeable about the use case. In company A, the interviewees included
non-technical users, developers and management. In company B, the interviewees included
developers and consultants. The interviews were conducted in a semi-structured manner:
several predefined questions were asked, but the interviewees were also encouraged to talk
freely. The interviews were recorded and transcribed. The transcriptions were labeled according
to themes of interest.
   Since we did not want to influence interviewees’ responses, we did not present them with
the results of our literature study, nor ask them about the aspects of meaningfulness that we
identified in literature. Aspects mentioned in the interviews were only afterwards compared
to those found in literature. We focused on elements that were mentioned in the interviews,
but were not prominent in literature, i.e. needs of users that are overlooked by the research
community.2

3.2. Selected insights from the interviews
We focus on three insights that we qualitatively estimated as important and recurring in the
interviews. Below, we shortly describe each insight and provide two illustrative quotes; the
emphasis in the quotes is ours. For each quote, we mention the role of the interviewee in
the company; sometimes the speakers are not users, but they talk from the perspective of
non-technical users.

Actionable explanations
Interviewees from both use cases emphasized the importance of the actionability of explanations.
It is not enough for the users to understand the explanation and the AI’s recommendation; an
explanation is perceived as meaningful if it directs the user towards an action or a next step.

       “For me, the explanation of the system is good if it is presented in a clear manner. So,
       digestible information that I can do something with, so to speak." (Company A;
       sales person, former risk underwriter)
2The opposite comparison (i.e. which aspects from literature are not mentioned in the interviews) is not possible in

this case, since the fact that a person does not mention something in an open-ended conversation does not mean
that they do not find it important.
       “It’s clear why the model made [a] decision. But the claim investigators are like, okay,
       but what do you want me to do with this? So the actionability part is also highly
       requested." (Company B; ML engineer)

Coherent narratives and scenarios
In the fraud detection use case (company B), interviewees expressed the need for a coherent
narrative that ties the separate indicators together. It is not enough for the users to see the
various factors that contribute to the AI’s recommendation; an explanation is perceived as
meaningful if, similarly to humans, it constructs a fraud scenario (a story) from the combination
of the factors.3

       “What I know from the customers and everything [is] that they’re more missing the
       full scenario of why this hit. Here you’re just presenting different things, but it’s
       like also trying to figure out how these correlate and what is the full picture of this."
       (Company B; consultant)

       “I can imagine customers saying, like, we got this sort of scenario, but now it’s reduced
       to a set of factors in which the scenario is a bit lost. (Company B; ML engineer)

Additional context
Interviewees from both use cases indicated that the output of the AI system and the explanations
are a good starting point for the analysis, but they are always supplemented by additional (qual-
itative) information. AI recommendations and explanations are meaningful only in combination
with additional context that is provided by the human expert.

       “It is a very good starting point for our analysis, the data and the bank statements,
       but above all, I think, powerfully supplemented with some qualitative data."
       (Company A; sales person, former risk underwriter)

       “But we are looking into the overall context. [...] This additional insight, this
       additional information is, I think, super helpful." (Company B; consultant)


4. Discussion and Conclusion
We observed that some aspects of a meaningful explanation that are important to users of real-
world AI systems are not prominently featured in current XAI literature. First, the actionability
of explanations was one of the most salient points in the interviews, but in the literature study
it was found only in one paper out of the 73 included in the study.
   Second, the need for coherent narratives to be constructed out of the individual factors
contributing to the AI recommendation was central in the fraud detection use case, but it was

3The customers mentioned in the quotes are the companies that use the AI application provided by company B, and

particularly the claim handlers, who are the users of the system.
not mentioned in the literature that we reviewed. Notably, this requirement seems to be specific
to the fraud detection use case; in this decision-making process, seeing the explanation as
separate factors is not meaningful, because the overall narrative (a plausible fraud scenario)
is lost. In the credit approval use case, on the other hand, the need for a coherent narrative
was not mentioned, probably because the process of approving a credit request is aligned with
checking whether individual factors are satisfied.
   Third, the role of context, i.e. additional (external, qualitative) knowledge to supplement
AI was not directly evaluated in the literature we reviewed. However, this aspect might be
related to a variable that is explored in some studies: level of expertise. Some studies found
that explanations are more beneficial for people with higher expertise in the task (e.g., [4]); this
could be due to the ability of these users to bring external knowledge into the task, which helps
them to make sense of the explanations.
   To conclude, we discovered from the literature study that a meaningful explanation is a
complex, multi-dimensional construct. Comparing the properties discussed in the literature
with those mentioned in the use cases, we observed that some aspects that are important to
users are not prominent in the current XAI literature. In addition, we observed that some
aspects, such as the need for coherent narratives, are use-case specific. Even though we only
focused on two use cases, this already provided us with valuable insights that have not yet been
described in the literature.
   Our findings highlight the importance of conducting studies that evaluate XAI in an appli-
cation grounded setup, i.e. evaluation with a real task and the expert intended users of the
application [5]. As we report here, in a real-world setup, user needs can be discovered which
might be overlooked otherwise. Future work will explore how these insights can be translated
into actionable evaluation methods to be implemented in user studies.


References
[1] M. Van den Berg, J. Gerlings, J. Kim, Empirical research on ensuring ethical ai in fraud
    detection of insurance claims: A field study of dutch insurers, in: European Conference on
    Artificial Intelligence, Springer, 2023, pp. 106–114.
[2] A. Bertrand, J. R. Eagan, W. Maxwell, Questioning the ability of feature-based explanations
    to empower non-experts in robo-advised financial decision-making, in: Proceedings of the
    2023 ACM Conference on Fairness, Accountability, and Transparency, 2023, pp. 943–958.
[3] J. Kim, H. Maathuis, D. Sent, Human-Centered Evaluation of Explainable AI Applications:
    a Systematic Review (under review).
[4] B. Ghai, Q. V. Liao, Y. Zhang, R. Bellamy, K. Mueller, Explainable active learning (XAL)
    toward AI explanations as interfaces for machine teachers, Proceedings of the ACM on
    Human-Computer Interaction 4 (2021) 1–28.
[5] F. Doshi-Velez, B. Kim, Considerations for evaluation and generalization in interpretable
    machine learning, Explainable and interpretable models in computer vision and machine
    learning (2018) 3–17.