Exploring Commonalities in Explanation Frameworks: A Multi-Domain Survey Analysis Eduard Barbu1,* , Marharytha Domnich1 , Raul Vicente1 , Nikos Sakkas2 and André Morim3 1 Institute Of Computer Science, Tartu, Estonia 2 Apintech Ltd, POLIS-21 Group, Limassol, Cyprus 3 LTPlabs, Avenida da Senhora da Hora,459, Porto, Portugal Abstract This study presents insights gathered from surveys and discussions with specialists in three domains, aiming to find essential elements for an explanation framework that could be applied to these and possibly other use cases. The applications analyzed include a medical scenario (involving predictive ML), a retail use case (involving prescriptive ML), and an energy use case (also involving predictive ML). We interviewed professionals from each sector, transcribing their conversations for further analysis. Additionally, experts and non-experts in these fields filled out questionnaires designed to probe various dimensions of explanatory methods. The findings indicate a universal preference for sacrificing a degree of accuracy in favor of greater explainability. Additionally, we highlight the significance of feature importance and counterfactual explanations as critical components of such a framework. Our questionnaires are publicly available to facilitate the dissemination of knowledge in the field of XAI. Keywords machine learning, expert surveys, explainability framework 1. Introduction and Related Work This paper explores the role of AI in data-driven decision-making across sectors like healthcare, retail, and energy, highlighting the challenges of ML models’ complexity and opacity. It focuses on improving explanation understandability and trust through a study involving expert and layman feedback on different explanation types. Although the study focuses on developing a genetic programming (GP) tool to aid decision-making in these fields, the findings are relevant for any machine learning algorithm. This strategy enhances user trust and transparency across various ML models, providing applicable insights for AI applications. Research in explainable AI (XAI) aligns AI system explanations with user expectations and needs. Key studies, such as [1], highlight identifying crucial stakeholders in AI explainability and the development of a framework to meet these needs. Tools like the System Causability Scale Late-breaking work, Demos and Doctoral Consortium, colocated with The 2nd World Conference on eXplainable Artificial Intelligence: July 17–19, 2024, Valletta, Malta * Corresponding author. $ eduard.barbu@ut.ee (E. Barbu); marharyta.domnich@ut.ee (M. Domnich); raulvicente@gmail.com (R. Vicente); sakkas@apintech.com (N. Sakkas); andre.morim@ltplabs.com (A. Morim)  0000-0002-3664-5367 (E. Barbu); 0000-0001-5414-6089 (M. Domnich); 0000-0002-2497-0007 (R. Vicente); 0000-0003-4724-1322 (A. Morim) © 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR ceur-ws.org Workshop ISSN 1613-0073 Proceedings [2] and the System Usability Scale [3] have been introduced to assess ML explanation interfaces and their effectiveness. Furthermore, a novel questionnaire leveraging psychometrics [4] aims to reliably evaluate XAI method explanations, addressing explainability’s complex nature. This body of work underpins our effort to craft AI tools that meet the diverse requirements of pro- fessionals in fields such as medicine, retail, and energy, proposing a cross-disciplinary approach to enhance user satisfaction and trust in AI applications. In their literature review, the authors in [5] define five primary goals for AI system interactions with end users: understandability, trustworthiness, transparency, controllability, and fairness. They recommend designing XAI systems to achieve these objectives and suggest guidelines for creating explanations focusing on crucial system components. Additionally, they highlight the necessity for compromises in AI explanations, underlining the absence of a one-size-fits-all solution. The paper is organized as follows: we begin with an overview of related work. This is followed by introducing the three distinct use cases and their unique characteristics. In Section 3, we elaborate on the methodology employed in conducting the surveys. The paper concludes with a discussion of our findings and presents conclusions, including recommendations for developing a GP tool to support practitioners across three use cases. The developed questionnaires are publicly available to facilitate the dissemination of knowledge in the field of XAI. 2. The use cases Medical Scenario The medical scenario explores GP models for paraganglioma and diabetes, aiming to predict the tumor’s progression and diabetes presence. The model for paraganglioma seeks to guide physicians on treatment timing, enhancing shared decision-making, optimizing treatments, and reducing unnecessary interventions without substituting clinical judgment. For diabetes, the model uses a well-known dataset [6] to predict if a patient has or does not have diabetes. Retail use case Grocery stores use Dynamic Timeslot Pricing to balance customer satisfaction with efficiency in home delivery. They offer flexible delivery times while keeping costs low. This AI-based approach sets fair and clear prices by looking at customer data and delivery logistics to estimate how much customers are willing to pay and the cost to serve. An algorithm then matches customer preferences with delivery efficiency to find the best times and prices. The method, which sets slot prices using a specific formula (Prescriptive Model), depends on two support models—the Willingness to Pay (WTP) and Cost to Serve (CTS) models. Energy use case To recommend savings, the energy use case predicts household energy consumption by analyzing weather, historical usage, building dynamics, pricing, and indoor temperatures. It aims to offer users clear explanations to support informed decisions and to integrate these insights into business strategies for improved energy efficiency. Key considera- tions include weather conditions, past consumption patterns, building characteristics, pricing strategies for managing demand, and indoor temperature monitoring for energy conservation. The challenge is making these forecasts understandable and actionable, facilitating efficient energy use and decision-making in practical settings. 3. Survey methods This section outlines the survey methodologies applied to the three investigated use cases. Our approach incorporated two methods: conducting interviews with domain experts and distributing questionnaires to practitioners who may not have expert knowledge. Details of the surveyed experts are available at this link: Interviewed Experts Document. Links to the questionnaires for each use case can be found in the following subsections. Three medical doctors completed the medical use case questionnaires, while the retail questionnaires were filled out by the interviewed expert and six additional respondents. For the energy case, six respondents completed the questionnaires, four of whom were the experts interviewed. 3.1. Survey methods for the Medical Scenario The questionnaire, which focused on diabetes risk estimation and was developed for the medical scenario, aimed to explore the type of AI model explanations doctors need. Key areas ex- plored included the trade-off between accuracy and explainability, various presentation formats (such as symbolic regression graphs, genetic programming protocols, SHAP feature importance graphs, coefficients tables, and textual explanations), and their impact on understandability and decision-making effectiveness. Doctors were asked to rate each format’s interpretability and effectiveness on a 1 to 5 scale. Additionally, an interview focusing on the paraganglioma case collected insights on tumor identification, statistical prediction models, genetic factors, training protocols for new doctors, expectations from AI tools in managing paraganglioma, and the specific explanations needed for comprehending this condition. The questionnaire and interview outcomes are intended to guide the development of AI tools that effectively meet doctors’ informational needs and preferences. The questionnaire for the medical scenario can be explored here: Diabetes Questionnaire 3.2. Survey methods for the Retail Use Case The retail use case questionnaire was designed to delve into several key areas. First, they explored price breakthroughs to gauge the significance of location and demand and how clear the explanations were to customers. Next, the questionnaire sought to identify which types of explanations customers preferred and how well they understood them. Lastly, there was a focus on summarization assessment to evaluate the need for summaries in conjunction with detailed pricing information. This part aimed to assess how these summaries affected clarity and influenced decision-making. Participants rated explanations on interpretability and effectiveness from 1 (least) to 5 (highest), aiming to understand the extent to which explanations helped in decision-making and their clarity to customers. For this use case, two questionnaires have been devised for two categories of users. 1. Decision-makers Seek a comprehensive understanding of feature contributions to model predictions for system optimization. With their expert background, they prefer detailed, technical explanations to build trust and validate the model’s use based on its accuracy. Decision-Makers Questionnaire 2. Customers Favor straightforward, accessible explanations that still convey essential information, aiding in understanding the rationale behind received offers without over- whelming technical detail.Customers Questionnaire The interview, which was recorded as a video file, explored issues such as finding a balance between accuracy and explainability in e-commerce models, the incorporation of graphs and mathematical formulas into explanations, understanding customer behavior through the dy- namic relationship between slot availability and pricing, and designing a dynamic dashboard to manage the interaction between operational efficiency and customer behavior effectively. 3.3. Survey methods for the Energy Use Case The questionnaire targets operational managers and customers, aiming to identify their preferred formats (tables, charts, interactive graphics, text) and types of explanations (causal, contrastive, counterfactual) for model predictions. Operational managers, the primary audience, must provide detailed feedback based on their expertise. They will focus on how model features affect predictions and optimization opportunities to enhance their trust and model endorsement through accurate and complex explanations. In contrast, customers likely prefer simpler, straightforward explanations that clarify the rationale behind offers. The energy questionnaire delves into key areas like the accuracy-explainability trade-off, the value of explanations in forecasting, the role of what-if scenarios in understanding model outcomes, and the specific needs of facility managers for detailed explanations and visualization tools such as SHAP graphs, highlighting preferences for explanation frequency and detail level. All interviewed experts and five additional energy experts have completed the questionnaire. Energy Questionnaire The interviews explored the energy problem from various angles, each tailored to the inter- viewee’s expertise. Discussions ranged from addressing market challenges in energy solutions and the importance of clear explanations for end-users to exploring energy consumption disag- gregation and the role of genetic programming in enhancing analysis. Insights were also shared on leveraging machine learning for water consumption monitoring to optimize resource man- agement and identify inefficiencies. Additionally, the design and usability of user interfaces for energy management systems were examined, emphasizing the need for intuitive and engaging interfaces to manage energy consumption better. 4. Results 4.1. Medical scenario Figure 1 summarizes key findings from the diabetes questionnaire. Doctors prefer AI explanations that balance a slight decrease in accuracy for better clarity, find complex graphs challenging, and favor clear, intuitive details like protocols and SHAP graphs. Simplification and clarity were highlighted as essential for effectively conveying model logic, with counterfactual explanations being particularly valued for their potential to improve patient understanding and therapy compliance. Figure 1: Insights into doctors’ preferences for medical scenario derived from the questionnaire. Feature importance graphs were most favored, followed by textual explanations and rule- based protocols. Graphs and coefficient tables were least preferred due to concerns about understandability. Interview insights highlight the novelty of our paraganglioma models due to a lack of bench- marks to measure the accuracy of our models, the critical role of genetic data in personalized medicine, and the need for tools to monitor tumor growth. The value doctors place on model predictions for patient communication emphasizes the importance of accurate, explainable models to foster trust and informed decisions. Initial tests on GP models for paraganglioma are documented in [7], providing detailed outcomes. 4.2. Retail use case The decision-makers seek explanations across various dimensions: customer behavior, trans- portation costs, and strategies for maximizing profits. The questionnaires findings are summa- rized in the figure 2 Figure 2: Insights into online retail decision-makers preferences derived from the questionnaire. In feedback from decision-makers on AI system explanations, there’s an openness to sacrific- ing a portion of model performance for enhanced explainability, with preferences for detailed yet intuitive insights into model workings. This encompasses a broad interest in customer behavior, cost analysis, and profit strategies, highlighting a desire for interactive tools and visualizations that facilitate deeper understanding and strategic adjustments. There’s a notable emphasis on practical application, with decision-makers valuing features like counterfactual ex- planations and the ability to interpret and act upon complex information, all aimed at optimizing operational efficiency and customer engagement. The interview highlighted a preference for explainability over accuracy, with caution ad- vised due to limited machine learning expertise. Simple visual explanations and mathematical formulas are preferred to avoid complexity. Graphical dashboards are recommended for assess- ing operational efficiency and customer behavior, enhancing interpretability and interaction. Counterfactual explanations are valued for demonstrating the impact of decisions such as new scheduling slots. Developing models that identify customer characteristics and behaviors by region is essential for deeper business insights. 4.3. Energy use case The insights from operational and facility managers are summarized in figure 3. Figure 3: The insights from the energy questionnaire from operational and facility managers Operational managers favor a balance between accuracy and transparency, adjusting the trade-off based on the audience. They prefer visual and simple mathematical explanations to suit various stakeholder technical levels. Graphical dashboards are effective for insights into efficiency and customer behavior, with counterfactual explanations providing useful scenario analysis. Strategic analyses, such as regional behavior modeling and what-if scenarios, highlight the value of feature importance graphs and counterfactuals in delivering clear, actionable insights for decision-making and management. Insights from the interviews demonstrate a preference for explanatory forecasting models over basic ones, with methods applicable across sectors like gas and energy. Ease of use and interactive elements are advised for the graphical interface, alongside a smartphone component for energy applications to enable notifications. For detailed analyses of GP models in energy, see [8] and [9]. 4.4. General guidelines The table 1 summarizes the overarching guidelines derived from the survey findings. Table 1 Guidelines and Insights from User Studies on Explanatory Tool’s Architecture Domain Insight Recommendation All Preference for explainability over perfect Balance explainability and accuracy, accuracy, feature importance graphs as utilize feature importance graphs, effective communication tools, and value and supplement counterfactuals for of counterfactual explanations. comprehensive understanding. Drawing from these insights, the design of the explanatory tool should incorporate two essential modules: a Counterfactual Module, which calculates the minimal changes required to shift the model’s decision towards a desired outcome, thereby enabling "What-if" scenarios based on user queries, and a Global Importance Module, which provides visualization of the significant feature contributions to the model’s predictions, in line with findings from the user studies. Both modules should be integrated within the tool, ensuring that the inputs, outputs, and connections between modules are well-defined. 5. Conclusions This study identifies foundational components for an XAI framework intended for various applications through comprehensive questionnaires and interviews with domain experts in three distinct use cases. The envisioned XAI tool incorporates a Counterfactual Module to facilitate "What-if" scenarios, allowing users to see how minimal changes could lead to desired outcomes. Additionally, a Global Importance Module is designed to visually represent the most influential features in model predictions, resonating with the XAI literature emphasizing the critical role of feature importance and counterfactual explanations. While aiming for shared applicability, the framework also acknowledges the unique requirements of each specific case, although the detailed exploration of these unique case aspects was beyond this paper’s scope. This approach informs the ongoing development of the AI tool, leveraging insights gathered from user studies to ensure the tool’s effectiveness across different domains. Our tool is now prepared for evaluation by experts across the three fields. We will integrate their feedback into an updated version of the tool. For future research, the interest in online retail and energy sectors for customizable and user-specific explanations points towards a growing trend. This trend leans towards integrating NLP interactivity into explanations, an area we are beginning to explore. Acknowledgments This research was conducted under the Transparent, Reliable, and Unbiased Smart Tool for AI (Trust-AI) project, with Grant Agreement ID: 952060, funded by the EU Commission. References [1] M. Langer, D. Oster, T. Speith, H. Hermanns, L. Kästner, E. Schmidt, A. Sesing, K. Baum, What do we want from explainable artificial intelligence (xai)? – a stakeholder perspective on xai and a conceptual model guiding interdisciplinary xai research, Artificial Intelligence 296 (2021) 103473. URL: https://www.sciencedirect.com/science/article/pii/S0004370221000242. doi:https://doi.org/10.1016/j.artint.2021.103473. [2] A. Holzinger, A. M. Carrington, H. Müller, Measuring the quality of explanations: The system causability scale (SCS). comparing human and machine explanations, CoRR abs/1912.09024 (2019). URL: http://arxiv.org/abs/1912.09024. arXiv:1912.09024. [3] M. Dragoni, I. Donadello, C. Eccher, Explainable ai meets persuasiveness: Translating reasoning results into behavioral change advice, Artificial Intelligence in Medicine 105 (2020) 101840. URL: https://www.sciencedirect.com/science/article/pii/S0933365719310140. doi:https://doi.org/10.1016/j.artmed.2020.101840. [4] G. Vilone, L. Longo, Development of a human-centred psychometric test for the evalua- tion of explanations produced by xai methods, in: L. Longo (Ed.), Explainable Artificial Intelligence, Springer Nature Switzerland, Cham, 2023, pp. 205–232. [5] S. Laato, M. Tiainen, A. Najmul Islam, M. Mäntymäki, How to explain ai systems to end users: a systematic literature review and research agenda, INTERNET RESEARCH 32 (2022) 1–31. doi:10.1108/INTR-08-2021-0600, funding Information: The initial literature search upon which this article develops was done for the following Master’s thesis published at the University of Turku: Tiainen, M., (2021), To whom to explain and what?: Systematic literature review on empirical studies on Explainable Artificial Intelligence (XAI), available at: https://www.utupub.fi/handle/10024/151554, accessed April 2, 2022. Publisher Copyright: © 2021, Samuli Laato, Miika Tiainen, A.K.M. Najmul Islam and Matti Mäntymäki. [6] J. W. Smith, J. E. Everhart, W. C. Dickson, W. C. Knowler, R. S. Johannes, Using the adap learning algorithm to forecast the onset of diabetes mellitus, in: Proceedings of the Annual Symposium on Computer Application in Medical Care, 1988, pp. 261–265. [7] E. M. C. Sijben, J. C. Jansen, P. A. N. Bosman, T. Alderliesten, Function class learning with genetic programming: Towards explainable meta learning for tumor growth functionals, 2024. arXiv:2402.12510. [8] N. Sakkas, S. Yfanti, P. Shah, N. Sakkas, C. Chaniotakis, C. Daskalakis, E. Barbu, M. Domnich, Explainable approaches for forecasting building electricity consumption, Energies 16 (2023). URL: https://www.mdpi.com/1996-1073/16/20/7210. doi:10.3390/en16207210. [9] N. Sakkas, S. Yfanti, C. Daskalakis, E. Barbu, M. Domnich, Interpretable forecasting of energy demand in the residential sector, Energies 14 (2021). URL: https://www.mdpi.com/ 1996-1073/14/20/6568. doi:10.3390/en14206568.