<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>A Twin XCBR System Using Supportive and Contrastive Explanations</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Betül Bayrak</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Kerstin Bach</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Norwegian University of Science and Technology (NTNU)</institution>
          ,
          <addr-line>Høgskoleringen 1, 7034 Trondheim</addr-line>
          ,
          <country country="NO">Norway</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Machine learning models are increasingly being applied in safety-critical domains. Therefore, ensuring their trustworthiness and reliability has become a priority. Uncertainty measures the lack of trust in these models, and explanation systems designed as twin systems can provide insights into model decisions to users. Case-based reasoning (CBR) is an experience-based problem-solving methodology with applications across various domains. In this work, we propose a novel approach to generate a twin system, specifically a multi-agent CBR system (MA-CBR system), which utilizes feature attribution-based Explainable Artificial Intelligence (XAI) techniques to explain black-box models in multi-class classification tasks. The proposed approach provides contrastive or supportive instance-based explanations, enabling users to interpret model outputs. Furthermore, we introduce an evaluation metric to assess the system's quality based on its supportiveness for the performance of the underlying black-box model, which we measure through a confidence score. To evaluate the performance of our approach, we apply it to three distinct datasets with difering characteristics. Our results demonstrate the efectiveness of the proposed approach in generating explanations for black-box models in multi-class classification tasks.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Explainable Artificial Intelligent (XAI)</kwd>
        <kwd>Explanation Case-Based Reasoning (XCBR)</kwd>
        <kwd>Model-Agnostic Explanation Generation</kwd>
        <kwd>Twin XAI Systems</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        The relationship between the performance and interpretability of machine learning models,
which are often considered to have a trade-of between complexity and interpretability, has
been a topic of discussion. A machine-learning model with low interpretability and high opacity
can be called a black-box model [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. Due to the low level of understandability by individuals
in the applications of black-box models, especially as their real-world use increases, interest
has started to grow in XAI systems that help understand the reasons behind the decisions
made by these models. As black-box models are increasingly applied in safety-critical domains,
ensuring their trustworthiness and reliability has become a priority, and uncertainty is defined
as a measure of the lack of trust in these models [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. Considering the demand for enhancing
understandability and ensuring reliability, XAI systems designed as twin systems are suitable to
meet the requirements for providing insights into model decisions to users. Further information
about twin-XAI systems can be found in Section 2.2.
      </p>
      <p>
        In XAI systems, the dimensions of explanations can be simplified as global and local
explanations. Global explanations ofer insights into understanding the overall logic of a model and
encompass the entire reasoning process that leads to all the diferent possible outcomes. On the
other hand, local explanations focus on understanding the specific reasons behind a particular
decision, such as a single prediction or decision. [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]
      </p>
      <p>
        Case-based reasoning (CBR) is a problem-solving approach that utilizes past experiences and
has broad applicability in various domains. By leveraging the flexibility and interpretability of
the CBR methodology, eXplanation CBR (XCBR) systems, CBR systems designed to explain a
model, can provide global and/or local explanations. Additionally, these systems are adaptable
to changes in data distribution and can generate trustworthy explanations using a small amount
of data [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ].
      </p>
      <p>This work proposes a novel approach to generate a twin XAI system. The twin system is
developed as a multi-agent CBR (MA-CBR) system that utilizes feature attributions to explain
multi-class classification black-box models. In the XCBR system, an agent is developed for
each class, and every agent is modeled separately. In the modeling phase, feature attributions
and data distribution are used. Thereby, this approach projects the diferent characteristics of
classes through feature attribution. The proposed approach provides contrastive or supportive
instance-based explanations that enable users to interpret model outputs. Furthermore, an
evaluation metric is introduced to assess the adaptability of the system based on supportiveness
for the performance of the underlying black-box model, which is .</p>
      <p>To evaluate the performance of our approach, we apply it to three distinct datasets with
difering characteristics. Our results demonstrate the efectiveness of the proposed approach in
generating interpretable explanations for black-box models in multi-class classification tasks.</p>
      <p>The main contributions of this work are as follows:
• Facilitate the incorporation of expert knowledge into the XCBR system, thereby improving
the reliability and trustworthiness of the explanations provided.
• Multi-agent structure enables the generation of instance-based explanations that
incorporate locality and globality in the explanations.
• Proposes an evaluation metric, , to measure the adaptability of the black-box
model’s performance through the proposed explanation system.
• Provides reproducible benchmarking experiments and open-source implementation of
the proposed approach and evaluation metric (https://github.com/b-bayrak/Twin_XAI).</p>
      <p>The rest of the paper is structured as follows. Section 2 provides an overview of background
information and related work. In Section 3, details of the proposed approach and evaluation
metric are described. Conducted experiments, use cases, results, discussions, and future work
directions are given in Section 4. Finally, Section 5 presents a brief conclusion of the paper.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Background and Related Work</title>
      <p>This paper combines the following lines of related research: Uncertainty focused XAI, Twin
XAI systems, and CBR for XAI.</p>
      <sec id="sec-2-1">
        <title>2.1. Uncertainty</title>
        <p>
          Uncertainty in black-box models has been an active discussion topic of research in machine
learning and artificial intelligence. Several techniques have been proposed to measure and
represent uncertainty in black-box models, such as Monte Carlo dropout [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ] and Bayesian
network’s weight patterns [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ]. However, these techniques often sufer from high computational
costs and limited applicability to diferent data types. To address this issue, decision-support
XAI systems have been proposed to provide users with a better understanding of the uncertainty
inherent in black-box models. These systems typically generate explanations that highlight the
key features that contribute to the model’s output and provide a measure of the confidence or
uncertainty associated with each prediction [
          <xref ref-type="bibr" rid="ref5 ref6 ref7">6, 5, 7</xref>
          ]. Overall, using XAI systems to represent
uncertainty in black-box models has shown promise in improving user trust and understanding
of the model’s output.
        </p>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. Twin Explanation Systems</title>
        <p>Considering the demand for enhancing understandability and ensuring reliability, twin XAI
systems are suitable to meet the requirements for providing insights into model decisions to
users.</p>
        <p>
          Twin XAI systems consist of two separate models trained on the same dataset. One of the
models acts as the primary model, the black-box model, and the explainer model provides
explanations for black-box model decisions. The explainer model provides insights into how
the black-box model makes its predictions, thereby providing greater transparency and
accountability, and it can be built as diferent types like a machine learning algorithm[
          <xref ref-type="bibr" rid="ref8">8</xref>
          ], rule-based
model [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ], or CBR system [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ].
        </p>
      </sec>
      <sec id="sec-2-3">
        <title>2.3. CBR Methodology for XAI Systems</title>
        <p>
          CBR has been applied to various domains, including medical diagnosis [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ], and financial fraud
detection [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ]. Recently, there has been an emerging interest in developing twin XCBR systems,
which generate instance-based explanations that allow incorporating the efects of both local
and global features. For instance, Bayrak et al. [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ] proposed a twin XCBR system that utilizes
feature attributions to explain multi-class classification black-box models, and Ahmed et al.
[
          <xref ref-type="bibr" rid="ref14">14</xref>
          ] showed that a CBR model as an explainer can perform good enough with the help of the
additive models.
        </p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Twin XCBR System</title>
      <p>In this section, we comprehensively describe the methodology utilized to construct the proposed
twin XCBR system.</p>
      <sec id="sec-3-1">
        <title>3.1. Multi-agent Structure</title>
        <p>To build the MA-CBR system, which functions as an explainer system as depicted in Figure 1,
both a dataset and a black-box model are required. The input data set may be the same as the
one utilized for training the black-box model, a diferent data set with a similar concept and
distribution, or a combination of both. In all circumstances, the input samples must be labeled
since all of the samples need to be grouped by their labels (i.e., classes), and each distinct class
necessitates the creation of an agent for the MA-CBR system design. Each agent has a case base
for which the global and local similarity measures are developed independently. Consequently,
this approach allows for the projection of diverse characteristics of distinct classes through
feature attributions and expert knowledge.</p>
        <p>
          SHAP values calculated separately for each class over the black-box model are employed as
weights for the CBR agent’s global similarity measure. To develop the local similarity measures,
a data-driven similarity measure development method is employed [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ]. In this approach, Verma
et al. proposed an Inter Quartile Range-based polynomial modeling. For both global and local
similarity measure developments, expert knowledge (if available) may be incorporated (refer to
Section 3.2).
        </p>
        <p>Following the development of similarity measures for each case base, data samples (cases) are
classified according to their respective labels and subsequently incorporated into the appropriate
case base. After completing these procedures, the MA-CBR system is deemed ready for querying.</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Injecting Expert Knowledge</title>
        <p>Under circumstances where expert knowledge is available, diferent types of domain
knowledge provided by experts can be incorporated into the system. The incorporation of domain
knowledge can be done on two diferent levels:
• At the data level, it can be integrated into the MA-CBR system in various ways. This
includes the exclusion of existing features, the inclusion of new features, or the
combination and compression of existing features. Additionally, it can be utilized to eliminate
invalid cases using rule-based domain-specific conditions.
• At the similarity measurement level, it can be incorporated into both global and local
similarity measurements. In the global similarity, domain knowledge can be used to
improve feature importance weights that are calculated by SHAP values and their efects
on diferent categories. In the local similarity, domain knowledge can help to define
relationships between diferent categories and the ranges of features.</p>
        <p>Domain knowledge can be used during the system design process or can be added retroactively,
even after the system has already been put into use. The system’s flexible structure allows for
such additions or modifications.</p>
        <p>Also, expert knowledge can provide additional insights and enable more transparent and
interpretable decision-making processes, thereby incorporating expert knowledge into the
explanation system plays a crucial role in ensuring compliance with the General Data
Protection Regulation (GDPR) in terms of addressing concerns around bias and meeting regulatory
requirements related to data protection and privacy.</p>
      </sec>
      <sec id="sec-3-3">
        <title>3.3. Explanation Generation and Representation</title>
        <p>After building up the MA-CBR system, this section represents how the explanations are
generated and represented. As shown in Figure 2, the system’s input is an explanation case, .
An explanation case consists of a data sample  and its prediction result  generated by the
black-box model. Where  = 0, 1, ..., ,  =  () and  = , .</p>
        <p>When a new explanation case, , arrives at the explainer, a new query is conducted by all
agents. Each agent returns the most similar cases with their corresponding similarity scores.
The query result with the highest similarity score, , is selected as the explanation source and
used for comparison. The similarity score of the selected query result is used as the confidence
score for the explanation. A class comparison is made between the selected explanation source
and  to see if they are identical. If they are identical, this indicates agreement between the
CBR system’s result and the black-box model. The winning query result, , will be used as a
supportive explanation with a confidence score will be provided. If they are not the same,  as
a contrastive explanation with a confidence score will be provided.</p>
        <p>In both circumstances, the proposed system provides a clear and informative output,
consisting of the explanation case, , the winning query result, , and a confidence score that
reflects the level of agreement or disagreement with the black-box model. Moreover, the system
presents a sorted version of attributes based on feature importance and diference. The inclusion
of both  and , with the attribute order, allows users to reason semantically between them
and gain a comprehensive understanding of the prediction process.</p>
        <p>Considering the demand for enhancing understandability and ensuring reliability, the
proposed supportive/contrastive explanations are suitable to meet the requirements for providing
insights into model decisions to users.</p>
      </sec>
      <sec id="sec-3-4">
        <title>3.4. Evaluation Technique</title>
        <p>Assuming that  contains  data samples,  = 1, ..., , and  represents the set of
prediction results made by black-box model, where  = 1, ...,  and  =  (). And, 
is a subset of the  consisting of supported predictions by the explainer. The support score,
, measures the ratio of the supported black-box model decisions by the explanation
system.</p>
        <p>= || (1)</p>
        <p>The expected behavior of the explainer for the  score is to be approximately
proportional to black-box model’s accuracy. Here,  denotes the accuracy score of black-box
modelcalculated as the comparison between  and ground truth. For example, if the black-box
modelis a high performance model, a high  score is expected because the idea of the
explanation is to support the black-box modelprediction with a supportive instance when it is
correct.</p>
        <p>The proposed evaluation technique, , measures the adaptability of the explainer to
the black-box model’s performance.</p>
        <p>= |1 −  | (2)</p>
        <p>A low  indicates better performance by the explainer with the black-box model. The
lower the value, the better.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Experiments</title>
      <p>In order to ensure that our approach is applicable to diferent domains and evaluate the
efectiveness of our proposed approach, we conducted experiments on three datasets which were
carefully selected to represent a variety of characteristics.</p>
      <sec id="sec-4-1">
        <title>4.1. Use Cases</title>
        <p>In the use cases, a standardized approach was employed. The datasets were partitioned into
training and testing sets to ensure a balanced representation across classes. The cases were then
constructed using the training data and imported into the relevant case bases. Local similarity
measures were set using the calculated IQRs of attributes as explained in Section 3.1 and the
global similarity measures of the case bases were set using calculated SHAP values for each
class separately. With the explainer system in place, explanation cases were provided as inputs,
and supportive or contrastive explanations were generated as shown in Fig 1. This systematic
process ensured consistency and facilitated the evaluation of the proposed approach.</p>
        <sec id="sec-4-1-1">
          <title>4.1.1. Use-case 1: Depression Screening Dataset</title>
          <p>
            The dataset[
            <xref ref-type="bibr" rid="ref16">16</xref>
            ] used in this use-case was collected to measure the level of depression among
undergraduate students. The participants consisted of undergraduate students from Tecnológico
Nacional de México (TecNM)/Instituto Tecnológico de Mérida (ITM) between May 2020 and
December 2020, ranging in age from 17 to 23 years. All relevant guidelines and regulations
were adhered to, and the students provided their consent to participate in the study. As part of
the study, the students were required to complete a 102-item questionnaire, where a response
of "true" was assigned a value of 1 and "false" a value of 0.
          </p>
          <p>
            This use case has been presented in a preliminary study of this paper. The dataset and
domain knowledge was provided as part of the XCBR challenge track at the 2022 International
Conference on Case-Based Reasoning (ICCBR-2022), and details of the preliminary experiments
can be found in [
            <xref ref-type="bibr" rid="ref13">13</xref>
            ].
          </p>
          <p>The dataset consists of 105 samples and is designed for a 3-class classification model of
depression levels, with the three possible classes indicating severity levels with varying numbers
of samples per class. Domain knowledge is provided as "expected answers" that represent the
expected responses from an individual with depression. Domain knowledge is incorporated
into the explanation system as a new attribute called "Matched", which represents the number
of overlapping items between actual and expected answers. For the experiments, 20% of the
dataset is used as test data, while the remaining 80% (train data) and its oversampled version
are used to train the black-box models and build the explanation systems.</p>
          <p>We conducted three diferent experiments for this use case. The first experiment compared
the performance of CBR systems with that of black-box models using raw and oversampled
data and the second experiment compared the performance of CBR systems with and without
domain knowledge. In the third experiment, cases were constructed using the train data and
the calculated "Matched" attribute (domain knowledge). With the first and second experiments,
the performance of the CBR system improved from 0.29 to 0.52 accuracy score using raw data
with domain knowledge. In the third experiment, the global similarity measures of the case
bases were set using calculated SHAP values for each class separately for the MA-CBR system.
The cases were constructed using the train data and the calculated "Matched" attribute (domain
knowledge). In this experiment, the test set comprised 21 instances, and the built black-box
model, an MLP, achieved an accuracy score of 0.24. Explanations were generated for each
instance in the test set, and the system supported 10 decisions out of the 21 decisions made by
the black-box model. The rigidity of the explanation system was calculated to be 0.984.</p>
        </sec>
        <sec id="sec-4-1-2">
          <title>4.1.2. Use-case 2: SelfBack App Usage Prediction Dataset</title>
          <p>
            SelfBack project1 developed a decision support system to improve self-management of
nonspecific low back pain [
            <xref ref-type="bibr" rid="ref17">17</xref>
            ]. As part of the project, a mobile application was developed to convey
personalized self-management content through exercises and educational content. The App
Usage dataset is derived from the collected data, and in this use case, we predict the usage
of the mobile application given the answers to a baseline questionnaire that characterizes a
user’s situation with regard to their current episode of back pain. The underlying question
in this use case is whether the suggested intervention would suit a user. Identifying patients
with lower or moderate app engagement enables personalized reminders, interactive exercises,
and educational content to engage them actively. These interventions aim to enhance patient
involvement, treatment adherence, and overall outcomes.
          </p>
          <p>The App Usage dataset contains data from 230 users and 26 continuous and nominal features
derived from users’ input (baseline questionnaire). Data instances were labeled using 3 classes
indicating levels of app usage with varying numbers of samples per class. This use-case applies
real-world data, and domain knowledge was incorporated to model the local similarity measures.
For the experiments, 30% of the dataset was used as test data, while the remaining 70% (train
data) was used to train the black-box models and build the explanation systems.</p>
          <p>The test set comprised 69 instances, and the built black-box model(an MLP) achieved an
accuracy score of 0.58. Explanations were generated for each instance in the test set, and the
system supported 48 decisions out of the 69 decisions made by the black-box model. The rigidity
of the explanation system was calculated to be 0.199.</p>
        </sec>
        <sec id="sec-4-1-3">
          <title>4.1.3. Use-case 3: Wine Quality Dataset</title>
          <p>
            The dataset used in this use-case is related to red variants of the Portuguese "Vinho Verde" wine
[
            <xref ref-type="bibr" rid="ref18">18</xref>
            ]. It contains 4898 rows and 12 features, with ordered and unbalanced classes. There is no
domain knowledge incorporated in this use-case. For the experiments, 30% of the dataset is
used as test data, while the remaining 70% (train data) is used to train the black-box models and
build the explanation systems.
          </p>
          <p>The test set consisted of 1950 instances, and for this use-case, we trained five models: K
Nearest Neighbors Classifier , MLP, Decision Tree Classifier , Gradient Boosting Classifier , and
Random Forest Classifier . They achieved accuracy scores of 0.749, 0.767, 0.776, 0.839, and 0.848
respectively. The same procedure was applied to all models using the same train-test set split
(refer to Figure 3).</p>
          <p>To compare with other use-cases, we selected the MLP model, which achieved an accuracy
score of approximately 0.77. The system supported 1930 decisions out of the 1950 decisions
made by the black-box model. The rigidity of the explanation system was calculated to be
0.2197.</p>
        </sec>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. Discussion</title>
        <p>This paper aims to propose a novel approach to generate a twin XAI system and apply the
approach to diverse datasets to assess the robustness and generalizability across diferent
domains. In the presented use cases, MLP models were trained as black-box models and used to
build the twin systems without refining them. Some of the models performed with very low
accuracy, while others had comparably better accuracy scores (see Table 1). Also in Figure 3,
use-case 3, the same setup is used with 5 models with diferent performances. These experiments
allowed us to demonstrate the applicability of the system. The approach is applied to two data
sets with a very limited number of samples (use-case 1 and 2) and one dataset with a large number
of samples (use-case 3), and all data sets are unbalanced. In use-case 1, the first experiment
showed that the built CBR models perform comparably better than the built black-box models.
As a result, we showed that the proposed approach is applicable to unbalanced data sets and
has considerable performance with data sets of diferent sizes. This is an essential feature in
real-world explanation systems.</p>
        <p>As mentioned in Section 3.2, expert knowledge can be incorporated into the system in
diferent ways. For example, in use-case 1, the provided expert knowledge is incorporated as a
new attribute. In use-case 2, it was used to model similarities, while no expert knowledge was
used in use-case 3. In either case, the explanation system was built successfully. The contribution
of domain knowledge is clearly shown in the second experiment of use-case 2, see Section 4.1.2.</p>
        <p>As stated above, the wine quality (use-case 3), app usage(use-case 2), and depression screening
(use-case 1) applications are built on a black-box modelwith 0.77, 0.58, and 0.24 accuracy scores,
respectively. Also, their support scores are calculated as 0.9897, 0.696, and 0.476, and rigidity
scores are 0.2197, 0.199, and 0.984, respectively. As can be understood from the results, when
the accuracy score decreases, the support score also decreases, as expected in an adaptable
explanation system. A low rigidity score indicates better performance, so in a perfect explanation
system, the rigidity score would be 0. In our experiments, similar to the expected behavior of the
explainer, the rigidity scores are approximately proportional to black-box models’ accuracies.
Also, in use-case 3 with 0.77 accuracy score and 0.9897 support score performs worse than
the other two in terms of rigidity because 1930 of the 1950 decisions made by the black-box
modelare supported, and the performance of the black-box modelis not perfect, so a lower
number of supported decisions is expected.</p>
        <p>As mentioned before, we expect a flexible system’s rigidity score to approach 0. In use-case
3, we anticipated an approximately linear relationship between accuracy and support scores.
Figure 3 demonstrates that, in most cases, as the accuracy score increases, the support score also
increases accordingly, with the exception of the MLP model. In the case of the MLP model, the
support score approaches 1 while the accuracy score is around 0.77. This observation suggests
that the MLP model and the constructed MA-CBR systemexplainer exhibit similar behaviors,
indicating a certain level of dependence on the explainer’s performance. This dependency
presents a challenge that should be addressed in future work.</p>
        <p>
          In the use cases, data sets with diferent data types and characteristics were used. However,
due to the nature of the tools used, only tabular data was used. For future work, this approach
can be extended to diferent kinds of data, such as images, using a similar approach to Barnett
et al.’s work [
          <xref ref-type="bibr" rid="ref19">19</xref>
          ]. Another area for improvement is the representation of the explanations and
the measurement of the efect of the explanations on the users through a user study.
        </p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Conclusion</title>
      <p>In conclusion, we have proposed a novel approach to generate a twin XAI system that utilizes
feature attributions to explain multi-class classification black-box models. Our approach employs
a MA-CBR system, where an agent is developed for each class, modeling their similarity measures
separately. By projecting the diferent characteristics of classes through feature attribution,
our approach provides contrastive or supportive instance-based explanations that enable users
to interpret model outputs. Moreover, we introduced an evaluation metric, rigidity, to assess
the system’s quality based on supportiveness for the performance of the underlying
blackbox model. Through experiments on three distinct datasets with difering characteristics, we
demonstrated the efectiveness and applicability of our approach in generating explanations for
black-box models in multi-class classification tasks. Our work also facilitates the incorporation
of expert knowledge into the XCBR system, improving the reliability and trustworthiness of the
explanations provided, and provides a reproducible benchmarking experiment and open-source
implementation of the proposed approach and evaluation metric. While our approach currently
only supports tabular data, it can be extended to other data types, such as images, using a similar
approach to previous works. Our explanation system can be useful in various domains, including
healthcare, finance, and law, where XAI is essential. Our proposed approach contributes to
the growing research on explainable AI and can provide valuable insights for stakeholders in
various domains.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>R.</given-names>
            <surname>Guidotti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Monreale</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Ruggieri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Turini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Giannotti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Pedreschi</surname>
          </string-name>
          ,
          <article-title>A survey of methods for explaining black box models, ACM computing surveys (CSUR) 51 (</article-title>
          <year>2018</year>
          )
          <fpage>1</fpage>
          -
          <lpage>42</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>K. E.</given-names>
            <surname>Brown</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. A.</given-names>
            <surname>Talbert</surname>
          </string-name>
          ,
          <article-title>Using explainable ai to measure feature contribution to uncertainty</article-title>
          , in: The
          <source>International FLAIRS Conference Proceedings</source>
          , volume
          <volume>35</volume>
          ,
          <year>2022</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>B.</given-names>
            <surname>Bayrak</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Bach</surname>
          </string-name>
          , When to Explain?
          <article-title>Model Agnostic Explanation Using a Case-based Approach and Counterfactuals</article-title>
          , in: A.
          <string-name>
            <surname>Rutle</surname>
          </string-name>
          (Ed.),
          <source>Norsk IKT-konferanse for forskning og utdanning, 1</source>
          ,
          <year>2022</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>G.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <article-title>Monte-carlo dropout based uncertainty analysis in input attributions of multivariate temporal neural networks (</article-title>
          <year>2020</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>K.</given-names>
            <surname>Bykov</surname>
          </string-name>
          ,
          <string-name>
            <surname>M. M.-C. Höhne</surname>
          </string-name>
          ,
          <string-name>
            <surname>K.-R. Müller</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Nakajima</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Kloft</surname>
          </string-name>
          ,
          <article-title>How much can i trust you?- quantifying uncertainties in explaining neural networks</article-title>
          , arXiv preprint arXiv:
          <year>2006</year>
          .
          <volume>09000</volume>
          (
          <year>2020</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>A.</given-names>
            <surname>Dubey</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Lyons</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Santra</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Pandey</surname>
          </string-name>
          ,
          <article-title>Xai-bayeshar: A novel framework for human activity recognition with integrated uncertainty and shapely values</article-title>
          ,
          <source>arXiv preprint arXiv:2211.03451</source>
          (
          <year>2022</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>J.</given-names>
            <surname>Jiang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Kahai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <article-title>Who needs explanation and when? juggling explainable ai and user epistemic uncertainty</article-title>
          ,
          <source>International Journal of Human-Computer Studies</source>
          <volume>165</volume>
          (
          <year>2022</year>
          )
          <fpage>102839</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>F.</given-names>
            <surname>Silva-Aravena</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H. Núñez</given-names>
            <surname>Delafuente</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. H.</given-names>
            <surname>Gutiérrez-Bahamondes</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Morales</surname>
          </string-name>
          ,
          <article-title>A hybrid algorithm of ml and xai to prevent breast cancer: A strategy to support decision making</article-title>
          ,
          <source>Cancers</source>
          <volume>15</volume>
          (
          <year>2023</year>
          )
          <fpage>2443</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>D.</given-names>
            <surname>Macha</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Kozielski</surname>
          </string-name>
          , Ł. Wróbel,
          <string-name>
            <given-names>M.</given-names>
            <surname>Sikora</surname>
          </string-name>
          ,
          <article-title>Rulexai-a package for rule-based explanations of machine learning model</article-title>
          ,
          <source>SoftwareX</source>
          <volume>20</volume>
          (
          <year>2022</year>
          )
          <fpage>101209</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>E. M.</given-names>
            <surname>Kenny</surname>
          </string-name>
          , M. T. Keane,
          <article-title>Explaining deep learning using examples: Optimal feature weighting methods for twin systems using post-hoc, explanation-by-example in xai</article-title>
          ,
          <source>KnowledgeBased Systems</source>
          <volume>233</volume>
          (
          <year>2021</year>
          )
          <fpage>107530</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>S.</given-names>
            <surname>Sharma</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Mehrotra</surname>
          </string-name>
          ,
          <article-title>Two-stage cbr based healthcare model to diagnose liver disease</article-title>
          ,
          <source>International Journal of Computing and Digital Systems</source>
          <volume>10</volume>
          (
          <year>2021</year>
          )
          <fpage>1</fpage>
          -
          <lpage>8</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>I.</given-names>
            <surname>Eweoya</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Adebiyi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Azeta</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. E.</given-names>
            <surname>Azeta</surname>
          </string-name>
          ,
          <article-title>Fraud prediction in bank loan administration using decision tree</article-title>
          ,
          <source>in: Journal of Physics: Conference Series</source>
          , volume
          <volume>1299</volume>
          ,
          <string-name>
            <given-names>IOP</given-names>
            <surname>Publishing</surname>
          </string-name>
          ,
          <year>2019</year>
          , p.
          <fpage>012037</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>B.</given-names>
            <surname>Bayrak</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Marin Veites</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Bach</surname>
          </string-name>
          ,
          <article-title>Explaining your neighbourhood: A CBR approach for explaining black-box models (</article-title>
          <year>2022</year>
          )
          <fpage>251</fpage>
          -
          <lpage>255</lpage>
          . URL: http://ceur-ws.
          <source>org/</source>
          Vol-
          <volume>994</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>M. U.</given-names>
            <surname>Ahmed</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Barua</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Begum</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. R.</given-names>
            <surname>Islam</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. O.</given-names>
            <surname>Weber</surname>
          </string-name>
          ,
          <article-title>When a cbr in hand is better than twins in the bush</article-title>
          ,
          <source>arXiv preprint arXiv:2305.05111</source>
          (
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>D.</given-names>
            <surname>Verma</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Bach</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. J.</given-names>
            <surname>Mork</surname>
          </string-name>
          ,
          <article-title>Similarity measure development for case-based reasoning-a data-driven approach</article-title>
          ,
          <source>in: Symposium of the Norwegian AI Society</source>
          , Springer,
          <year>2019</year>
          , pp.
          <fpage>143</fpage>
          -
          <lpage>148</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>M. G.</given-names>
            <surname>Orozco-del Castillo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E. C.</given-names>
            <surname>Orozco-del Castillo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Brito-Borges</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Bermejo-Sabbagh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Cuevas-Cuevas</surname>
          </string-name>
          ,
          <article-title>An artificial neural network for depression screening and questionnaire refinement in undergraduate students</article-title>
          , in: M. F.
          <string-name>
            <surname>Mata-Rivera</surname>
          </string-name>
          , R. Zagal-Flores (Eds.),
          <source>Telematics and Computing</source>
          , Springer International Publishing, Cham,
          <year>2021</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>13</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>L. F.</given-names>
            <surname>Sandal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Bach</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. K.</given-names>
            <surname>Øverås</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. J.</given-names>
            <surname>Svendsen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Dalager</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. S. D.</given-names>
            <surname>Jensen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Kongsvold</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. L.</given-names>
            <surname>Nordstoga</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E. M.</given-names>
            <surname>Bardal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I.</given-names>
            <surname>Ashikhmin</surname>
          </string-name>
          , et al.,
          <article-title>Efectiveness of app-delivered, tailored self-management support for adults with lower back pain-related disability: a selfback randomized clinical trial</article-title>
          ,
          <source>JAMA internal medicine 181</source>
          (
          <year>2021</year>
          )
          <fpage>1288</fpage>
          -
          <lpage>1296</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>P.</given-names>
            <surname>Cortez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Cerdeira</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Almeida</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Matos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Reis</surname>
          </string-name>
          ,
          <article-title>Modeling wine preferences by data mining from physicochemical properties, Decision support systems 47 (</article-title>
          <year>2009</year>
          )
          <fpage>547</fpage>
          -
          <lpage>553</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>A. J.</given-names>
            <surname>Barnett</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F. R.</given-names>
            <surname>Schwartz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Tao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Ren</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. Y.</given-names>
            <surname>Lo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Rudin</surname>
          </string-name>
          ,
          <article-title>A case-based interpretable deep learning model for classification of mass lesions in digital mammography</article-title>
          ,
          <source>Nature Machine Intelligence</source>
          <volume>3</volume>
          (
          <year>2021</year>
          )
          <fpage>1061</fpage>
          -
          <lpage>1070</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>