<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Beyond Single-model XAI: Aggregating Multi-model Explanations for Enhanced Trustworthiness</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Ilaria Vascotto</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Alex Rodriguez</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Alessandro Bonaita</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Luca Bortolussi</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Assicurazioni Generali Spa</institution>
          ,
          <addr-line>Milan</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Department of Mathematics</institution>
          ,
          <addr-line>Informatics and Geosciences</addr-line>
          ,
          <institution>University of Trieste</institution>
          ,
          <addr-line>Trieste</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>The Abdus Salam International Center for Theoretical Physics</institution>
          ,
          <addr-line>Trieste</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2025</year>
      </pub-date>
      <abstract>
        <p>The use of Artificial Intelligence (AI) models in real-world and high-risk applications has intensified the discussion about their trustworthiness and ethical usage, from both a technical and a legislative perspective. The field of eXplainable Artificial Intelligence (XAI) addresses this challenge by proposing explanations that bring to light the decision-making processes of complex black-box models. Despite being an essential property, the robustness of explanations is often an overlooked aspect during development: only robust explanation methods can increase the trust in the system as a whole. This paper investigates the role of robustness through the usage of a feature importance aggregation derived from multiple models (-nearest neighbours, random forest and neural networks). Preliminary results showcase the potential in increasing the trustworthiness of the application, while leveraging multiple model's predictive power.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Feature importances</kwd>
        <kwd>Aggregation</kwd>
        <kwd>Explanations</kwd>
        <kwd>Tabular data</kwd>
        <kwd>Classification</kwd>
        <kwd>XAI</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        The role of Machine Learning (ML) and Artificial Intelligence (AI) has become ever so prominent in the
latest years. The widespread use of these tools, especially in high-risk applications [
        <xref ref-type="bibr" rid="ref1 ref2">1, 2</xref>
        ], has contributed
to a growing discussion in the field over their ethical and fair usage. Despite being highly accurate and
capable of dealing with complex problems, these models lack transparency, an essential property to
truly trust the used systems. Recent legislative regulations, such as the GDPR [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] and the AI Act [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] in
the European Union, have stressed the need of model transparency and the relevance of explanations
to aid both end-users and practitioners in understanding the mechanics of black box models. In light
of these considerations, the field of eXplainable Artificial Intelligence (XAI) proposes a wide range of
explanation methods that aim at opening the black box and clarifying the decision-making process of the
models. While multiple approaches have been proposed in the latest years, with the aim of explaining
the predictions either locally (for a single datapoint) or globally (for the whole model), the fundamental
role of explanation robustness is often left unconsidered. Robustness can be broadly defined as the
ability of an explanation method (or explainer) to propose similar explanations for similar inputs. Its role
is of upmost importance to ensure that the explanations can be trusted and, consequently, to increase
the trustworthiness of the system. Only by trusting the given explanation it is possible to build trust
into the model itself. Crucially, popular techniques such as LIME [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] and SHAP [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] have been proved to
be non robust in multiple works [
        <xref ref-type="bibr" rid="ref7 ref8">7, 8</xref>
        ] but are still being used extensively even in practical high-risk
applications. An additional issue that may arise in the field is the disagreement problem [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], that is,
the cases in which multiple explanation methods are applied to the same instance and contrasting
explanations are retrieved. In this case, the lack of a ground truth for explanations makes it virtually
impossible to choose one method over the other, efectively damaging the positive applications of XAI.
      </p>
      <p>
        In light of these considerations, we aim at tackling two research questions on explanation
trustworthiness. The first research direction posits that the disagreement problem may be mitigated by considering
an aggregation of multiple explanations. The second research direction instead aims at increasing
system’s trustworthiness by computing a local robustness score to determine whether an explanation
can be trusted. Previous attempts at answering these questions were made in [
        <xref ref-type="bibr" rid="ref10 ref14">10</xref>
        ], where a robustness
analysis of feature importance methods was proposed alongside an ensemble approach. The focus was
on tabular datasets, classification problems and neural network models (NNs). Trustworthiness was
investigated with respect to a robustness estimator computed on a neighbourhood of the datapoint,
constructed by leveraging the manifold hypothesis. An aggregation of approaches tailored to NNs
[
        <xref ref-type="bibr" rid="ref11 ref12 ref13">11, 12, 13</xref>
        ] was introduced and it was shown that multiple neural networks - with varying architectures
but similar accuracy - were able to detect when a datapoint would lie in a robust (explanation-wise)
area of the dataspace and, in that case, would also agree on the prediction.
      </p>
      <p>
        Taking inspiration from these results, we propose a natural generalization of the approach to multiple
classes of models, aiming at answering both research questions. In particular, we investigated if an
ensemble approach could be used even in cases in which multiple decision making processes are being
considered at once, and tested the trustworthiness of the derived explanations. We have focused on
-nearest neighbours (-NNs), random forests (RFs) and NNs. The choice of these three models was
governed by the diferences in which the models make a prediction, having -NN a distance-base
method and RF an ensemble of rule-based base learners, significantly difering with respect to NNs in
terms of complexity, accuracy and inherent interpretability. As in [
        <xref ref-type="bibr" rid="ref10 ref14">10</xref>
        ], the class of explanations which
was selected is that of feature attributions, as they ofer significant and interpretable results even to
less experience users. A contribution in this area is the introduction of two new feature attribution
approaches for the NN and the RF, as both models tend to prefer explanations of other forms.
      </p>
      <p>The remainder of this paper is structured as follows: Section 2 will present the proposed methodology,
Section 3 the preliminary results on tabular datasets and binary classification tasks and Section 4 future
research directions.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Methodology</title>
      <sec id="sec-2-1">
        <title>2.1. Explaining k-Nearest Neighbours</title>
        <p>Despite being considered an interpretable method, -NNs still lack a proper feature importance approach
to explain their decisional process. In particular, while it is easy to understand the distance-based
reasoning when predicting a new datapoint’s class, it is not as simple to identify which features were
the most influential for the given prediction.</p>
        <p>Leveraging its inner mechanisms, we have developed a new feature importance approach to locally
explain the decisional process of -NNs. Having selected a value for the hyperparameter , a new
datapoint is classified based on the most frequent class which is encountered among its  nearest
neighbours within the training set. The diference between the two possible classes can mainly be
attributed to the features that exhibit larger distances between the two sets of corresponding points.
This reasoning can be further extended when considering a point for which all the -nn are associated
to the same class. Locally, there are no distinctions to be emphasised, but the classification can be
imputed to the fact that the datapoint is closer to those of the predicted class rather than the opposing
one. This concept aligns with the reasoning of -NN on a greater scale: datapoints are classified based
on the most similar class, and the similarity is defined with distance-based metrics.</p>
        <p>Our proposal makes use of this characteristics and produces a feature-importance explanation as
described in Algorithm 1. Consider two sets of datapoints, namely  and ¬, which are comprised
of the  nearest neighbours selected, respectively, between the training set’s datapoints belonging to
class  and those of the other class ¬. The average feature-wise distance within each set of points
and the datapoint of interest x is computed and stored as  and ¬. Assuming x is predicted to
belong to class , the explanation is proposed as the feature-wise diference  = ¬ − . To ensure
comparability, the resulting vector is normalized to have norm one.</p>
        <p>Algorithm 1 -NN explanation: x is the input point and  is the -nearest neighbours model
function average_distance(x,  )
 ← 1/| | · ∑︀x′∈ (x, x′)</p>
        <p>return 
end function
 ←
¬ ←
 ←
¬ ←
 ← ¬ − 
 ← /||||2</p>
        <p>return 
end function
function Explain_knn(x, )
 =  (x)
k-nn(, x|)</p>
        <p>k-nn(, x|¬)
average_distance(x, )
average_distance(x, ¬)
◁ Class predicted by the -nn model
◁  nearest neighbours belonging to class 
◁ Average feature-wise distance
◁ Normalize to have norm 1</p>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. Explaining Random Forests</title>
        <p>Random Forest explanations are usually proposed in the form of single decision trees or sets of rules,
to align with the binary decisions that occur at each node split. While explanations of this form can
be useful, it is hard to compare them to feature attributions derived from other classes of models. For
this reason, we have developed a new feature attribution method for random forests (presented in
Algorithm 2), based on the computed node impurity at each node of the traversed paths in the forest.
Algorithm 2 RF explanation: x is the input point and  is the random forest model
 ←
[] ←
end for</p>
        <p>return 
end function
function sum_node_impurity(path, )
for node ∈ path do
node.feature</p>
        <p>add(node.impurity)
 ←
else
function Explain_rf(x,  )
, ¬ ←  (x)
 ← arg max( (x))
, ¬ ← array(0, dim = )
for tree ∈  do
path ← decision_path(tree, x)
if tree.predict(x) =  then</p>
        <p>sum_node_impurity(path, )
¬ ←
end if
end for
 ← (¬ +  ) ×  −  × ¬
 ← /||||2</p>
        <p>return 
end function
sum_node_impurity(path, ¬)
◁ Predicted class probabilities</p>
        <p>◁ Predicted class
◁ Follow the traversed path in the tree
◁ Normalize to have norm 1</p>
        <p>Consider a datapoint x which is passed through a random forest  composed of  trees. Assume
that the random forest predicts point x belonging to class , having predicted probabilities [, ¬]. For
each of the  trees in the forest, the datapoint’s prediction is made by traversing the tree following a
single decision path. Depending on whether the individual tree’s prediction aligns with the random
forest majority vote, the decision path includes features that contribute to the RF predicted class or
to the opposite one. By construction, each node in a decision tree is selected as the one with minimal
node impurity, practically measured by the Gini index or the cross entropy. We propose to use the node
impurities as feature importances, with due adjustments, considering their relevance in the training
phase of the model itself. Consider two feature attribution vectors, namely  and ¬, of length equal
to the number of features in the input vector x. For each tree in the forest, we retrieve the decision
path followed by point x and check if the prediction of the individual tree is the same as that of
the random forest or not. For a tree which agrees with the RF prediction, we retrieve the feature
importance vector  and, for all the features used at the split nodes of the decision path, sum the
node impurity to the corresponding element of . The vector ¬ is otherwise used to store the node
impurities in a similar manner. A final calibration is performed to properly take into account the efect
of positively and negatively influencing features. In particular, the feature importance coeficients
defined above are weighted by the opposite class probability scores, returning a vector of the form
 = (¬ +  ) ·  −  · ¬. The parameter  = 0.01 yields non-null coeficients in the case in which
¬ = 0.00. As in the -NN case, a final normalization step is performed to ensure comparability.</p>
      </sec>
      <sec id="sec-2-3">
        <title>2.3. Aggregating Multi-Model Explanations</title>
        <p>
          As mentioned in the Introduction, we propose an aggregation of explanations derived from diferent
models. While new approaches have been proposed in the previous subparagraphs, for NNs we will
be using DeepLIFT [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ] as explanation method. DeepLIFT is a local feature importance approach
tailored to neural networks which is based on a backpropagation-like procedure to distribute the
diference-from-reference in the output layer to the input one. We have selected it as, during preliminary
experiments, it exhibited a more consistent behaviour with respect to other approaches applicable
to NNs. It is necessary to normalize its attributions to have norm 1 for comparability with the other
feature importance vectors before computing the aggregation.
        </p>
        <p>
          Assume that the feature importance vectors for -NN (Algorithm 1), RF (Algorithm 2) and NN
(DeepLIFT [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ]) are denoted, with no loss of generality, by a with  ∈ { 1, . . . , },  = 3. In this study,
we propose the aggregation to be computed as the feature-wise arithmetic average of the composing
feature attributions vectors, namely:
(1)
(2)
1 ∑︁ a
a =
        </p>
        <p>=1</p>
        <p>Among the advantages of using the average as an aggregation is its ability to penalize strong sign
discordances between the individual methods, while still producing attributions with both positive and
negative signs. It also takes into account instances in which one of the methods is uncertain about the
importance of a feature , that is, when it is associated to a coeficient |()| &lt; , ( &gt; 0).</p>
      </sec>
      <sec id="sec-2-4">
        <title>2.4. Robustness Estimation and Neighbourhood Generation</title>
        <p>
          Robustness estimation is a critical aspect of this research and, while multiple works have introduced
robustness metrics for explanations, we selected the estimator proposed in [
          <xref ref-type="bibr" rid="ref10 ref14">10</xref>
          ]. The robustness estimate
ℛ^ for a datapoint x, a neighbourhood  , an explanation method  and a model  is computed as:
ℛ^ (x,  , ,  ) =
| | x˜∈
        </p>
        <p>1 ∑︁  ((x), (x˜))
where  = {x˜|x˜ = x +  with  ∈ R, (x, x˜) &lt;  ( &gt; 0) and  (x) =  (x˜)},  is the number
of features and  is the Spearman’s rho rank correlation coeficient.</p>
        <p>
          In literature, there is a clear distinction between perturbation-based robustness and adversarial
robustness. The first one refers to the ability of an explainer to produce similar explanations for similar
inputs under the expected data distribution. The latter one instead refers to malicious attacks to either
the model or the explanation, and encompasses the ability of an explainer to produce similar outputs
under such attacks. Our research focuses on non-adversarial perturbations and, as shown in [
          <xref ref-type="bibr" rid="ref10 ref14">10</xref>
          ], we
note that the neighbourhood generation mechanism can be highly influential in the computation of
the robustness, even when considering perturbation-based evaluations. We suggest the use of the
medoid-based neighbourhood which can be summarized by the following steps. First perform -medoid
clustering on a validation set: for each cluster, the medoid x acts as a representative. For each medoid,
the set    stores the  nearest neighbours computed among the other cluster centres. For each
datapoint we’d like to perturb, the corresponding cluster is computed and both x and    are retrieved.
A medoid x is selected at random from the set   . Having  and   the probabilities of perturbing
numerical () and categorical () variables respectively, a perturbation is then performed as:
⎪⎩˜ =
⎧˜ = (1 − ¯) ·  + ¯ ·  with ¯ ←
⎪
⎨
{︃ with probability 1 −  
 with probability  
( · 100, (1 −  ) · 100)
(3)
        </p>
        <p>This scheme yields perturbations which are on-manifold and therefore more truthful to the observed
data distribution on which the model was trained. A final step is performed to remove the perturbations
that change predicted class label with respect to the original point, ensuring comparability
explanationwise.</p>
        <p>As we are considering three diferent models being used at the same time, the requirement that the
predictions should be the same for both the original datapoint and its perturbation has to be satisfied
for all three models concurrently. More specifically, (x) = (x˜) ∀ ∈ (1, . . . , ).</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Preliminary Results</title>
      <p>Dataset and Model Details We have tested our proposal on five publicly-available tabular datasets
addressing binary classification problems 1. Some preprocessing steps were performed on the data, such
as the standardization of numerical variables, the one-hot encoding of categorical ones as well as the
removal of highly correlated features. The dataset were split into a training, a validation and a test sets.
For all the use-cases, we have trained a -NN classifier, a random forest and a neural network. The
-NN models were trained with a number of neighbours set at  = 15 for the adult and bank dataset
and  = 5 for the other cases. The random forest were built with  = 25 base learners for all cases. All
models exhibit an adequate performance as the accuracy scores are above the 80% mark, as summarized
in Table 1, with the exception of the -NN model on the heloc dataset. As can be expected, the accuracy
of the -NN models is often the lowest while the neural networks showcase larger values in the most
complex datasets (adult, bank, heloc) and the random forest in the two simpler ones (cancer, wine).</p>
      <sec id="sec-3-1">
        <title>Neighbourhood Generation and Aggregation Computation For each point in the test set, a</title>
        <p>neighbourhood was generated with hyperparameters selected such that, for the three models, at least
95% of the generated datapoints were kept on average. Default choices for the hyperparameters are
 = { = 5,  = 0.05,   = 0.05}. For the individual models the filtering is based on the constraint
 (x) =  (x˜) while for the aggregation it holds that (x) = (x˜) ∀ ∈ (1, . . . , ).</p>
        <p>Note that for the aggregation, the condition includes both the case in which all models agree on the
prediction and that in which one of the three models predicts the opposing class. In this case, to correctly
identify the feature importance vectors expressing the decision behind the models’ predictions, it is
possible to leverage the binary nature of the problems and change sign to the explanations associated
to the disagreeing model before computing the aggregation. It is essential that the three explanations
are referring to the same output class, to ensure comparability.</p>
        <p>Let  be a model associated to a feature importance vector a with  ∈ (1, . . . , ). Assume that, for
a given datapoint x, it holds that 1(x) ̸= 2(x), 2(x) = 3(x). Then, we will consider as feature
attributions the vectors of the form (− a1, a2, a3) when computing the aggregation method a.</p>
        <p>For each model  and corresponding explanation method  then, the feature importances for the
test set and the neighbourhoods are retrieved. The aggregation is performed following Section 2.3
taking into account the possible sign change for some attributions, as discussed. The robustness scores
are then derived for the three individual model-explanation pairs and the aggregation, following Eq. 2.
The complete framework is summarized in Figure 1.</p>
        <p>
          Qualitative Evaluation of the Aggregation Let us briefly discuss an example of the derived
aggregation, analysing the explanations of one datapoint x from the test set of adult dataset. The
results are depicted in Figure 2 where, for each of the twelve features, the three individual explanations
(coloured circles) and the aggregation (red star) are presented. This example illustrates two key aspects
of the aggregation: first, cases in which the three explanations have coeficients very close to zero (and
possibly difering in sign) are shrunk towards the zero (that is, the feature is considered non important)
- Features 3, 9, 10, 11. Secondly, when features are difering in sign and the magnitudes are large, the
aggregation penalizes the disagreement but produces a coeficient which is coherent with the stronger
signals - Features 1, 2, 4, 6, 8, 12. This is an expected (and desirable) behaviour as consistent information
is kept and disagreements are penalized more strongly according to the magnitude.
Robustness Estimation Table 2 presents the average robustness scores computed on the test set
for the individual models and the aggregation. As can be seen also from Figure 3, -NN appears to
be the least robust method on average, while NN the most. The aggregated explanation acts as an
average - in terms of robustness - between the three individual components. This is to be expected as
the robustness score was shown in [
          <xref ref-type="bibr" rid="ref10 ref14">10</xref>
          ] to be upper bounded by the individual methods’ robustness. In
fact, the advantage of using the aggregation is bifold. On one hand, the aggregation’s robustness is
still acceptable but, one the other, the aggregation encompasses the explanations derived from multiple
explanations. In practice, it acts as a conservative and trustworthy explainer.
        </p>
        <p>Additionally, it is possible to note that the robustness scores derived from -NN and RF align with
both the model’s inner reasoning and the explanation methods we have described in Subsections 2.1 and
2.2. For the -NN explanations, we expect the robustness to be lower on average as, by construction,
the explanation method deviates the most from a local neighbourhood of the datapoint of interest. In
particular, as we are always considering the distance to the  nearest neighbours of class ¬, it may be
that - especially when the datapoint is far from the observed class boundary - the variability between
the distances is higher. This reflects into lower robustness scores even if it locally mimics the decisional
reasoning of the underlying model. On the other hand, RF exhibits greater robustness scores as it is
harder to generate small perturbations that sensibly change the decision path traversed in each tree in
the forest, maintaining lower explanation variability on average.</p>
        <p>Model Concordance and Neighbourhood Size An interesting information which is easily
derivable in this setting is the neighbourhood size as an additional metric of robustness. As for the aggregation
multiple predictions are being considered at once, the local agreement between the models can be used
as further confirmation of the robustness. The assumption is that, as suggested by our previous results,
a robust explanation usually lies in an area of the data manifold which is robust explanation-wise. This
can be verified by assessing if multiple models flag the same datapoint as being robust or not and are
agreeing in the point’s prediction, efectively verifying that the explanations are referred to the same
class. In this context, the local agreement prediction-wise can indicate if the manifold area is robust
also from an explanation point of view.</p>
        <p>Results from Table 3 show the average size retained in the neighbourhoods associated to the
aggregation, starting from  = 100 datapoints and after filtering out the perturbations that change class
label with respect to the original one. It is possible to note that the average neighbourhood size varies
greatly depending on the dataset being analysed (column Test - %). The remaining columns of the table
show the number of observations in test set and corresponding average neighbourhood size for the
datapoints over which the three models agree on the prediction (Agree) and those for which one model
(which names the column) predicts the opposite class with respect to the other two. Values in bold in
the columns % are associated with the largest observed neighbourhood size after the filtering step. It
can be noted that, in most cases, the largest size is achieved under the full concordance of the three
models, efectively supporting our assumption. Note that, for the cancer dataset, there are 0 datapoints
for which -NN is the disagreeing model, therefore the corresponding value N/A indicates that no data
was available for the specific scenario during our experimental evaluation.</p>
        <p>
          Validation Assessment The results from Table 3 suggest that, even when considering multiple
models of diferent types, it is possible to discuss the relationship between explanation robustness and
model agreement, as we have previously observed in [
          <xref ref-type="bibr" rid="ref10 ref14">10</xref>
          ]. To verify our assumption, we have employed
a validation assessment akin to the one presented in [
          <xref ref-type="bibr" rid="ref10 ref14">10</xref>
          ]. More specifically, we have considered a True
Positive Rate (TPR) and a False Positive Rate (FPR) defined as:
        </p>
        <p>TPR =
#{Robust &amp; Agree}
#{Agree}</p>
        <p>FPR =
#{Robust &amp; Disagree}
#{Disagree}
(4)
where the agreement and disagreement are defined according to the model’s concordance in the
prediction and the robustness is defined, according to a threshold ℎ, as I[ℛ^ (x) ≥ ℎ].</p>
        <p>Varying the threshold value, it is possible to obtain a ROC curve as in Figure 4, where the dotted
gray line represents the bisecting line. The Area Under the Curve (AUC) can be used as a goodness
measure: its values are reported in Table 4. The greater the value of the AUC, the more preferable is the
given model-explanation pair under the agreement assumption and the considered dataset. The results
from both Figure 4 and Table 4 suggest that the aggregation can act as a good explanation but not all
models are performing in the expected manner. In particular, it seems that the -NN attributions are
the least efective in the selected scenarios.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Conclusions and Future Work</title>
      <p>What we have presented in this contribution are the first steps towards a multi-model and
multiexplanation aggregation to increase trustworthiness. The aim of our work is proposing an aggregation
of explanations and models that can increase the trustworthiness in the system, by leveraging the
predictive power of multiple models and their diference in the reasoning process. The robustness score
and associated neighbourhood size indicate if the aggregated explanations lie in a robust area of the
feature space and can therefore be trusted. Being a work in progress, our approach could still benefit
from improvements on the following points:
• An aggregation approach more complex than the arithmetic mean, able to deal with cases in which,
for example, the attributions are of the form (, , − 2 ) with  &gt; 0. The current aggregation
would results in an attribution equal to 0, compared to a magnitude (in absolute value) of at least
 . Taking this aspect into account could benefit the reliability of the explanations.
• A new approach for -NN explanations, as the proposed one is dependant on the hyperparameter
 and - even if from a theoretical point of view is consistent with the -NN reasoning - it exhibits
too large a variability.
• A more complete validation assessment of the proposal, to test its potential in real use-cases
from both an explanation quality and trustability point of view. This would also allow to jointly
consider the predictive power of the composing models more appropriately.
• The extension of the approach to other classes of models and diferent XAI approaches. This
analysis would allow for an extensive evaluation of the eficacy of ensemble approaches under
varying circumstances.</p>
    </sec>
    <sec id="sec-5">
      <title>Acknowledgments</title>
      <p>We wish to thank Assicurazioni Generali Spa for their support and interest in our work.</p>
    </sec>
    <sec id="sec-6">
      <title>Declaration on Generative AI</title>
      <p>The authors have not employed any Generative AI tools.</p>
    </sec>
    <sec id="sec-7">
      <title>A. Comparison with LIME and SHAP</title>
      <p>This appendix presents a comparative example of LIME and SHAP’s robustness. Figure A1 depicts the
robustness scores of LIME, SHAP and DeepLIFT for the bank and heloc dataset (as in Figure 3), with the
yellow dotted-dashed line representing DeepLIFT’s robustness in both figures. For comparability, LIME
and SHAP (which are model-agnostic methods by nature) have been applied to the same model in this
example - the neural network. It is clear from Figure A1 that both LIME and SHAP are unstable methods
as their robustness scores are, on average, much lower than the DeepLIFT counterpart. Critically, they
lie below the 0.50 threshold (gray dashed line), denoting high instability.</p>
      <p>Table A1 presents the average robustness scores derived on all the considered datasets for LIME,
SHAP and DeepLIFT over the neural network. Note that the column "DeepLIFT" corresponds to the
column "NN" of Table 2. Again, it is clear that the robustness scores of LIME and SHAP underperforms
with respect to other techniques.</p>
      <p>In light of these considerations, as well as further support from the literature, LIME and SHAP were
excluded from our analysis as their inner instability may damage both the individual explanations and
any ensemble approach that may include them.</p>
      <sec id="sec-7-1">
        <title>LIME</title>
        <p>17.77
33.11
22.80
56.11
53.99</p>
      </sec>
      <sec id="sec-7-2">
        <title>SHAP</title>
        <p>13.50
10.29
9.34
17.33
36.08</p>
      </sec>
      <sec id="sec-7-3">
        <title>DeepLIFT</title>
        <p>85.03
78.74
84.23
98.40
92.96</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>R.</given-names>
            <surname>Caruana</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Lou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Gehrke</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Koch</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Sturm</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Elhadad</surname>
          </string-name>
          ,
          <article-title>Intelligible models for healthcare: Predicting pneumonia risk and hospital 30-day readmission</article-title>
          ,
          <source>in: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining</source>
          , KDD '15,
          <string-name>
            <surname>Association</surname>
          </string-name>
          for Computing Machinery, New York, NY, USA,
          <year>2015</year>
          , p.
          <fpage>1721</fpage>
          -
          <lpage>1730</lpage>
          . URL: https://doi.org/10.1145/ 2783258.2788613. doi:
          <volume>10</volume>
          .1145/2783258.2788613.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>L.</given-names>
            <surname>Cao</surname>
          </string-name>
          ,
          <article-title>Ai in finance: Challenges, techniques, and opportunities</article-title>
          ,
          <source>ACM Comput. Surv</source>
          .
          <volume>55</volume>
          (
          <year>2022</year>
          ). URL: https://doi.org/10.1145/3502289. doi:
          <volume>10</volume>
          .1145/3502289.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>European</given-names>
            <surname>Commission</surname>
          </string-name>
          ,
          <source>Regulation (EU)</source>
          <year>2016</year>
          /
          <article-title>679 of the European Parliament and of the Council of 27 April 2016 on the protection of natural persons with regard to the processing of personal data and on the free movement of such data</article-title>
          ,
          <source>and repealing Directive</source>
          <volume>95</volume>
          /46/EC (
          <article-title>General Data Protection Regulation) (Text with EEA relevance</article-title>
          ),
          <year>2016</year>
          . https://eur-lex.europa.eu/eli/reg/2016/679/oj.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>European</given-names>
            <surname>Commission</surname>
          </string-name>
          ,
          <article-title>Proposal for a Regulation of the European Parliament and of the Council laying down harmonised rules on artificial intelligence (Artificial Intelligence Act) and amending certain union legislative acts (COM(</article-title>
          <year>2021</year>
          )
          <article-title>206 final</article-title>
          ),
          <year>2021</year>
          . https://eur-lex.europa.eu/legal-content/ EN/TXT/?uri=celex%3A52021PC0206.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>M. T.</given-names>
            <surname>Ribeiro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Singh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Guestrin</surname>
          </string-name>
          ,
          <article-title>"Why should i trust you?" Explaining the predictions of any classifier, NAACL-HLT 2016 - 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies</article-title>
          ,
          <source>Proceedings of the Demonstrations Session</source>
          (
          <year>2016</year>
          )
          <fpage>97</fpage>
          --
          <lpage>101</lpage>
          . doi:
          <volume>10</volume>
          .18653/v1/n16-
          <fpage>3020</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>S. M.</given-names>
            <surname>Lundberg</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.-I.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <article-title>A unified approach to interpreting model predictions</article-title>
          ,
          <source>in: Proceedings of the 31st International Conference on Neural Information Processing Systems</source>
          , volume
          <volume>2017</volume>
          <source>- December of NIPS'17</source>
          , Curran Associates Inc.,
          <year>2017</year>
          , pp.
          <fpage>4766</fpage>
          --
          <lpage>4775</lpage>
          . https://dl.acm.org/doi/10.5555/ 3295222.3295230.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>D.</given-names>
            <surname>Slack</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Hilgard</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Jia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Singh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Lakkaraju</surname>
          </string-name>
          ,
          <article-title>Fooling lime and shap: Adversarial attacks on post hoc explanation methods</article-title>
          ,
          <source>in: Proceedings of the AAAI/ACM Conference on AI</source>
          ,
          <string-name>
            <surname>Ethics</surname>
          </string-name>
          , and Society, AIES '20,
          <string-name>
            <surname>Association</surname>
          </string-name>
          for Computing Machinery, New York, NY, USA,
          <year>2020</year>
          , p.
          <fpage>180</fpage>
          -
          <lpage>186</lpage>
          . URL: https://doi.org/10.1145/3375627.3375830. doi:
          <volume>10</volume>
          .1145/3375627.3375830.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>D.</given-names>
            <surname>Alvarez-Melis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Jaakkola</surname>
          </string-name>
          ,
          <source>On the robustness of interpretability methods</source>
          ,
          <year>2018</year>
          . doi:
          <volume>10</volume>
          .48550/ arXiv.
          <year>1806</year>
          .
          <volume>08049</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>S.</given-names>
            <surname>Krishna</surname>
          </string-name>
          , T. Han,
          <string-name>
            <surname>A</surname>
          </string-name>
          . Gu,
          <string-name>
            <given-names>S.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Jabbari</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Lakkaraju</surname>
          </string-name>
          ,
          <article-title>The disagreement problem in explainable machine learning: A practitioner's perspective</article-title>
          ,
          <source>Transactions on Machine Learning Research</source>
          (
          <year>2024</year>
          ). doi:
          <volume>10</volume>
          .21203/rs.3.rs-
          <volume>2963888</volume>
          /v1.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>I.</given-names>
            <surname>Vascotto</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Rodriguez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Bonaita</surname>
          </string-name>
          , L. Bortolussi,
          <article-title>When can you trust your explanations? a robustness analysis on feature importances</article-title>
          ,
          <year>2025</year>
          . arXiv:
          <volume>2406</volume>
          .
          <fpage>14349</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>M.</given-names>
            <surname>Sundararajan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Taly</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Yan</surname>
          </string-name>
          ,
          <article-title>Axiomatic attribution for deep networks</article-title>
          .,
          <source>34th International Conference on Machine Learning, ICML 2017 7</source>
          (
          <issue>2017</issue>
          )
          <fpage>5109</fpage>
          -
          <lpage>5118</lpage>
          . https://dl.acm.org/doi/10.5555/ 3305890.3306024.
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>A.</given-names>
            <surname>Shrikumar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Greenside</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Kundaje</surname>
          </string-name>
          ,
          <article-title>Learning important features through propagating activation diferences</article-title>
          ,
          <source>34th International Conference on Machine Learning, ICML 2017 7</source>
          (
          <issue>2017</issue>
          )
          <fpage>4844</fpage>
          -
          <lpage>4866</lpage>
          . https://dl.acm.org/doi/10.5555/3305890.3306006.
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>S.</given-names>
            <surname>Bach</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Binder</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Montavon</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Klauschen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K. R.</given-names>
            <surname>Müller</surname>
          </string-name>
          , W. Samek,
          <article-title>On pixel-wise explanations for non-linear classifier decisions by layer-wise relevance propagation</article-title>
          ,
          <source>PLoS ONE 10(7)</source>
          (
          <year>2015</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          <source>doi:10</source>
          .1371/journal.pone.
          <volume>0130140</volume>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>