<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>Harvard Journal of Law &amp; Technology 31 (2017) 841-887. URL:
https://arxiv.org/abs/1711.00399.
[9] A. Lucic</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <article-id pub-id-type="doi">10.24963/ijcai.2019/876</article-id>
      <title-group>
        <article-title>Directionality of Counterfactual Explanations</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Antonio Maratea</string-name>
          <email>antonio.maratea@uniparthenope.it</email>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Domenico Lugubre</string-name>
          <email>domenico.lugubre001@studenti.uniparthenope.it</email>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Science and Technologies, University of Naples ”Parthenope”, Centro Direzionale Isola C4</institution>
          ,
          <addr-line>80143, Napoli, IT</addr-line>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2025</year>
      </pub-date>
      <volume>3074</volume>
      <fpage>9</fpage>
      <lpage>11</lpage>
      <abstract>
        <p>Counterfactual explanations and feature selection can be seen as two diferent ways to recognize the most important variables influencing the outcome of a classifer, the former being valid on an instance scale, the latter on a global scale. They have been integrated recently in the the Bounday Crossing Solo Ratio (BoC-SoR) method, that gives a global relevance score to a feature considering how frequently it generates a counterfactual, that is how frequently it causes a class swap. In this paper a method to generate a local feature selection and to evaluate the stability of counterfactuals at a regional scale is proposed, based on BoC-SoR, with the aim of mitigating the Rashomon efect and highlighting the regions where counterfactuals are unreliable. The method exploits clustering in both the original feature space and in the explanation space, providing directional explanations. Tests on three real-data benchmarks, namely Diabetes, Adult Income and Credit Risk, confirm its viability and efectiveness.</p>
      </abstract>
      <kwd-group>
        <kwd>Counterfactuals</kwd>
        <kwd>Rashomon efect</kwd>
        <kwd>Feature selection</kwd>
        <kwd>XAI</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        With the relentless growth of Machine Learning model complexity and the continuously expanding
domain of its applications, the eXplainable Artificial Intelligence has emerged as a way to render
AI predictions comprehensible and justifiable to humans [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. As soon as AI began to be deployed
in high-stakes domains, such as healthcare, finance, law, and autonomous systems, the demand for
transparency, accountability, and human trust has quickly become the elephant in the room. Fairness,
bias, and legal rights to issue a recourse against an algorithmic decision urge explanations in an human
understandable form [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ].
      </p>
      <p>
        Meanwhile, the EU has made a notable regulatory efort, first through the General Data Protection
Regulation (GDPR), which focuses on privacy and pushes trustworthy AI, emphasizing the need for
transparency and interpretability of Machine Learning models; second through the Artificial Intelligence
Act in 2024, that introduces transparency requirements in Art. 13: systems should be ”suficiently
transparent to enable users to interpret the system’s output and use it appropriately” [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ].
      </p>
      <p>
        Needless to say, research on eXplainable Artificial Intelligence (XAI) is in its infancy and enforcing
explainability by law without an established and sound scientific background implies the risk of
producing plausible, convincing or convenient explanations, instead of reliable, faithful and trustable
ones [
        <xref ref-type="bibr" rid="ref1 ref4">1, 4</xref>
        ].
      </p>
      <p>
        Guidotti et al. [
        <xref ref-type="bibr" rid="ref5 ref6">5, 6</xref>
        ] proposed a taxonomy of eXplainable AI (XAI) methods, categorising them as either
transparent by design (e.g. decision trees) — that is intrinsically interpretable or ante hoc — versus
posthoc, where explanations are generated after the training step. The latter can be further subdivided into
model explanation (describing the overall logic of the model), outcome explanation (clarifying individual
predictions), and black-box inspection techniques, whereas these last are furthermore distinguished as
either model-specific
— designed for specific model — or
model-agnostic, that is universally applicable
(see Maratea and Ferone [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]).
https://www.uniparthenope.it/Portale-Ateneo/organigramma/1154 (A. Maratea)
      </p>
      <p>CEUR
Workshop</p>
      <p>ISSN1613-0073</p>
      <p>Given an instance   and its label   , counterfactuals are synthetic instances answering the question
”what changes of the values of   would have led to a diferent label?”; they shed some light on the
behaviour of the classifier and represent changes that could be addressed to alter the prediction, to the
point of being used as a legit explanation for the prediction itself [8, 9, 10]. Counterfactual explanations
can be classified as post-hoc, local explanations targeted to outcomes and they are interwound with
feature importance measures. Whereas feature importance highlight features that are the most important
on a global scale for a specific problem, a set of coherent counterfactuals can highlight the features that
are the most important for reverting the prediction of the model in a specific region: counterfactual
generation can be seen as a micro-scale feature importance determination. Indeed, as demonstrated
by Mothilal et al. [11], features classified as highly important by attribution methods, being local as
LIME or global as SHAP (when SHAP values are aggregated across all instances in a dataset), are often
neither necessary nor suficient to alter the model prediction. This misalignment raises concerns about
the reliability of attribution scores, being true causality the unreachable silver bullet.</p>
      <p>Notwithstanding criticalities, due to their simplicity, intuitive nature and recall of human causal
reasoning, counterfactual generation elicited a steadily growing interest in recent years. Several major
challenges remain open:
• Granularity, that is the local nature of the explanation obtained from the counterfactual. While
intuitively close instances should have similar counterfactual explanations, each counterfactual
is valid for a single instance and it is independent from its neighbours;
• Actionability, that is the actual feasibility of the suggested changes on the target variables.</p>
      <p>Certain counterfactuals may involve modifications that are unrealistic or ethically problematic,
such as altering immutable attributes like age or gender (please see Lucic et al.[9]);
• Directionality, that is the diference involved in reversing the change from one class to the
other: the variables and values to see approved a mortgage application that has been previously
denied are diferent from the variables and values involved in denying a mortgage application
that has been previously approved;
• the Rashomon efect , that is the presence of several diferent counterfactuals for the same
instance, often contradicting each other. This issue strongly limits the human trust in the
counterfactuals due to the conflicting explanations, ambiguous causality and consequent possible
cherry picking.</p>
      <p>In light of these considerations, here is BoXoR-C, a novel methodology that combines counterfactual
generation with feature importance evaluation and clustering on a regional scale, aiming to several
advantages:
1. to reduce the Rashomon efect through an aggregation in both the original feature space and
the explanation space, so that the explanation validity is expanded from a single instance to the
region surrounding the instance;
2. to characterize regions of the original space, where the explanation is stable and hence more
likely to be reliable, safe and trustable;
3. to recognize regions of the original space, where the explanation is unstable and hence unlikely
to be reliable, safe and trustable;
4. to cluster the explanations and to check whether the corresponding regions are consistently
distributed in the original feature space.</p>
      <p>The paper is organized as follows: first the counterfactual explanations are defined and their
desirable properties briefly listed; then the baseline Boundary Crossing Solo Ratio method, that combines
counterfactuals with feature importance, is described in detail; then the proposed method BoXoR-C is
presented and its advantages are highlighted; finally the experiments on real data are reported and the
conclusions are drawn.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Counterfactual Explanations</title>
      <p>
        Building upon the previous discussions, counterfactual explanations can be characterised as post-hoc,
local, model-agnostic explanation methods. They are meant to elucidate the probable causes behind
the classification decision made by a pre-trained model, focusing on a specific instance of input and a
minimal perturbation. As Guidotti et al. [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] note, counterfactual explanations fall under the category of
active explanation methods, since they are not inherently provided by the classification model.
      </p>
      <p>Thinking in counterfactual terms requires the imagination of a reality that contradicts the observed
facts, hence the name ”counterfactuals” [12]. From a cognitive psychology point of view,
counterfactual reasoning can be regarded as a natural mechanism by which humans interpret cause-efect
relationships[10, 13].</p>
      <p>
        In order to serve their purpose and provide actionable, understandable and reliable insights into
model predictions, counterfactual explanations should satisfy several desirable properties [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ].
• Validity: The original instance must have a diferent label than the counterfactual instance;
• Minimality and proximity: Among all valid counterfactuals, the selected one should be the
instance that changes the fewest possible number of input features with minimal distance from
the original one, while still achieving the desired prediction;
• Actionability: Counterfactual explanations must propose changes that are practically feasible and
actionable. Indeed, some explanations may suggest unfeasible, unrealistic or ethically problematic
changes, such as changing age or gender;
• Plausibility: A plausible counterfactual must be realistic and consistent with the observed data
distribution. This property ensures that suggested explanations remain meaningful and aligned
with real-world scenarios;
• Diversity: if that is the case, a set of counterfactual explanations should present diverse and
distinct alternatives. This can be the normal consequence of a complex problem with multiple
causes, or the side efect of an inadequate model or an overly intricate decision boundary;
• Causality: Counterfactual explanations should respect known causal relationships among
features. Given that certain features can influence others (e.g., increasing the loan duration typically
increases interest rates), a plausible and actionable counterfactual must preserve these established
causal dependencies;
• Discriminative Power: An efective counterfactual should clearly highlight the features
responsible for altering the classification outcome. By comparing the original instance to its
counterfactual, users should intuitively understand why the prediction changed. However,
discriminative power is inherently subjective and challenging to measure quantitatively without
empirical validation involving human judgment or human-like approximation models.
Counterfactuals are one property away from adversarial examples: imperceptibility. An adversarial
example is nothing more than a counterfactual engineered to be similar to the original instance to the
point of being undetectable.
      </p>
      <p>The satisfaction of all these properties, hard as it seems, is instrumental in ensuring theoretical
soundness and practical value to counterfactual explanations, thereby enabling users to gain significant
insights into the decision-making processes of AI systems.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Boundary Crossing Solo Ratio (BoC-SoR)</title>
      <p>Feature importance methods, such as Shapley Additive Explanations (SHAP) when aggregated on all
instances, are computationally-intensive and sensitive to feature correlation, whereas counterfactual
explanations are limited to single predictions, lacking global interpretability. To overcome these
limitations, the Boundary Crossing Solo Ratio (BoC-SoR) approach was introduced by Alfeo et al. [14] as a
novel explainability method that efectively integrates global feature importance and local counterfactual
explanations, under the hypothesis that minimal modifications to the most relevant features significantly
increase the likelihood of crossing the decision boundary.</p>
      <p>More formally, given a binary problem, the original class  and the opposite class  , for each instance
 ∈  its closest instance (nearest neighbour) of the other class is    ∈  .</p>
      <p>The set  of boundary instances is defined as follows:
 = { dist(,    ) ∣  ∈ },  = { ∈  ∣
dist(,    ) &lt; percentile(ℎ, )}
(1)</p>
      <p>Where  is the set of all distances among pairs (,     ), while  is the set of instances considered
boundary points, that is the pairs with a distance with respect to its closest instance of opposite class
   less than a given percentile. Here, dist refers to the Euclidean distance.</p>
      <p>According to Alfeo et al. [14], an efective strategy to identify the closest counterfactual is trough
a  -Nearest Neighbor ( -NN) search: chosen a boundary instance, the  nearest instances from the
opposite class are initially considered as potential counterfactual candidates for it; then the intermediate
instances along the path between the original instance and its nearest neighbours are evaluated to
ifnally find the minimally-diferent counterfactual instance, similarly to SMOTE [ 15].</p>
      <p>For each boundary instance  ∈  , its minimally-diferent counterfactual instance closestCF in class
 is determined as the instance with minimal Euclidean distance from  according to the procedure
described above:
closestCF ∈ ,</p>
      <p>dist(, closestCF ) is minimal</p>
      <p>A feature    at index  is considered relevant if swapping its value in the counterfactual instance
closestCF with the original value from instance  changes the classification outcome back to class  .
Consequently, the global feature importance score (BoC-SoR) for feature  is quantified as:
BoCSoR  = |{ ∈  ∣ 
  is relevant}|
(2)
(3)
For further details on the algorithms please see [14].</p>
    </sec>
    <sec id="sec-4">
      <title>4. The proposed method: BoXoR-C</title>
      <p>To counteract the Rashomon efect and obtain insights into the regional validity of explanations, the
proposed idea is based on clustering and a suitable modification of BoC-SoR. Clustering is here exploited
to transform instance-level explanations into regions of validity and BoC-SoR is chosen to transform
simple counterfactuals into feature importance measures at the cluster level.</p>
      <p>Called  the original space and  the space of explanations, two diferent clusterings are performed:
one in the original space  , to obtain clusters of similar instances and another in the counterfactual
explanation space  , to obtain clusters of counterfactuals with similar explanations. The intuition is
that similar instances should ideally present similar explanations, so the two clusterings should mostly
agree.</p>
      <p>In the first case (  →  ), from original instances to explanations, the purpose is to group closer
instances in the original space and to characterize these regions according to the internal variability
of explanations; whereas in the second direction ( →  ), from explanations to original instances, the
purpose is to map similar explanations in the original space so to have an overview of their distribution
and highlight critical regions, that is regions where the two clustering in  and  disagree the most.</p>
      <p>In figure 1 a schematic view of BoXoR-C is shown.</p>
      <sec id="sec-4-1">
        <title>4.1. From original features to explanations</title>
        <p>Given the set of boundary points  defined in equation 1, first they are clustered according to a specific
clustering algorithm and a suitable similarity measure in the original feature space. Once the boundary
1. Cluster the boundary points</p>
        <p>into  clusters { 1, … ,   }
2. For each   ∈   , generate the
counterfactual</p>
        <p>findCF(  )
3. Extract the subset of relevant features
ℱ 

 → 
relevantFeatures(  ,   )
4. Construct the binary matrix   ∈ {0, 1}  ×
with   = 1 if feature  is relevant for instance 
 Euclidean(, ) =</p>
        <p>√∑ (  −   )2
R
o
S
C
o</p>
        <p>B
5. Cluster on E into  clusters { 1, … ,   }
 Jaccard(  ,   ) = 1 − ||  ∩∪  ||
each cluster is performed ([14]):
instances have been clustered into  clusters named { 1,  2, … ,   }, a local BoC-SoR analysis within
rithm 1 (findCF).</p>
        <p>• For each boundary instance   ∈   , a class-changing counterfactual   is generated using
Algofor each feature   the following quantities are computed:
• The prevalence   = ∑    of each feature   ;</p>
        <p>features ℱ</p>
        <p>→  responsible for the class change.</p>
        <p>• Each pair (  ,   ) is then processed using Algorithm 2 (relevantFeatures) to extract the set of
the  -th instance.
instances in cluster   ,  = | |
At this stage, for each cluster   , a binary matrix   ∈ {0, 1}  × is built, where   is the number of
is the total number of features and  
= 1 if feature  was relevant for
On each   , the Shannon entropy is computed as a global measure of diversity in explanations and
• The binary variance Varn =   (1 −   ) of each feature   within the cluster.</p>
        <p>By combining these three descriptors —– prevalence, variability, and entropy — a rich understanding of
the internal explanatory structure of each cluster can be obtained: features with high   and low Varn
are dominant and stable in the cluster, while high entropy signals random diferences in the relevant
features and an unstable cluster.</p>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. From explanations to original features</title>
        <p>While clustering in the original feature space provides insight into geometric similarity, it does not
guarantee consistency in the structure of the explanations. To complement the previous analysis, a
second clustering is applied directly on the counterfactual explanations matrix E ∈ {0, 1}× , where  is
the total number of instances and each row encodes which features were modified in the counterfactual
for a given instance.</p>
        <p>To capture structural similarity between explanations, the pairwise Jaccard distance between binary
vectors is used:
 Jaccard(  ,   ) = 1 −
|  ∩   |</p>
        <p>,
|  ∪   |
where   and   denote the binary attribution vectors of two instances. This yields an alternative
partitioning based purely on the similarity of counterfactual patterns.</p>
        <p>Once the explanations have been clustered into  clusters named { 1,  2, … ,   }, again the three
descriptors —– prevalence, variability, and entropy — can be computed for each cluster to characterize
it.</p>
      </sec>
      <sec id="sec-4-3">
        <title>4.3. Alignment of clusterings</title>
        <p>To assess quantitatively the agreement between the explanation-based clusters computed with the
Jaccard similarity and those derived from Euclidean distances in the original space, standard agreement
metrics can be used:
• Adjusted Rand Index (ARI),
• Normalized Mutual Information (NMI),
• Homogeneity, Completeness, and V-measure.</p>
        <p>An high agreement would indicate that structurally similar explanations tend to emerge from
geometrically close regions in feature space. On the other side a low agreement value may mask some regions
where the agreement is qualitatively very good or may be consequence of only a few critical regions on
an overall good qualitative performance.</p>
      </sec>
      <sec id="sec-4-4">
        <title>4.4. Directionality</title>
        <p>Explanations are directional: transitions from a negative to a positive class leverages diferent features
with respect to transitions from a positive to a negative class: real-world scenarios often require
reasoning in both directions.</p>
        <p>The BoXoR-C pipeline should include bidirectional counterfactual analysis, generating
explanations for both types of transitions: the schema in Figure 1 should be applied independently in both
class transition directions (0 → 1 and 1 → 0), allowing the identification of potential asymmetries in
the behaviour of the model.</p>
        <p>This bidirectional approach uncovers potential asymmetries in the decision boundary, where the
importance or frequency of certain features may difer depending on the direction of class change.
Understanding these diferences provides a more complete and robust view of the decision rationale.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Experiments</title>
      <p>In the experiments, hierarchical agglomerative clustering with Ward’s linkage and Euclidean
distance has been tested in  , and with the Jaccard similarity measure and complete linkage has been
tested in  . Of course, other clustering techniques (e.g.,  -means, DBSCAN, spectral clustering) — not
necessarily the same in both directions — can be chosen depending on the nature and geometry of the
dataset. The advantage of hierarchical clustering is that it allows to control the granularity.
5.1. Data
The experiments in this study are based on three publicly available datasets: Adult Census Income1,
Diabetes2 and Credit Risk Dataset3. The 20ℎ percentile was considered for boundary points in the
1https://archive.ics.uci.edu/dataset/2/adult
2https://www.kaggle.com/datasets/mathchi/diabetes-data-set
3https://www.kaggle.com/datasets/laotse/credit-risk-dataset/data
ifrst and third datasets, while the 80ℎ percentile was considered for Diabetes, due to its small size.</p>
      <p>The Adult Census Income dataset, originally from the UCI Machine Learning Repository, has 48,842
instances and 14 attributes, categorical and numerical, related to the demographic and employment
characteristics of individuals. The binary target class is 1 for an annual income that exceeds $50,000, 0
otherwise.</p>
      <p>First individuals older than 16 years with a positive value for hours worked, FNLWGT &gt; 1, and
adjusted gross income &gt; 100 where filtered out; then missing values in workclass, occupation, and
native.country were imputed using the mode. Age was binned into six categorical brackets, rare
education levels were grouped into a single ’School’ category, and infrequent races were merged
into an ’Other’ class. All categorical features were encoded using a combination of ordinal, when
appropriate, or one-hot encoding; numerical and ordinal features were standardised to zero mean and
unit variance.</p>
      <p>The dataset was split randomly into 70% training and 30% testing and the LogisticRegression
classifier was chosen from Scikit-learn (version 1.6.1) in Python 3.12 (parameters penalty='l2',
C=1.0, solver='lbfgs' and max_iter=10000, all others as default).</p>
      <p>Diabetes, is from the National Institute of Diabetes and Digestive and Kidney Diseases. The dataset
contains 768 instances and 8 numeric attributes related to the health of patients. The binary target class
is 1 if the patient tested positive for diabetes, 0 otherwise. The data were preprocessed by imputing
missing values in Glucose, BloodPressure, SkinThickness, Insulin, and BMI using the median, and
then standardizing all features to zero mean and unit variance. The dataset was split into training and
test sets with a 60/40 ratio. The training set was further balanced using SMOTE [15] (balanced classes
and k_neighbors=5) to obtain 299 samples per class, resulting in a total of 598 samples. Approximately
246 boundary points were identified in the training set as a baseline for generating counterfactuals.</p>
      <p>The XGBoost classifier was trained using the xgboost library (version 2.1.1) with Python 3.12 and
scikit-learn 1.6.1. Hyperparameters were set using grid search and stratified k fold cross validation
with k=5. The training set contained 460 samples (276 negative, 184 positive) before SMOTE and 598
samples (299 per class) after SMOTE.</p>
      <p>Credit Risk, available on Kaggle, contains 32,581 instances with 12 attributes, including
socioeconomic information of loan applicants and variables simulating credit bureau data. The binary target
class takes 1 if the loan was classified as risky, and 0 otherwise.</p>
      <p>The data were preprocessed by first removing duplicate records. Missing values in the numerical
variables were imputed using the median. Age was discretized into six ordinal brackets, and loan_grade
was mapped to an ordinal scale from A (best) to G (worst). Categorical attributes were one-hot encoded.
Finally, all numerical and ordinal features were standardized to zero mean and unit variance.</p>
      <p>The dataset was split into training and test sets with a 80/20 ratio and the XGBoost classifier was
trained using the xgboost library (version 2.1.1) in Python 3.12 with scikit-learn 1.6.1. Hyperparameters
were set using grid search and stratified k fold cross validation with k=5. The best configuration was
learning_rate=0.3, max_depth=7, and n_estimators=200.</p>
      <sec id="sec-5-1">
        <title>5.2. Results and discussion</title>
        <p>The analysis is directional. First BoXoR-C has been applied generating counterfactuals from Class 1 to
Class 0, then the reverse.</p>
        <sec id="sec-5-1-1">
          <title>5.2.1. Adult Income, from class 1 to 0</title>
          <p>In this direction, the counterfactuals highlight the key factors associated with transitions to lower
income levels, that is the most frequent variables found in counterfactuals generated for high-income
people. In extreme synthesis, the most important feature among clusters results education, suggesting
that education is a good investment and that an higher education protects from transitioning to lower
income.</p>
          <p>Figure 2 allows to characterize each cluster in the original feature space. Cluster 4 is the biggest,
it shows low entropy and a dominant feature, that implies high stability and interpretability, with
education consistently appearing in over 80% of cases. Cluster 3 is much smaller, but also exhibits a
low entropy value, being clearly dominated by capital.gain, with age.group appearing rarely but
consistently in the relevant features. Clusters 0 and 1 exhibit dispersed patterns, with variable roles of
education, capital.gain, and hours.per.week, resulting in limited interpretability. These clusters
give inconsistent explanations in terms of counterfactuals and the Rashomon efect should be expected.
Despite its small size, Cluster 2 is consistently dominated by education.</p>
          <p>Figure 3 allows to characterize each cluster in the explanation space. The average entropy is much
lower than before, and the clusters of explanations show more coherence, as there is a clear grouping
of possible explanations. Notably, Clusters 3 and 4 primarily involve education, with Cluster 4 that
highlights a correlation with age_group. Cluster 0 shows the dominance of capital.gain, with
moderate contributions from education and age_group, indicating a partially stable explanation.
Despite its smaller size, Cluster 1 is characterized by a sharp focus on occupation-related features,
suggesting specific behavioural profiles, albeit with moderate internal variability. Cluster 2, the largest
in terms of size and entropy, nonetheless exhibits a dominance of hours.per.week.</p>
          <p>There is not a clear match among the clustering in the original space and the one in the explanation.</p>
        </sec>
        <sec id="sec-5-1-2">
          <title>5.2.2. Adult Income, from class 0 to 1</title>
          <p>In this direction, the counterfactuals highlight the key factors associated with transitions to higher
income levels, that is the most frequent variables found in counterfactuals generated for low-income
people. In extreme synthesis, the most important feature among clusters results capital.gain, suggesting
that the investment capacity is a key factor for transitioning to higher income.</p>
          <p>Figure 4 allows to characterize each cluster in the original feature space. Clusters 0, 2, and 4
demonstrate consistent patterns, particularly in the capital.gain field. This observation underscores
a correlation between financial attributes and class transitions. Clusters 1, 3, and 5 in particular
demonstrate high entropy and feature variability, resulting in more difuse and less interpretable
counterfactuals. This may, in turn, compromise the reliability of the explanations provided in these
clusters. capital.gain is dominant in several clusters. Features such as age_group are non-actionable,
while hours.per.week ofers more practical options, though its explanatory impact is more variable
and context-dependent.</p>
          <p>Figure 5 allows to characterize each cluster in the explanation space. Cluster 0, the largest, is
dominated by capital.gain, with moderate contributions from education and hours.per.week,
indicating a relatively stable but partially heterogeneous explanatory pattern. Cluster 1 and 5 are
the most variable, with a broader spread across multiple features, including hours.per.week and
education, suggesting unstable and less interpretable explanations in this region. Cluster 2 and 3
reflect a similar reliance on capital.gain, with the additional involvement of age_group, reflecting
the correlation among the two.</p>
        </sec>
        <sec id="sec-5-1-3">
          <title>5.2.3. Diabete, from class 1 to 0</title>
          <p>In this direction, the counterfactuals highlight the key factors associated with transitions from having a
diabetes to being sane, that is the most frequent variables found in counterfactuals from class ”diabetes”
to ”sane”. While diabetes cannot be cured and most features are non-actionable, nonetheless the analysis
allows to highlight the risk factors for Diabetes, that turned out to be coherent with medical literature.
It must be stressed that the dataset is very small and that this result has a remarkable generality.</p>
          <p>Figure 6 allows to characterize each cluster in the original feature space. Cluster 2 is clearly an outlier,
while the only cluster with moderate entropy and a strong characterization is cluster 4, dominated
by pedigree/familiarty. Triceps skin thickness, even if with low prevalence, appears consistently in 3
clusters and show to have an non-marginal influence.</p>
          <p>Figure 7 shows very small and strongly characterized clusters: Clusters 2 and 3 are dominated by
triceps skin thickness and age and pedigree/familiarty respectively. Cluster 0, that is the biggest, is
dominated by glucose and pregnancies, while cluster 1 and 4 are outliers. Net of noise due to the small
sample size, the key risk factors for Diabetes (familiarity, triceps skin thickness, age and pregnancies)
can be recognized.</p>
          <p>What seems to emerge is that diabetes is more common in older people with familiarity, high level
of glucose, previous pregnancies and thick triceps skin fold. BMI and high level of glucose appear
randomly among the explanations in this direction.</p>
        </sec>
        <sec id="sec-5-1-4">
          <title>5.2.4. Diabete, from class 0 to 1</title>
          <p>In this direction, the counterfactuals highlight the key factors associated with transitions from being
sane to having a Diabete, that is the most frequent variables found in counterfactuals from class ”sane”
to ”diabete”. While most features are non-actionable, for actionable variables the analysis allows to
highlight the ones to keep under control, that again turned out to be coherent with medical literature,
nonetheless the small sample size. In extreme synthesis, the most important feature among clusters
results glucose, suggesting that controlling its level is a key protective factor for Diabetes.</p>
          <p>Figure 8 allows to characterize each cluster in the original feature space. As can be read from entropy
values, Cluster 1 has the minimum variability in terms of relevant features in the explanations, but all
the features have such a variability that appear randomly distributed among instances. In cluster 2
the only variable that appears to be non randomly distributed has a borderline value of    = 0.16
and a low prevalence   = 0.2. Cluster 3 appears to have the only really dominant feature in terms of
prevalence and variability that is glucose level.</p>
          <p>Figure 9 shows three very small clusters (1,3 and 5) and two big clusters of equal size, with a
diferent characterization: in Cluster 2 the dominant variables are BMI ad skin thickness, while Cluster
4 is dominated by glucose. Cluster 0 has an intermediate size and groups instances where BMI and
Pregnancies are dominant.</p>
          <p>What seems to emerge is that preventing diabete requires low glucose levels and that in the two
subgroup of females with high BMI and previous pegnancies, and people with high BMI and thicker
triceps skin fold, the risk is higher.</p>
        </sec>
        <sec id="sec-5-1-5">
          <title>5.2.5. Credit Risk, from class 1 to 0</title>
          <p>In this direction the counterfactuals highlight the factors that reduce the probability of loan default,
that is the most frequent variables found in counterfactuals generated for people that most risk to fail
their loan. In extreme synthesis, the failure risk is related to the income, the interest rate and the grade
of the loan. Some variables like interest rate can be actioned by the loaner to reduce the risk of failure.</p>
          <p>Figure 10 allows to characterize each cluster in the original feature space. Cluster 2, the largest,
evinces heterogeneous changes primarily in person_income and loan_percent_income. This pattern
is also observed in Cluster 3. In contrast, Clusters 0 and 1 are more stable, being dominated by
person_income, loan_int_rate, and loan_grade_num. This observation indicates that these financial
metrics consistently govern the transition to reduced default risk and potentially function as actionable
mechanisms for borrowers aiming to enhance their credit performance.</p>
          <p>Figure 11 allows to characterize each cluster in the explanation space. Cluster 0, the largest, is
dominated by person_income, indicating a relatively stable and interpretable explanatory pattern. Cluster 3
also shows a focused and consistent explanation, being primarily driven by loan_percent_income.
Clusters 1 and 2, smaller with 16 and 17 instances respectively, have higher entropy and heterogeneous
feature contributions, particularly loan_int_rate in Cluster 1 and loan_grade_num in Cluster 2,
indicating less stable and more difuse explanatory patterns. Clusters 4 and 5, the smallest clusters with 12
and 5 instances, show low entropy, each dominated by a single feature (loan_amnt in Cluster 4 and
person_emp_length in Cluster 5), reflecting highly focused but very specific explanations that may
have limited generalization capacity.</p>
        </sec>
        <sec id="sec-5-1-6">
          <title>5.2.6. Credit Risk, from class 0 to 1</title>
          <p>In this direction, the counterfactuals highlight the factors that increase the probability of loan default,
that is the most frequent variables found in counterfactuals generated for people that less risk to fail
their loan. In extreme synthesis, the most important feature among clusters results income, suggesting
that an high income is the best guarantee against the risk of failure. It must be noted that the people
with lowest income are also the most likely to apply for a loan, so there is an intrinsic bias in the data.</p>
          <p>Figure 12 allows to characterize each cluster in the original feature space. Clusters 1 and 2, the
largest ones, are stably dominated by person_income, while other features appear mostly random,
limiting interpretability. Clusters 0 and 3 show moderate entropy, with person_income again
emerging as the main driver, supported by loan-related variables (loan_int_rate, loan_amnt). Cluster 4
highlights a stronger role of loan_percent_income, still in combination with person_income.
Finally, Cluster 5, although very small, exhibits a highly stable structure around person_income and
loan_percent_income.</p>
          <p>Figure 13 allows to characterize each cluster in the explanation space. Cluster 1, the largest, shows
the lowest entropy and is clearly dominated by person_income, yielding a stable and interpretable
profile. Cluster 2 is also strongly characterized by person_income and loan_int_rate, with limited
variability. Clusters 0 and 3, by contrast, exhibit high entropy values and a more dispersed pattern, with
multiple features contributing randomly, which reduces interpretability. Cluster 4, despite its small
size, presents a coherent structure dominated by person_income and loan_amnt. Finally, Cluster 5
highlights the joint role of person_income and cb_person_cred_hist_length, ensuring moderate
stability.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>6. Conclusions</title>
      <p>Feature selection and counterfactual explanations act on a diferent scale. Feature selection on a global
scale is sometimes too general, while counterfactual explanations are often too specific, being valid only
for one instance. An intermediate-scale feature importance measure based on counterfactuals has been
proposed in this paper: it allows to characterize ”safe” regions, where the counterfactuals explanations
and the feature involved are stable and with a limited Rashomon efect; at the same time it allows to
recognize ”unsafe” regions, where the counterfactuals explanations and the feature involved change
randomly, so that very close points may end up with very diferent counterfactuals. Clustering has been
used in both the original feature space and the explanation space, accounting for label directionality and
more insights into the possible explanations an causal relationships. Results on real data are promising,
even on small datasets, and confirm the viability and efectiveness of the proposed method.</p>
    </sec>
    <sec id="sec-7">
      <title>Acknowledgments</title>
      <p>This work was supported by: the Digital Twin and Fintech services for sustainable supply chain
(SmarTwin) project (Fondo per la Crescita Sostenibile – Accordi per l’innovazione di cui al D.M. 31
dicembre 2021e D.D. 18 marzo 2022 - CUP B69J23000500005) Ministero dello Sviluppo Economico (MISE);
the context-AwaRe deCision-making for Autonomus unmmaneD vehicles in mArine environmental
monitoring (ARCAD-IA) project (PE00000013_1 - CUP E63C22002150007) cascade call of the Future
Artificial Intelligence Research (FAIR) project Spoke 3 - Resilient AI, within the National Recovery and
Resilience Plan (PNRR) of the Italian Ministry of University and Research (MUR).</p>
    </sec>
    <sec id="sec-8">
      <title>Declaration on Generative AI</title>
      <p>During the preparation of this work, the author(s) used Deep-L in order to: Grammar and spelling
check. After using these tool(s)/service(s), the author(s) reviewed and edited the content as needed and
take(s) full responsibility for the publication’s content.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>A.</given-names>
            <surname>Maratea</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Ferone</surname>
          </string-name>
          ,
          <source>Deep Neural Networks and Explainable Machine Learning</source>
          ,
          <year>2019</year>
          , pp.
          <fpage>253</fpage>
          -
          <lpage>256</lpage>
          . doi:
          <volume>10</volume>
          .1007/978- 3-
          <fpage>030</fpage>
          - 12544- 8_
          <fpage>23</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>J.</given-names>
            <surname>Burrel</surname>
          </string-name>
          ,
          <article-title>How the machine 'thinks': Understanding opacity in machine learning algorithms</article-title>
          ,
          <source>Big Data &amp; Society</source>
          <volume>3</volume>
          (
          <year>2016</year>
          ). URL: https://doi.org/10.1177/2053951715622512. doi:
          <volume>10</volume>
          . 1177/2053951715622512, original work published
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>E.</given-names>
            <surname>Parliament</surname>
          </string-name>
          ,
          <string-name>
            <surname>C.</surname>
          </string-name>
          <article-title>of the European Union, Regulation (eu) 2024/1689 of the european parliament and of the council laying down harmonised</article-title>
          <source>rules on artificial intelligence (artificial intelligence act)</source>
          ,
          <year>2024</year>
          . URL: https://eur-lex.europa.eu/eli/reg/2024/1689/oj, accessed:
          <fpage>2025</fpage>
          -04-01.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>H.</given-names>
            <surname>Chung</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Frankle</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z. C.</given-names>
            <surname>Lipton</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Doshi-Velez</surname>
          </string-name>
          ,
          <article-title>False sense of security in explainable artificial intelligence (xai), arXiv preprint (</article-title>
          <year>2024</year>
          ). URL: https://arxiv.org/abs/2405.03820. arXiv:
          <volume>2405</volume>
          .
          <fpage>03820</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>F.</given-names>
            <surname>Bodria</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Giannotti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Guidotti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Naretto</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Pedreschi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Rinzivillo</surname>
          </string-name>
          ,
          <article-title>Benchmarking and survey of explanation methods for black box models</article-title>
          ,
          <year>2021</year>
          . URL: https://arxiv.org/abs/2102.13076. arXiv:
          <volume>2102</volume>
          .
          <fpage>13076</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>R.</given-names>
            <surname>Guidotti</surname>
          </string-name>
          ,
          <article-title>Counterfactual explanations and how to find them: literature review and benchmarking</article-title>
          ,
          <source>Data Mining and Knowledge Discovery</source>
          <volume>38</volume>
          (
          <year>2022</year>
          )
          <fpage>1</fpage>
          -
          <lpage>55</lpage>
          . doi:
          <volume>10</volume>
          .1007/s10618- 022- 00831- 6.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>A.</given-names>
            <surname>Maratea</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Ferone</surname>
          </string-name>
          ,
          <article-title>Pitfalls of local explainability in complex black-box models</article-title>
          , in: A.
          <string-name>
            <surname>Ciaramella</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Mencar</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Montes</surname>
          </string-name>
          , S. Rovetta (Eds.),
          <source>Proceedings of WILF</source>
          <year>2021</year>
          , the 13th International Workshop
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>