<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Robustness Analysis of Counterfactual Explanations from Generative Models: An Empirical Study⋆</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Kseniya Sahatova</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Johannes De Smedt</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Xuefei Lu</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>KU Leuven</institution>
          ,
          <addr-line>B-3000 Leuven</addr-line>
          ,
          <country country="BE">Belgium</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>SKEMA Business School</institution>
          ,
          <addr-line>92156 Suresnes</addr-line>
          ,
          <country country="FR">France</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Counterfactual explanations are a key tool in explainable AI, ofering insights into complex machine learning models by addressing "What if?" scenarios. While conventional methods for generating counterfactual explanations (CFEs) rely on computationally expensive optimization techniques, generative models such as GANs and VAEs have enabled faster CFE generation. However, their opaque nature raises concerns about trustworthiness, especially in high-stakes domains like healthcare and finance, where transparency and accountability are crucial. In this study, we benchmark existing methods for generating CFEs that apply generative models and connect them with a range of established metrics to assess robustness in both binary and multiclass image classification settings. Our analysis yields insights into the reliability of these approaches, while the proposed taxonomy organizes this rapidly evolving field through categorization based on CFE search methodologies.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;counterfactual explanations</kwd>
        <kwd>explainability</kwd>
        <kwd>robustness</kwd>
        <kwd>image data</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        The rapid development of generative models (GMs) aims to tackle complex tasks, such as identifying
abnormalities in medical images, detecting fraud, and optimizing supply chains. As the demand for
explainability and transparency in black-box AI models grows, counterfactual explanations (CFEs) have
emerged as a tool in explainable AI that sheds light on decision-making processes by answering "What
if ?" question. CFEs generate plausible scenarios to provide alternative outcomes, enhancing model
interpretability and trust. However, optimization-based methods for generating CFEs are
computationally expensive [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. The integration of GMs, such as Generative Adversarial Networks (GANs) [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] and
Variational AutoEncoders (VAEs) [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], into the counterfactual explanation generation pipeline ofers
faster alternatives, although their opaque nature raises concerns about trustworthiness.
      </p>
      <p>
        In high-stakes domains like healthcare, finance, and human behavior, black-box explainers require
full transparency regarding both predictions and potential uncertainties. Attributes like proximity,
sparsity, robustness, feasibility, and actionability are crucial. While extensive surveys ([
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]) have
categorized methods and benchmarks, they reveal that no single method satisfies all attributes
simultaneously. Instead, methods balance these properties based on specific domain needs, enhancing the
lfexibility and adaptability of CFEs. Among these attributes, robustness, ensuring CFE stability despite
input or model changes, remains underexplored for high-dimensional data like images. Robustness
refers to the insensitivity of CFEs to small perturbations within guaranteed bounds, given that the
model’s prediction remains unchanged for the generated explanation. While significant progress has
been made in evaluating the robustness of CFE generation methods [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], most work has focused on
binary classification and tabular data.
      </p>
      <p>In this research, we assess the robustness of methods based on generative models used for
counterfactual explanations in a benchmark of three techniques with a lens on multiclass classification as most
of the robustness metrics in literature are primarily applied to binary classification tasks. Multiclass
scenarios remain underexplored, where target class selection significantly impacts counterfactual quality
due to varying distances in the data manifold and the proximity of decision boundaries. Additionally,
we compare the stability of these methods with regard to task complexity, specifically binary and
multiclass classification, and report the results based on the corresponding metrics. The considered
research questions are as follows:
• RQ 1: Are the counterfactual explanations produced by generative models resistant to various
forms of perturbations?
• RQ 2: Is the robustness of generated counterfactual explanations diferent in a multiclass setting
compared to the binary classification setting?</p>
    </sec>
    <sec id="sec-2">
      <title>2. Background and Related Works</title>
      <p>In this section, the definition of counterfactual explanations for classification tasks is formalized and
key components of the loss functions used by counterfactual explainers are outlined, which vary with
model architecture and domain. For instance, GANs optimize an adversarial loss, while VAEs use an
evidence lower bound. Further in the Related works subsection, we review recent work on GMs for
CFEs and propose a taxonomy of these commonly employed pipelines.</p>
      <p>
        Definition . A counterfactual explanation can be defined as follows. Given a predictive model 
that maps the distribution of input data to a discrete class distribution, denoted as  :  →  , where
 ⊆ R and  ∈ {0, . . . , }, we define a counterfactual explanation for a factual data point  of the
class  as  =  (, , ′;  ).  is a generative model that can either output an explanation
directly or a so-called diference mask, which must be applied to the factual data point. A valid
counterfactual explanation satisfies the condition  ( ) = ′, where ′ ̸= . Eq. 1 is the adapted
framework proposed in [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] (Eq. 1,2) that we make more coherent with the introduced notation. The
CFE algorithm is trained to minimize  as follows:
 (,  , ′) =    +   +    ; arg min  (,  , ′),

(1)
where  =  ( ( ), ′) is a classification loss term, regularized by   , encouraging the classifier’s
output on the counterfactual  to be close to the desired class ′.  = (· , · ) is a measure of the
distance between the data point  and the counterfactual  regularized with  .  is a generative
model-dependent regularization term, penalizing out-of-distribution instances, typically formulated as
an adversarial loss and/or a cycle-consistency loss. The number of components in the loss function
may vary depending on the required properties for the generated CFEs.
      </p>
      <p>Related Works. The development of GMs demonstrates not only their ability to produce high-quality
synthetic data, but also their potential to create realistic and meaningful visual CFEs. Categorizing CFE
methods based on GMs is challenging due to the complexity of models and their compound nature.
Kirilenko et al. [8] outlined the literature on GMs for counterfactual explanations. In contrast, we focus
on a higher-level classification based on CFE search mechanics rather than specific GMs. Our taxonomy
aims to highlight emerging directions in integrating GMs for CFEs and should be seen as an extension,
not a replacement, of existing benchmarks.</p>
      <p>Latent space optimization/perturbation. The latent space of GMs ofers a compact, adjustable
representation for generating new instances. Applying simple linear interpolations in the latent space
allows transformations of the latent vector [9], but these may not fully capture the complexities of
decision boundaries learned by sophisticated classifiers. Singla et al. [ 10] apply walks with a fixed step
size on the data manifold, which are then embedded in a low-dimensional space. C3LT [11] optimizes
an external model to learn meaningful perturbations for steering predictions, and REVISE [12] uses
constrained optimization with a pretrained VAE to modify the latent representation.</p>
      <p>Disentanglement of latent space. Disentangling the latent space in GMs helps identify orthogonal
factors that can be mapped to distinct, semantically meaningful concepts [13]. This approach can help
to reveal biases in black-box models or data and enhance counterfactual explainability by detecting
spurious correlations and enabling feature editing through factor manipulation. StyleGAN is used
in [14] to extract human-understandable latent style vectors, concept disentanglers are employed in
[15] to learn a predefined set of K concepts via cross-entropy loss, and Rotem et al. [16] enforce
disentanglement by whitening the latent covariance matrix in an adversarial autoencoder.</p>
      <p>Concept-based. Unlike tabular data, small image perturbations can lead to unrealistic adversarial
examples instead of plausible explanations. GMs can facilitate operation at a more abstract conceptual
level. STEEX [17] decomposes a latent vector into codes for semantic categories, while the work [18]
encodes label-related concepts as binary latent variables, and Dominici et al. [19] combine a Concept
Bottleneck Model with a VAE to model concept dependencies within a continuous latent space.</p>
      <p>
        Residual Learning. Another widely adopted approach to generate CFEs involves learning diferences
or residuals that modify the initial input to achieve the desired result of the classifier. CounteRGAN
[20] formalized residual GANs for identifying plausible CFEs. CX-GAN [21] generates discrepancy
maps that transform abnormal instances into normal ones in a medical context without relying on
a predictive model. COIN [22] applies a GAN conditioned on a flag for inpainting or removal of the
abnormal region and a latent code. Van Looveren et al. [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] propose a framework for generating sparse,
in-distribution CFEs for various data types, using a loss function tailored for desired properties of CFEs.
      </p>
      <p>Difusion-based. The limitations of GANs and VAEs have led to difusion-based models for
highquality CFEs. Difusion models reduce VAE blur while preserving quality and variability, which GANs
struggle with. DiME [23] was among the first to use a difusion model for explainability, combining
an unconditional DDPM sampler with a guidance mechanism. [24] explored adversarial attacks to
generate interpretable perturbations. LDCE [25] introduced a class-conditioned difusion model with
consensus guidance to filter misleading gradients.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Robustness of Counterfactual Explanations</title>
      <p>
        Robustness property is particularly important due to its inherent trade-of with the proximity objective:
proximity seeks minimal changes near the decision boundary, while robustness ensures explanations
remain valid despite minor perturbations. We adopt the classification established by [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] for images,
considering two groups of robustness to input and model changes that might afect the quality of
explanations. Unlike other data modalities, images are highly sensitive to minor, often imperceptible
perturbations that can generate adversarial examples, invalidating predictive outcomes. Therefore, the
robustness of generative model-based methods for CFEs is highly relevant.
      </p>
      <sec id="sec-3-1">
        <title>3.1. Input Changes</title>
        <p>Local Instability (LI) Theoretical proofs and formalizations of robustness to input changes are
present in [26], where it is defined as a measure of local instability. The authors opt for the 1 norm
without distance function restrictions in their experiments with tabular data and a handwritten digits
dataset. The following Eq. 2 is a mathematical notation of local instability given by [26]:</p>
        <p>E
′∼  ()
︀[ (cf, ′cf)︀] ,
(2)
where  denotes the original instance, cf represents its CFE , ′cf is the CFE for the perturbed instance
′,  represents the distribution of plausible perturbations around , (· ) measures the similarity or
distance between the CFEs of the original instance  and a perturbed instance ′.</p>
        <p>Similarity estimation for images typically relies on  norms, which operate at the pixel level
but ignore semantic diferences like shapes and spatial correlations. Alternative metrics such as the
Structural Similarity Index Measure (SSIM) and Peak Signal-to-Noise Ratio are commonly used in the
related task of adversarial example generation [27]. In our experiments, SSIM [28] and 1 norm are
used.</p>
        <p>Local Lipschitz Continuity Lipschitz continuity is a commonly used metric for evaluating the
stability of post-hoc explanations [29]. It estimates the relative change in the output with respect to the
variations in the input. Although it has been applied to assess the robustness of CFEs against model
changes, it has not yet been utilized to evaluate robustness with respect to input dissimilarities. Eq. 3
below formalizes the estimation of local Lipschitz continuity of the explanations.</p>
        <p>ˆ() =</p>
        <p>max
∈ ()
‖ −  ‖2 ,
‖ −  ‖2
where  () represents an  -ball centered at ,  is its CFE,  and  are an input instance
sampled from the  -ball and the corresponding generated CFE, respectively, where lower values of
ˆ() indicate more stable explanations.</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Model Changes</title>
        <p>This category of perturbations relates to variations in the predictive model  , which may be caused
by model retraining, a common practice in real-world applications, alters decision boundaries and,
consequently, the generated explanations. Prior studies have examined small perturbations caused by
weight reinitialization or the removal of training data subsets [30].</p>
        <p>Invalidation Rate (IR) In counterfactual consistency for deep neural networks, [31] suggests
accounting for both the cost of generating stable explanations and the Lipschitz continuity of the predictive
model in the vicinity of the counterfactual. The consistency of the explanations is measured by the
Invalidation Rate (IR), given by:</p>
        <p>IR( ,  ) = E ′∼ Θ˜ [I[ ( ;  ′) ̸=  ( ;  )]],
where  ( ;  ) denotes the predictive model with parameters  , and  ′ represents the model
parameters after variations in the training conditions.</p>
        <p>Validity After Retraining (VaR) It is a commonly used evaluation metric that measures the
percentage of CFEs that remain valid [30], i.e. belong to the same predicted class, under the retrained model.
It can be considered the opposite of the IR. However, the latter provides only a relative estimation
of the percentage of invalidated explanations after the model change introduced. In some cases, the
counterfactual generation method itself might have a low validity. Comparing the validity of the
perturbed predictive model with its initial counterpart can ofer a broader perspective on the robustness
and performance of the method. The validity can be defined as follows:</p>
        <p>Validity = 1 ∑︁ 1[ ′ () = ],</p>
        <p>=1
where  is a total number of counterfactual explanations,  ′() is a prediction of the retrained
model for the -th counterfactual instance, and  is the target class.</p>
        <p>Relaxed Stability (RS) The counterfactual stability metric was adapted for diferentiable models
based on the concept of local Lipschitz continuity in [30]. As exact Lipschitz estimation is often
impractical, the authors derive a relaxed stability metric (Eq. 6), where the Lipschitz constant is approximated.
The following properties are considered essential for generating robust explanations under naturally
occurring model changes, enabling the proposed relaxation: (i) high model confidence for an input  ,
denoted as  ( ); (ii) elevated values of  (′ ) for several points ′ in close proximity to  ; (iii)
low variability in model outputs around  .</p>
        <p>ˆ
, 2 ( ,  ) =
1</p>
        <p>∑︁
 ,∈,
︀(  ′ (,) − |  ′ ( ) −  ′ (,)|)︀ ,
(6)
where  , represents a set of k points sampled from a Gaussian distribution  ( ,  2I), with
I being the identity matrix.
(3)
(4)
(5)</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Experiments</title>
      <sec id="sec-4-1">
        <title>4.1. Experimental Design</title>
        <p>Datasets and classifiers. Most methods for generating CFEs focus on binary classification, where the
target class is the opposite of the initial prediction. Multiclass scenarios, however, remain underexplored.
In these cases, target class selection afects counterfactual quality, especially when the target class is
farther in the data manifold, increasing explanation costs. For instance, class 7 is farther from class
8 than from class 0, based on cosine similarity and class centroid distances in the MNIST manifold.
Additionally, multiple decision boundaries may challenge the stability of these explanations.</p>
        <p>We use the MNIST dataset for experiments in binary and multiclass classification tasks. In the binary
setting, class 1 is used for factual instances, with class 8 as the target for explanations. In the multiclass
setting, class 8 remains the target while considering multiple initial classes. Factual instance classes are
determined based on cosine similarity of features from the penultimate model layer, selecting the five
closest classes (0, 3, 4, 5, and 9) to class 8. A simple CNN with 3 convolutional layers and max pooling
achieved 99.95% accuracy in binary and 98.16% in multiclass classification.</p>
        <p>Counterfactual explainers. The analyzed counterfactual explainers covered diverse approaches
and model types, as detailed in Section 2, with code availability also considered. For example, STEEX
[17] requires segmentation masks for training instances, CF-CBM [19] relies on annotated concepts in
the data, and COIN is designed only for binary classification. Thus, REVISE [ 12], CounteRGAN [20],
and C3LT [11] were selected for the initial experiments.</p>
        <p>Robustness evaluation. For optimization-based methods like REVISE, robustness evaluation is
computationally expensive, so we limited the number of factual instances to k=100. In contrast,
CounteRGAN generates counterfactuals eficiently, allowing for more perturbed inputs. To estimate LI, we
applied incremental Gaussian noise  = {0.001, 0.0025, 0.005, 0.0075, 0.01}, ensuring visual similarity.
Estimating local Lipschitz continuity involves sampling from the  -ball around an input and
generating multiple explanations per sample. The number of sampled points around a given counterfactual
explanation is set to 30 for REVISE, to reduce computational costs, and 50 for CounteRGAN and C3LT.</p>
        <p>The metrics in Section 3.2 are evaluated on 10 perturbed CNN models, averaging results. Naturally
occurring model changes, theoretically justified in [ 30], are introduced by reinitializing model weights
with adjusted random seeds. We estimate IR and RS using explanations from the original test images.</p>
        <p>Results. The primary results of our study are summarized below. To gain a better understanding
of the quality of the generated explanations, we evaluate not only local instability w.r.t. (1) distance
and SSIM but also the validity (the same Eq. 5) of the methods. The results of LI towards small input
perturbations for both binary and multiclass settings are depicted in Fig. 1. The validity results indicate
that REVISE is capable of generating CFEs that consistently lead the classifier to the same target output
and remain stable across tested noise magnitudes. In contrast, CounteRGAN achieves only 43% validity
at a perturbation level of 0.001, which further declines to 36% at a noise level of 0.01 in the multiclass
classification task. For binary classification, CounteRGAN attains a maximum validity of 15% at a noise
level of 0.001 and drops to 9%. C3LT maintains consistent validity across all multiclass experiments,
with all generated explanations correctly classified as digit 8. However, in the binary setting, the validity
of the algorithm is lower, reaching only 67%.</p>
        <p>Regarding  (1) metric, REVISE exhibits stable average values even at higher noise levels, despite
having a higher standard deviation. Furthermore, the distances between CFEs generated for original
and perturbed inputs are greater in multiclass classification, presumably due to the richer semantics of
the selected classes and, consequently, the larger number of transformed pixels during CFE generation.
It is worth noting that the method optimizes the latent code of the perturbed input, which may converge
along a diferent gradient path, potentially resulting in a diferent yet valid explanation. It can be seen
in Fig. 1 (a), where  (  ) is rather low for both binary and multiclass tasks.</p>
        <p>In contrast, explanations generated by CounteRGAN deviate more from the original explanations
as the noise magnitude increases (Fig. 1 (b)). However, the average LI remains lower compared to
REVISE in scenarios involving multiple classes. The method’s poor performance in the binary setting
(a)
is reflected in the results of  (  ), which indicate a deterioration in CounteRGAN’s ability to
generate perceptually similar explanations, which is less pronounced in the results of  (1). A similar
pattern is observed for C3LT (Fig.1 (c)), although the average distances in all perturbations remain
relatively low. Unlike REVISE that optimizes the latent code directly, C3LT uses this additional model
g that learns the mapping of the given latent code of a factual class to the target class. The local
Lipschitz continuity estimates are shown in Figure 2. This metric efectively reflects the conclusions on
local instability, demonstrating a greater dissimilarity between counterfactuals generated by REVISE
compared to those produced by CounteRGAN and C3LT.</p>
        <p>Comparing the robustness of explanations against model changes, we additionally present the initial
validity (IV) results of the unperturbed classifier in Table 1. REVISE shows a higher IR in binary
settings, which is 0.195, than in multiclass - 0.046, consistent with VaR being the inverse of IR. The
multiclass classifiers achieve an average RS of approximately 0.92, whereas binary classifiers attain
only 0.79. CounteRGAN exhibits an inverse trend in terms of IR, with higher initial validity but greater
invalidation in the multiclass setting. Nevertheless, the RS results reveal a diferent pattern: multiclass
classifiers perform worse on the generated explanations validated by the unperturbed classifier. For this
method, validity is initially low in either settings: 15% in the binary setting and 47% in the multiclass
setting. C3LT provides only 65% of valid explanations in the binary problem, while reaching 100% in
the multiclass task. The IR is 0 for the latter setting and constitutes only 0.001 for the former. However,
the RS of slightly perturbed binary classifiers drops significantly compared to multiclass scenarios.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Conclusion</title>
      <p>In this work, we present preliminary results on the robustness of counterfactual explanation methods
based on generative models. Our proposed taxonomy provides structure to this rapidly evolving
ifeld by categorizing solutions according to their architectural properties. The evaluation reveals that
robustness requires joint assessment across multiple metrics, with binary classification unexpectedly
exhibiting greater fragility than multiclass scenarios despite its simpler decision boundaries. For RQ1,
CounteRGAN and C3LT produced counterfactuals with greater deviation under perturbations, while
REVISE better preserved explanation quality. However, perceptual analysis revealed that REVISE
may still generate visually distinct explanations from minimally perturbed inputs, posing concerns
in sensitive applications. For RQ2, multiclass tasks increased explanation costs due to more semantic
features, reflected in REVISE’s local instability. CounteRGAN and C3LT showed low validity in binary
settings, declining with perturbations. All methods were less stable under model changes, with binary
classification surprisingly showing higher invalidation rates. Overall, these findings reveal critical
trade-ofs between explanation quality, stability, and task complexity that must be addressed for reliable
deployment in sensitive domains. The code is publicly available on GitHub 1.</p>
    </sec>
    <sec id="sec-6">
      <title>Declaration on Generative AI</title>
      <p>During the preparation of this work, the author(s) used Chat-GPT in order to: Grammar and spelling
check. After using these tool(s)/service(s), the author(s) reviewed and edited the content as needed and
take(s) full responsibility for the publication’s content.
1https://github.com/kSahatova/CF-Robustness-Benchmark.git
[8] D. Kirilenko, P. Barbiero, M. Gjoreski, M. Luštrek, M. Langheinrich, Generative models for
counterfactual explanations, View Article (2024).
[9] M. Y. Michelis, Q. Becker, On linear interpolation in the latent space of deep generative models,
in: ICLR 2021 Workshop on Geometrical and Topological Representation Learning, 2021.
[10] S. Singla, B. Pollack, J. Chen, K. Batmanghelich, Explanation by Progressive Exaggeration, 2020.
[11] S. Khorram, L. Fuxin, Cycle-consistent counterfactuals by latent transformations, in: Proceedings
of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 10203–10212.
[12] S. Joshi, O. Koyejo, W. Vijitbenjaronk, B. Kim, J. Ghosh, Towards realistic individual recourse and
actionable explanations in black-box decision making systems, CoRR abs/1907.09615 (2019).
[13] A. Pandey, M. Fanuel, J. Schreurs, J. A. Suykens, Disentangled representation learning and
generation with manifold optimization, Neural Computation 34 (2022) 2009–2036.
[14] S. Sankaranarayanan, T. Hartvigsen, L. Oakden-Rayner, M. Ghassemi, P. Isola, Real world relevance
of generative counterfactual explanations, in: Workshop on Trustworthy and Socially Responsible
Machine Learning, NeurIPS 2022, 2022.
[15] A. Ghandeharioun, B. Kim, C.-L. Li, B. Jou, B. Eof, R. W. Picard, Dissect: Disentangled simultaneous
explanations via concept traversals, arXiv preprint arXiv:2105.15164 (2021).
[16] O. Rotem, T. Schwartz, R. Maor, Y. Tauber, M. T. Shapiro, M. Meseguer, D. Gilboa, D. S. Seidman,
A. Zaritsky, Visual interpretability of image-based classification models by generative latent space
disentanglement applied to in vitro fertilization, Nature communications 15 (2024) 7390.
[17] P. Jacob, Zablocki, H. Ben-Younes, M. Chen, P. Pérez, M. Cord, STEEX: Steering Counterfactual</p>
      <p>Explanations with Semantics, 2022.
[18] I. Gat, G. Lorberbom, I. Schwartz, T. Hazan, Latent space explanation by intervention, in:</p>
      <p>Proceedings of the AAAI Conference on Artificial Intelligence, volume 36, 2022, pp. 679–687.
[19] G. Dominici, P. Barbiero, F. Giannini, M. Gjoreski, G. Marra, M. Langheinrich, Climbing the Ladder
of Interpretability with Counterfactual Concept Bottleneck Models, 2024.
[20] D. Nemirovsky, N. Thiebaut, Y. Xu, A. Gupta, CounteRGAN: Generating counterfactuals for
real-time recourse and interpretability using residual GANs, in: Proceedings of the Thirty-Eighth
Conference on Uncertainty in Artificial Intelligence, PMLR, 2022, pp. 1488–1497.
[21] T. Zia, Z. Nisar, S. Murtaza, Counterfactual Explanation and Instance-Generation using
Cycle</p>
      <p>Consistent Generative Adversarial Networks, 2023.
[22] D. Shvetsov, J. Ariva, M. Domnich, R. Vicente, D. Fishman, COIN: Counterfactual inpainting for
weakly supervised semantic segmentation for medical images, 2024.
[23] G. Jeanneret, L. Simon, F. Jurie, Difusion models for counterfactual explanations, in: Proceedings
of the Asian conference on computer vision, 2022, pp. 858–876.
[24] G. Jeanneret, L. Simon, F. Jurie, Adversarial counterfactual visual explanations, in: Proceedings of
the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 16425–16435.
[25] K. Farid, S. Schrodi, M. Argus, T. Brox, Latent difusion counterfactual explanations, arXiv preprint
arXiv:2310.06668 (2023).
[26] A. Artelt, V. Vaquet, R. Velioglu, F. Hinder, J. Brinkrolf, M. Schilling, B. Hammer, Evaluating</p>
      <p>Robustness of Counterfactual Explanations, 2021.
[27] M. Sharif, L. Bauer, M. K. Reiter, On the suitability of lp-norms for creating and preventing
adversarial examples, in: Proceedings of the IEEE conference on computer vision and pattern
recognition workshops, 2018, pp. 1605–1613.
[28] Z. Wang, A. C. Bovik, H. R. Sheikh, E. P. Simoncelli, Image quality assessment: from error visibility
to structural similarity, IEEE transactions on image processing 13 (2004) 600–612.
[29] D. Alvarez-Melis, T. S. Jaakkola, On the robustness of interpretability methods, arXiv preprint
arXiv:1806.08049 (2018).
[30] F. Hamman, E. Noorani, S. Mishra, D. Magazzeni, S. Dutta, Robust Counterfactual Explanations
for Neural Networks With Probabilistic Guarantees, in: Proceedings of the 40th International
Conference on Machine Learning, PMLR, 2023, pp. 12351–12367. ISSN: 2640-3498.
[31] E. Black, Z. Wang, M. Fredrikson, Consistent counterfactuals for deep models, in: International
Conference on Learning Representations, 2022.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>R.</given-names>
            <surname>Guidotti</surname>
          </string-name>
          ,
          <article-title>Counterfactual explanations and how to find them: literature review and benchmarking</article-title>
          ,
          <source>Data Mining and Knowledge Discovery</source>
          <volume>38</volume>
          (
          <year>2024</year>
          )
          <fpage>2770</fpage>
          -
          <lpage>2824</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>I. J.</given-names>
            <surname>Goodfellow</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Pouget-Abadie</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Mirza</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Warde-Farley</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Ozair</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Courville</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Bengio</surname>
          </string-name>
          , Generative adversarial nets,
          <source>Advances in neural information processing systems</source>
          <volume>27</volume>
          (
          <year>2014</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>D. P.</given-names>
            <surname>Kingma</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Welling</surname>
          </string-name>
          , et al.,
          <source>Auto-encoding variational bayes</source>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>S.</given-names>
            <surname>Verma</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Boonsanong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Hoang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K. E.</given-names>
            <surname>Hines</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. P.</given-names>
            <surname>Dickerson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Shah</surname>
          </string-name>
          ,
          <source>Counterfactual Explanations and Algorithmic Recourses for Machine Learning: A Review</source>
          ,
          <year>2022</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>A.-H.</given-names>
            <surname>Karimi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Barthe</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Schölkopf</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Valera,</surname>
          </string-name>
          <article-title>A survey of algorithmic recourse: Contrastive explanations and consequential recommendations 55 (</article-title>
          <year>2022</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>J.</given-names>
            <surname>Jiang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Leofante</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Rago</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Toni</surname>
          </string-name>
          ,
          <article-title>Robust counterfactual explanations in machine learning: A survey</article-title>
          , in: K. Larson (Ed.),
          <source>Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence, IJCAI-24, International Joint Conferences on Artificial Intelligence Organization</source>
          ,
          <year>2024</year>
          , pp.
          <fpage>8086</fpage>
          -
          <lpage>8094</lpage>
          . Survey Track.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>A.</given-names>
            <surname>Van Looveren</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Klaise</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Vacanti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Cobb</surname>
          </string-name>
          ,
          <source>Conditional Generative Models for Counterfactual Explanations</source>
          ,
          <year>2021</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>