<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Generative Models for Counterfactual Explanations</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Daniil Kirilenko</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Pietro Barbiero</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Martin Gjoreski</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Mitja Luštrek</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Marc Langheinrich</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Jožef Stefan Institute</institution>
          ,
          <addr-line>Ljubljana</addr-line>
          ,
          <country country="SI">Slovenia</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Università della Svizzera italiana</institution>
          ,
          <addr-line>Lugano</addr-line>
          ,
          <country country="CH">Switzerland</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Counterfactual explanations have emerged as an efective method of explaining machine learning models. These explanations elucidate how to tweak the model input in order to flip its output. Generative approaches serve as a tool for creating meaningful counterfactuals for complex problems, where other methods fail or require too much computation. This work presents an overview of generative approaches and their applications in the generation of counterfactual explanations. We highlight the prevailing challenges, such as diversity and distinction from adversarial examples, and identify open questions with future research directions, such as ensuring the stability of counterfactuals and automatic reasoning with counterfactual explanations.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Counterfactual Explanations</kwd>
        <kwd>Generative Models</kwd>
        <kwd>Explainable AI</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Counterfactual explanations clarify complex system decisions by answering "what if" scenarios,
showing how minimal input changes can lead to diferent outcomes [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. This is crucial in
Machine Learning (ML), where understanding the rationale of a model is as important as the
decision itself [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. By examining hypothetical alternatives, counterfactual explanations make
ML models’ decision-making more transparent and comprehensible.
      </p>
      <p>
        Despite growing interest in counterfactual explanations, there is a gap in the literature on
the generative methods used to create them. Variational Autoencoders (VAEs) [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], Generative
Adversarial Networks (GANs) [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], and Denoising Difusion Probabilistic Models (DDPMs)
[
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] are notable for generating counterfactuals, especially for complex data modalities such as
images, where tweaking uninterpretable features fall short. However, existing surveys often
overlook the generative aspects or high-dimensional data scenarios [
        <xref ref-type="bibr" rid="ref6 ref7 ref8">6, 7, 8</xref>
        ]. Our work addresses
this gap by focusing on generative models for counterfactual explanations in complex data,
ofering a comprehensive understanding of their capabilities and limitations.
      </p>
      <p>
        In this paper, we explore the common use cases of generative models for counterfactual
explanations and highlight primary challenges. We categorize methods by their generative
techniques and examine modifications to standard processes to meet counterfactual
requirements. Our discussion aims to stimulate further research by identifying key challenges and
potential directions for advancing generative methods in counterfactual explanations. While
decision boundary
data space
perturb
latent space
counterfactual generation is often seen through the lens of causal generative modeling [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], we
focus on noncausal approaches.
      </p>
    </sec>
    <sec id="sec-2">
      <title>2. Counterfactual Explanations</title>
      <p>
        Counterfactual explanations are essential for explainability in machine learning [
        <xref ref-type="bibr" rid="ref1 ref6">6, 1</xref>
        ]. Formally,
given a dataset  with corresponding labels  , a counterfactual for a query sample q ∈  is an
alternative sample cf ∈  that results in a diferent outcome  ∈  under a predictive model
 :  →  , providing insight into model behavior. The quality of a counterfactual explanation
depends on several conditions [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. Proximity and validity require a counterfactual to be as
similar as possible to the query sample q but lead to a diferent desired outcome. Meanwhile,
plausibility and actionability demand that the suggested modifications be meaningful and
practical to users. For example, suggesting a reduction in age is impractical, as age cannot be
reversed.
      </p>
      <p>
        Counterfactuals and adversarial examples. To further understand counterfactual
explanations, it is useful to compare them with adversarial examples, since both involve modifying
the original samples to change the output of a model. Counterfactual explanations and
adversarial examples modify the original samples to change the output of a model but difer in their
objectives. Counterfactuals introduce semantically reasonable changes to provide meaningful
insights, while adversarial examples use subtle, imperceptible perturbations to mislead the
model. Distinguishing between them is challenging. Thus, it is crucially important for
counterfactual approaches to ensure modifications that are perceptible and semantically significant
[
        <xref ref-type="bibr" rid="ref10 ref11">10, 11</xref>
        ].
      </p>
      <p>a)
c)</p>
      <p>VAE</p>
      <p>Optimization methods
Latent perturbations
Modality-specific VAEs</p>
      <p>DDPM
b)</p>
      <p>GAN
Explicit guidance
Textual inversion</p>
      <p>Semantic
decomposition
Cycle-consistency</p>
      <p>GANs</p>
      <p>StyleGAN
modifications
domains like images or time-series is challenging. In tabular data, ensuring properties like
validity and feasibility is straightforward. However, high-dimensional data with non-interpretable
features poses significant dificulties. Generative models capable of approximating data
distributions ofer a promising solution. Figure 1 illustrates common uses of VAEs, GANs, and DDPMs
for counterfactual generation. These models can satisfy counterfactual conditions through
specific modifications and regularizations, such as incorporating distance metrics for validity or
controllable generation techniques for actionability.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Variational Autoencoders</title>
      <p>VAEs are useful for counterfactual generation by approximating data density  (|), ensuring
modifications remain within the data distribution (see the Appendix A for details). Since VAEs
can produce interpretable latent factors, they are useful for counterfactual explanations. Our
taxonomy for VAE-based counterfactual generation methods includes optimization methods,
which employ optimization of an expression defining a good counterfactual while using VAE to
stay in the data manifold; latent perturbations, which encode a sample into a latent space,
modify it, and decode to obtain a counterfactual; we also mention some modality-specific
VAEs to highlight the versatility of this approach.</p>
      <sec id="sec-3-1">
        <title>Optimization methods.</title>
        <p>
          This is the most common approach to use density approximation for
counterfactual explanations [
          <xref ref-type="bibr" rid="ref1 ref6">1, 6</xref>
          ]. They optimize an expression involving a classifier  , desired
outcome , classifier loss ℓ, and a cost function  that enforces desired properties, balanced by  :
ℒcf = ℓ(︁  (︀  (|))︀ , 
︁)
+  (︀ q,  (|))︀ , cf = arg min ℒcf
(1)
The counterfactual explanation is derived by cf ∼  (|cf). Despite using the learnable
posterior  (|), stochastic optimization process can result in  being outside the prior (),
making generating actionable counterfactuals a challenge.
        </p>
        <p>
          Latent perturbations. [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ] introduced a conditional VAE with factorized encoder and
decoder. It uses mixture priors for clustering in the latent space. Counterfactuals are generated
by small perturbations to the latent representation and reconstruction through the decoder,
ensuring proximity and high-density data alignment. [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ] proposed an approach to generate
non-trivial, diverse explanations by varying less influential latent factors.
        </p>
        <p>
          Modality-specific VAEs. [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ] presented a framework for counterfactual explanations for
graph ML models using a conditional graph VAE. It handles graph data challenges and generalizes
to out-of-distribution graphs. [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ] focused on anomaly detection in multivariate time-series,
segmenting the latent space into general and salient components with supervised contrastive
loss. Counterfactuals replace the salient component with a healthy latent prototype, estimated
using kernel density estimation.
        </p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Generative Adversarial Networks</title>
      <p>
        The capability of GANs to generate high-quality samples makes them useful for realistic
counterfactuals. However, they face challenges with unstable training and mode collapse, limiting
diversity [
        <xref ref-type="bibr" rid="ref16 ref17 ref4">4, 16, 17, 18</xref>
        ]. We categorize GAN-based approaches into four groups: Conditional
GANs, where the generator  combines latent noise , encoded features from the original
sample q, and the class label  to generate counterfactuals: cf = (, q, ); Semantic
decomposition, which involves segmenting original images into meaningful regions and
treating each region individually during generation, these approaches utilize the individual
editing of semantically distinct regions to enhance actionability; Cycle-consistency GANs,
which produce a counterfactual and its reversal (as a counterfactual with respect to the
counterfactual), aligning the reversal with the original sample; and StyleGAN modifications , which
use StyleGAN modifications for detailed and high-quality counterfactuals.
      </p>
      <p>Conditional GANs. [19] introduced a GAN-based approach for counterfactual generation
by finding the latent encoding of a query image and using a class-conditional GAN to produce
three instances: a reconstructed original, a modified image, and a change mask. The final
counterfactual blends the original and modified images based on the predicted mask. [ 20] used
typical conditional GAN training with additional constraints to ensure validity and
counterfactual properties. [21] modified the GAN architecture, training the generator to output residuals
instead of complete data points. [22] used an external classifier as a discriminator to improve
robustness against adversarial attacks.</p>
      <p>Semantic decomposition. [23] decomposed counterfactual generation into three
components: background, foreground, and object mask, using a conditional GAN for each. [24] used
semantic maps and embeddings to generate counterfactuals. [25] combined conditional GANs
with saliency maps to target specific regions for modification.</p>
      <p>Cycle-consistency GANs. [26] used cycle-consistency loss to ensure coherence and
reversibility of counterfactual changes. [27] added latent concept vectors for disentangled
concept learning. [28] enforced cycle-consistency between original and counterfactual latent
embeddings.</p>
      <p>StyleGAN modifications. StyleGAN [29] uses latent style vectors for image generation. [30]
combined StyleGAN vectors with classifier outputs for counterfactuals. [ 31] integrated a style
vector with a CLIP [32] embedding to allow user-defined modifications in natural language.</p>
    </sec>
    <sec id="sec-5">
      <title>5. Denoising Difusion Probabilistic Models</title>
      <p>DDPMs have emerged as the state-of-the-art generative approach, particularly useful for
counterfactual generation (see the Appendix C for details). We classify these approaches into two
main groups: explicit guidance and textual inversion methods. Explicit guidance
methods exploit an external diferentiable classifier (|) to replace the original score function
∇ log () with a conditional one ∇ log (|) to generate specific counterfactuals. Textual
inversion is a technique used in text-conditioned generative models, where unique trainable
embeddings are assigned to images that share common concepts. These embeddings are
optimized so that, when used as conditioning inputs, the generative model produces images that
are similar to the original ones, efectively capturing and reproducing the shared concepts.
Counterfactual generation here involves modifications of the original concept embedding.
Explicit guidance. Advances in DDPMs have led to innovative applications in counterfactual
generation, such as DiME [33] and DVCE [34]. These methods use classifier-guidance during
the difusion process. Since difusion models operate with noised samples, DVCE employs a
one-step denoising approximation, while DiME uses multiple iterations. [35] emphasized the
significance of techniques for handling noised samples, such as gradient cone projections and
intermediate denoising steps. [36] introduced a two-stage approach using classifier feedback
for initial image modifications, followed by iterative denoising with DDPM. [ 37] utilized the
latent difusion method from [ 38], known for computational eficiency by operating in a
lowerdimensional latent space. The introduced consensus guidance mechanism filters gradients to
ensure plausible counterfactual changes.</p>
      <p>DDPMs gained wide application in medical image processing. One significant task addressed
with counterfactual generation is the medical anomaly detection, which involves identifying
disease-specific regions within samples. [ 39] and [40] used reverse difusion with classifier
guidance to generate healthy counterparts of pathological images. [41] combined DDIM [42],
DDPM, and saliency maps for precisely targeted modifications. [ 43] explores counterfactual
generation for fMRI data, using a transformer-only model for long-range dependencies and a
modified difusion sampling process for enhanced eficiency.</p>
      <p>Textual inversion. Textual inversion [44] learns distinctive tokens for specific classes or
concepts, enabling refined control over the generation process. [ 45] used this technique for
counterfactual generation by combining learned concept tokens with additional counterfactual
shift. [46] applied textual inversion to visual counterfactuals by learning concept embeddings
and prompts for predefined objects or classes. In contrast to the previous approaches, this
method implemented counterfactual modifications as transitions between concepts.</p>
    </sec>
    <sec id="sec-6">
      <title>6. Challenges, open questions, and future directions</title>
      <sec id="sec-6-1">
        <title>6.1. Challenges</title>
        <p>
          Dealing with adversarial perturbations. All methods for counterfactual explanations
vary significantly in their approach to ensuring semantically reasonable modifications. Some
strategies use discriminative models that are adversarially resilient [34]. Other approaches
apply changes within a structured latent space [
          <xref ref-type="bibr" rid="ref13">37, 28, 13</xref>
          ], resulting in more feasible and
meaningful modifications. Another promising method involves the use of a structured latent
space with representations of concepts as individual entities present in a sample [47, 48], learned
with or without supervision [46, 45, 27].
        </p>
        <p>
          Diversity of generated explanations. In high-dimensional data, generating diverse
counterfactual explanations is crucial but rarely addressed [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ]. Multiple plausible changes can lead to
the same desired outcome, making it a complex task to consider all possible directions. However,
too many diverse explanations can overwhelm users and hinder decision-making, requiring a
balance between variety and interpretability. The challenge is to combine potential alterations,
identify the most reasonable and relevant ones, and allow users to choose specific directions
of change without being inundated. This diversity of counterfactuals, vital for comprehensive
explainability, remains an open area for research and development. However, providing users
with multiple diverse explanations can invoke problems related to the Rashomon Efect [ 49],
where diferent potentially contradictory explanations may cause potentially contradictory
interpretations of the same phenomena.
        </p>
      </sec>
      <sec id="sec-6-2">
        <title>6.2. Open questions</title>
        <p>Are counterfactuals stable? The stability of counterfactuals — ensuring minimal changes
in the input sample do not lead to disproportionate changes in the output—remains a critical
challenge. Research in tabular data has revealed vulnerabilities, where slight manipulations in
the input can drastically alter the counterfactual [50]. This raises the question of whether
generative approaches, increasingly used for counterfactual explanations, exhibit similar instability
or ofer more robust solutions. Understanding and improving the stability of these models is
crucial for their reliability and trustworthiness in practical applications.</p>
        <p>How to evaluate generated counterfactuals? Evaluating counterfactual explanations in
AI is challenging due to a disconnect between theoretical metrics and real-world applicability.
Traditional metrics, focusing on conditions such as proximity to factual instances or diversity,
do not necessarily translate to practical utility for end-users [51]. Research [52] has shown
that while counterfactual explanations may satisfy theoretical criteria, their impact on user
trust and understanding is inconsistent. To address this, recent approaches integrate users
and domain experts into the evaluation process [34, 53]. This expert-in-the-loop methodology
aligns theoretical constructs with practical realities, especially in critical areas like healthcare,
providing a more comprehensive evaluation of counterfactual explanations’ utility. However, a
universal benchmark for comparing diferent methods is still missing.</p>
      </sec>
      <sec id="sec-6-3">
        <title>6.3. Future directions</title>
      </sec>
      <sec id="sec-6-4">
        <title>Modalities and multimodal counterfactual explanations. Much of the current research</title>
        <p>on counterfactual explanations focuses on image data. Extending these methodologies to other
modalities, such as graphs, time-series, audio, and video is an important challenge. In addition,
there is a gap in methods capable of handling multimodal data, which is increasingly prevalent in
practical applications. Developing techniques that can generate counterfactuals across various
modalities, or even within multimodal contexts, is a critical area for future exploration.</p>
      </sec>
      <sec id="sec-6-5">
        <title>Grounding and reasoning in counterfactual explanations. The typical process of gener</title>
        <p>ating counterfactuals often lacks detailed explanations for why certain changes lead to specific
outcomes, leaving users to interpret these changes on their own, which can lead to
misunderstandings. Recent advances in Large Language Models (LLMs) with multimodal capabilities
ofer a promising solution to this issue [ 54]. Integrating advanced LLMs with counterfactual
generation can enhance user comprehension by providing reasoning for suggested changes.
This aligns with the concept of Evaluative AI [55]. Eforts like [ 56] highlight the versatility of
LLMs in improving model explainability, especially with tabular data. Combining various forms
of explanations, including semi-factuals [57], with multimodal LLMs conditioned on specific
scenarios, could lead to more comprehensive and transparent AI systems.</p>
      </sec>
    </sec>
    <sec id="sec-7">
      <title>7. Conclusion</title>
      <p>This work highlights the crucial role of generative models in producing counterfactual
explanations for high-dimensional data. We emphasize the need for semantically rich, intuitive
explanations and robust user-centered evaluation describing existing approaches. We discussed
future research directions, which include application of counterfactuals across diverse data
modalities and integrating them with LLMs and other explanatory methods.</p>
    </sec>
    <sec id="sec-8">
      <title>Acknowledgment</title>
      <p>This study was funded by the projects TRUST-ME (205121L_214991), BASE (200021_182109),
and XAI-PAC (PZ00P2_216405).
gans, in: Proceedings of the IEEE/CVF international conference on computer vision, 2019,
pp. 6382–6390.
[18] H. Thanh-Tung, T. Tran, Catastrophic forgetting and mode collapse in gans, in: 2020
international joint conference on neural networks (ijcnn), IEEE, 2020, pp. 1–10.
[19] P. Samangouei, A. Saeedi, L. Nakagawa, N. Silberman, Explaingan: Model explanation via
decision boundary crossing transformations, in: Proceedings of the European Conference
on Computer Vision (ECCV), 2018, pp. 666–681.
[20] A. Van Looveren, J. Klaise, G. Vacanti, O. Cobb, Conditional generative models for
counterfactual explanations, arXiv preprint arXiv:2101.10123 (2021).
[21] D. Nemirovsky, N. Thiebaut, Y. Xu, A. Gupta, Countergan: Generating realistic
counterfactuals with residual generative adversarial nets, arXiv preprint arXiv:2009.05199
(2020).
[22] R. Bischof, F. Scheidegger, M. A. Kraus, A. C. I. Malossi, Counterfactual image generation
for adversarially robust and interpretable classifiers, arXiv preprint arXiv:2310.00761
(2023).
[23] A. Sauer, A. Geiger, Counterfactual generative networks, arXiv preprint arXiv:2101.06046
(2021).
[24] P. Jacob, É. Zablocki, H. Ben-Younes, M. Chen, P. Pérez, M. Cord, Steex: steering
counterfactual explanations with semantics, in: European Conference on Computer Vision,
Springer, 2022, pp. 387–403.
[25] A. Samadi, A. Shirian, K. Koufos, K. Debattista, M. Dianati, Safe: Saliency-aware
counterfactual explanations for dnn-based automated driving systems, arXiv preprint
arXiv:2307.15786 (2023).
[26] S. Singla, B. Pollack, J. Chen, K. Batmanghelich, Explanation by progressive exaggeration,
arXiv preprint arXiv:1911.00483 (2019).
[27] A. Ghandeharioun, B. Kim, C.-L. Li, B. Jou, B. Eof, R. W. Picard, Dissect: Disentangled
simultaneous explanations via concept traversals, arXiv preprint arXiv:2105.15164 (2021).
[28] S. Khorram, L. Fuxin, Cycle-consistent counterfactuals by latent transformations, in:
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition,
2022, pp. 10203–10212.
[29] T. Karras, S. Laine, T. Aila, A style-based generator architecture for generative adversarial
networks, in: Proceedings of the IEEE/CVF conference on computer vision and pattern
recognition, 2019, pp. 4401–4410.
[30] O. Lang, Y. Gandelsman, M. Yarom, Y. Wald, G. Elidan, A. Hassidim, W. T. Freeman, P. Isola,
A. Globerson, M. Irani, et al., Explaining in style: Training a gan to explain a classifier
in stylespace, in: Proceedings of the IEEE/CVF International Conference on Computer
Vision, 2021, pp. 693–702.
[31] J. Luo, Z. Wang, C. H. Wu, D. Huang, F. De la Torre, Zero-shot model diagnosis, in:
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition,
2023, pp. 11631–11640.
[32] A. Radford, J. W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal, G. Sastry, A. Askell,
P. Mishkin, J. Clark, et al., Learning transferable visual models from natural language
supervision, in: International conference on machine learning, PMLR, 2021, pp. 8748–8763.
[33] G. Jeanneret, L. Simon, F. Jurie, Difusion models for counterfactual explanations, in:</p>
      <p>Proceedings of the Asian Conference on Computer Vision, 2022, pp. 858–876.
[34] M. Augustin, V. Boreiko, F. Croce, M. Hein, Difusion visual counterfactual explanations,</p>
      <p>Advances in Neural Information Processing Systems 35 (2022) 364–377.
[35] P. Vaeth, A. M. Fruehwald, B. Paassen, M. Gregorova, Difusion-based visual counterfactual
explanations–towards systematic quantitative evaluation, arXiv preprint arXiv:2308.06100
(2023).
[36] G. Jeanneret, L. Simon, F. Jurie, Adversarial counterfactual visual explanations, in:
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition,
2023, pp. 16425–16435.
[37] K. Farid, S. Schrodi, M. Argus, T. Brox, Latent difusion counterfactual explanations, arXiv
preprint arXiv:2310.06668 (2023).
[38] R. Rombach, A. Blattmann, D. Lorenz, P. Esser, B. Ommer, High-resolution image synthesis
with latent difusion models, in: Proceedings of the IEEE/CVF conference on computer
vision and pattern recognition, 2022, pp. 10684–10695.
[39] J. Wolleb, F. Bieder, R. Sandkühler, P. C. Cattin, Difusion models for medical anomaly
detection, in: International Conference on Medical image computing and
computerassisted intervention, Springer, 2022, pp. 35–45.
[40] P. Sanchez, A. Kascenas, X. Liu, A. Q. O’Neil, S. A. Tsaftaris, What is healthy? generative
counterfactual difusion for lesion localization, in: MICCAI Workshop on Deep Generative
Models, Springer, 2022, pp. 34–44.
[41] A. Fontanella, G. Mair, J. Wardlaw, E. Trucco, A. Storkey, Difusion models for
counterfactual generation and anomaly detection in brain images, arXiv preprint arXiv:2308.02062
(2023).
[42] J. Song, C. Meng, S. Ermon, Denoising difusion implicit models, arXiv preprint
arXiv:2010.02502 (2020).
[43] H. A. Bedel, T. Çukur, Dreamr: Difusion-driven counterfactual explanation for functional
mri, arXiv preprint arXiv:2307.09547 (2023).
[44] R. Gal, Y. Alaluf, Y. Atzmon, O. Patashnik, A. H. Bermano, G. Chechik, D. Cohen-Or, An
image is worth one word: Personalizing text-to-image generation using textual inversion,
arXiv preprint arXiv:2208.01618 (2022).
[45] J. Vendrow, S. Jain, L. Engstrom, A. Madry, Dataset interfaces: Diagnosing model failures
using controllable counterfactual generation, arXiv preprint arXiv:2302.07865 (2023).
[46] G. Jeanneret, L. Simon, F. Jurie, Text-to-image models for counterfactual explanations: a
black-box approach, arXiv preprint arXiv:2309.07944 (2023).
[47] D. Kirilenko, V. Vorobyov, A. Kovalev, A. Panov, Object-centric learning with slot mixture
module, in: The Twelfth International Conference on Learning Representations, 2023.
[48] M. R. Arefin, Y. Zhang, A. Baratin, F. Locatello, I. Rish, D. Liu, K. Kawaguchi, Unsupervised
concept discovery mitigates spurious correlations, arXiv preprint arXiv:2402.13368 (2024).
[49] R. Anderson, The rashomon efect and communication, Canadian Journal of
Communication 41 (2016) 249–270.
[50] D. Slack, A. Hilgard, H. Lakkaraju, S. Singh, Counterfactual explanations can be
manipulated, Advances in neural information processing systems 34 (2021) 62–75.
[51] E. Delaney, A. Pakrashi, D. Greene, M. T. Keane, Counterfactual explanations for
misclassified images: How human and machine explanations difer, Artificial Intelligence 324
(2023) 103995.
[52] Y. Rong, T. Leemann, T.-T. Nguyen, L. Fiedler, P. Qian, V. Unhelkar, T. Seidel, G. Kasneci,
E. Kasneci, Towards human-centered explainable ai: A survey of user studies for model
explanations, IEEE Transactions on Pattern Analysis &amp; Machine Intelligence (2023) 1–20.
[53] S. Sankaranarayanan, T. Hartvigsen, L. Oakden-Rayner, M. Ghassemi, P. Isola, Real world
relevance of generative counterfactual explanations, in: Workshop on Trustworthy and
Socially Responsible Machine Learning, NeurIPS 2022, 2022.
[54] C. Wu, S. Yin, W. Qi, X. Wang, Z. Tang, N. Duan, Visual chatgpt: Talking, drawing and
editing with visual foundation models, arXiv preprint arXiv:2303.04671 (2023).
[55] T. Miller, Explainable ai is dead, long live explainable ai! hypothesis-driven decision
support using evaluative ai, in: Proceedings of the 2023 ACM Conference on Fairness,
Accountability, and Transparency, 2023, pp. 333–342.
[56] D. Slack, S. Krishna, H. Lakkaraju, S. Singh, Explaining machine learning models with
interactive natural language conversations using talktomodel, Nature Machine Intelligence
5 (2023) 873–883.
[57] S. Aryal, M. T. Keane, Even if explanations: Prior work, desiderata &amp; benchmarks for
semi-factual xai, arXiv preprint arXiv:2301.11970 (2023).</p>
    </sec>
    <sec id="sec-9">
      <title>A. VAE Background</title>
      <p>
        Introduced by [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], VAE has become a significant tool in deep learning for training latent variable
models through variational inference. A VAE comprises an encoder and a decoder, with its
primary objective typically being the minimization of the reconstruction error of data samples.
The encoder, parameterized by trainable parameters , approximates a variational distribution
(|), where  is a latent variable with a prior distribution  (), often chosen as the standard
Gaussian distribution  (0,  ). The decoder models  (|). VAEs are trained by maximizing
the Evidence Lower Bound (ELBO), a variational lower bound of the exact log-likelihood:
log  () ≥
      </p>
      <p>ELBO = E(|)[log  (|)] − ︀( (|) ‖  ())︀ ,
(2)
where  is Kullback–Leibler divergence.</p>
    </sec>
    <sec id="sec-10">
      <title>B. GAN Background</title>
      <p>
        (3)
(4)
In contrast to VAEs, GANs [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] operate on a diferent principle, as they do not explicitly learn
the likelihood of data samples. Instead, GANs employ two neural networks: a generator  and
a discriminator . The generator  maps input noise, sampled from a distribution () =
 (0,  ), to the data space with the objective of learning the distribution of the generator  on
the data samples. The discriminator , on the other hand, aims to estimate the probability that a
given data sample  originated from the actual data distribution data. The training of  involves
distinguishing real samples drawn from data and generated samples from . Concurrently,  is
trained to maximize the probability of its generated samples being misclassified by , efectively
minimizing log(1 − (())). This training process sets up a minimax zero-sum game between
 and , where each network continuously improves its performance in response to the other,
leading to the generation of increasingly realistic samples:
min max
      </p>
      <p>E∼ data()[log ()] + E∼ ()[log(1 − (()))] .
Compared to VAEs, GANs sample the latent code  from the same prior distribution during
both training and inference, and are capable of generating more complex and high-fidelity data
samples [29].</p>
    </sec>
    <sec id="sec-11">
      <title>C. DDPM</title>
    </sec>
    <sec id="sec-12">
      <title>Background</title>
      <p>
        In recent years, Denoising Difusion Probabilistic Models (DDPMs) have solidified their position
as a leading framework in generative modeling [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. The central component of DDPMs is an
iterative process where noise is incrementally added to a data sample 0, transforming it into a
complete noise sample  ∼ ( ) =  (0,  ):
(|− 1) =  (√︀1
−  − 1,   ),
where { 1, . . . ,   } is a predefined variance schedule. This is coupled with a learned reversal
of this process:
(5)
(6)
(7)
(8)
(9)
This procedure results in training a so-called score function ∇ log (). The resultant sampling
procedure is executed by initial sampling of  ∼  (0,  ), followed by a sequence of − 1 ∼
 (− 1|) until the final 0 ∼  (0|1). This technique, involving multiple trainable
denoising iterations, empowers DDPMs to produce outputs that are both highly detailed and
diverse, setting them apart in the generative modeling arena.
      </p>
      <p>
        A notable feature of DDPMs, with significant potential for counterfactual generation, is the
development of a classifier guidance mechanism [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. This approach diverges from traditional
generative models, which typically necessitate explicit conditioning during the training phase.
Guided difusion introduces a paradigm where an unconditional generative model is first trained.
Subsequently, this model is adapted for conditional sampling via the integration of an auxiliary
classifier model by replacing  (− 1|) with
      </p>
      <p>(− 1|) =  ︀(   (, ), Σ (, ))︀ ,
where   (, ) and Σ (, ) are predicted by models parameterized with learnable parameters
 . Since DDPMs, similar to VAEs, represent latent variable models, they are trained by optimizing
the following variational lower bound:</p>
      <p>, (− 1|, ) ∝  (− 1|)(|, ),
where (|, ) is an external model to be explained. This strategy provides remarkable
lfexibility in conditional generation but introduces a pivotal challenge: the classifier must be
either trained on noise-augmented samples to align with the DDPM’s intermediate stages, or a
denoising mechanism should be applied prior to classifier usage.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>S.</given-names>
            <surname>Wachter</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Mittelstadt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Russell</surname>
          </string-name>
          ,
          <article-title>Counterfactual explanations without opening the black box: Automated decisions and the gdpr</article-title>
          ,
          <source>Harv. JL &amp; Tech. 31</source>
          (
          <year>2017</year>
          )
          <fpage>841</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>R.</given-names>
            <surname>Hofman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Miller</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. T.</given-names>
            <surname>Mueller</surname>
          </string-name>
          , G. Klein,
          <string-name>
            <given-names>W. J.</given-names>
            <surname>Clancey</surname>
          </string-name>
          ,
          <article-title>Explaining explanation, part 4: a deep dive on deep nets</article-title>
          ,
          <source>IEEE Intelligent Systems</source>
          <volume>33</volume>
          (
          <year>2018</year>
          )
          <fpage>87</fpage>
          -
          <lpage>95</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>D. P.</given-names>
            <surname>Kingma</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Welling</surname>
          </string-name>
          ,
          <article-title>Auto-encoding variational bayes</article-title>
          ,
          <source>arXiv preprint arXiv:1312.6114</source>
          (
          <year>2013</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>I.</given-names>
            <surname>Goodfellow</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Pouget-Abadie</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Mirza</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Warde-Farley</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Ozair</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Courville</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Bengio</surname>
          </string-name>
          , Generative adversarial nets,
          <source>Advances in neural information processing systems</source>
          <volume>27</volume>
          (
          <year>2014</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>P.</given-names>
            <surname>Dhariwal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Nichol</surname>
          </string-name>
          ,
          <article-title>Difusion models beat gans on image synthesis</article-title>
          ,
          <source>Advances in neural information processing systems</source>
          <volume>34</volume>
          (
          <year>2021</year>
          )
          <fpage>8780</fpage>
          -
          <lpage>8794</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>S.</given-names>
            <surname>Verma</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Boonsanong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Hoang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K. E.</given-names>
            <surname>Hines</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. P.</given-names>
            <surname>Dickerson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Shah</surname>
          </string-name>
          ,
          <article-title>Counterfactual explanations and algorithmic recourses for machine learning: A review</article-title>
          , arXiv preprint arXiv:
          <year>2010</year>
          .
          <volume>10596</volume>
          (
          <year>2020</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>I.</given-names>
            <surname>Stepin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. M.</given-names>
            <surname>Alonso</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Catala</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Pereira-Fariña</surname>
          </string-name>
          ,
          <article-title>A survey of contrastive and counterfactual explanation generation methods for explainable artificial intelligence</article-title>
          ,
          <source>IEEE Access 9</source>
          (
          <year>2021</year>
          )
          <fpage>11974</fpage>
          -
          <lpage>12001</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>R.</given-names>
            <surname>Guidotti</surname>
          </string-name>
          ,
          <article-title>Counterfactual explanations and how to find them: literature review and benchmarking, Data Mining and Knowledge Discovery (</article-title>
          <year>2022</year>
          )
          <fpage>1</fpage>
          -
          <lpage>55</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>A.</given-names>
            <surname>Komanduri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <article-title>From identifiable causal representations to controllable counterfactual generation: A survey on causal generative modeling</article-title>
          ,
          <source>arXiv preprint arXiv:2310.11011</source>
          (
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>M.</given-names>
            <surname>Pawelczyk</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Agarwal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Joshi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Upadhyay</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Lakkaraju</surname>
          </string-name>
          ,
          <article-title>Exploring counterfactual explanations through the lens of adversarial examples: A theoretical and empirical analysis</article-title>
          ,
          <source>in: International Conference on Artificial Intelligence and Statistics</source>
          , PMLR,
          <year>2022</year>
          , pp.
          <fpage>4574</fpage>
          -
          <lpage>4594</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>T.</given-names>
            <surname>Freiesleben</surname>
          </string-name>
          ,
          <article-title>The intriguing relation between counterfactual explanations and adversarial examples</article-title>
          ,
          <source>Minds and Machines</source>
          <volume>32</volume>
          (
          <year>2022</year>
          )
          <fpage>77</fpage>
          -
          <lpage>109</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>M.</given-names>
            <surname>Pawelczyk</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Broelemann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Kasneci</surname>
          </string-name>
          ,
          <article-title>Learning model-agnostic counterfactual explanations for tabular data</article-title>
          ,
          <source>in: Proceedings of the web conference</source>
          <year>2020</year>
          ,
          <year>2020</year>
          , pp.
          <fpage>3126</fpage>
          -
          <lpage>3132</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>P.</given-names>
            <surname>Rodríguez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Caccia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Lacoste</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Zamparo</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Laradji</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Charlin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Vazquez</surname>
          </string-name>
          ,
          <article-title>Beyond trivial counterfactual explanations with diverse valuable explanations</article-title>
          ,
          <source>in: Proceedings of the IEEE/CVF International Conference on Computer Vision</source>
          ,
          <year>2021</year>
          , pp.
          <fpage>1056</fpage>
          -
          <lpage>1065</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>J.</given-names>
            <surname>Ma</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Guo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Mishra</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <article-title>Clear: Generative counterfactual explanations on graphs</article-title>
          ,
          <source>Advances in Neural Information Processing Systems</source>
          <volume>35</volume>
          (
          <year>2022</year>
          )
          <fpage>25895</fpage>
          -
          <lpage>25907</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>W.</given-names>
            <surname>Todo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Selmani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Laurent</surname>
          </string-name>
          ,
          <string-name>
            <surname>J.-M. Loubes</surname>
          </string-name>
          ,
          <article-title>Counterfactual explanation for multivariate times series using a contrastive variational autoencoder</article-title>
          ,
          <source>in: ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)</source>
          , IEEE,
          <year>2023</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>5</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>L.</given-names>
            <surname>Mescheder</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Geiger</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Nowozin</surname>
          </string-name>
          ,
          <article-title>Which training methods for gans do actually converge?</article-title>
          ,
          <source>in: International conference on machine learning, PMLR</source>
          ,
          <year>2018</year>
          , pp.
          <fpage>3481</fpage>
          -
          <lpage>3490</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>K.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Tang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Zhou</surname>
          </string-name>
          , G. Qiu,
          <article-title>Spectral regularization for combating mode collapse in</article-title>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>