<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Counterfactual generating Variational Autoencoder for Anomaly Detection</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Renate Ernst</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Fraunhofer Institute for Industrial Mathematics (ITWM)</institution>
          ,
          <addr-line>Fraunhofer-Platz 1, 67663, Kaiserslautern</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>University of Kaiserslautern-Landau (RPTU)</institution>
          ,
          <addr-line>Gottlieb-Daimler-Straße, 67663, Kaiserslautern</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Machine learning applications in fields such as financial accounting or the healthcare industry have to meet high transparency requirements for user acceptance and to meet the growing number of regulatory standards. Counterfactual explanations as a rather easy to interpret concept of local explanations combined with the generative power of Variational Autoencoder (VAE) and their ability to learn distributions of latent representations can ofer information to fulfill the needs of machine learning experts and non-expert users at the same time. Most current studies leveraging the power of deep generative models for counterfactual generation focus on vision data. We focus on anomaly detection applications on real world tabular data in the two high-risk fields of financial accounting and healthcare. We give an overview on constructions of counterfactual explanations and a categorization of current approaches to produce counterfactual explanations. We are investigating supervised extensions of the VAE for simultaneous classification and counterfactual generation. Therefor we explore the connection between diferent approaches of probabilistic modelling and separability properties in latent space. We discuss their applicability to anomaly detection and evaluation criteria.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;explainable AI</kwd>
        <kwd>variational autoencoder</kwd>
        <kwd>counterfactual explanation</kwd>
        <kwd>anomaly detection</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>Generative neural networks as a special form of deep learning have changed the way of data
driven application in many fields, such as computer vision, robotics, and natural language
processing. Specifically medical and financial industries ML-based approaches for decision
support have to meet high requirements of transparency. Explainability methods are constructed
to help the user of ml-methods to interpret the results of black-box models. In this PhD project
we focus on counterfactual explanations as an local explanability approach and investigate, how
this concept can be used to answer the users question: “What Should I have done diferently to
change the outcome of the model prediction?” We investigate how counterfactual explanations
can be integrated into the VAE method to generate exogenous counterfactuals in the context of
anomaly detection.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related work</title>
      <p>
        The VAE was introduced by [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. It is based on a directed causal model using Bayes statistic to
learn the parameters of the distribution of a latent variable. It can be used to detect anomalies
in an unsupervised setting, by training the model on normal data and using the reconstruction
error as anomaly score [2]. The concept of counterfactual explanations was formulated as an
mathematical optimization problem for ML-models by [3]. There is no consensus in the scientific
literature on the taxonomy of explainability. Based on [4], we will interpret counterfactuals as
local instance explanations as an post hoc approach and a way of improving interpretability of
an ML-model as an ante hoc approach. The VAE as post hoc approach is visualized in figure 1
by the green lines, and as an ante hoc approach by the pink line.
      </p>
      <sec id="sec-2-1">
        <title>Decision maker cognition, knowledge and biases</title>
      </sec>
      <sec id="sec-2-2">
        <title>Learning</title>
      </sec>
      <sec id="sec-2-3">
        <title>Algorithm</title>
      </sec>
      <sec id="sec-2-4">
        <title>Model h</title>
      </sec>
      <sec id="sec-2-5">
        <title>Prediction p</title>
      </sec>
      <sec id="sec-2-6">
        <title>Explainer r</title>
      </sec>
      <sec id="sec-2-7">
        <title>Explanation e</title>
      </sec>
      <sec id="sec-2-8">
        <title>Interpretation i</title>
      </sec>
      <sec id="sec-2-9">
        <title>Training data</title>
      </sec>
      <sec id="sec-2-10">
        <title>Test data</title>
        <p>An explanation is called interpretable, when it describes the internals of a system in a way
that is understandable to humans. The success of this goal is tied to the cognition, knowledge,
and biases of a decision maker. For an explanation to be interpretable, it must give descriptions
that are simple enough for a decision maker to understand using a vocabulary that is meaningful
to the them. Note, that we will not measure the interpretability of a model. But we measure,
how well the construction goals of the counterfactual are met by the approach used.</p>
        <p>An overview over current approaches for calculating counterfactual explanations is given in
[5]. The approaches can be categorized into model agnostic and model specific approaches. We
will look at concepts specific to VAE.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Research goals</title>
      <p>This project aims to achieve the following goals:
1. get an overview on counterfactual construction objectives (CE-objectives) and categorize,
how they can be integrated into a VAE training procedure,
2. develop metrics to evaluate, how good the CE-objectives can be achieved,
3. develop the best way to integrated the CE-objectives into the VAE-objective specifically
for anomaly detection and
4. apply our method(s) on real world anomaly detection use cases from the healthcare and
ifnancial sector.</p>
    </sec>
    <sec id="sec-4">
      <title>4. Methods</title>
      <sec id="sec-4-1">
        <title>4.1. Counterfactual Explanations</title>
        <p>Counterfactual explanations (CE) have a long history in philosophy, but [3] first mathematically
formulated the counterfactual explanation as solution of an optimization problem for machine
learning.</p>
        <p>Definition 4.1 (Counterfactual explanation). Given a classifier  that outputs the decision
 = () for an instance , a counterfactual explanation consists of an instance ′ such that the
decision for  on  is diferent from , i.e., () ̸=  and such that the diference between  and ′
is minimal.</p>
        <p>Since the first formulation, the requirements on this concept have evolved to meet practical
considerations. In [5] these CE-objectives are formulated as:
1. Validity: A counterfactual ′ should actually changes the classification outcome.
2. Proximity: Given a distance function  in the domain of , the distance between  and
′ should be as small as possible.
3. Minimality: There should not be any other valid counterfactual example ′′ such that
the number of diferent attribute value pairs between  and ′ is higher than the number
of diferent attribute value pairs between  and ′′.
4. Plausibility: The counterfactual ′ should not be labeled as an outlier with respect to
the instances in .
5. Diversity: Let  = {′1, . . . ′} be a set of  (valid) CE for the instance . The CE 
should be formed by diverse CE, i.e., while every CE ′ ∈  should be minimal and close
to , the diference among all the CE in  should be maximized.
6. Actionability: A CE ′ is actionable if all the diferences between  and ′ refers only
to actionable (mutable) features. This requirement links the concept of counterfactual
explanation to the concept of algorithmic recourse.
7. Causality: Let  be a directed acyclic graph (DAG) where every node models a feature
and there is a directed edge from  to  if  contributes in causing . The DAG  describes
the known causalities among features. Thus, given a DAG , a counterfactual ′ respects
the causalities in  iif ∀′ = (, ) ∈  ,′ such that the node  in  has at least an
incoming/outcoming edge, the value  maintains any known causal relation between 
and the values 1 , . . . ,  , where the features 1, . . . ,  identifies the nodes connected
with  in .</p>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. Variational autoencoder</title>
        <p>
          The VAE was introduced in [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ] as following. Consider dataset  = {() 
}=1 consisting of 
i.i.d. samples of some continuous or discrete variable . We assume, the data is generated by
some random process, involving an unobserved continuous random variable . The process
consists of two steps:
• a value () is generated from some prior distribution  * ().
        </p>
        <p>• a value () is generated from some conditional distribution  * (|).</p>
        <p>We assume the prior  * () and likelihood  * (|) come from parametric families of
distributions  () and  (|), and that their probability density functions (PDF(s)) are diferentiable
almost everywhere w.r.t. both  and . The true parameters  * as well as the values of the
latent variables () are unknown. Where the integral of the marginal likelihood  () =
∫︀  () (|) is intractable, the true posterior density  (|) =  (|) ()/ () is
intractable. Let us introduce a surrogate (|) ≈
 (|) to approximate the intractable true
posterior. We refer to (|) as probabilistic encoder and  (|) as probabilistic decoder.</p>
        <p>The VAE-objective aims to maximize the so called evidence lower bound (ELBO).
ℒ(,  ; ) = − ((|)|| ()) + E∼ (|) [log  (|)] ,
(1)
where  is the Kullback-Leibler-divergence (KLD).</p>
        <p>
          By using a neural network architecture for encoding and decoding and neural network
optimization (e.g. adam) [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ] use the power of neural network optimization to autoencode
variational bayes.
        </p>
        <p>1 1
2 2
3 3



∼  (0, )
1
2
 =  +   
1
2
3
1
2
3
input layer
parameter layer
hidden layer
output layer</p>
      </sec>
      <sec id="sec-4-3">
        <title>4.3. Integration of CE-objectives into the VAE training</title>
        <p>In the current literature we can see modifications to the VAE architecture, to produce CE. There
are modifications used to increase the interpretability of the network architecture. For example
the use of invertable mappings form feature space to latent space (see [6] or [7]) or to encode
a hierarchical latent sequence () to capture more complex causal structures in latent space
(see [8]). There are modifications to the VAE-objective to include CE-objectives. This can be
included by regularization (see [7] or by conditioning the prior distribution to control the CE
candidate generation(see [9]). If the model is not explicitly trained to generate counterfactuals,
the training objective aims to separate the diferent classes and use some perturbation (see [ 10]),
projection (see [6]), interpolation (see [8]) or sampling technique (see [11]) to generate CEs.
Due to reference space limitations, the overview of current literature is not comprehensive, but
all current methods can be categorised into one of the following tree categories:
• Include the CE-objectives in the VAE-objective and train the model to produce a CE.
• Train the VAE to separate the classes. Use perturbation or projection technique to produce
valid candidate(s). Select the candidate optimal under the CE-objectives.
• Train diferent models on partitioned data. Use interpolation or sampling technique to
produce valid candidates. Select the candidate optimal under the CE-objectives.
Since the methods are designed for diferent data types, use data specific model modifications
and try to meet diferent CE-objectives, a comprehensive comparison using the literature so far
is not possible. None of the methods is designed or evaluated for anomaly detection use-cases
with rare anomaly data.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Preliminary results</title>
      <p>After reviewing and grouping current approaches on counterfactual generation in VAE, we
develop ideas on how to integrate the CE-objectives specifically for the anomaly detection
usecase. Currently we are investigating the complementary supervised VAE approach introduced
in [12]. We want to investigate it’s ability to separate anomalies from normal data and using
it, to generate CE candidates. The concept in [12] is to train the VAE with normal prior and a
loss function with so called standard KLD on a dataset with normal data. For training on seen
anomaly data a complementary prior and corresponding KLD is chosen. The complementary set
VAE approach follows the assumptions, that anomalies are regarded as the complementary set
of the normal set and the normal region and the anomalous region are both mutually exclusive
and collectively exhaustive. With defining () to be the PDF for normal data [12] construct
() to be PDF of the anomalous data. It is constructed to satisfy the relationship
() =
1</p>
      <p>(max (′) − ())
′ ′
where ′ is a norming constant such that (· ) is PDF. This construction satisfies the property
of the complementary set, but ′ is infinity because the mass explodes. To ensure () is a
PDF, we multiply () that is wide enough for each dimension. Then the density function is
where  is a finite normalizing constant</p>
      <p>1
() =  ()(max (′) − ())</p>
      <p>′
⏟
 =
∫︁ ∞
−∞</p>
      <p>⏞
=:*()
*().</p>
      <p>Using this as a prior, [12] expand the conventional unsupervised VAE into a supervised one
to distinguish anomalies in the latent space. We choose the Standard Gaussian distribution
(2)
(3)
(4)
and
max (; 0, 1) = √2
′</p>
      <p>=: 
1
 =
∫︁ ∞
−∞
(; 0, 2) · ( − (; 0, 1)) =  1</p>
      <p>−
︃(
√︂</p>
      <p>1
2 + 1
)︃
.</p>
      <p>The parameter 2 determines the width of the distribution. The multi-dimensional version
is derived as a product of each dimension composed of the one-dimensional version. The
function (; 0, 2) and equation (3)
as a prior for the representation of normal samples  ∼ 
(; 0, 1) with PDF (; 0, 1).</p>
      <p>We construct the one-dimensional PDF () of the representation for anormal samples 
using the prior for the normal data representation (; 0, 1), the bounding Gaussian density
(; 2) =
1
where the constants in this equation are described as
(5)
(6)
(7)
(8)
complementary KLD can be approximately calculated as
((|; )||()) ≃
√︂
−</p>
      <p>2
 2 + 1
exp
︂(</p>
      <p>−  2 )︂
2( 2 + 1)
+
 2 +  2</p>
      <p>22
log(2 + 1)
− log  + log  + log(√︀2 + 1 − 1)
(a) Standard KLD
(b) Complementary KLD for s = 5
This VAE is trained alternating on a batch of normal data with standard prior and KLD and a
batch of anormal data with complementary prior and complementary KLD. It is based on the
idea, that both models share parameters. We want to investigate this training setup, because it
can be used to detect also unseen anomalies. We have implemented the training procedure and
want test it on synthetic data with given distributions, to validate this training setup.</p>
    </sec>
    <sec id="sec-6">
      <title>6. Next research steps and expected final contribution</title>
      <p>We aim to give an overview and discuss diferent evaluation metrics for current approaches
and for our model extensions. In next research steps we evaluate the training procedure for
complementary set VAE, produce and evaluate CE-candidates and develop and evaluate diferent
approaches to integrate CE-objectives in the VAE-objective. Specifically we aim to investigate
modifications in conditioning the prior distribution and the causal model of the VAE, investigate
modifications in optimization, e.g multi-criteria optimization, investigate diferent training
settings and develop and discuss an evaluation scheme for synthetically generated data and for
real world use case data.
[2] Jinwon An, Sungzoon Cho, Variational autoencoder based anomaly detection using
reconstruction probability, 2015. URL: http://dm.snu.ac.kr/static/docs/tr/snudm-tr-2015-03.pdf.
[3] S. Wachter, B. Mittelstadt, C. Russell, Counterfactual explanations without opening the
black box: Automated decisions and the gdpr, SSRN Electronic Journal (2017). doi:10.
2139/ssrn.3063289.
[4] N. Burkart, M. F. Huber, A survey on the explainability of supervised machine learning,
Journal of Artificial Intelligence Research 70 (2021) 245–317. URL: https://www.jair.org/
index.php/jair/article/view/12228. doi:10.1613/jair.1.12228.
[5] R. Guidotti, Counterfactual explanations and how to find them: literature review and
benchmarking, Data Mining and Knowledge Discovery (2022) 1–55. doi:10.1007/
s10618-022-00831-6.
[6] W. Zhang, B. Barr, J. Paisley, An interpretable deep classifier for counterfactual generation,
in: D. Magazzeni, S. Kumar, R. Savani, R. Xu, C. Ventre, B. Horvath, R. Hu, T. Balch,
F. Toni, S. T. Kumar (Eds.), Proceedings of the 3rd ACM International Conference on AI in
Finance (ICAIF’22), Association for Computing Machinery, New York, NY, 2022, pp. 36–43.
doi:10.1145/3533271.3561722.
[7] Deep structural causal models for tractable counterfactual inference, 2020.
[8] B. Barr, M. R. Harrington, S. Sharpe, C. B. Bruss, Counterfactual explanations via
latent space projection and interpolation, 2021. URL: https://arxiv.org/abs/2112.00890.
arXiv:2112.00890.
[9] M. Pawelczyk, K. Broelemann, G. Kasneci, Learning model-agnostic counterfactual
explanations for tabular data, in: Proceedings of The Web Conference 2020, WWW ’20,
Association for Computing Machinery, New York, NY, USA, 2020, p. 3126–3132. URL:
https://doi.org/10.1145/3366423.3380087. doi:10.1145/3366423.3380087.
[10] R. Balasubramanian, S. Sharpe, B. Barr, J. Wittenbach, C. B. Bruss, Latent-cf: A simple
baseline for reverse counterfactual explanations, 2021. URL: https://arxiv.org/abs/2012.
09301. arXiv:2012.09301.
[11] X. Xiang, A. Lenskiy, Realistic counterfactual explanations with learned relations, 2022.</p>
      <p>URL: https://arxiv.org/abs/2202.07356. arXiv:2202.07356.
[12] Y. Kawachi, Y. Koizumi, N. Harada, Complementary set variational autoencoder for
supervised anomaly detection, in: 2018 IEEE International Conference on Acoustics,
Speech and Signal Processing (ICASSP), 2018, pp. 2366–2370.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>D. P.</given-names>
            <surname>Kingma</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Welling</surname>
          </string-name>
          , Auto-encoding variational bayes,
          <year>2022</year>
          . URL: https://arxiv.org/ abs/1312.6114. arXiv:
          <volume>1312</volume>
          .
          <fpage>6114</fpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>