=Paper=
{{Paper
|id=Vol-3087/paper_32
|storemode=property
|title=Quantifying the Importance of Latent Features in Neural Networks
|pdfUrl=https://ceur-ws.org/Vol-3087/paper_32.pdf
|volume=Vol-3087
|authors=Amany Alshareef,Nicolas Berthier,Sven Schewe,Xiaowei Huang
|dblpUrl=https://dblp.org/rec/conf/aaai/AlshareefBSH22
}}
==Quantifying the Importance of Latent Features in Neural Networks==
<pdf width="1500px">https://ceur-ws.org/Vol-3087/paper_32.pdf</pdf>
<pre>
             Quantifying the Importance of Latent Features in Neural Networks

                          Amany Alshareef, Nicolas Berthier, Sven Schewe, Xiaowei Huang
                                                  Department of Computer Science
                                                    University of Liverpool, UK
                          {amany.alshareef, nicolas.berthier, sven.schewe, xiaowei.huang}@liverpool.ac.uk


                             Abstract                                    gence (XAI) (Miller 2019), semantic-level robustness (Hamdi and
                                                                         Ghanem 2020; Xu et al. 2021), and exhibition of internal working
  The susceptibility of deep learning models to adversarial              mechanism through test cases (Huang et al. 2021).
  examples raises serious concerns over their application in                We build on the work of Berthier et al. (2021), which elicited
  safety-critical contexts. In particular, the level of understand-      semantic assumptions by advancing an approach that relies on a
  ing of the underlying decision processes often lies far below          Bayesian Network (BN) abstraction to examine whether latent fea-
  what can reasonably be accepted for standard safety assur-             tures are adequately exercised by a set of inputs. The Bayesian
  ance. In this work, we provide insights into the high-level            view of statistics treats the latent parameters as random variables
  representations learned by neural network models. We specif-           and seeks to learn a distribution of these parameters conditional on
  ically investigate how the distribution of features in their la-       what is observed in the training data.
  tent space changes in the presence of distortions. To achieve
  this, we first abstract a given neural network model into a            Contribution. Our key contribution is to propose a method that
  Bayesian Network, where each random variable represents                estimates the importance of a neural network’s latent features by
  the value of a hidden feature. We then estimate the importance         analysing an associated Bayesian network’s sensitivity to distribu-
  of each feature by analysing the sensitivity of the abstrac-           tional shifts. This allows us to define semantic testing metrics and
  tion to targeted perturbations. An importance value indicates          to identify distributional shifts in the feature space through the ef-
  the role of the corresponding feature in underlying decision           fect they have on the random variables in BNs.
  process. Our empirical results suggest that obtained feature               This provides us with a separation of concern: we can study the
  importance measures provide valuable insights for validating           effect of a distributional shift in the latent feature space, which is
  and explaining neural network decisions.                               typically low-dimensional, independent of potential shifts in the
                                                                         input distributions. This provides insight on the semantic mecha-
  Keywords— Neural network latent representation, Bayesian net-          nisms of decision making in the DNN as well as information for
work, Feature importance, Sensitivity analysis.                          testing the sensitivity of features to distributional shifts.
                                                                             A weighted scoring model is commonly applied in statistics
                      1     Introduction                                 when certain selected criteria are assigned more importance than
When neural networks are used in critical applications, the reli-        others. The feature importance (FI) describes how much each
ability of their decision making becomes a major concern. Vari-          feature influences the classifier’s decision, and thus indicates the
ous techniques have been developed to verify, falsify, enhance, or       importance of the feature for the classification. This is to be con-
explain the neural networks, see (Huang et al. 2020) for a recent        tracted with the existing notion of feature importance in explana-
survey. In this paper, we focus on gaining insight on the decision       tion models, which assigns the importance value to the features
mechanisms based on the main features of the network.                    that belong to the input space, e.g., age, sex, education. Instead,
   Deep neuronal networks (DNNs) learn their decision rule               we investigate the learning models’ latent feature space and exam-
through training on a large dataset by gradually optimising parame-      ine how much their deep representation relies on a specific hidden
ters until they achieve the required accuracy. Therefore, they do not    feature to change their prediction.
have a specific control-flow structure, which makes it difficult to          We seek to evaluate the learning models’ semantic robustness
precisely define suitable test criteria. Neuron activation (Pei et al.   by developing a weight-based test metric that utilises the Bayesian
2017) and other structural coverage techniques, such as MC/DC            Network model from Berthier et al. (2021). However, instead of di-
(Sun et al. 2019), that are defined based on the syntactic model         rectly using the extracted hidden features to measure some test cov-
components have proven to be less effective in validating the safety     erage metric, we first compute a weight value, wi , for the i-th latent
behaviour of the intelligent systems (Sun et al. 2018a). This paper      feature, by analysing the BN’s sensitivity to a controlled noise ap-
analyses the internal representation of a DNN built from the train-      plied to this feature. In this paper, we develop several analyses that
ing dataset, together with the training data itself, toward defining a   rely on the BN abstraction to estimate the relative impacts and sen-
testing approach that uses semantic aspects.                             sitivity of the latent features. For example, we measure the relative
   This is a contribution to a recent trend to exploring the internal    impact one feature has on another for all feature pairs by estimat-
logic of the learning model, such as eXplainable Artificial Intelli-     ing how a controlled noise impacts the BN’s probability distribu-
                                                                         tions. As an alternative approach, we also estimate the sensitiv-
Copyright © 2022, for this paper by its authors. Use permitted un-       ity to a given latent feature by comparing the probability distribu-
der Creative Commons License Attribution 4.0 International (CC           tions of training samples before and after the feature has been per-
BY 4.0).                                                                 turbed. Figure 1 outlines the proposed feature sensitivity analysis
       Figure 1: Illustration of the proposed BN analysis technique to compute the sensitivity of extracted latent features.


approach.                                                                    for the classification of regular inputs in the standard setting. They
    This allows us to monitor the behaviour of a DNN via its as-             further argue that ML models are highly vulnerable to adversarial
sociated BN. The structure of this BN is built based on a param-             examples due to the presence of these useful non-robust features.
eterisable abstraction scheme that defines a series of DNN layers            This was further emphasised by Madaan, Shin, and Hwang (2020),
to consider (conv2d, dense and dense 1 in the Figure), a feature             who showed that the factor causing the adversarial vulnerability is
extraction technique to identify a given number of latent features           the distortion in the latent feature space. Going beyond these works
for each one of these layers (2 in the example), and a discretisation        which justify the needs of considering features (instead of pixels or
strategy that determines the granularity at which values of latent           neurons), we study the causality relation between features by con-
features are aggregated into distinct intervals of indistinguishable         structing a formal model – Bayesian network.
values. Combined with the feed-forward nature of the DNNs we
                                                                             Bayesian Networks (BNs) and Neural Networks (NNs).
consider, this scheme allows us to derive the structure of a BN, as
                                                                             Current trends towards using Bayesian modelling to solve chal-
shown in Figure 1.
                                                                             lenging issues of neural networks have seen a growing recogni-
    In addition, this scheme provides us with a discretisation
                                                                             tion of the vital links between them. Daxberger et al. (2021) de-
function Discr] , that transforms a set of inputs X into a low-
                                                                             veloped a framework for scaling Bayesian inference to NNs to be
dimensional, discretised version FX . In the Figure, the vector of in-
                                                                             able to quantify the uncertainty in NN predictions. Furthermore,
puts X is transformed into FX , which associates each input x ∈ X
                                                                             due to the scalability problem that arises when analysing neural
with six feature intervals, one for each latent feature represented by
                                                                             networks, Berthier et al. (2021) uses a statistical analysis of acti-
the BN. As the BN assigns a probability to an input sample that be-
                                                                             vations at network layers, and abstracts the behaviours of the DNN
longs to the distribution it represents, we can compare the probabil-
                                                                             using a Bayesian Network. They identify hidden features that have
ities of a sample under a given BN before and after a perturbation.
                                                                             been learned by hidden layers of the DNN and associate each fea-
To do so, we conduct the interior analysis on FX by calculating the
                                                                             ture with a node of the BN. Their Bayesian network approximation
probability of each sample under the BN B. After that, we iterate
                                                                             model is therefore defined based on high-level features, rather than
over all considered latent feature f , and shift the associated inter-
                                       0                                     on low-level neurons. These extracted features are minimal seman-
vals in FX to produce a modified FX        w.r.t. the feature f , and cal-
                                         f                                   tic components that can be analysed to understand the behaviour
culate its probability belonging to the BN B distribution. The term          of the feature space and the internal logic of the analysed DNN.
intervals shifting refers to a technique used to artificially simulate       This paper is, based on the BN in Berthier et al. (2021), to consider
a controlled distribution shift by randomly shifting intervals in the        different methods to quantify the importance of latent features.
selected feature space. To identify the impact of a perturbation, we
compute a distance between the original probability vector and the           Bayesian Networks Sensitivity Analysis. Sensitivity anal-
probability vector obtained from the perturbed features.                     ysis in Bayesian networks is concerned with understanding how
                                                                             a small change in local network parameters may affect the global
                                                                             conclusions drawn based on the network (Castillo, Gutiérrez, and
                      2     Related Works                                    Hadi 1997). The key aspect of performing a sensitivity analysis on
Robustness of Neural Networks latent features. Features                      a Bayesian network can be listed as follows:
play an essential role in the field of image processing and classi-            • Quantify the impact of different nodes on a target node;
fication. They are considered as the basic conceptual components
of the semantics of an image. Similar as this paper, there are re-             • Discover important features that have significant influence on
cent studies concentrated on the features space to study the hidden              the classifier decision;
semantic representation of intelligent models. Ilyas et al. (2019)             • Determine sensitive parts of the network that might cause net-
categorised useful features in the input-space into robust and non-              work vulnerability.
robust features. They demonstrate that adversarial perturbation can          However, we cannot directly apply the sensitivity analysis in the
arise from flipping non-robust features in the data that are useful          traditional sense with Bayesian Networks, where the sensitivity is
                                                                               Figure 3: Illustration of probability tables and feature inter-
                                                                               vals with a Bayesian Network node.


                                                                               node in the BN model is associated with either a marginal proba-
                                                                               bility table for hidden features of layer l1 , or a conditional proba-
                                                                               bility table for hidden or output layers. In Figure 2, the conditional
                                                                               probability table for the feature component λ3,1 is defined for each
                                                                               feature interval {(−∞, 3[, [3, +∞)} for layer l3 , w.r.t. each com-
Figure 2: Structure of the Bayesian Network abstraction af-                    bination of the parent feature intervals from previous layer l2 .
ter reducing each h1 , h2 , h3 into two features λi,1 ◦ hi and                 Example 2 Figure 3 illustrates an example node in a BN, which
λi,2 ◦ hi with two intervals each. The conditional probability                 corresponds to the second extracted feature from the first NN layer,
tables are shown for features λ3,1 and λ3,2 .                                  i.e., i = 1, j = 2. The set F1,2  ]                             ]1
                                                                                                                     contains two intervals, f1,2 and
                                                                                 ]2
                                                                               f1,2 , which partition the real line. The node is denoted as a random
                                                                               variable named λ1,2 ◦ h1 , which is associated with a probability
performed by changing the BN parameter from the input space and                table. The probability table is a marginal probability table because
observing how that influences the final decision. Instead, since our           the features on the first layer do not have parent features. The table
analysis targets the latent features in the low-dimensional space,             says that this feature has probability 0.7 to have a value smaller
we measure how sensitive are the BN probability distributions to               than 2 and probability 0.3 to have a value no less than 2.
changes in the values of hidden features. This process gives insight
into how such perturbations impact the inner Bayesian network dis-                The previous demonstrating examples were simple, which illus-
tribution and hence reflects the ground truth of the neural networks’          trated a BN built using each layer of the NN. In practice, we can
behaviour in the presence of adversarial inputs.                               select specific NN layers to be abstracted, as we did in our analysis
   Throughout this work, we use the Bayesian network probability               experiments.
distributions to study the neural networks latent features and anal-           Using Bayesian Network Abstractions. We can fit the set
yse their deep representations.                                                of probability tables in a BN abstraction B by using a training sam-
                                                                               ple X. This process first transforms X by means of the discreti-
                        3    Preliminaries                                     sation function to obtain a vector of elements from the discretised
                                                                               latent feature space FX = Discr] (X). It then updates the probabil-
At the core of our approach lies the proposed BN-based latent fea-             ity tables in B in such a way that the joint probability distribution
ture analysis algorithms. Before that, we introduce the scheme and             it represents fits the distribution of FX .
Bayesian Network that have been used by Berthier et al. (2021) as                  We can query the fitted BN B for the probabilities of the
an explainable abstraction of DNNs’ latent features.                           discretised input sample FX 0 . We denote this query opera-
   Let N be a trained deep neural network with sequential layers               tion Pr(FX 0 ∈ B), and may abuse
L = (l1 , . . . , lK ) and X a training dataset. As an abstract model                                                   this notation by defining
                                                                               Pr(X 0 ∈ B) = Pr Discr] (X 0 ) ∈ B .
of N and X, a Bayesian Network (BN) is a directed acyclic graph
B = (V, E, P ), where V are nodes, E are edges that indicate de-               Perturbation of Latent Features. Our developments in the
pendencies between features in successive layers, and P maps each              next Section rely on the application of a controlled change of a
node in V to a probability table representing the conditional prob-            feature (i, j) in an element Fx of the latent feature space. This
ability of the current feature over its parent features w.r.t. X.              operation simulates a distortion in single targeted component of
                                                                               the latent feature space by substituting its associated interval with
Example 1 Figure 2 gives a simple neural network of 2 hid-
                                                                               an adjacent one. (We assume that each latent feature component
den layers and its Bayesian Network abstraction. hi is a func-
                                                                               is partitioned into at least two intervals). When an interval has
tion that gives the neuron activations at layer li from any given
                                                                               two neighbours, we chose uniformly at random between them. We
input sample, and λi,j is a feature mapping from the set Λi =
                                                                               denote this operation with the function random shift(Fx , i, j),
{λi,j }j∈{1,...,|Λi |} . Each random variable λi,j ◦ hi in the BN rep-                                                          ]k                        ]k−1
                                                                               which replaces the feature interval fi,j             of Fx with either fi,j
resents the j-th component of the value obtained after mapping hi                    ]k+1
into the latent feature space. Since each function λi,j ◦ hi ranges            or fi,j . For instance, assuming two hidden feature components
over a continuous space, the respective feature components—which               extracted from activations at two layers of a NN, each compo-
are the codomains of the λi,j ’s—are discretised into a finite set of          nent being discretised into small-enough intervals, i.e., 10 in-
                                                                                                                ]4     ]7     ]1     ]9
feature intervals.                                                             tervals, random shift((f0,0         , f0,1 , f1,0 , f1,1 ), 1, 0) returns either
                                                                                  ]4    ]7     ]0     ]9         ]4     ]7     ]2     ]9
   Each node in BN abstractions represents an extracted feature,               (f0,0 , f0,1 , f1,0 , f1,1 ) or (f0,0 , f0,1 , f1,0 , f1,1 ).
               ]         ]1            ]m
and we let Fi,j   = {fi,j   , . . . , fi,j }, for the j-th extracted feature
from layer li , be a finite set of m intervals that partition the value               4     BN-based Latent Feature Analysis
range of the feature. We formally define a feature as a pair (i, j),           In this section, we develop several BN-based analysis approaches
where i indexes a layer li in L, and j identifies a component of the           we employ to gain insights on latent features. The first approach
extracted feature space for layer li , i.e., j ∈ {1, . . . , |Λi |}. Each      produces a pairwise comparison matrix that exhibits the relative
                                                                           Algorithm 1: BN-based Feature Sensitivity Analysis
                                                                           Input: Bayesian network B and associated feature mapping
                                                                           & discretisation function Discr] , training dataset X, distance
                                                                           metric dp .
                                                                           Output: Mapping associating a distance measure with each
                                                                           considered latent feature
                                                                            1: Compute the feature intervals w.r.t. X:
                                                                               FX = Discr] (X)
                                                                            2: Compute the reference probabilities of FX w.r.t. B:
                                                                               Pref = Pr(FX ∈ B)
                                                                            3: for each considered feature f = (i, j) do
                                                                            4:    Pf0 = hPr(random shift(Fx , i, j) ∈ B)iFx ∈FX
                                                                                                    
Figure 4: A toy example, with only three intervals for each                 5:    df = dp Pref , Pf0
feature, illustrates the conditional probability table for the
first extracted feature from layer dense 1 before and after                 6: end for
shifting intervals of feature (2, 0) in the dataset used to fit             7: return distances df , for all f
the BN.

                                                                              The feature sensitivity analysis is given in Algorithm 1. This
impact the latent features have on each other. Next, we leverage           procedure receives an input sample X, taken from the training
the BN to estimate the sensitivity of individual features to a con-        dataset, and first performs the feature projection and discretisation
trolled distribution shift. We then describe how the sensitivity anal-     step with Discr] to obtain the associated feature intervals FX . It
ysis technique can be applied to define feature importance based on        then calculates the probability of each element of FX w.r.t. the BN
a generic definition of weights on features. Finally, we formalise a       B; this gives the vector of reference probabilities Pref , that asso-
concrete definition of weights based on our BN-based feature sen-          ciates a probability with each set of abstracted latent features that
sitivity.                                                                  are elicited by each x in X. Then, for each extracted feature f ,
                                                                           a random perturbation is performed in FX via the random shift
4.1    Pairwise Comparison                                                 function introduced in the previous Section. This leads to a second
                                                                                                                                   0
This particular study is to assess the degree to which the extracted       vector, that holds the probabilities of the resulting FX  f
                                                                                                                                       w.r.t. the
features can affect each other by comparing the parallelised Con-          BN B. The given distance dp between these two probability vec-
ditional Probability Tables (CPTs) of a sample, under a BN, with           tors for the perturbed feature f is eventually computed.
the CPTs of the same sample after perturbing the features intervals           We chose to make the feature sensitivity analysis algorithm para-
of the BN. The pairwise comparison method is used to make a re-            metric in the distance metric p for the purposes of easing further
cursive comparison. It begins by extracting a set of inputs X from         experimental use of the FI measure. The considered distances are:
training data, and computing its feature intervals with Discr] . This
produces a sample FX of intervals w.r.t. X. To generate the proba-          • Lp ’s with different norms, typically 1, 2, or ∞;
bility tables, we fit the Bayesian network with FX , which gives the
                                                                            • JS is the Jensen-Shannon distance, that is a metric that mea-
clean reference probability tables CPTs(FX ). Figure 4-(a) shows
                                                                              sures the similarity between two probability distributions based
the CPT for feature (3, 0), which is the first extracted feature from
                                                                              on entropy computations;
the third NN layer, named dense 1 in the BN from Figure 1.
     To extract knowledge about a given feature’s independence and          • corr is the correlation distance;
robustness, we apply a controlled change to the targeted feature
f , by using the random shift operation to shift f ’s intervals             • cos is the cosine distance;
                       0
in FX to obtain FX       . We then re-fit the BN’s probabilities with       • MSE is the mean squared error;
   0                                                          0
FX , which gives the modified probability tables CPTs(FX        ) w.r.t.
the perturbed feature f , examplified in Figure 4-(b). To identify          • RMSE is the root mean squared error;
the impact, we use the mean squared error (MSE) between each
corresponding table in the reference CPTs(FX ) and generated                • MAE is the mean absolute error;
            0
CPTs(FX       ).                                                            • AF is a special purpose anti-fit divergence, which we define
     We illustrate and give an example of pairwise comparison in              based on the coefficient of determination R2 . R2 is a score
Section 5 below.                                                              that is typically used as a “goodness-of-fit” measure for regres-
                                                                              sion models, and we refer to it as scoreR2 . While the maximal
4.2    Feature Sensitivity Analysis                                           score is 1 (indicating a perfect fit), the score decreases with the
The core benefit of relying on a Bayesian Network is to have a                amount of variance in P that is not in Q and can take negative
model that exhibits the relevant theoretical aspects of Bayesian              values. With this we define dAF(P, Q) = 1 − scoreR2(P, Q).
analysis. To estimate the sensitivity of the abstraction scheme on            The rationale of using scoreR2 as a basis for measuring the di-
a given latent feature, we measure the impact of artificially per-            vergence is that we can view the probability vectors for per-
turbing the intervals representing the selected feature on the prob-          turbed features as output by a model. Divergence will be large
ability distribution represented by the BN. In this algorithm, the            when the effect of the perturbation is significant, and small
BN is already fitted using a training dataset, and the distribution it        when the model is not (very) sensitive to the perturbation.
represents does not change.
         distance       dL1      dL2           dL∞        dJS     dcorr    dcos              dMSE                 dRMSE            dMAE          dAF
 perturbed feature
             (1, 0)   150        0.726         0.009 56   0.224    0.142   0.114               0.000 000 879        0.000 937        0.000 249    0.278
             (1, 1)   340        1.18          0.009 89   0.353    0.448   0.361               0.000 002 32         0.001 52         0.000 567    0.735
             (2, 0)   325        1.09          0.009 46   0.365    0.332   0.267               0.000 001 98         0.001 41         0.000 541    0.625
             (2, 1)   360        1.16          0.0103     0.393    0.395   0.323               0.000 002 24         0.001 50         0.000 600    0.710
             (3, 0)   276        0.880         0.008 89   0.258    0.170   0.137               0.000 001 29         0.001 14         0.000 460    0.408
             (3, 1)   315        1.07          0.009 60   0.324    0.318   0.264               0.000 001 92         0.001 39         0.000 525    0.608


                                                     Table 1: Example distance measures.


4.3    Feature Importance                                                         on feature (3, 1) when perturbing feature (2, 1). Although this im-
We associate each extracted feature f with a weight wf based on                   pact is relatively small, we can (as expected) observe the depen-
the set of measured sensitivity distances as follows:                             dencies between latent feature values of the BN model. However,
                                                                                  the perturbations do not change the features’ probability for deeper
                                       edf                                        layers, e.g., features of Layer 3 are not affected by the perturbation
                           wf = P         df
                                                                    (1)           made on features of Layer 1, which is surprising.
                                    f ∈T e

where T is the set of considered latent features. The soft-max                    5.2    Sensitivity Analysis
weighting in Eq. (1) acts as a normalisation function, i.e., it en-               Let us now turn to our empirical assessment of the effectiveness of
sures the sum of the feature components’ weights equals one. The                  the BN sensitivity analysis method in examining the behaviour of
normalised importance weight for each feature is usually positively               the latent features under perturbation.
correlated with the respective probabilities distances.
                                                                                  Datasets and Experimental Setup. We have selected two
Example 3 Table 1 shows selected distance measures computed                       trained CNN models for our experiments: the first one targets the
based on one experiment detailed in the next Section. Assuming the                MNIST classification problem with 99.38% validation accuracy,
dcorr distance is chosen to determine feature importance,                         and the second model targets the CIFAR-10 dataset with 81.00%
   feature (1, 1) is assigned the largest weight at 0.192, followed               validation accuracy. The models are reasonably sized, with more
by feature (2, 1) at 0.182, etc.                                                  than 15 layers including blocks of convolutional and max-pooling
                                                                                  layers, followed by a series of dense layers. They have 312 000 and
The importance weight for an extracted latent feature of a DNN’s                  890 000 trainable parameters, respectively.
layer may reflect some relevant amount of information/variance/                      The Bayesian Network abstraction scheme accepts a wide range
that the abstracted DNN uses at the considered layer. The current                 of feature extraction techniques and discretisation strategies. To
abstraction scheme, however, does not relate latent features with                 explore their impact on our approach, we use a wide set of BN
the DNNs’ decisions. Still, perturbing a specific part of the latent              abstractions. We have selected two linear feature extraction tech-
space and observing the implicit changes of the learning models’                  niques: Principal Component Analysis (PCA) and Independent
distribution contributes to understanding their internal decisions.               Component Analysis (ICA), and one non-linear technique: radial
                                                                                  basis functions (RBF) kernel-PCA. We also decided to fix the num-
                       5      Experiments                                         ber of extracted features at three features per layer; this choice
In this Section, we first illustrate the results of pairwise comparison           of a relatively small number of hidden features enables us to use
of latent features, and then turn to an empirical evaluation of the               many intervals (5 or 10) for their discretisation while still obtain-
sensitivity of BN abstractions at detecting distribution shift induced            ing reasonably-sized probability tables. We applied both uniform-
by adversarial examples.                                                          and quantile-based discretisation strategies, with or without the ad-
                                                                                  dition of two left- and right-most intervals that do not contain any
                                                                                  element of the training sample. Finally, we considered three hidden
5.1    Illustration of Pairwise Comparison                                        layers to construct the BN abstractions: for the two models, the first
We first concentrate on the BN given in examples so far.                          two selected layers directly follow a block of convolutions, while
   We report in Table 2 a pairwise comparison matrix for this ex-                 the last is a dense ReLU layer that is situated few layers before the
ample, where we arrange the perturbed features in the first column                NN’s output layer. The layers chosen criteria is based on a belief
and compute their impact on each feature (i, j)’s probability ta-                 that the activation values at these layers capture relevant patterns
bles. The numbers reported in this matrix represent the change in                 w.r.t the NN decisions.
the probability values. For instance, our controlled perturbation of
feature (2, 0) intervals has an impact on features (3, 0) and (3, 1)              Example Distributions and Distances. We plot in Figure 5
values. More specifically, the MSE between the (3, 0) probability                 example distributions of probabilities in vectors obtained from a
tables for feature (3, 0), given in Figure 4 (a) before and (b) after             BN abstraction of the MNIST model. We have annotated each
perturbing feature (2, 0), is 0.0113.                                             one of these plots with various measures of distances between
                                                                                  the reference probabilities Pref that is generated using a sample
Discussion. Suppose we set the diagonal line to zeros since the                   from the training data set, and the respective six perturbed fea-
change is made from the feature itself. In that case, we can observe              tures probabilities Pf0 . The shown difference between these two
that the perturbations are not affecting the probability of features              probability distributions illustrates the internal change in the dis-
from the previous layer (parent features) or the same layer as ex-                tribution represented by the BN. For instance, when applying the
pected. On the other hand, random shifting only influenced the im-                random shift on the first feature that is extracted from the first se-
mediate features in the next layer. The largest difference occurred               lected layer i.e., perturbed feature (1,0), the calculated probability
                                              (1, 0)        (1, 1)      (2, 0)      (2, 1)        (3, 0)        (3, 1)
                      perturbed feature
                                 (1, 0)        0.003 01      0.          0.008 58     0.008 08     0.            0.
                                 (1, 1)        0.            0.002 57    0.007 83     0.008 65     0.            0.
                                 (2, 0)        0.            0.          0.0143       0.           0.0113        0.008 07
                                 (2, 1)        0.            0.          0.           0.0102       0.008 89      0.0114
                                 (3, 0)        0.            0.          0.           0.           0.0229        0.
                                 (3, 1)        0.            0.          0.           0.           0.            0.0161

Table 2: Example pairwise comparison matrix for six extracted features. Each cell describes the extent to which a feature (rows)
affects the others (columns).


Figure 5: Distributions of probabilities obtained from one example BN abstraction for each perturbed feature, for the MNIST
model. Each plot shows respective distance measures w.r.t. the probabilities obtained from the BN for the clean (unperturbed)
features (Pref ).


distribution P 0 (1, 0), coloured with blue, shows a change on prob-      cies) with sufficient precision to associate diverging probabilities
abilistic causal relation that implies the change on the probability      between “legitimate” inputs and adversarially perturbed ones. If
represented by the BN. Hence, we can determine the safety viola-          such is the case, we shall conclude that our abstraction scheme and
tion risk by comparing an input probability belonging to the BN           the associated BN are sufficiently precise to capture relevant de-
probability distribution.                                                 pendencies in latent feature values that may not be matched (or
                                                                          matched too well, depending on the sign of the actual difference in
Sensitivity to Adversarial Distribution Shift. We have                    probabilities) by some adversarial inputs.
carried out a set of experiments to assess whether the set of three          To carry out these experiments, we have selected the following
features extracted for each considered hidden layer allows us to          adversarial attacks:
capture relevant properties of the learnt representations. In partic-     fgsm is the Fast Gradient Sign Method of Goodfellow, Shlens, and
ular, we wanted to check whether the BN abstraction allows us to             Szegedy (2015);
detect the shift in the distribution of inputs that occurs when the
NN is subject to adversarial examples. In other words, we want            pgdlinf and pgdl2 are the Projected Gradient Descent approach of
to discover whether some distance measures indicate that the BN              Madry et al. (2017) with L∞ and L2 norm, respectively;
abstractions capture relevant latent features (and their dependen-        cwlinf and cwl2 are Carlini and Wagner (2017)’s attack with L∞
                                               p = L2 — fgsm          p = L2 — pgdlinf        p = L2 — pgdl2            p = L2 — cwlinf            p = L2 — cwl2          p = L2 — deepfool

                                                                                                                          uniform5
                            0.50
                                                                                                                          quantile5
                                                                                                                          uniform10
                            0.25
 dp (Pr(Xtest ∈ B) , Pr(Xattack ∈ B))


                                                                                                                          quantile10

                            0.00
                                              p = cos — fgsm          p = cos — pgdlinf       p = cos — pgdl2           p = cos — cwlinf          p = cos — cwl2          p = cos — deepfool


                                        0.5


                                        0.0
                                              p = AF — fgsm           p = AF — pgdlinf        p = AF — pgdl2            p = AF — cwlinf           p = AF — cwl2           p = AF — deepfool


                                        1.0

                                        0.5

                                        0.0
                                              pca    ica   rbf kpca   pca    ica   rbf kpca   pca    ica   rbf kpca     pca     ica    rbf kpca   pca    ica   rbf kpca    pca    ica   rbf kpca

                                               p = L2 — fgsm          p = L2 — pgdlinf        p = L2 — pgdl2            p = L2 — cwlinf            p = L2 — cwl2          p = L2 — deepfool


                                         2
 dp (Pr(Xtest ∈ B) , Pr(Xattack ∈ B))


                                         0
                                              p = cos — fgsm          p = cos — pgdlinf       p = cos — pgdl2           p = cos — cwlinf          p = cos — cwl2          p = cos — deepfool
                                        1.0


                                        0.5


                                        0.0
                                              p = AF — fgsm           p = AF — pgdlinf        p = AF — pgdl2            p = AF — cwlinf           p = AF — cwl2           p = AF — deepfool


                                        10


                                         0
                                              pca    ica   rbf kpca   pca    ica   rbf kpca   pca    ica   rbf kpca     pca     ica    rbf kpca   pca    ica   rbf kpca    pca    ica   rbf kpca
                                              feature extraction      feature extraction      feature extraction        feature extraction        feature extraction       feature extraction

Figure 6: Selected distances (vertical axes) between probability vectors obtained for the validation dataset (Pr(Xtest ∈ B)) and
probability vectors (Pr(Xattack ∈ B)) obtained for datasets generated by selected adversarial attacks (shown on each column),
for a range of BN abstractions B. The top (resp. bottom) three rows show results for the MNIST (resp. CIFAR10) model. Hue
indicates the discretisation strategy and the number of intervals. The grey vertical lines show confidence intervals.


                                        and L2 norm, respectively, both targeting 0.1 confidence;                     Results and Discussion. Figure 6 shows our results for three
                                                                                                                      selected distances L2 , cos, and AF . We give more detailed results
deepfool is the DeepFool attack by Moosavi-Dezfooli, Fawzi, and                                                       in Appendix A. Each chart in the figure illustrates the calculated
   Frossard (2016).                                                                                                   distances with four colours according to the discretisation method
                                                                                                                      and the number of intervals in the vertical axis, using three sets
Attacks involving the L∞ norm target a maximum of ε = 0.1                                                             of feature extraction techniques (pca, ica, and rbf.kpca) in the hor-
perturbation in the input images, whereas pgdl2 targets a maximum                                                     izontal axis. The used distance metric and attack type are shown
perturbation ε = 10.                                                                                                  at each chart’s top. First of all, we can observe that some combi-
   For each attack, we have generated an adversarial dataset                                                          nations of abstractions and distance measures exhibit notable dif-
Xattack from the validation dataset Xtest for both the MNIST                                                          ferences between the validation dataset and the adversarial one for
and CIFAR10 models, where each dataset consists of 10 000                                                             some attacks. For instance, every distance shown allows us to mea-
inputs. Then, for each attack and BN abstraction B built and                                                          sure a shift in input distribution for every attack, except Carlini and
fit using 20 000 elements drawn from the respective training                                                          Wagner (2017)’s in some cases. Next, although the feature extrac-
datasets, we measured a set of distances p between the vec-                                                           tion technique does not have a noticeable impact on any measured
tors of probabilities Pr(Xtest ∈ B) and Pr(Xattack ∈ B), denoted                                                      distance, the discretisation strategy certainly plays a role in the abil-
dp (Pr(Xtest ∈ B) , Pr(Xattack ∈ B)).                                                                                 ity of the BN to model each abstracted latent feature and their de-
pendencies with sufficient precision. For example, in the first row      Utility of Bayesian Network. As suggested, BN can be seen
of the CIFAR-10 experiment (L2 distance), the distribution shift is      as an abstraction of the original DNN. It is therefore imperative
detected when using the uniform-based discretisation method with         to understand how this abstraction may help in either analysing or
five intervals (distance with blue color).                               enhancing the original DNN. In Berthier et al. (2021), test metrics
   Overall, the experimental results show that computing distances       are designed over the BN by extending the MC/DC metrics pro-
between two BN probability distributions, clean and perturbed by         posed by Sun et al. (2019). As the next step, it would be interesting
intervals-shift or adversarial attacks, can detect the distribution      to understand if test case generation methods (Sun et al. 2018b),
shift where it exists. We emphasise that, in the case of adversar-       in particular the one based on symbolic computation (Sun, Huang,
ial shift, this is measured based on the latent features only. Given     and Kroening 2018), can also be extended to work with BNs. More-
this empirically confirmed property, BN-based computation of fea-        over, it will be useful to see if the generated test cases can be more
ture importance appears to be one tool, which adds to the growing        nature and diverse when comparing with those generated directly
set of useful techniques for the detection of important features as      on DNNs, as done in (Huang et al. 2021).
well as of adversarial examples. What is more, it adds a semantic            In addition to testing, it would also be interesting to see if such
twist to this analysis and allows for explaining in which way the        abstraction may bring any benefit to e.g., verification (Huang et al.
changes in the features contribute to the distribution shift.            2017), interpretation of DNN training (Jin et al. 2020), explainable
                                                                         AI (Zhao et al. 2021c), and safety case (Zhao et al. 2020). For ex-
                                                                         ample, scalability is the key obstacle of DNN verification due to its
                       6     Discussions                                 complexity (Ruan, Huang, and Kwiatkowska 2018). Considering
In this section, we discuss a few aspects related to either the method   that BN is significantly smaller than the original DNN, it will be
we take or the potential application of the method.                      interesting to understand if BN can be used to alleviate the prob-
                                                                         lem without losing the provable guarantee. A potential difficulty
Hyper-parameters in BN Construction. The parametric                      may be whether and how the verification result on the BN can be
nature of the scheme advanced by Berthier et al. (2021) enables          transferred to the DNN.
the exploration of a wide range of DNN abstractions. For instance,           Similar as the above discussion for testing and verification, the
in our experiments, the sensitivity to adversarial distribution shift    potential for the BN to be used as an intermediate step for the reli-
is relied most on the linear dimensionality reduction techniques         ability assessment (Zhao et al. 2021a) and safety case (Zhao et al.
to extract latent features. We plan to conduct further experiments       2021b) is worthy of exploration. This may probably require a quan-
with more non-linear feature extraction techniques, like manifold        tification of the error, or the loss of information, when using BN as
learning (Lee 2000), to assess the properties of extracted features in   an abstraction of the DNN.
extended cases. The effect of more advanced discretisation strate-
gies can also be explored, for instance by relying on kernel density                            7    Conclusions
estimations to partition each latent feature component into inter-
vals that span across ranges of the real line that are either densely    In this study, we have advanced a novel technique that employs a
or non-densely exercised by the training sample.                         BN abstraction to investigate how to measure the importance of
                                                                         high level features when they are used by the neural network to
Hyper-parameters in Weight Quantification. There are a                   make classification decisions. In addition to the observed ability of
number of building blocks in the weight quantification method (Al-       detecting the distribution shifts before and after perturbation, this
gorithm 1), including e.g., the perturbation made to generate new        will open many doors for future exploration. For example, it will
CPTs, the random shifting function, and the distance metrics for         certainly be interesting to understand if the generated importance
probabilities (Pref ) and (Pf0 ). In this paper, we have explored sev-   values can support the explanation of the black-box learning model.
eral different options of the distance metrics for a comparison. It      It will also be useful if such importance values can be utilised to
would also be useful to study if and how the other hyper-parameters      improve the training process.
may affect the overall results.

Utility of Feature Weights. Quantifying the importance of
                                                                                                    References
the hidden features provides three advantages. First, visualising the    Bai, Y.; Zeng, Y.; Jiang, Y.; Xia, S.-T.; Ma, X.; and Wang, Y.
most important features provides insight into the model’s internal       2021. Improving Adversarial Robustness via Channel-wise Acti-
decisions by highlighting dominating regions in the feature space.       vation Suppressing. In International Conference on Learning Rep-
   Second, we can use the importance measurement to design high-         resentations.
level testing metrics that evaluate the robustness of the DNN. Some      Berthier, N.; Alshareef, A.; Sharp, J.; Schewe, S.; and Huang, X.
attempts have been made in Berthier et al. (2021), where no feature      2021. Abstraction and Symbolic Execution of Deep Neural Net-
weight is taken into consideration.                                      works with Bayesian Approximation of Hidden Features. arXiv
   Third, with FI as a defence, we can utilise the obtained impor-       preprint arXiv:2103.03704.
tance in the training process and force the DNN to adjust its param-     Carlini, N.; and Wagner, D. 2017. Towards Evaluating the Robust-
eters according to the features that are most relevant for the predic-   ness of Neural Networks. In 2017 IEEE Symposium on Security
tion. This direction is the most widely adopted strategy. For exam-      and Privacy (SP), 39–57. IEEE Computer Society.
ple, Zhang et al. (2021) propose a hierarchical feature alignment
method that computes the difference between clean and adversarial        Castillo, E.; Gutiérrez, J. M.; and Hadi, A. S. 1997. Sensitivity
feature representations and utilises it as a loss function when opti-    analysis in discrete Bayesian networks. IEEE Transactions on Sys-
mising network parameters, while Bai et al. (2021) suggest that          tems, Man, and Cybernetics-Part A: Systems and Humans, 27(4):
different channels of a DNN’s intermediate layers contribute dif-        412–423.
ferently to a specific class prediction and propose a Channel-wise       Daxberger, E.; Nalisnick, E.; Allingham, J. U.; Antorán, J.; and
Activation Suppressing training technique that learns the channel        Hernández-Lobato, J. M. 2021. Bayesian deep learning via subnet-
importance, and leverages them to suppress the channel activation        work inference. In International Conference on Machine Learning,
while training the network.                                              2510–2521. PMLR.
Goodfellow, I. J.; Shlens, J.; and Szegedy, C. 2015. Explaining and      Zhang, X.; Wang, J.; Wang, T.; Jiang, R.; Xu, J.; and Zhao, L. 2021.
Harnessing Adversarial Examples. arXiv:1412.6572.                        Robust feature learning for adversarial defense via hierarchical fea-
Hamdi, A.; and Ghanem, B. 2020. Towards analyzing semantic               ture alignment. Information Sciences, 560: 256–270.
robustness of deep neural networks. In European Conference on            Zhao, X.; Banks, A.; Sharp, J.; Robu, V.; Flynn, D.; Fisher, M.;
Computer Vision, 22–38. Springer.                                        and Huang, X. 2020. A Safety Framework for Critical Systems
Huang, W.; Sun, Y.; Zhao, X.; Sharp, J.; Ruan, W.; Meng, J.; and         Utilising Deep Neural Networks. In SafeComp2020, 244–259.
Huang, X. 2021. Coverage-Guided Testing for Recurrent Neural             Zhao, X.; Huang, W.; Banks, A.; Cox, V.; Flynn, D.; Schewe, S.;
Networks. IEEE Transactions on Reliability, 1–16.                        and Huang, X. 2021a. Assessing the Reliability of Deep Learning
Huang, X.; Kroening, D.; Ruan, W.; Sharp, J.; Sun, Y.; Thamo, E.;        Classifiers Through Robustness Evaluation and Operational Pro-
Wu, M.; and Yi, X. 2020. A survey of safety and trustworthiness of       files. In AISafety.
deep neural networks: Verification, testing, adversarial attack and      Zhao, X.; Huang, W.; Bharti, V.; Dong, Y.; Cox, V.; Banks,
defence, and interpretability. Computer Science Review, 37.              A.; Wang, S.; Schewe, S.; and Huang, X. 2021b. Reliabil-
Huang, X.; Kwiatkowska, M.; Wang, S.; and Wu, M. 2017. Safety            ity Assessment and Safety Arguments for Machine Learning
Verification of Deep Neural Networks. In International Conference        Components in Assuring Learning-Enabled Autonomous Systems.
on Computer Aided Verification, 3–29. Springer.                          arXiv:2112.00646.
Ilyas, A.; Santurkar, S.; Tsipras, D.; Engstrom, L.; Tran, B.; and       Zhao, X.; Huang, X.; Robu, V.; and Flynn, D. 2021c. BayLIME:
Madry, A. 2019. Adversarial Examples Are Not Bugs, They Are              Bayesian Local Interpretable Model-Agnostic Explanations. In
Features. In Wallach, H.; Larochelle, H.; Beygelzimer, A.; d'Alché-     UAI.
Buc, F.; Fox, E.; and Garnett, R., eds., Advances in Neural Infor-
mation Processing Systems, volume 32. Curran Associates, Inc.                   A     Detailed Results for Sensitivity to
Jin, G.; Yi, X.; Zhang, L.; Zhang, L.; Schewe, S.; and Huang, X.                       Adversarial Shift Experiments
2020. How does Weight Correlation Affect the Generalisation
Ability of Deep Neural Networks. Advances in Neural Informa-             We have plotted in Figure 6 some statistics for a subset of the dis-
tion Processing Systems.                                                 tances we have considered for comparing probability vectors. Fig-
                                                                         ures 7, 8, and 9 show the distances computed for the MNIST model,
Lee, J. 2000. A global geometric framework for non-linear dimen-         and Figures 10, 11, and 12 show the results for the CIFAR10 model.
sionality reduction. In Proceedings of the 8th European symposium        In these plots, hue still indicates the discretisation strategy. How-
on artificial neural networks, 2000, volume 1, 13–20.                    ever, we have discriminated between extended and non-extended
Madaan, D.; Shin, J.; and Hwang, S. J. 2020. Adversarial neu-            strategies: the prefix ‘-X’ denotes that latent features are discretised
ral pruning with latent vulnerability suppression. In International      in such a way that left- and right-most intervals do not contain any
Conference on Machine Learning, 6575–6585. PMLR.                         (projected) training sample.
Madry, A.; Makelov, A.; Schmidt, L.; Tsipras, D.; and Vladu, A.
2017. Towards Deep Learning Models Resistant to Adversarial
Attacks. arXiv:1706.06083.
Miller, T. 2019. Explanation in artificial intelligence: Insights from
the social sciences. Artificial intelligence, 267: 1–38.
Moosavi-Dezfooli, S.-M.; Fawzi, A.; and Frossard, P. 2016. Deep-
fool: a simple and accurate method to fool deep neural networks. In
Proceedings of the IEEE conference on computer vision and pat-
tern recognition, 2574–2582.
Pei, K.; Cao, Y.; Yang, J.; and Jana, S. 2017. Deepxplore: Auto-
mated whitebox testing of deep learning systems. In proceedings
of the 26th Symposium on Operating Systems Principles, 1–18.
Ruan, W.; Huang, X.; and Kwiatkowska, M. 2018. Reachability
Analysis of Deep Neural Networks with Provable Guarantees. In
IJCAI, 2651–2659.
Sun, Y.; Huang, X.; and Kroening, D. 2018. Testing Deep Neural
Networks. CoRR, abs/1803.04792.
Sun, Y.; Huang, X.; Kroening, D.; Sharp, J.; Hill, M.; and Ash-
more, R. 2018a. Testing deep neural networks. arXiv preprint
arXiv:1803.04792.
Sun, Y.; Huang, X.; Kroening, D.; Sharp, J.; Hill, M.; and Ash-
more, R. 2019. Structural Test Coverage Criteria for Deep Neural
Networks. ACM Trans. Embed. Comput. Syst., 18(5s).
Sun, Y.; Wu, M.; Ruan, W.; Huang, X.; Kwiatkowska, M.; and
Kroening, D. 2018b. Concolic Testing for Deep Neural Networks.
In ASE, 109–119.
Xu, Q.; Tao, G.; Cheng, S.; and Zhang, X. 2021. Towards Feature
Space Adversarial Attack by Style Perturbation. In Proceedings of
the AAAI Conference on Artificial Intelligence, volume 35, 10523–
10531.
                                                                                                        measure p = L1 — feat. extr.: pca                             measure p = L1 — feat. extr.: ica                  measure p = L1 — feat. extr.: rbf kpca

                                                                                                                                         discretization
                                                     dp (Pr(Xtest ∈ B) , Pr(Xattack ∈ B))


                                                                                            40                                             uniform5
                                                                                                                                           uniform5-X
                                                                                                                                           quantile5
                                                                                                                                           quantile5-X
                                                                                            30
                                                                                                                                           uniform10
                                                                                                                                           uniform10-X
                                                                                                                                           quantile10
                                                                                            20
                                                                                                                                           quantile10-X


                                                                                            10


                                                                                             0
                                                                                                        measure p = L2 — feat. extr.: pca                             measure p = L2 — feat. extr.: ica                  measure p = L2 — feat. extr.: rbf kpca
                                                                                       0.8
                                          dp (Pr(Xtest ∈ B) , Pr(Xattack ∈ B))


                                                                                       0.7

                                                                                       0.6

                                                                                       0.5

                                                                                       0.4

                                                                                       0.3

                                                                                       0.2

                                                                                       0.1

                                                                                       0.0
                                                                                                    measure p = L∞ — feat. extr.: pca                                 measure p = L∞ — feat. extr.: ica                  measure p = L∞ — feat. extr.: rbf kpca
   dp (Pr(Xtest ∈ B) , Pr(Xattack ∈ B))


                                                   0.035

                                                   0.030

                                                   0.025

                                                   0.020

                                                   0.015

                                                   0.010

                                                   0.005

                                                   0.000
                                                                                                 fgsm


                                                                                                                                                               fgsm


                                                                                                                                                                                                                         fgsm
                                                                                                                                            cwl2


                                                                                                                                                                                                       cwl2


                                                                                                                                                                                                                                                            cwl2
                                                                                                                       pgdl2


                                                                                                                                                                                     pgdl2


                                                                                                                                                                                                                                          pgdl2
                                                                                                                                                    deepfool


                                                                                                                                                                                                              deepfool


                                                                                                                                                                                                                                                                   deepfool
                                                                                                                                cwlinf


                                                                                                                                                                                              cwlinf


                                                                                                                                                                                                                                                   cwlinf
                                                                                                             pgdlinf


                                                                                                                                                                           pgdlinf


                                                                                                                                                                                                                                pgdlinf


                                                                                                                           attack                                                        attack                                               attack

Figure 7: Distances (vertical axes) between probability vectors obtained for the MNIST validation dataset (Pr(Xtest ∈ B))
and probability vectors (Pr(Xattack ∈ B)) obtained for datasets generated by selected adversarial attacks (attack, shown on the
horizontal axes), for a range of BN abstractions B. Every abstraction involves 3 layers for which 3 features have been extracted
using PCA (left-hand side column), ICA (middle), or radial basis functions (RBF) kernel-PCA (right). Plotted data aggregates
five independent runs, and shows confidence intervals.
                                                measure p = corr — feat. extr.: pca                               measure p = corr — feat. extr.: ica                measure p = corr — feat. extr.: rbf kpca

                                                                                     discretization
dp (Pr(Xtest ∈ B) , Pr(Xattack ∈ B))


                                       1.0
                                                                                       uniform5
                                                                                       uniform5-X
                                       0.8                                             quantile5
                                                                                       quantile5-X
                                                                                       uniform10
                                       0.6                                             uniform10-X
                                                                                       quantile10
                                                                                       quantile10-X
                                       0.4


                                       0.2


                                       0.0
                                                    measure p = cos — feat. extr.: pca                            measure p = cos — feat. extr.: ica                 measure p = cos — feat. extr.: rbf kpca
dp (Pr(Xtest ∈ B) , Pr(Xattack ∈ B))


                                       0.8


                                       0.6


                                       0.4


                                       0.2


                                       0.0
                                                    measure p = JS — feat. extr.: pca                             measure p = JS — feat. extr.: ica                  measure p = JS — feat. extr.: rbf kpca
dp (Pr(Xtest ∈ B) , Pr(Xattack ∈ B))


                                       0.7

                                       0.6

                                       0.5

                                       0.4

                                       0.3

                                       0.2

                                       0.1

                                       0.0
                                             fgsm


                                                                                                           fgsm


                                                                                                                                                                     fgsm
                                                                                        cwl2


                                                                                                                                                   cwl2


                                                                                                                                                                                                         cwl2
                                                                   pgdl2


                                                                                                                                 pgdl2


                                                                                                                                                                                       pgdl2
                                                                                                deepfool


                                                                                                                                                          deepfool


                                                                                                                                                                                                                deepfool
                                                                            cwlinf


                                                                                                                                          cwlinf


                                                                                                                                                                                                cwlinf
                                                         pgdlinf


                                                                                                                       pgdlinf


                                                                                                                                                                             pgdlinf


                                                                       attack                                                        attack                                                attack

                                                                                                                  Figure 8: See Figure 7.
                                                                                                               ×10
                                                                                                                  measure
                                                                                                                  −5      p = MSE — feat. extr.: pca                          measure p = MSE — feat. extr.: ica                      measure p = MSE — feat. extr.: rbf kpca
                                                                                                           6
                                                                                                                                                      discretization
                                                                    dp (Pr(Xtest ∈ B) , Pr(Xattack ∈ B))                                                uniform5
                                                                                                           5                                            uniform5-X
                                                                                                                                                        quantile5
                                                                                                           4
                                                                                                                                                        quantile5-X
                                                                                                                                                        uniform10
                                                                                                                                                        uniform10-X
                                                                                                           3                                            quantile10
                                                                                                                                                        quantile10-X
                                                                                                           2


                                                                                                           1


                                                                                                           0
                                                                                                                measure p = RMSE — feat. extr.: pca                         measure p = RMSE — feat. extr.: ica                       measure p = RMSE — feat. extr.: rbf kpca
                                                0.008
dp (Pr(Xtest ∈ B) , Pr(Xattack ∈ B))


                                                0.007

                                                0.006

                                                0.005

                                                0.004

                                                0.003

                                                0.002

                                                0.001

                                                0.000
                                                                                                                 measure p = MAE — feat. extr.: pca                          measure p = MAE — feat. extr.: ica                       measure p = MAE — feat. extr.: rbf kpca
dp (Pr(Xtest ∈ B) , Pr(Xattack ∈ B))


                                                0.004


                                                0.003


                                                0.002


                                                0.001


                                                0.000
                                                                                                                     measure p = AF — feat. extr.: pca                             measure p = AF — feat. extr.: ica                   measure p = AF — feat. extr.: rbf kpca
                                                                                    1.4
                                       dp (Pr(Xtest ∈ B) , Pr(Xattack ∈ B))


                                                                                    1.2


                                                                                    1.0


                                                                                    0.8


                                                                                    0.6


                                                                                    0.4


                                                                                    0.2


                                                                                    0.0
                                                                                                                 fgsm


                                                                                                                                                                            fgsm


                                                                                                                                                                                                                                        fgsm
                                                                                                                                                         cwl2


                                                                                                                                                                                                                    cwl2


                                                                                                                                                                                                                                                                           cwl2
                                                                                                                                    pgdl2


                                                                                                                                                                                                  pgdl2


                                                                                                                                                                                                                                                         pgdl2
                                                                                                                                                                 deepfool


                                                                                                                                                                                                                           deepfool


                                                                                                                                                                                                                                                                                  deepfool
                                                                                                                                             cwlinf


                                                                                                                                                                                                           cwlinf


                                                                                                                                                                                                                                                                  cwlinf
                                                                                                                          pgdlinf


                                                                                                                                                                                        pgdlinf


                                                                                                                                                                                                                                               pgdlinf


                                                                                                                                        attack                                                        attack                                                 attack

                                                                                                                                                                            Figure 9: See Figure 7.
                                                                         measure p = L1 — feat. extr.: pca                             measure p = L1 — feat. extr.: ica                  measure p = L1 — feat. extr.: rbf kpca

                                                                                                          discretization
              dp (Pr(Xtest ∈ B) , Pr(Xattack ∈ B))


                                                                                                            uniform5
                                                            250
                                                                                                            uniform5-X
                                                                                                            quantile5
                                                            200                                             quantile5-X
                                                                                                            uniform10-X
                                                                                                            uniform10
                                                            150
                                                                                                            quantile10
                                                                                                            quantile10-X
                                                            100


                                                             50


                                                              0
                                                                         measure p = L2 — feat. extr.: pca                             measure p = L2 — feat. extr.: ica                  measure p = L2 — feat. extr.: rbf kpca
                                                            4.0
                     dp (Pr(Xtest ∈ B) , Pr(Xattack ∈ B))


                                                            3.5

                                                            3.0

                                                            2.5

                                                            2.0

                                                            1.5

                                                            1.0

                                                            0.5

                                                            0.0
                                                                     measure p = L∞ — feat. extr.: pca                                 measure p = L∞ — feat. extr.: ica                  measure p = L∞ — feat. extr.: rbf kpca
   dp (Pr(Xtest ∈ B) , Pr(Xattack ∈ B))


                                                0.07

                                                0.06

                                                0.05

                                                0.04

                                                0.03

                                                0.02

                                                0.01

                                                0.00
                                                                  fgsm


                                                                                                                                fgsm


                                                                                                                                                                                          fgsm
                                                                                                             cwl2


                                                                                                                                                                        cwl2


                                                                                                                                                                                                                             cwl2
                                                                                        pgdl2


                                                                                                                                                      pgdl2


                                                                                                                                                                                                           pgdl2
                                                                                                                     deepfool


                                                                                                                                                                               deepfool


                                                                                                                                                                                                                                    deepfool
                                                                                                 cwlinf


                                                                                                                                                               cwlinf


                                                                                                                                                                                                                    cwlinf
                                                                              pgdlinf


                                                                                                                                            pgdlinf


                                                                                                                                                                                                 pgdlinf


                                                                                            attack                                                        attack                                               attack

Figure 10: Distances (vertical axes) between probabilities obtained for the CIFAR10 validation dataset and datasets generated
by selected adversarial attacks (horizontal axes). See Figure 7 for further details.
                                                measure p = corr — feat. extr.: pca                            measure p = corr — feat. extr.: ica                measure p = corr — feat. extr.: rbf kpca
dp (Pr(Xtest ∈ B) , Pr(Xattack ∈ B))


                                       1.0

                                                                   discretization
                                       0.8                           uniform5
                                                                     uniform5-X
                                                                     quantile5
                                       0.6
                                                                     quantile5-X
                                                                     uniform10-X
                                       0.4                           uniform10
                                                                     quantile10
                                                                     quantile10-X
                                       0.2


                                       0.0
                                                    measure p = cos — feat. extr.: pca                         measure p = cos — feat. extr.: ica                 measure p = cos — feat. extr.: rbf kpca
                                       1.0
dp (Pr(Xtest ∈ B) , Pr(Xattack ∈ B))


                                       0.8


                                       0.6


                                       0.4


                                       0.2


                                       0.0
                                                    measure p = JS — feat. extr.: pca                          measure p = JS — feat. extr.: ica                  measure p = JS — feat. extr.: rbf kpca
dp (Pr(Xtest ∈ B) , Pr(Xattack ∈ B))


                                       0.8

                                       0.7

                                       0.6

                                       0.5

                                       0.4

                                       0.3

                                       0.2

                                       0.1

                                       0.0
                                             fgsm


                                                                                                        fgsm


                                                                                                                                                                  fgsm
                                                                                      cwl2


                                                                                                                                                cwl2


                                                                                                                                                                                                      cwl2
                                                                    pgdl2


                                                                                                                              pgdl2


                                                                                                                                                                                    pgdl2
                                                                                             deepfool


                                                                                                                                                       deepfool


                                                                                                                                                                                                             deepfool
                                                                             cwlinf


                                                                                                                                       cwlinf


                                                                                                                                                                                             cwlinf
                                                         pgdlinf


                                                                                                                    pgdlinf


                                                                                                                                                                          pgdlinf


                                                                        attack                                                    attack                                                attack

                                                                                                        Figure 11: See Figure 10.
                                                                                                                          measure p = MSE — feat. extr.: pca                          measure p = MSE — feat. extr.: ica                   measure p = MSE — feat. extr.: rbf kpca

dp (Pr(Xtest ∈ B) , Pr(Xattack ∈ B))            0.00175                                                                                                      discretization
                                                                                                                                                               uniform5
                                                0.00150                                                                                                        uniform5-X
                                                                                                                                                               quantile5
                                                0.00125                                                                                                        quantile5-X
                                                                                                                                                               uniform10-X
                                                0.00100                                                                                                        uniform10
                                                                                                                                                               quantile10
                                                0.00075                                                                                                        quantile10-X


                                                0.00050

                                                0.00025

                                                0.00000
                                                                                                                          measure p = RMSE — feat. extr.: pca                       measure p = RMSE — feat. extr.: ica                    measure p = RMSE — feat. extr.: rbf kpca
                                                                              0.040
                                       dp (Pr(Xtest ∈ B) , Pr(Xattack ∈ B))


                                                                              0.035

                                                                              0.030

                                                                              0.025

                                                                              0.020

                                                                              0.015

                                                                              0.010

                                                                              0.005

                                                                              0.000
                                                                                                                          measure p = MAE — feat. extr.: pca                          measure p = MAE — feat. extr.: ica                   measure p = MAE — feat. extr.: rbf kpca
                                       dp (Pr(Xtest ∈ B) , Pr(Xattack ∈ B))


                                                                              0.025


                                                                              0.020


                                                                              0.015


                                                                              0.010


                                                                              0.005


                                                                              0.000
                                                                                                                            measure p = AF — feat. extr.: pca                           measure p = AF — feat. extr.: ica                   measure p = AF — feat. extr.: rbf kpca
                                                                              dp (Pr(Xtest ∈ B) , Pr(Xattack ∈ B))


                                                                                                                     20


                                                                                                                     15


                                                                                                                     10


                                                                                                                      5


                                                                                                                      0
                                                                                                                          fgsm


                                                                                                                                                                                     fgsm


                                                                                                                                                                                                                                             fgsm
                                                                                                                                                                cwl2


                                                                                                                                                                                                                         cwl2


                                                                                                                                                                                                                                                                                cwl2
                                                                                                                                           pgdl2


                                                                                                                                                                                                       pgdl2


                                                                                                                                                                                                                                                              pgdl2
                                                                                                                                                                        deepfool


                                                                                                                                                                                                                                deepfool


                                                                                                                                                                                                                                                                                       deepfool
                                                                                                                                                    cwlinf


                                                                                                                                                                                                                cwlinf


                                                                                                                                                                                                                                                                       cwlinf
                                                                                                                                 pgdlinf


                                                                                                                                                                                             pgdlinf


                                                                                                                                                                                                                                                    pgdlinf


                                                                                                                                               attack                                                      attack                                                 attack

                                                                                                                                                                                   Figure 12: See Figure 10.

</pre>