=Paper=
{{Paper
|id=Vol-3087/paper_32
|storemode=property
|title=Quantifying the Importance of Latent Features in Neural Networks
|pdfUrl=https://ceur-ws.org/Vol-3087/paper_32.pdf
|volume=Vol-3087
|authors=Amany Alshareef,Nicolas Berthier,Sven Schewe,Xiaowei Huang
|dblpUrl=https://dblp.org/rec/conf/aaai/AlshareefBSH22
}}
==Quantifying the Importance of Latent Features in Neural Networks==
Quantifying the Importance of Latent Features in Neural Networks Amany Alshareef, Nicolas Berthier, Sven Schewe, Xiaowei Huang Department of Computer Science University of Liverpool, UK {amany.alshareef, nicolas.berthier, sven.schewe, xiaowei.huang}@liverpool.ac.uk Abstract gence (XAI) (Miller 2019), semantic-level robustness (Hamdi and Ghanem 2020; Xu et al. 2021), and exhibition of internal working The susceptibility of deep learning models to adversarial mechanism through test cases (Huang et al. 2021). examples raises serious concerns over their application in We build on the work of Berthier et al. (2021), which elicited safety-critical contexts. In particular, the level of understand- semantic assumptions by advancing an approach that relies on a ing of the underlying decision processes often lies far below Bayesian Network (BN) abstraction to examine whether latent fea- what can reasonably be accepted for standard safety assur- tures are adequately exercised by a set of inputs. The Bayesian ance. In this work, we provide insights into the high-level view of statistics treats the latent parameters as random variables representations learned by neural network models. We specif- and seeks to learn a distribution of these parameters conditional on ically investigate how the distribution of features in their la- what is observed in the training data. tent space changes in the presence of distortions. To achieve this, we first abstract a given neural network model into a Contribution. Our key contribution is to propose a method that Bayesian Network, where each random variable represents estimates the importance of a neural network’s latent features by the value of a hidden feature. We then estimate the importance analysing an associated Bayesian network’s sensitivity to distribu- of each feature by analysing the sensitivity of the abstrac- tional shifts. This allows us to define semantic testing metrics and tion to targeted perturbations. An importance value indicates to identify distributional shifts in the feature space through the ef- the role of the corresponding feature in underlying decision fect they have on the random variables in BNs. process. Our empirical results suggest that obtained feature This provides us with a separation of concern: we can study the importance measures provide valuable insights for validating effect of a distributional shift in the latent feature space, which is and explaining neural network decisions. typically low-dimensional, independent of potential shifts in the input distributions. This provides insight on the semantic mecha- Keywords— Neural network latent representation, Bayesian net- nisms of decision making in the DNN as well as information for work, Feature importance, Sensitivity analysis. testing the sensitivity of features to distributional shifts. A weighted scoring model is commonly applied in statistics 1 Introduction when certain selected criteria are assigned more importance than When neural networks are used in critical applications, the reli- others. The feature importance (FI) describes how much each ability of their decision making becomes a major concern. Vari- feature influences the classifier’s decision, and thus indicates the ous techniques have been developed to verify, falsify, enhance, or importance of the feature for the classification. This is to be con- explain the neural networks, see (Huang et al. 2020) for a recent tracted with the existing notion of feature importance in explana- survey. In this paper, we focus on gaining insight on the decision tion models, which assigns the importance value to the features mechanisms based on the main features of the network. that belong to the input space, e.g., age, sex, education. Instead, Deep neuronal networks (DNNs) learn their decision rule we investigate the learning models’ latent feature space and exam- through training on a large dataset by gradually optimising parame- ine how much their deep representation relies on a specific hidden ters until they achieve the required accuracy. Therefore, they do not feature to change their prediction. have a specific control-flow structure, which makes it difficult to We seek to evaluate the learning models’ semantic robustness precisely define suitable test criteria. Neuron activation (Pei et al. by developing a weight-based test metric that utilises the Bayesian 2017) and other structural coverage techniques, such as MC/DC Network model from Berthier et al. (2021). However, instead of di- (Sun et al. 2019), that are defined based on the syntactic model rectly using the extracted hidden features to measure some test cov- components have proven to be less effective in validating the safety erage metric, we first compute a weight value, wi , for the i-th latent behaviour of the intelligent systems (Sun et al. 2018a). This paper feature, by analysing the BN’s sensitivity to a controlled noise ap- analyses the internal representation of a DNN built from the train- plied to this feature. In this paper, we develop several analyses that ing dataset, together with the training data itself, toward defining a rely on the BN abstraction to estimate the relative impacts and sen- testing approach that uses semantic aspects. sitivity of the latent features. For example, we measure the relative This is a contribution to a recent trend to exploring the internal impact one feature has on another for all feature pairs by estimat- logic of the learning model, such as eXplainable Artificial Intelli- ing how a controlled noise impacts the BN’s probability distribu- tions. As an alternative approach, we also estimate the sensitiv- Copyright © 2022, for this paper by its authors. Use permitted un- ity to a given latent feature by comparing the probability distribu- der Creative Commons License Attribution 4.0 International (CC tions of training samples before and after the feature has been per- BY 4.0). turbed. Figure 1 outlines the proposed feature sensitivity analysis Figure 1: Illustration of the proposed BN analysis technique to compute the sensitivity of extracted latent features. approach. for the classification of regular inputs in the standard setting. They This allows us to monitor the behaviour of a DNN via its as- further argue that ML models are highly vulnerable to adversarial sociated BN. The structure of this BN is built based on a param- examples due to the presence of these useful non-robust features. eterisable abstraction scheme that defines a series of DNN layers This was further emphasised by Madaan, Shin, and Hwang (2020), to consider (conv2d, dense and dense 1 in the Figure), a feature who showed that the factor causing the adversarial vulnerability is extraction technique to identify a given number of latent features the distortion in the latent feature space. Going beyond these works for each one of these layers (2 in the example), and a discretisation which justify the needs of considering features (instead of pixels or strategy that determines the granularity at which values of latent neurons), we study the causality relation between features by con- features are aggregated into distinct intervals of indistinguishable structing a formal model – Bayesian network. values. Combined with the feed-forward nature of the DNNs we Bayesian Networks (BNs) and Neural Networks (NNs). consider, this scheme allows us to derive the structure of a BN, as Current trends towards using Bayesian modelling to solve chal- shown in Figure 1. lenging issues of neural networks have seen a growing recogni- In addition, this scheme provides us with a discretisation tion of the vital links between them. Daxberger et al. (2021) de- function Discr] , that transforms a set of inputs X into a low- veloped a framework for scaling Bayesian inference to NNs to be dimensional, discretised version FX . In the Figure, the vector of in- able to quantify the uncertainty in NN predictions. Furthermore, puts X is transformed into FX , which associates each input x ∈ X due to the scalability problem that arises when analysing neural with six feature intervals, one for each latent feature represented by networks, Berthier et al. (2021) uses a statistical analysis of acti- the BN. As the BN assigns a probability to an input sample that be- vations at network layers, and abstracts the behaviours of the DNN longs to the distribution it represents, we can compare the probabil- using a Bayesian Network. They identify hidden features that have ities of a sample under a given BN before and after a perturbation. been learned by hidden layers of the DNN and associate each fea- To do so, we conduct the interior analysis on FX by calculating the ture with a node of the BN. Their Bayesian network approximation probability of each sample under the BN B. After that, we iterate model is therefore defined based on high-level features, rather than over all considered latent feature f , and shift the associated inter- 0 on low-level neurons. These extracted features are minimal seman- vals in FX to produce a modified FX w.r.t. the feature f , and cal- f tic components that can be analysed to understand the behaviour culate its probability belonging to the BN B distribution. The term of the feature space and the internal logic of the analysed DNN. intervals shifting refers to a technique used to artificially simulate This paper is, based on the BN in Berthier et al. (2021), to consider a controlled distribution shift by randomly shifting intervals in the different methods to quantify the importance of latent features. selected feature space. To identify the impact of a perturbation, we compute a distance between the original probability vector and the Bayesian Networks Sensitivity Analysis. Sensitivity anal- probability vector obtained from the perturbed features. ysis in Bayesian networks is concerned with understanding how a small change in local network parameters may affect the global conclusions drawn based on the network (Castillo, Gutiérrez, and 2 Related Works Hadi 1997). The key aspect of performing a sensitivity analysis on Robustness of Neural Networks latent features. Features a Bayesian network can be listed as follows: play an essential role in the field of image processing and classi- • Quantify the impact of different nodes on a target node; fication. They are considered as the basic conceptual components of the semantics of an image. Similar as this paper, there are re- • Discover important features that have significant influence on cent studies concentrated on the features space to study the hidden the classifier decision; semantic representation of intelligent models. Ilyas et al. (2019) • Determine sensitive parts of the network that might cause net- categorised useful features in the input-space into robust and non- work vulnerability. robust features. They demonstrate that adversarial perturbation can However, we cannot directly apply the sensitivity analysis in the arise from flipping non-robust features in the data that are useful traditional sense with Bayesian Networks, where the sensitivity is Figure 3: Illustration of probability tables and feature inter- vals with a Bayesian Network node. node in the BN model is associated with either a marginal proba- bility table for hidden features of layer l1 , or a conditional proba- bility table for hidden or output layers. In Figure 2, the conditional probability table for the feature component λ3,1 is defined for each feature interval {(−∞, 3[, [3, +∞)} for layer l3 , w.r.t. each com- Figure 2: Structure of the Bayesian Network abstraction af- bination of the parent feature intervals from previous layer l2 . ter reducing each h1 , h2 , h3 into two features λi,1 ◦ hi and Example 2 Figure 3 illustrates an example node in a BN, which λi,2 ◦ hi with two intervals each. The conditional probability corresponds to the second extracted feature from the first NN layer, tables are shown for features λ3,1 and λ3,2 . i.e., i = 1, j = 2. The set F1,2 ] ]1 contains two intervals, f1,2 and ]2 f1,2 , which partition the real line. The node is denoted as a random variable named λ1,2 ◦ h1 , which is associated with a probability performed by changing the BN parameter from the input space and table. The probability table is a marginal probability table because observing how that influences the final decision. Instead, since our the features on the first layer do not have parent features. The table analysis targets the latent features in the low-dimensional space, says that this feature has probability 0.7 to have a value smaller we measure how sensitive are the BN probability distributions to than 2 and probability 0.3 to have a value no less than 2. changes in the values of hidden features. This process gives insight into how such perturbations impact the inner Bayesian network dis- The previous demonstrating examples were simple, which illus- tribution and hence reflects the ground truth of the neural networks’ trated a BN built using each layer of the NN. In practice, we can behaviour in the presence of adversarial inputs. select specific NN layers to be abstracted, as we did in our analysis Throughout this work, we use the Bayesian network probability experiments. distributions to study the neural networks latent features and anal- Using Bayesian Network Abstractions. We can fit the set yse their deep representations. of probability tables in a BN abstraction B by using a training sam- ple X. This process first transforms X by means of the discreti- 3 Preliminaries sation function to obtain a vector of elements from the discretised latent feature space FX = Discr] (X). It then updates the probabil- At the core of our approach lies the proposed BN-based latent fea- ity tables in B in such a way that the joint probability distribution ture analysis algorithms. Before that, we introduce the scheme and it represents fits the distribution of FX . Bayesian Network that have been used by Berthier et al. (2021) as We can query the fitted BN B for the probabilities of the an explainable abstraction of DNNs’ latent features. discretised input sample FX 0 . We denote this query opera- Let N be a trained deep neural network with sequential layers tion Pr(FX 0 ∈ B), and may abuse L = (l1 , . . . , lK ) and X a training dataset. As an abstract model this notation by defining Pr(X 0 ∈ B) = Pr Discr] (X 0 ) ∈ B . of N and X, a Bayesian Network (BN) is a directed acyclic graph B = (V, E, P ), where V are nodes, E are edges that indicate de- Perturbation of Latent Features. Our developments in the pendencies between features in successive layers, and P maps each next Section rely on the application of a controlled change of a node in V to a probability table representing the conditional prob- feature (i, j) in an element Fx of the latent feature space. This ability of the current feature over its parent features w.r.t. X. operation simulates a distortion in single targeted component of the latent feature space by substituting its associated interval with Example 1 Figure 2 gives a simple neural network of 2 hid- an adjacent one. (We assume that each latent feature component den layers and its Bayesian Network abstraction. hi is a func- is partitioned into at least two intervals). When an interval has tion that gives the neuron activations at layer li from any given two neighbours, we chose uniformly at random between them. We input sample, and λi,j is a feature mapping from the set Λi = denote this operation with the function random shift(Fx , i, j), {λi,j }j∈{1,...,|Λi |} . Each random variable λi,j ◦ hi in the BN rep- ]k ]k−1 which replaces the feature interval fi,j of Fx with either fi,j resents the j-th component of the value obtained after mapping hi ]k+1 into the latent feature space. Since each function λi,j ◦ hi ranges or fi,j . For instance, assuming two hidden feature components over a continuous space, the respective feature components—which extracted from activations at two layers of a NN, each compo- are the codomains of the λi,j ’s—are discretised into a finite set of nent being discretised into small-enough intervals, i.e., 10 in- ]4 ]7 ]1 ]9 feature intervals. tervals, random shift((f0,0 , f0,1 , f1,0 , f1,1 ), 1, 0) returns either ]4 ]7 ]0 ]9 ]4 ]7 ]2 ]9 Each node in BN abstractions represents an extracted feature, (f0,0 , f0,1 , f1,0 , f1,1 ) or (f0,0 , f0,1 , f1,0 , f1,1 ). ] ]1 ]m and we let Fi,j = {fi,j , . . . , fi,j }, for the j-th extracted feature from layer li , be a finite set of m intervals that partition the value 4 BN-based Latent Feature Analysis range of the feature. We formally define a feature as a pair (i, j), In this section, we develop several BN-based analysis approaches where i indexes a layer li in L, and j identifies a component of the we employ to gain insights on latent features. The first approach extracted feature space for layer li , i.e., j ∈ {1, . . . , |Λi |}. Each produces a pairwise comparison matrix that exhibits the relative Algorithm 1: BN-based Feature Sensitivity Analysis Input: Bayesian network B and associated feature mapping & discretisation function Discr] , training dataset X, distance metric dp . Output: Mapping associating a distance measure with each considered latent feature 1: Compute the feature intervals w.r.t. X: FX = Discr] (X) 2: Compute the reference probabilities of FX w.r.t. B: Pref = Pr(FX ∈ B) 3: for each considered feature f = (i, j) do 4: Pf0 = hPr(random shift(Fx , i, j) ∈ B)iFx ∈FX Figure 4: A toy example, with only three intervals for each 5: df = dp Pref , Pf0 feature, illustrates the conditional probability table for the first extracted feature from layer dense 1 before and after 6: end for shifting intervals of feature (2, 0) in the dataset used to fit 7: return distances df , for all f the BN. The feature sensitivity analysis is given in Algorithm 1. This impact the latent features have on each other. Next, we leverage procedure receives an input sample X, taken from the training the BN to estimate the sensitivity of individual features to a con- dataset, and first performs the feature projection and discretisation trolled distribution shift. We then describe how the sensitivity anal- step with Discr] to obtain the associated feature intervals FX . It ysis technique can be applied to define feature importance based on then calculates the probability of each element of FX w.r.t. the BN a generic definition of weights on features. Finally, we formalise a B; this gives the vector of reference probabilities Pref , that asso- concrete definition of weights based on our BN-based feature sen- ciates a probability with each set of abstracted latent features that sitivity. are elicited by each x in X. Then, for each extracted feature f , a random perturbation is performed in FX via the random shift 4.1 Pairwise Comparison function introduced in the previous Section. This leads to a second 0 This particular study is to assess the degree to which the extracted vector, that holds the probabilities of the resulting FX f w.r.t. the features can affect each other by comparing the parallelised Con- BN B. The given distance dp between these two probability vec- ditional Probability Tables (CPTs) of a sample, under a BN, with tors for the perturbed feature f is eventually computed. the CPTs of the same sample after perturbing the features intervals We chose to make the feature sensitivity analysis algorithm para- of the BN. The pairwise comparison method is used to make a re- metric in the distance metric p for the purposes of easing further cursive comparison. It begins by extracting a set of inputs X from experimental use of the FI measure. The considered distances are: training data, and computing its feature intervals with Discr] . This produces a sample FX of intervals w.r.t. X. To generate the proba- • Lp ’s with different norms, typically 1, 2, or ∞; bility tables, we fit the Bayesian network with FX , which gives the • JS is the Jensen-Shannon distance, that is a metric that mea- clean reference probability tables CPTs(FX ). Figure 4-(a) shows sures the similarity between two probability distributions based the CPT for feature (3, 0), which is the first extracted feature from on entropy computations; the third NN layer, named dense 1 in the BN from Figure 1. To extract knowledge about a given feature’s independence and • corr is the correlation distance; robustness, we apply a controlled change to the targeted feature f , by using the random shift operation to shift f ’s intervals • cos is the cosine distance; 0 in FX to obtain FX . We then re-fit the BN’s probabilities with • MSE is the mean squared error; 0 0 FX , which gives the modified probability tables CPTs(FX ) w.r.t. the perturbed feature f , examplified in Figure 4-(b). To identify • RMSE is the root mean squared error; the impact, we use the mean squared error (MSE) between each corresponding table in the reference CPTs(FX ) and generated • MAE is the mean absolute error; 0 CPTs(FX ). • AF is a special purpose anti-fit divergence, which we define We illustrate and give an example of pairwise comparison in based on the coefficient of determination R2 . R2 is a score Section 5 below. that is typically used as a “goodness-of-fit” measure for regres- sion models, and we refer to it as scoreR2 . While the maximal 4.2 Feature Sensitivity Analysis score is 1 (indicating a perfect fit), the score decreases with the The core benefit of relying on a Bayesian Network is to have a amount of variance in P that is not in Q and can take negative model that exhibits the relevant theoretical aspects of Bayesian values. With this we define dAF(P, Q) = 1 − scoreR2(P, Q). analysis. To estimate the sensitivity of the abstraction scheme on The rationale of using scoreR2 as a basis for measuring the di- a given latent feature, we measure the impact of artificially per- vergence is that we can view the probability vectors for per- turbing the intervals representing the selected feature on the prob- turbed features as output by a model. Divergence will be large ability distribution represented by the BN. In this algorithm, the when the effect of the perturbation is significant, and small BN is already fitted using a training dataset, and the distribution it when the model is not (very) sensitive to the perturbation. represents does not change. distance dL1 dL2 dL∞ dJS dcorr dcos dMSE dRMSE dMAE dAF perturbed feature (1, 0) 150 0.726 0.009 56 0.224 0.142 0.114 0.000 000 879 0.000 937 0.000 249 0.278 (1, 1) 340 1.18 0.009 89 0.353 0.448 0.361 0.000 002 32 0.001 52 0.000 567 0.735 (2, 0) 325 1.09 0.009 46 0.365 0.332 0.267 0.000 001 98 0.001 41 0.000 541 0.625 (2, 1) 360 1.16 0.0103 0.393 0.395 0.323 0.000 002 24 0.001 50 0.000 600 0.710 (3, 0) 276 0.880 0.008 89 0.258 0.170 0.137 0.000 001 29 0.001 14 0.000 460 0.408 (3, 1) 315 1.07 0.009 60 0.324 0.318 0.264 0.000 001 92 0.001 39 0.000 525 0.608 Table 1: Example distance measures. 4.3 Feature Importance on feature (3, 1) when perturbing feature (2, 1). Although this im- We associate each extracted feature f with a weight wf based on pact is relatively small, we can (as expected) observe the depen- the set of measured sensitivity distances as follows: dencies between latent feature values of the BN model. However, the perturbations do not change the features’ probability for deeper edf layers, e.g., features of Layer 3 are not affected by the perturbation wf = P df (1) made on features of Layer 1, which is surprising. f ∈T e where T is the set of considered latent features. The soft-max 5.2 Sensitivity Analysis weighting in Eq. (1) acts as a normalisation function, i.e., it en- Let us now turn to our empirical assessment of the effectiveness of sures the sum of the feature components’ weights equals one. The the BN sensitivity analysis method in examining the behaviour of normalised importance weight for each feature is usually positively the latent features under perturbation. correlated with the respective probabilities distances. Datasets and Experimental Setup. We have selected two Example 3 Table 1 shows selected distance measures computed trained CNN models for our experiments: the first one targets the based on one experiment detailed in the next Section. Assuming the MNIST classification problem with 99.38% validation accuracy, dcorr distance is chosen to determine feature importance, and the second model targets the CIFAR-10 dataset with 81.00% feature (1, 1) is assigned the largest weight at 0.192, followed validation accuracy. The models are reasonably sized, with more by feature (2, 1) at 0.182, etc. than 15 layers including blocks of convolutional and max-pooling layers, followed by a series of dense layers. They have 312 000 and The importance weight for an extracted latent feature of a DNN’s 890 000 trainable parameters, respectively. layer may reflect some relevant amount of information/variance/ The Bayesian Network abstraction scheme accepts a wide range that the abstracted DNN uses at the considered layer. The current of feature extraction techniques and discretisation strategies. To abstraction scheme, however, does not relate latent features with explore their impact on our approach, we use a wide set of BN the DNNs’ decisions. Still, perturbing a specific part of the latent abstractions. We have selected two linear feature extraction tech- space and observing the implicit changes of the learning models’ niques: Principal Component Analysis (PCA) and Independent distribution contributes to understanding their internal decisions. Component Analysis (ICA), and one non-linear technique: radial basis functions (RBF) kernel-PCA. We also decided to fix the num- 5 Experiments ber of extracted features at three features per layer; this choice In this Section, we first illustrate the results of pairwise comparison of a relatively small number of hidden features enables us to use of latent features, and then turn to an empirical evaluation of the many intervals (5 or 10) for their discretisation while still obtain- sensitivity of BN abstractions at detecting distribution shift induced ing reasonably-sized probability tables. We applied both uniform- by adversarial examples. and quantile-based discretisation strategies, with or without the ad- dition of two left- and right-most intervals that do not contain any element of the training sample. Finally, we considered three hidden 5.1 Illustration of Pairwise Comparison layers to construct the BN abstractions: for the two models, the first We first concentrate on the BN given in examples so far. two selected layers directly follow a block of convolutions, while We report in Table 2 a pairwise comparison matrix for this ex- the last is a dense ReLU layer that is situated few layers before the ample, where we arrange the perturbed features in the first column NN’s output layer. The layers chosen criteria is based on a belief and compute their impact on each feature (i, j)’s probability ta- that the activation values at these layers capture relevant patterns bles. The numbers reported in this matrix represent the change in w.r.t the NN decisions. the probability values. For instance, our controlled perturbation of feature (2, 0) intervals has an impact on features (3, 0) and (3, 1) Example Distributions and Distances. We plot in Figure 5 values. More specifically, the MSE between the (3, 0) probability example distributions of probabilities in vectors obtained from a tables for feature (3, 0), given in Figure 4 (a) before and (b) after BN abstraction of the MNIST model. We have annotated each perturbing feature (2, 0), is 0.0113. one of these plots with various measures of distances between the reference probabilities Pref that is generated using a sample Discussion. Suppose we set the diagonal line to zeros since the from the training data set, and the respective six perturbed fea- change is made from the feature itself. In that case, we can observe tures probabilities Pf0 . The shown difference between these two that the perturbations are not affecting the probability of features probability distributions illustrates the internal change in the dis- from the previous layer (parent features) or the same layer as ex- tribution represented by the BN. For instance, when applying the pected. On the other hand, random shifting only influenced the im- random shift on the first feature that is extracted from the first se- mediate features in the next layer. The largest difference occurred lected layer i.e., perturbed feature (1,0), the calculated probability (1, 0) (1, 1) (2, 0) (2, 1) (3, 0) (3, 1) perturbed feature (1, 0) 0.003 01 0. 0.008 58 0.008 08 0. 0. (1, 1) 0. 0.002 57 0.007 83 0.008 65 0. 0. (2, 0) 0. 0. 0.0143 0. 0.0113 0.008 07 (2, 1) 0. 0. 0. 0.0102 0.008 89 0.0114 (3, 0) 0. 0. 0. 0. 0.0229 0. (3, 1) 0. 0. 0. 0. 0. 0.0161 Table 2: Example pairwise comparison matrix for six extracted features. Each cell describes the extent to which a feature (rows) affects the others (columns). Figure 5: Distributions of probabilities obtained from one example BN abstraction for each perturbed feature, for the MNIST model. Each plot shows respective distance measures w.r.t. the probabilities obtained from the BN for the clean (unperturbed) features (Pref ). distribution P 0 (1, 0), coloured with blue, shows a change on prob- cies) with sufficient precision to associate diverging probabilities abilistic causal relation that implies the change on the probability between “legitimate” inputs and adversarially perturbed ones. If represented by the BN. Hence, we can determine the safety viola- such is the case, we shall conclude that our abstraction scheme and tion risk by comparing an input probability belonging to the BN the associated BN are sufficiently precise to capture relevant de- probability distribution. pendencies in latent feature values that may not be matched (or matched too well, depending on the sign of the actual difference in Sensitivity to Adversarial Distribution Shift. We have probabilities) by some adversarial inputs. carried out a set of experiments to assess whether the set of three To carry out these experiments, we have selected the following features extracted for each considered hidden layer allows us to adversarial attacks: capture relevant properties of the learnt representations. In partic- fgsm is the Fast Gradient Sign Method of Goodfellow, Shlens, and ular, we wanted to check whether the BN abstraction allows us to Szegedy (2015); detect the shift in the distribution of inputs that occurs when the NN is subject to adversarial examples. In other words, we want pgdlinf and pgdl2 are the Projected Gradient Descent approach of to discover whether some distance measures indicate that the BN Madry et al. (2017) with L∞ and L2 norm, respectively; abstractions capture relevant latent features (and their dependen- cwlinf and cwl2 are Carlini and Wagner (2017)’s attack with L∞ p = L2 — fgsm p = L2 — pgdlinf p = L2 — pgdl2 p = L2 — cwlinf p = L2 — cwl2 p = L2 — deepfool uniform5 0.50 quantile5 uniform10 0.25 dp (Pr(Xtest ∈ B) , Pr(Xattack ∈ B)) quantile10 0.00 p = cos — fgsm p = cos — pgdlinf p = cos — pgdl2 p = cos — cwlinf p = cos — cwl2 p = cos — deepfool 0.5 0.0 p = AF — fgsm p = AF — pgdlinf p = AF — pgdl2 p = AF — cwlinf p = AF — cwl2 p = AF — deepfool 1.0 0.5 0.0 pca ica rbf kpca pca ica rbf kpca pca ica rbf kpca pca ica rbf kpca pca ica rbf kpca pca ica rbf kpca p = L2 — fgsm p = L2 — pgdlinf p = L2 — pgdl2 p = L2 — cwlinf p = L2 — cwl2 p = L2 — deepfool 2 dp (Pr(Xtest ∈ B) , Pr(Xattack ∈ B)) 0 p = cos — fgsm p = cos — pgdlinf p = cos — pgdl2 p = cos — cwlinf p = cos — cwl2 p = cos — deepfool 1.0 0.5 0.0 p = AF — fgsm p = AF — pgdlinf p = AF — pgdl2 p = AF — cwlinf p = AF — cwl2 p = AF — deepfool 10 0 pca ica rbf kpca pca ica rbf kpca pca ica rbf kpca pca ica rbf kpca pca ica rbf kpca pca ica rbf kpca feature extraction feature extraction feature extraction feature extraction feature extraction feature extraction Figure 6: Selected distances (vertical axes) between probability vectors obtained for the validation dataset (Pr(Xtest ∈ B)) and probability vectors (Pr(Xattack ∈ B)) obtained for datasets generated by selected adversarial attacks (shown on each column), for a range of BN abstractions B. The top (resp. bottom) three rows show results for the MNIST (resp. CIFAR10) model. Hue indicates the discretisation strategy and the number of intervals. The grey vertical lines show confidence intervals. and L2 norm, respectively, both targeting 0.1 confidence; Results and Discussion. Figure 6 shows our results for three selected distances L2 , cos, and AF . We give more detailed results deepfool is the DeepFool attack by Moosavi-Dezfooli, Fawzi, and in Appendix A. Each chart in the figure illustrates the calculated Frossard (2016). distances with four colours according to the discretisation method and the number of intervals in the vertical axis, using three sets Attacks involving the L∞ norm target a maximum of ε = 0.1 of feature extraction techniques (pca, ica, and rbf.kpca) in the hor- perturbation in the input images, whereas pgdl2 targets a maximum izontal axis. The used distance metric and attack type are shown perturbation ε = 10. at each chart’s top. First of all, we can observe that some combi- For each attack, we have generated an adversarial dataset nations of abstractions and distance measures exhibit notable dif- Xattack from the validation dataset Xtest for both the MNIST ferences between the validation dataset and the adversarial one for and CIFAR10 models, where each dataset consists of 10 000 some attacks. For instance, every distance shown allows us to mea- inputs. Then, for each attack and BN abstraction B built and sure a shift in input distribution for every attack, except Carlini and fit using 20 000 elements drawn from the respective training Wagner (2017)’s in some cases. Next, although the feature extrac- datasets, we measured a set of distances p between the vec- tion technique does not have a noticeable impact on any measured tors of probabilities Pr(Xtest ∈ B) and Pr(Xattack ∈ B), denoted distance, the discretisation strategy certainly plays a role in the abil- dp (Pr(Xtest ∈ B) , Pr(Xattack ∈ B)). ity of the BN to model each abstracted latent feature and their de- pendencies with sufficient precision. For example, in the first row Utility of Bayesian Network. As suggested, BN can be seen of the CIFAR-10 experiment (L2 distance), the distribution shift is as an abstraction of the original DNN. It is therefore imperative detected when using the uniform-based discretisation method with to understand how this abstraction may help in either analysing or five intervals (distance with blue color). enhancing the original DNN. In Berthier et al. (2021), test metrics Overall, the experimental results show that computing distances are designed over the BN by extending the MC/DC metrics pro- between two BN probability distributions, clean and perturbed by posed by Sun et al. (2019). As the next step, it would be interesting intervals-shift or adversarial attacks, can detect the distribution to understand if test case generation methods (Sun et al. 2018b), shift where it exists. We emphasise that, in the case of adversar- in particular the one based on symbolic computation (Sun, Huang, ial shift, this is measured based on the latent features only. Given and Kroening 2018), can also be extended to work with BNs. More- this empirically confirmed property, BN-based computation of fea- over, it will be useful to see if the generated test cases can be more ture importance appears to be one tool, which adds to the growing nature and diverse when comparing with those generated directly set of useful techniques for the detection of important features as on DNNs, as done in (Huang et al. 2021). well as of adversarial examples. What is more, it adds a semantic In addition to testing, it would also be interesting to see if such twist to this analysis and allows for explaining in which way the abstraction may bring any benefit to e.g., verification (Huang et al. changes in the features contribute to the distribution shift. 2017), interpretation of DNN training (Jin et al. 2020), explainable AI (Zhao et al. 2021c), and safety case (Zhao et al. 2020). For ex- ample, scalability is the key obstacle of DNN verification due to its 6 Discussions complexity (Ruan, Huang, and Kwiatkowska 2018). Considering In this section, we discuss a few aspects related to either the method that BN is significantly smaller than the original DNN, it will be we take or the potential application of the method. interesting to understand if BN can be used to alleviate the prob- lem without losing the provable guarantee. A potential difficulty Hyper-parameters in BN Construction. The parametric may be whether and how the verification result on the BN can be nature of the scheme advanced by Berthier et al. (2021) enables transferred to the DNN. the exploration of a wide range of DNN abstractions. For instance, Similar as the above discussion for testing and verification, the in our experiments, the sensitivity to adversarial distribution shift potential for the BN to be used as an intermediate step for the reli- is relied most on the linear dimensionality reduction techniques ability assessment (Zhao et al. 2021a) and safety case (Zhao et al. to extract latent features. We plan to conduct further experiments 2021b) is worthy of exploration. This may probably require a quan- with more non-linear feature extraction techniques, like manifold tification of the error, or the loss of information, when using BN as learning (Lee 2000), to assess the properties of extracted features in an abstraction of the DNN. extended cases. The effect of more advanced discretisation strate- gies can also be explored, for instance by relying on kernel density 7 Conclusions estimations to partition each latent feature component into inter- vals that span across ranges of the real line that are either densely In this study, we have advanced a novel technique that employs a or non-densely exercised by the training sample. BN abstraction to investigate how to measure the importance of high level features when they are used by the neural network to Hyper-parameters in Weight Quantification. There are a make classification decisions. In addition to the observed ability of number of building blocks in the weight quantification method (Al- detecting the distribution shifts before and after perturbation, this gorithm 1), including e.g., the perturbation made to generate new will open many doors for future exploration. For example, it will CPTs, the random shifting function, and the distance metrics for certainly be interesting to understand if the generated importance probabilities (Pref ) and (Pf0 ). In this paper, we have explored sev- values can support the explanation of the black-box learning model. eral different options of the distance metrics for a comparison. It It will also be useful if such importance values can be utilised to would also be useful to study if and how the other hyper-parameters improve the training process. may affect the overall results. Utility of Feature Weights. Quantifying the importance of References the hidden features provides three advantages. First, visualising the Bai, Y.; Zeng, Y.; Jiang, Y.; Xia, S.-T.; Ma, X.; and Wang, Y. most important features provides insight into the model’s internal 2021. Improving Adversarial Robustness via Channel-wise Acti- decisions by highlighting dominating regions in the feature space. vation Suppressing. In International Conference on Learning Rep- Second, we can use the importance measurement to design high- resentations. level testing metrics that evaluate the robustness of the DNN. Some Berthier, N.; Alshareef, A.; Sharp, J.; Schewe, S.; and Huang, X. attempts have been made in Berthier et al. (2021), where no feature 2021. Abstraction and Symbolic Execution of Deep Neural Net- weight is taken into consideration. works with Bayesian Approximation of Hidden Features. arXiv Third, with FI as a defence, we can utilise the obtained impor- preprint arXiv:2103.03704. tance in the training process and force the DNN to adjust its param- Carlini, N.; and Wagner, D. 2017. Towards Evaluating the Robust- eters according to the features that are most relevant for the predic- ness of Neural Networks. In 2017 IEEE Symposium on Security tion. This direction is the most widely adopted strategy. For exam- and Privacy (SP), 39–57. IEEE Computer Society. ple, Zhang et al. (2021) propose a hierarchical feature alignment method that computes the difference between clean and adversarial Castillo, E.; Gutiérrez, J. M.; and Hadi, A. S. 1997. Sensitivity feature representations and utilises it as a loss function when opti- analysis in discrete Bayesian networks. IEEE Transactions on Sys- mising network parameters, while Bai et al. (2021) suggest that tems, Man, and Cybernetics-Part A: Systems and Humans, 27(4): different channels of a DNN’s intermediate layers contribute dif- 412–423. ferently to a specific class prediction and propose a Channel-wise Daxberger, E.; Nalisnick, E.; Allingham, J. U.; Antorán, J.; and Activation Suppressing training technique that learns the channel Hernández-Lobato, J. M. 2021. Bayesian deep learning via subnet- importance, and leverages them to suppress the channel activation work inference. In International Conference on Machine Learning, while training the network. 2510–2521. PMLR. Goodfellow, I. J.; Shlens, J.; and Szegedy, C. 2015. Explaining and Zhang, X.; Wang, J.; Wang, T.; Jiang, R.; Xu, J.; and Zhao, L. 2021. Harnessing Adversarial Examples. arXiv:1412.6572. Robust feature learning for adversarial defense via hierarchical fea- Hamdi, A.; and Ghanem, B. 2020. Towards analyzing semantic ture alignment. Information Sciences, 560: 256–270. robustness of deep neural networks. In European Conference on Zhao, X.; Banks, A.; Sharp, J.; Robu, V.; Flynn, D.; Fisher, M.; Computer Vision, 22–38. Springer. and Huang, X. 2020. A Safety Framework for Critical Systems Huang, W.; Sun, Y.; Zhao, X.; Sharp, J.; Ruan, W.; Meng, J.; and Utilising Deep Neural Networks. In SafeComp2020, 244–259. Huang, X. 2021. Coverage-Guided Testing for Recurrent Neural Zhao, X.; Huang, W.; Banks, A.; Cox, V.; Flynn, D.; Schewe, S.; Networks. IEEE Transactions on Reliability, 1–16. and Huang, X. 2021a. Assessing the Reliability of Deep Learning Huang, X.; Kroening, D.; Ruan, W.; Sharp, J.; Sun, Y.; Thamo, E.; Classifiers Through Robustness Evaluation and Operational Pro- Wu, M.; and Yi, X. 2020. A survey of safety and trustworthiness of files. In AISafety. deep neural networks: Verification, testing, adversarial attack and Zhao, X.; Huang, W.; Bharti, V.; Dong, Y.; Cox, V.; Banks, defence, and interpretability. Computer Science Review, 37. A.; Wang, S.; Schewe, S.; and Huang, X. 2021b. Reliabil- Huang, X.; Kwiatkowska, M.; Wang, S.; and Wu, M. 2017. Safety ity Assessment and Safety Arguments for Machine Learning Verification of Deep Neural Networks. In International Conference Components in Assuring Learning-Enabled Autonomous Systems. on Computer Aided Verification, 3–29. Springer. arXiv:2112.00646. Ilyas, A.; Santurkar, S.; Tsipras, D.; Engstrom, L.; Tran, B.; and Zhao, X.; Huang, X.; Robu, V.; and Flynn, D. 2021c. BayLIME: Madry, A. 2019. Adversarial Examples Are Not Bugs, They Are Bayesian Local Interpretable Model-Agnostic Explanations. In Features. In Wallach, H.; Larochelle, H.; Beygelzimer, A.; d'Alché- UAI. Buc, F.; Fox, E.; and Garnett, R., eds., Advances in Neural Infor- mation Processing Systems, volume 32. Curran Associates, Inc. A Detailed Results for Sensitivity to Jin, G.; Yi, X.; Zhang, L.; Zhang, L.; Schewe, S.; and Huang, X. Adversarial Shift Experiments 2020. How does Weight Correlation Affect the Generalisation Ability of Deep Neural Networks. Advances in Neural Informa- We have plotted in Figure 6 some statistics for a subset of the dis- tion Processing Systems. tances we have considered for comparing probability vectors. Fig- ures 7, 8, and 9 show the distances computed for the MNIST model, Lee, J. 2000. A global geometric framework for non-linear dimen- and Figures 10, 11, and 12 show the results for the CIFAR10 model. sionality reduction. In Proceedings of the 8th European symposium In these plots, hue still indicates the discretisation strategy. How- on artificial neural networks, 2000, volume 1, 13–20. ever, we have discriminated between extended and non-extended Madaan, D.; Shin, J.; and Hwang, S. J. 2020. Adversarial neu- strategies: the prefix ‘-X’ denotes that latent features are discretised ral pruning with latent vulnerability suppression. In International in such a way that left- and right-most intervals do not contain any Conference on Machine Learning, 6575–6585. PMLR. (projected) training sample. Madry, A.; Makelov, A.; Schmidt, L.; Tsipras, D.; and Vladu, A. 2017. Towards Deep Learning Models Resistant to Adversarial Attacks. arXiv:1706.06083. Miller, T. 2019. Explanation in artificial intelligence: Insights from the social sciences. Artificial intelligence, 267: 1–38. Moosavi-Dezfooli, S.-M.; Fawzi, A.; and Frossard, P. 2016. Deep- fool: a simple and accurate method to fool deep neural networks. In Proceedings of the IEEE conference on computer vision and pat- tern recognition, 2574–2582. Pei, K.; Cao, Y.; Yang, J.; and Jana, S. 2017. Deepxplore: Auto- mated whitebox testing of deep learning systems. In proceedings of the 26th Symposium on Operating Systems Principles, 1–18. Ruan, W.; Huang, X.; and Kwiatkowska, M. 2018. Reachability Analysis of Deep Neural Networks with Provable Guarantees. In IJCAI, 2651–2659. Sun, Y.; Huang, X.; and Kroening, D. 2018. Testing Deep Neural Networks. CoRR, abs/1803.04792. Sun, Y.; Huang, X.; Kroening, D.; Sharp, J.; Hill, M.; and Ash- more, R. 2018a. Testing deep neural networks. arXiv preprint arXiv:1803.04792. Sun, Y.; Huang, X.; Kroening, D.; Sharp, J.; Hill, M.; and Ash- more, R. 2019. Structural Test Coverage Criteria for Deep Neural Networks. ACM Trans. Embed. Comput. Syst., 18(5s). Sun, Y.; Wu, M.; Ruan, W.; Huang, X.; Kwiatkowska, M.; and Kroening, D. 2018b. Concolic Testing for Deep Neural Networks. In ASE, 109–119. Xu, Q.; Tao, G.; Cheng, S.; and Zhang, X. 2021. Towards Feature Space Adversarial Attack by Style Perturbation. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 35, 10523– 10531. measure p = L1 — feat. extr.: pca measure p = L1 — feat. extr.: ica measure p = L1 — feat. extr.: rbf kpca discretization dp (Pr(Xtest ∈ B) , Pr(Xattack ∈ B)) 40 uniform5 uniform5-X quantile5 quantile5-X 30 uniform10 uniform10-X quantile10 20 quantile10-X 10 0 measure p = L2 — feat. extr.: pca measure p = L2 — feat. extr.: ica measure p = L2 — feat. extr.: rbf kpca 0.8 dp (Pr(Xtest ∈ B) , Pr(Xattack ∈ B)) 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0 measure p = L∞ — feat. extr.: pca measure p = L∞ — feat. extr.: ica measure p = L∞ — feat. extr.: rbf kpca dp (Pr(Xtest ∈ B) , Pr(Xattack ∈ B)) 0.035 0.030 0.025 0.020 0.015 0.010 0.005 0.000 fgsm fgsm fgsm cwl2 cwl2 cwl2 pgdl2 pgdl2 pgdl2 deepfool deepfool deepfool cwlinf cwlinf cwlinf pgdlinf pgdlinf pgdlinf attack attack attack Figure 7: Distances (vertical axes) between probability vectors obtained for the MNIST validation dataset (Pr(Xtest ∈ B)) and probability vectors (Pr(Xattack ∈ B)) obtained for datasets generated by selected adversarial attacks (attack, shown on the horizontal axes), for a range of BN abstractions B. Every abstraction involves 3 layers for which 3 features have been extracted using PCA (left-hand side column), ICA (middle), or radial basis functions (RBF) kernel-PCA (right). Plotted data aggregates five independent runs, and shows confidence intervals. measure p = corr — feat. extr.: pca measure p = corr — feat. extr.: ica measure p = corr — feat. extr.: rbf kpca discretization dp (Pr(Xtest ∈ B) , Pr(Xattack ∈ B)) 1.0 uniform5 uniform5-X 0.8 quantile5 quantile5-X uniform10 0.6 uniform10-X quantile10 quantile10-X 0.4 0.2 0.0 measure p = cos — feat. extr.: pca measure p = cos — feat. extr.: ica measure p = cos — feat. extr.: rbf kpca dp (Pr(Xtest ∈ B) , Pr(Xattack ∈ B)) 0.8 0.6 0.4 0.2 0.0 measure p = JS — feat. extr.: pca measure p = JS — feat. extr.: ica measure p = JS — feat. extr.: rbf kpca dp (Pr(Xtest ∈ B) , Pr(Xattack ∈ B)) 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0 fgsm fgsm fgsm cwl2 cwl2 cwl2 pgdl2 pgdl2 pgdl2 deepfool deepfool deepfool cwlinf cwlinf cwlinf pgdlinf pgdlinf pgdlinf attack attack attack Figure 8: See Figure 7. ×10 measure −5 p = MSE — feat. extr.: pca measure p = MSE — feat. extr.: ica measure p = MSE — feat. extr.: rbf kpca 6 discretization dp (Pr(Xtest ∈ B) , Pr(Xattack ∈ B)) uniform5 5 uniform5-X quantile5 4 quantile5-X uniform10 uniform10-X 3 quantile10 quantile10-X 2 1 0 measure p = RMSE — feat. extr.: pca measure p = RMSE — feat. extr.: ica measure p = RMSE — feat. extr.: rbf kpca 0.008 dp (Pr(Xtest ∈ B) , Pr(Xattack ∈ B)) 0.007 0.006 0.005 0.004 0.003 0.002 0.001 0.000 measure p = MAE — feat. extr.: pca measure p = MAE — feat. extr.: ica measure p = MAE — feat. extr.: rbf kpca dp (Pr(Xtest ∈ B) , Pr(Xattack ∈ B)) 0.004 0.003 0.002 0.001 0.000 measure p = AF — feat. extr.: pca measure p = AF — feat. extr.: ica measure p = AF — feat. extr.: rbf kpca 1.4 dp (Pr(Xtest ∈ B) , Pr(Xattack ∈ B)) 1.2 1.0 0.8 0.6 0.4 0.2 0.0 fgsm fgsm fgsm cwl2 cwl2 cwl2 pgdl2 pgdl2 pgdl2 deepfool deepfool deepfool cwlinf cwlinf cwlinf pgdlinf pgdlinf pgdlinf attack attack attack Figure 9: See Figure 7. measure p = L1 — feat. extr.: pca measure p = L1 — feat. extr.: ica measure p = L1 — feat. extr.: rbf kpca discretization dp (Pr(Xtest ∈ B) , Pr(Xattack ∈ B)) uniform5 250 uniform5-X quantile5 200 quantile5-X uniform10-X uniform10 150 quantile10 quantile10-X 100 50 0 measure p = L2 — feat. extr.: pca measure p = L2 — feat. extr.: ica measure p = L2 — feat. extr.: rbf kpca 4.0 dp (Pr(Xtest ∈ B) , Pr(Xattack ∈ B)) 3.5 3.0 2.5 2.0 1.5 1.0 0.5 0.0 measure p = L∞ — feat. extr.: pca measure p = L∞ — feat. extr.: ica measure p = L∞ — feat. extr.: rbf kpca dp (Pr(Xtest ∈ B) , Pr(Xattack ∈ B)) 0.07 0.06 0.05 0.04 0.03 0.02 0.01 0.00 fgsm fgsm fgsm cwl2 cwl2 cwl2 pgdl2 pgdl2 pgdl2 deepfool deepfool deepfool cwlinf cwlinf cwlinf pgdlinf pgdlinf pgdlinf attack attack attack Figure 10: Distances (vertical axes) between probabilities obtained for the CIFAR10 validation dataset and datasets generated by selected adversarial attacks (horizontal axes). See Figure 7 for further details. measure p = corr — feat. extr.: pca measure p = corr — feat. extr.: ica measure p = corr — feat. extr.: rbf kpca dp (Pr(Xtest ∈ B) , Pr(Xattack ∈ B)) 1.0 discretization 0.8 uniform5 uniform5-X quantile5 0.6 quantile5-X uniform10-X 0.4 uniform10 quantile10 quantile10-X 0.2 0.0 measure p = cos — feat. extr.: pca measure p = cos — feat. extr.: ica measure p = cos — feat. extr.: rbf kpca 1.0 dp (Pr(Xtest ∈ B) , Pr(Xattack ∈ B)) 0.8 0.6 0.4 0.2 0.0 measure p = JS — feat. extr.: pca measure p = JS — feat. extr.: ica measure p = JS — feat. extr.: rbf kpca dp (Pr(Xtest ∈ B) , Pr(Xattack ∈ B)) 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0 fgsm fgsm fgsm cwl2 cwl2 cwl2 pgdl2 pgdl2 pgdl2 deepfool deepfool deepfool cwlinf cwlinf cwlinf pgdlinf pgdlinf pgdlinf attack attack attack Figure 11: See Figure 10. measure p = MSE — feat. extr.: pca measure p = MSE — feat. extr.: ica measure p = MSE — feat. extr.: rbf kpca dp (Pr(Xtest ∈ B) , Pr(Xattack ∈ B)) 0.00175 discretization uniform5 0.00150 uniform5-X quantile5 0.00125 quantile5-X uniform10-X 0.00100 uniform10 quantile10 0.00075 quantile10-X 0.00050 0.00025 0.00000 measure p = RMSE — feat. extr.: pca measure p = RMSE — feat. extr.: ica measure p = RMSE — feat. extr.: rbf kpca 0.040 dp (Pr(Xtest ∈ B) , Pr(Xattack ∈ B)) 0.035 0.030 0.025 0.020 0.015 0.010 0.005 0.000 measure p = MAE — feat. extr.: pca measure p = MAE — feat. extr.: ica measure p = MAE — feat. extr.: rbf kpca dp (Pr(Xtest ∈ B) , Pr(Xattack ∈ B)) 0.025 0.020 0.015 0.010 0.005 0.000 measure p = AF — feat. extr.: pca measure p = AF — feat. extr.: ica measure p = AF — feat. extr.: rbf kpca dp (Pr(Xtest ∈ B) , Pr(Xattack ∈ B)) 20 15 10 5 0 fgsm fgsm fgsm cwl2 cwl2 cwl2 pgdl2 pgdl2 pgdl2 deepfool deepfool deepfool cwlinf cwlinf cwlinf pgdlinf pgdlinf pgdlinf attack attack attack Figure 12: See Figure 10.