Bayesian Model for Trustworthiness Analysis of Deep Learning Classifiers Andrey Morozov1∗ , Emil Valiev2 , Michael Beyer2 Kai Ding3 , Lydia Gauerhof4 , Christoph Schorn4 1 Institute of Industrial Automation and Software Engineering, University of Stuttgart, Germany 2 Institute of Automation, Technische Universität Dresden, Germany 3 Bosch (China) Investment Ltd., Corporate Research, Shanghai, China 4 Robert Bosch GmbH, Corporate Research, Renningen, Germany andrey.morozov@ias.uni-stuttgart.de, {emil.valiev, michael.beyer3}@mailbox.tu-dresden.de, kai.ding@cn.bosch.com, {lydia.gauerhof, christoph.schorn}@de.bosch.com Abstract A DL component is simply a piece of software deployed on a standard computing unit. For example, a traffic-sign recog- In the near future, Artificial Intelligence methods nition module of a car receives images from a front camera, will inevitably enter safety-critical areas. Deep detects, and classifies road signs [Beyer et al., 2019]. Such a Learning software, deployed on standard comput- system is prone to several types of random hardware faults, ing hardware, is prone to random hardware faults including bit flips that can occur in RAM or CPU of the such as bit flips that can result in silent data cor- computing unit. Bit flips may result in silent data corrup- ruption. We have performed fault injection ex- tion and affect classification accuracy, as shown in [Beyer et periments on three Convolution Neural Network al., 2020], [Li et al., 2018]. There are even specific Bit-Flip (CNN) image classifiers, including VGG16 and Attack methods that intentionally cause misclassification by VGG19. Besides the fact that the bit flips indeed flipping a small number of bits in RAM, where the weights of drop the classification accuracy, we have observed the network are stored [Rakin et al., 2019] [Liu et al., 2017]. that these faults result not in random misclassifica- This phenomenon can be investigated with Fault Injection tion but tend to particular erroneous sets of classes. (FI) experiments using methods and tools discussed in Sec- This fact shall be taken into account to design a tion 2. We have performed such experiments on three Con- reliable and safe system. For example, we might volution Neural Network (CNN) image classifiers described consider re-running the classifier if it yields a class in Section 3. Besides the fact that the bit flips indeed drop for such an erroneous set. This paper discusses the classification accuracy, we have made another interesting the results of our fault injection experiments and observation: The injection of a random bit-flip in an output of introduces a new Bayesian Network (BN) model a particular CNN layer results not in random misclassifica- that aggregates these results and enables numeri- tion, but tends to a particular set of image classes. For some cal evaluation of the performance of the CNNs un- layers, especially for several first layers, these sets are very der the influence of random hardware faults. We distinctive. Examples are shown in Figures 2 and 3. A simi- demonstrate the application of the developed BN lar observation was mentioned in [Liu et al., 2017], where the model for the trustworthiness analysis. In particu- classes from such sets are called the sink classes. lar, we show how to evaluate the misclassification This fact has potential practical value and should be taken probabilities for each resulting class, for the vary- into account during the reliable and safe design of systems ing probability of random bit-flips. that include DL-components. • First and the most obvious, if the provided classification result belongs to such a sink set, then we might consider 1 Introduction re-running the classifier. The majority of the high-tech industrial areas already exploit • Second, since these sink sets are different for different Artificial Intelligence (AI) methods, including deep learning CNN layers, we estimate possible fault location, and, techniques. Presumably, in the next few years, the safety cer- for example, re-run the network partially, starting from tification challenges of AI components will be overcome, and the potentially faulty layer to reduce computational over- Deep Learning (DL) will enter safety-critical domains such head. as transportation, robotics, and healthcare. • Third, if several classification results in a row belong ∗ Contact Author. Copyright c 2020 for this paper by its au- to a sink set, then we can assume a ”hard” error, e.g., thors. Use permitted under Creative Commons License Attribution permanent stuck-at one or stuck-at zero in RAM where 4.0 International (CC BY 4.0). the data of a particular CNN layer is stored. Figure 1: Working principle of the fault injection framework InjectTF2. The layer selected for injection is shown in orange. Source code available at https://github.com/mbsa-tud/InjectTF2. Contribution: This paper presents the results of the dis- InjectTF2 performs fault injection experiments in an auto- cussed FI experiments. In particular, it shows several exam- mated way and logs the results. The model splitting principle ples of sink sets for the layers of VGG16 and VGG19. The sketched in Figure 1, drastically reduces the execution time complete results of the FI experiments are available online. of the experiments, since the network is not executed from Based on these experiments, we have developed and fed a bottom to top each time, but only after the layer where the Bayesian Network (BN) model that enables numerical eval- faults are injected. uation of the performance of the CNNs under the influence In this paper, we are focused on random faults. However, of random hardware faults. The paper provides a formal de- there are also methodologies to evaluate the impact of per- scription of this BN model and demonstrates its application manent faults, like the one presented in [Bosio et al., 2019]. for the trustworthiness analysis of the classification results. Besides that, there are other methods for performance and re- The paper shows how to evaluate the misclassification proba- liability analysis of deep neural networks that help to improve bilities for each resulting class, for the varying probability of fault tolerance. A specific fault-injection method for the random bit-flips. neural networks deployed in FPGAs and further algorithm- based fault tolerance and selective triplicating of the most 2 State of the Art critical layers [Libano, 2018]. An efficient bit-flip resilience optimization method for deep neural networks is presented A good overview of the current research effort on making [Schorn et al., 2019]. deep learning neural networks safe and trustworthy is given in [Huang et al., 2018]. The authors surveyed the methods 3 Fault Injection Experiments for verification, testing, adversarial attack and defense, and interpretability. 3.1 CNNs and Datasets In most cases, neural networks are treated as black boxes. We performed experiments on three different neural net- Therefore, at the moment, the most straightforward analysis works. The architectures and layer output dimensions of the methods are based on fault injection campaigns. The formal networks are listed in Table 1. verification methods are less found. The first is a self-developed simple CNN, which consists of Several tools enable automated fault injection into the neu- 12 layers and follows common design principles. The ReLU ral networks. For example, TensorFI [Li et al., 2018] and activation function is used throughout the network, excluding InjectTF [Beyer et al., 2019] support the first version of Ten- the last layer that uses the Softmax activation function. sorFlow. The experiments discussed in this paper were car- This CNN has been trained on an augmented German ried out in the TensorFlow V2 environment. Therefore we Traffic Sign Recognition (GTSRB) [Stallkamp et al., 2012] have used InjectTF2 [Beyer et al., 2020] that was devel- dataset and can classify road signs with an accuracy of ap- oped for TensorFlow V2. Figure 1 shows the main working proximately 96 %. The dataset is split into three subsets for principle of the InjectTF2. The tool allows the layer-wise training, testing, and validation. The subsets contain 34 799, fault injection. InjectTF2 takes a trained neural network, a 12 630, and 4 410 images. Each image has 32 × 32 RGB dataset, and a configuration file as inputs. The network and pixels and belongs to one of 43 classes of road signs. In or- the dataset should be provided as a HDF5 model and a Ten- der to ensure a uniform classification performance across all sorFlow dataset. In the configuration file, the user can specify classes, the training dataset has been normalized, augmented, the fault type and fault injection probability for each layer of and balanced. The augmentation is done by adding copies the neural network under test. Currently supported fault types of images with zoom, rotation, shear, brightness disturbance, are (i) a random bit flip or (ii) a specified bit flip of a random and gaussian noise to the dataset. The augmented training value of a layer’s output. subset contains 129 100 images. (a) VGG16: Fault free run. (b) VGG16: Fault injection in Layer 3. The sink classes are highlighted in red. (c) VGG16: Fault injection in Layer 7. The sink classes are highlighted in red. Figure 2: Results of the fault injection experiments on VGG16 for the ImageNet dataset (1000 classes). (a) VGG19: Fault free run. (b) VGG19: Fault injection in Layer 3. The sink classes are highlighted in red. (c) VGG19: Fault injection in Layer 10. The sink classes are highlighted in red. Figure 3: Results of the fault injection experiments on VGG19 for the ImageNet dataset (1000 classes). Table 1: The layer structure of the CNNs with output dimensions. The second and third networks are the pre-trained Tensor- Flow VGG16 and VGG19 [Simonyan and Zisserman, 2014]. They are trained on the ImageNet dataset [Russakovsky et # Custom CNN VGG16 VGG19 al., 2015]. In the experiments, a random sample of 5000 im- Conv Conv Conv ages from the 2012 ImageNet testing subset has been used. 1 The images belong to 1000 different classes and consist of (32 × 32 × 32) (224 × 224 × 64) (224 × 224 × 64) Conv Conv Conv 224 × 224 RGB pixels. 2 (32 × 32 × 32) (224 × 224 × 64) (224 × 224 × 64) MaxPool MaxPool MaxPool 3.2 Results 3 (16 × 16 × 32) (112 × 112 × 64) (112 × 112 × 64) Six examplar bar plots in Figures 2 and 3 describe the clas- Dropout Conv Conv sification results for VGG16 and VGG19. The plots display 4 (16 × 16 × 32) (112 × 112 × 128) (112 × 112 × 128) how many images from the input datasets are classified into Conv Conv Conv each of the 1000 classes. 5 (16 × 16 × 64) (112 × 112 × 128) (112 × 112 × 128) The first (top) plots in both figures show the distributions Conv MaxPool MaxPool without faults. The images are distributed more or less uni- 6 (16 × 16 × 64) (56 × 56 × 128) (56 × 56 × 128) formly over the classes. The other two plots in each fig- MaxPool Conv Conv ure show the distributions after the faults injected into spe- 7 cific layers. Precisely, into the layers three and seven of (8 × 8 × 64) (56 × 56 × 256) (56 × 56 × 256) Dropout Conv Conv VGG16 and layers three and ten of VGG19. These lay- 8 ers are also highlighted in bold in Table 1. Similar plots (8 × 8 × 64) (56 × 56 × 256) (56 × 56 × 256) Flatten Conv Conv for other layers, together with the raw data, are available at 9 https://github.com/mbsa-tud/InjectTF2. (4096) (56 × 56 × 256) (56 × 56 × 256) Dense MaxPool Conv For each layer, we have carried out 100 fault injection ex- 10 periments. In each experiment, we flip a random bit of a ran- (256) (28 × 28 × 256) (56 × 56 × 256) Dropout Conv MaxPool dom output value of the corresponding layer. The bar plots 11 represent the average for these 100 experiments. (256) (28 × 28 × 512) (28 × 28 × 256) Dense Conv Conv In the plots, we can observe several distinctive peaks. 12 These peaks reveal that the VGGs tend to erroneously classify (43) (28 × 28 × 512) (28 × 28 × 512) Conv Conv images into these sink classes after the fault injections. The 13 peaks are located differently for the presented layers. Note (28 × 28 × 512) (28 × 28 × 512) that the peaks are different also for the third layers of VGG16 MaxPool Conv 14 (14 × 14 × 512) (28 × 28 × 512) and VGG19. However, for several layers, especially from the same VGG Conv Conv 15 (14 × 14 × 512) (28 × 28 × 512) blocks, the peaks are quite similar. We also observed that such peaks appear only in the first layers, and the misclassifi- Conv MaxPool cation became more random if we inject faults into the more 16 (14 × 14 × 512) (14 × 14 × 512) in-depth layers. Note that the peaks for the seventh and tenth Conv Conv layers are lower than the peaks of the third layer. The peaks 17 (14 × 14 × 512) (14 × 14 × 512) are distinctive for the first 11 layers of VGG 16 and the first 12 MaxPool Conv layers of VGG19. After that, the distribution becomes more 18 (7 × 7 × 512) (14 × 14 × 512) or less uniform. Flatten Conv 19 (25088) (14 × 14 × 512) Dense Conv 4 Trustworthiness analysis 20 (4096) (14 × 14 × 512) The experimental results discussed above enable numerical Dense MaxPool evaluation of the performance of the CNNs under the influ- 21 (4096) (7 × 7 × 512) ence of random hardware faults. For instance, we can statis- Dense Flatten tically evaluate the probability of misclassification for each 22 (1000) (25088) resulting image class. This probability is higher for the sink Dense classes than for other classes. For this purpose, we use a 23 Bayesian Network (BN) model fed with the statistical results (4096) Dense of the fault injection experiments. 24 (4096) Dense 4.1 Formal Model of the Classifier 25 (1000) The BN is defined using a formal set-based model of a classifier that is shown in Figure 4. This model is based on three sets, two functions, and three random variables. Set of images: Set of layers: Set of classes: 4.2 Bayesian Network CNN classifier Input image Resulting class (random var) (random var) Input image Bit flip Other faults A bit flip in one layer or no bit flip (random var) Resulting class Figure 4: Formal set-based model of a classifier. Sets: I = {i1 , i2 , ..., iNI } - Input images. C = {c1, c2, ..., cNC } - Result classes. L = {l1 , l2 , ..., lNL } - Layers of the CNN. Functions: g : I × C → {1, 0} - Formalization of the results of the fault-free run, g(i, c) = 1 if image i is classified as class c, and g(i, c) = 0 otherwise. For simplicity, we assume that classification is always correct. fkl : I × C → {1, 0} - Formalization of the results of Figure 5: The Bayesian network and conditional probability tables. the FI experiments, fkl (i, c) = 1 if image i is classified as class c in the k th FI experiment, the faults are injected in layer l ∈ L, fkl (i, c) = 0 otherwise. A BN is a graphical formalism for representing joint prob- ability distributions [Pearl, 1985]. It is a probabilistic directed acyclic graph that represents a set of variables and their condi- Random variables: tional dependencies. Each node is defined with a Conditional Probability Table (CPT). Our BN describes the conditional I ∈ I - Current input image. An independent discrete random probabilities of C. Figure 5 shows the BN and the CPTs of variable. For simplicity, we assume that the probability that the three random variables. The CPTs of independent vari- an image from I is the input image is equal for all images: ables I and B define constant probabilities for each outcome. P (I = i) = 1/NI , ∀i ∈ I. Otherwise the distribution should The outcome of C depends on the outcomes of I and B. So be specified statistically. the probabilities are defined for each combination of the out- comes of I and B. B ∈ {0 none0 } ∪ L - No bit flip or a bit flip in a partic- The CPT of C is divided into two parts. The upper part ular layer. An independent discrete random variable. The describes the situation without bit flips. We assumed per- value 0 none0 means that there was no bit flip during the fect classification. Thus, this part consists just of zeroes run. A value l ∈ L means that it was a bit flip in layer l. and ones. The ones indicate the correct classes for each im- We assume, that only a single bit flip can happen during age. Mathematically we represent this using the function g: the run. The distribution is defined according to a variable pi,0 none0 ,c = g(i, c). The bottom part describes the situa- plk that defines a probability of a bit flip in a layer lk . tion when bit flips occur in corresponding layers. Here we For the simplicity we will apply the same probability p statistically approximate the probabilities using the results of for each layer. The outcome 0 none0 is defined as the com- our fault injection experiments. Mathematically we represent PNL plement of all other events: P (B =0 none0 ) = 1− k=1 plk . this using the function f . Each probability is estimated as the number of times when i was classified as c divided by the to- C ∈ C - Resulting class. A discrete random variable tal numberP of the fault injection experiments ofr the layer l: that depends on I and B. pi,l,c = k fkl (i, c)/k. 4.3 Quantification The BN stores the results of the fault injection experiments in a structured way. This allows the analysis of various reliability-related properties. From the general cumulative probability of misclassification to the specific probabilities of the wrong classification of a particular input image because of a bit flip in a particular layer. Moreover, other kinds of ran- dom faults and their combinations can be taken into account, as shown in Figure 5 with the dashed lines. As an example, we show how to quantify the trustworthi- ness for each resulting class. We define the trustworthiness as a kind of inverse probability, the probability that the resulting class c is the correct class for the input image i, taking into account the possibility of a bit flip in any layer. Ic is a sub- set of images that belong to the class c: Ic ⊂ I : i ∈ Ic if f (i, c) == 1, i ∈ I, c ∈ C. Then, the trustworthiness of class c is the conditional probability P (I ∈ Ic ∩ B ∈ B|C = c). Applying first the formula for conditional probability (Kol- mogorov definition) and then the law of total probability, we obtain the following expression. P (I ∈ Ic ∩ B ∈ B|C = c) = P (C = c ∩ I ∈ Ic ∩ B ∈ B) = = P (C = c) P P P (C = c|B = b ∩ I = i)P (B = b)P (I = i) Figure 6: Misclassification probabilities for the varying probability i∈Ic b∈B of a bit flip for the image classes of the custom CNN. = P (C = c|B = b0 ∩ I = i0 )P (B = b0 )P (I = i0 ) P P i0 ∈I b0 ∈B Where, P (I = i) is from the CPT of I, P (B = b) is from the CPT of B, and P (C = c|B = b ∩ I = i) is from the CPT of C. In the numerator of the fraction, we sum up only for i from Ic and for all i from I in the denominator. Ba- sically, we compute the ratio of correct classifications to all classification. In our experiments, we computed the probabilities with our self-developed scripts. However, probabilistic analytical soft- ware libraries, like pomegranate [Schreiber, 2018], allow effi- cient and scalable methods for computation of Bayesian net- works. 4.4 Results Figures 6, 7, and 8 show misclassifcation probabilities for each class. These probabilities are computed as one minus the trustworthiness. The probabilities of bit flips vary from 10−7 to 1 for the custom CNN and from 10−11 to 10−4 for the VGG16 and VGG19. Ten classes with the highest misclassi- fication probabilities are highlighted with colors (sorted using the results obtained for the probabilities of bit flip 10−9 ). The remaining classes are shown in grey. Based on the estimated bit flip probabilities, we can decide whether we trust the classification result or not. Moreover, from the safety point of view, the misclassification for some classes might be more hazardous than for the others. For instance, it might be more critical to confuse the stop sign with the main road sign than to confuse speed limits 30 and Figure 7: Misclassification probabilities for the varying probability of a bit flip for the image classes of the VGG16. 50. Such cases can be easily quantified using the proposed Bayesian model. They could also lead to re-training regard- ing the classes with the lowest trustworthiness. tion. This will be updated in the final version of the paper., pages xxx–xxx, XXX, XXX, XXX 2020. XXX. [Bosio et al., 2019] A. Bosio, P. Bernardi, A. Ruospo, and E. Sanchez. A reliability analysis of a deep neural network. In 2019 IEEE Latin American Test Symposium (LATS), pages 1–6, 2019. [Huang et al., 2018] Xiaowei Huang, Daniel Kroening, Wenjie Ruan, James Sharp, Youcheng Sun, Emese Thamo, Min Wu, and Xinping Yi. A survey of safety and trustworthiness of deep neural networks. arXiv preprint arXiv:1812.08342, 2018. [Li et al., 2018] Guanpeng Li, Karthik Pattabiraman, and Nathan DeBardeleben. Tensorfi: A configurable fault in- jector for tensorflow applications. In 2018 IEEE Inter- national Symposium on Software Reliability Engineering Workshops (ISSREW), pages 313–320. IEEE, 2018. [Libano, 2018] Fabiano Pereira Libano. Reliability analysis of neural networks in fpgas, 2018. [Liu et al., 2017] Y. Liu, L. Wei, B. Luo, and Q. Xu. Fault in- jection attack on deep neural network. In 2017 IEEE/ACM International Conference on Computer-Aided Design (IC- Figure 8: Misclassification probabilities for the varying probability CAD), pages 131–138, 2017. of a bit flip for the image classes of the VGG19. [Pearl, 1985] Judea Pearl. Bayesian netwcrks: A model cf self-activated memory for evidential reasoning. In Pro- 5 Conclusion ceedings of the 7th Conference of the Cognitive Science Society, University of California, Irvine, CA, USA, pages A series of fault injection experiments on several CNN-based 15–17, 1985. classifiers have shown that random hardware faults result not [Rakin et al., 2019] Adnan Siraj Rakin, Zhezhi He, and in random misclassification but tend to misclassify the input Deliang Fan. Bit-flip attack: Crushing neural network with images into specific distinctive sets of classes. These sets progressive bit search. In Proceedings of the IEEE Interna- are different for functionally equivalent CNNs. Also, these tional Conference on Computer Vision, pages 1211–1220, sets depend on the layer where a fault is injected. This in- 2019. formation has to be taken into account during the reliability and safety analysis of such classifiers if they shall be inte- [Russakovsky et al., 2015] Olga Russakovsky, Jia Deng, grated into a safety-critical system. In this paper, we pro- Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, posed the application of a Bayesian network model fed with Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael the results of such fault injection experiments. This model al- Bernstein, Alexander C. Berg, and Li Fei-Fei. Ima- lows a broad range of numerical reliability and safety-related geNet Large Scale Visual Recognition Challenge. Inter- analysis of the classifier under test. As an application exam- national Journal of Computer Vision (IJCV), 115(3):211– ple, we have demonstrated how the proposed Bayesian model 252, 2015. helps to estimate the level of trustworthiness for each result- [Schorn et al., 2019] Christoph Schorn, Andre Guntoro, and ing image class. Gerd Ascheid. An efficient bit-flip resilience optimiza- tion method for deep neural networks. In 2019 Design, References Automation & Test in Europe Conference & Exhibition (DATE), pages 1507–1512. IEEE, 2019. [Beyer et al., 2019] M. Beyer, A. Morozov, K. Ding, [Schreiber, 2018] Jacob Schreiber. Pomegranate: fast and S. Ding, and K. Janschek. Quantification of the impact flexible probabilistic modeling in python. Journal of Ma- of random hardware faults on safety-critical ai applica- chine Learning Research, 18(164):1–6, 2018. tions: Cnn-based traffic sign recognition case study. In 2019 IEEE International Symposium on Software Relia- [Simonyan and Zisserman, 2014] Karen Simonyan and An- bility Engineering Workshops (ISSREW), pages 118–119, drew Zisserman. Very deep convolutional networks Oct 2019. for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. [Beyer et al., 2020] Michael Beyer, Andrey Morozov, Emil Valiev, Christoph Schorn, Lydia Gauerhof, Kai Ding, and [Stallkamp et al., 2012] Johannes Stallkamp, Marc Schlips- Klaus Janschek. Two fault injectors for tensorflow: Eval- ing, Jan Salmen, and Christian Igel. Man vs. computer: uation of the impact of random hardware faults on vggs. Benchmarking machine learning algorithms for traffic sign In The paper is submitted to EDCC2020. Under evalua- recognition. Neural Networks, 32:323–332, 2012.