=Paper= {{Paper |id=Vol-2640/paper_6 |storemode=property |title=Bayesian Model for Trustworthiness Analysis of Deep Learning Classifiers |pdfUrl=https://ceur-ws.org/Vol-2640/paper_6.pdf |volume=Vol-2640 |authors=Andrey Morozov,Emil Valiev,Michael Beyer,Kai Ding,Lydia Gauerhof,Christoph Schorn |dblpUrl=https://dblp.org/rec/conf/ijcai/MorozovVBDGS20 }} ==Bayesian Model for Trustworthiness Analysis of Deep Learning Classifiers== https://ceur-ws.org/Vol-2640/paper_6.pdf
         Bayesian Model for Trustworthiness Analysis of Deep Learning Classifiers
                                   Andrey Morozov1∗ , Emil Valiev2 , Michael Beyer2
                                   Kai Ding3 , Lydia Gauerhof4 , Christoph Schorn4
         1
           Institute of Industrial Automation and Software Engineering, University of Stuttgart, Germany
                          2
                            Institute of Automation, Technische Universität Dresden, Germany
                        3
                          Bosch (China) Investment Ltd., Corporate Research, Shanghai, China
                            4
                              Robert Bosch GmbH, Corporate Research, Renningen, Germany
           andrey.morozov@ias.uni-stuttgart.de, {emil.valiev, michael.beyer3}@mailbox.tu-dresden.de,
                     kai.ding@cn.bosch.com, {lydia.gauerhof, christoph.schorn}@de.bosch.com

                             Abstract                                A DL component is simply a piece of software deployed on
                                                                  a standard computing unit. For example, a traffic-sign recog-
        In the near future, Artificial Intelligence methods       nition module of a car receives images from a front camera,
        will inevitably enter safety-critical areas. Deep         detects, and classifies road signs [Beyer et al., 2019]. Such a
        Learning software, deployed on standard comput-           system is prone to several types of random hardware faults,
        ing hardware, is prone to random hardware faults          including bit flips that can occur in RAM or CPU of the
        such as bit flips that can result in silent data cor-     computing unit. Bit flips may result in silent data corrup-
        ruption. We have performed fault injection ex-            tion and affect classification accuracy, as shown in [Beyer et
        periments on three Convolution Neural Network             al., 2020], [Li et al., 2018]. There are even specific Bit-Flip
        (CNN) image classifiers, including VGG16 and              Attack methods that intentionally cause misclassification by
        VGG19. Besides the fact that the bit flips indeed         flipping a small number of bits in RAM, where the weights of
        drop the classification accuracy, we have observed        the network are stored [Rakin et al., 2019] [Liu et al., 2017].
        that these faults result not in random misclassifica-        This phenomenon can be investigated with Fault Injection
        tion but tend to particular erroneous sets of classes.    (FI) experiments using methods and tools discussed in Sec-
        This fact shall be taken into account to design a         tion 2. We have performed such experiments on three Con-
        reliable and safe system. For example, we might           volution Neural Network (CNN) image classifiers described
        consider re-running the classifier if it yields a class   in Section 3. Besides the fact that the bit flips indeed drop
        for such an erroneous set. This paper discusses           the classification accuracy, we have made another interesting
        the results of our fault injection experiments and        observation: The injection of a random bit-flip in an output of
        introduces a new Bayesian Network (BN) model              a particular CNN layer results not in random misclassifica-
        that aggregates these results and enables numeri-         tion, but tends to a particular set of image classes. For some
        cal evaluation of the performance of the CNNs un-         layers, especially for several first layers, these sets are very
        der the influence of random hardware faults. We           distinctive. Examples are shown in Figures 2 and 3. A simi-
        demonstrate the application of the developed BN           lar observation was mentioned in [Liu et al., 2017], where the
        model for the trustworthiness analysis. In particu-       classes from such sets are called the sink classes.
        lar, we show how to evaluate the misclassification
                                                                     This fact has potential practical value and should be taken
        probabilities for each resulting class, for the vary-
                                                                  into account during the reliable and safe design of systems
        ing probability of random bit-flips.
                                                                  that include DL-components.
                                                                    • First and the most obvious, if the provided classification
                                                                      result belongs to such a sink set, then we might consider
1       Introduction                                                  re-running the classifier.

The majority of the high-tech industrial areas already exploit      • Second, since these sink sets are different for different
Artificial Intelligence (AI) methods, including deep learning         CNN layers, we estimate possible fault location, and,
techniques. Presumably, in the next few years, the safety cer-        for example, re-run the network partially, starting from
tification challenges of AI components will be overcome, and          the potentially faulty layer to reduce computational over-
Deep Learning (DL) will enter safety-critical domains such            head.
as transportation, robotics, and healthcare.
                                                                    • Third, if several classification results in a row belong
    ∗
     Contact Author. Copyright c 2020 for this paper by its au-       to a sink set, then we can assume a ”hard” error, e.g.,
thors. Use permitted under Creative Commons License Attribution       permanent stuck-at one or stuck-at zero in RAM where
4.0 International (CC BY 4.0).                                        the data of a particular CNN layer is stored.
Figure 1: Working principle of the fault injection framework InjectTF2. The layer selected for injection is shown in orange. Source code
available at https://github.com/mbsa-tud/InjectTF2.


   Contribution: This paper presents the results of the dis-             InjectTF2 performs fault injection experiments in an auto-
cussed FI experiments. In particular, it shows several exam-          mated way and logs the results. The model splitting principle
ples of sink sets for the layers of VGG16 and VGG19. The              sketched in Figure 1, drastically reduces the execution time
complete results of the FI experiments are available online.          of the experiments, since the network is not executed from
Based on these experiments, we have developed and fed a               bottom to top each time, but only after the layer where the
Bayesian Network (BN) model that enables numerical eval-              faults are injected.
uation of the performance of the CNNs under the influence                In this paper, we are focused on random faults. However,
of random hardware faults. The paper provides a formal de-            there are also methodologies to evaluate the impact of per-
scription of this BN model and demonstrates its application           manent faults, like the one presented in [Bosio et al., 2019].
for the trustworthiness analysis of the classification results.       Besides that, there are other methods for performance and re-
The paper shows how to evaluate the misclassification proba-          liability analysis of deep neural networks that help to improve
bilities for each resulting class, for the varying probability of     fault tolerance. A specific fault-injection method for the
random bit-flips.                                                     neural networks deployed in FPGAs and further algorithm-
                                                                      based fault tolerance and selective triplicating of the most
2   State of the Art                                                  critical layers [Libano, 2018]. An efficient bit-flip resilience
                                                                      optimization method for deep neural networks is presented
A good overview of the current research effort on making              [Schorn et al., 2019].
deep learning neural networks safe and trustworthy is given
in [Huang et al., 2018]. The authors surveyed the methods             3     Fault Injection Experiments
for verification, testing, adversarial attack and defense, and
interpretability.                                                     3.1    CNNs and Datasets
   In most cases, neural networks are treated as black boxes.         We performed experiments on three different neural net-
Therefore, at the moment, the most straightforward analysis           works. The architectures and layer output dimensions of the
methods are based on fault injection campaigns. The formal            networks are listed in Table 1.
verification methods are less found.                                     The first is a self-developed simple CNN, which consists of
   Several tools enable automated fault injection into the neu-       12 layers and follows common design principles. The ReLU
ral networks. For example, TensorFI [Li et al., 2018] and             activation function is used throughout the network, excluding
InjectTF [Beyer et al., 2019] support the first version of Ten-       the last layer that uses the Softmax activation function.
sorFlow. The experiments discussed in this paper were car-               This CNN has been trained on an augmented German
ried out in the TensorFlow V2 environment. Therefore we               Traffic Sign Recognition (GTSRB) [Stallkamp et al., 2012]
have used InjectTF2 [Beyer et al., 2020] that was devel-              dataset and can classify road signs with an accuracy of ap-
oped for TensorFlow V2. Figure 1 shows the main working               proximately 96 %. The dataset is split into three subsets for
principle of the InjectTF2. The tool allows the layer-wise            training, testing, and validation. The subsets contain 34 799,
fault injection. InjectTF2 takes a trained neural network, a          12 630, and 4 410 images. Each image has 32 × 32 RGB
dataset, and a configuration file as inputs. The network and          pixels and belongs to one of 43 classes of road signs. In or-
the dataset should be provided as a HDF5 model and a Ten-             der to ensure a uniform classification performance across all
sorFlow dataset. In the configuration file, the user can specify      classes, the training dataset has been normalized, augmented,
the fault type and fault injection probability for each layer of      and balanced. The augmentation is done by adding copies
the neural network under test. Currently supported fault types        of images with zoom, rotation, shear, brightness disturbance,
are (i) a random bit flip or (ii) a specified bit flip of a random    and gaussian noise to the dataset. The augmented training
value of a layer’s output.                                            subset contains 129 100 images.
                                     (a) VGG16: Fault free run.




            (b) VGG16: Fault injection in Layer 3. The sink classes are highlighted in red.




            (c) VGG16: Fault injection in Layer 7. The sink classes are highlighted in red.

Figure 2: Results of the fault injection experiments on VGG16 for the ImageNet dataset (1000 classes).




                                     (a) VGG19: Fault free run.




            (b) VGG19: Fault injection in Layer 3. The sink classes are highlighted in red.




           (c) VGG19: Fault injection in Layer 10. The sink classes are highlighted in red.

Figure 3: Results of the fault injection experiments on VGG19 for the ImageNet dataset (1000 classes).
Table 1: The layer structure of the CNNs with output dimensions.      The second and third networks are the pre-trained Tensor-
                                                                   Flow VGG16 and VGG19 [Simonyan and Zisserman, 2014].
                                                                   They are trained on the ImageNet dataset [Russakovsky et
 #    Custom CNN            VGG16                 VGG19            al., 2015]. In the experiments, a random sample of 5000 im-
          Conv                Conv                 Conv            ages from the 2012 ImageNet testing subset has been used.
 1                                                                 The images belong to 1000 different classes and consist of
      (32 × 32 × 32)    (224 × 224 × 64)      (224 × 224 × 64)
          Conv                Conv                 Conv            224 × 224 RGB pixels.
 2
      (32 × 32 × 32)    (224 × 224 × 64)      (224 × 224 × 64)
        MaxPool             MaxPool              MaxPool           3.2    Results
 3
      (16 × 16 × 32)    (112 × 112 × 64)     (112 × 112 × 64)      Six examplar bar plots in Figures 2 and 3 describe the clas-
         Dropout              Conv                 Conv            sification results for VGG16 and VGG19. The plots display
 4
      (16 × 16 × 32)    (112 × 112 × 128)    (112 × 112 × 128)     how many images from the input datasets are classified into
          Conv                Conv                 Conv            each of the 1000 classes.
 5
      (16 × 16 × 64)    (112 × 112 × 128)    (112 × 112 × 128)        The first (top) plots in both figures show the distributions
          Conv              MaxPool              MaxPool           without faults. The images are distributed more or less uni-
 6
      (16 × 16 × 64)     (56 × 56 × 128)      (56 × 56 × 128)      formly over the classes. The other two plots in each fig-
        MaxPool               Conv                 Conv            ure show the distributions after the faults injected into spe-
 7                                                                 cific layers. Precisely, into the layers three and seven of
       (8 × 8 × 64)     (56 × 56 × 256)       (56 × 56 × 256)
         Dropout              Conv                 Conv            VGG16 and layers three and ten of VGG19. These lay-
 8                                                                 ers are also highlighted in bold in Table 1. Similar plots
       (8 × 8 × 64)      (56 × 56 × 256)      (56 × 56 × 256)
          Flatten             Conv                 Conv            for other layers, together with the raw data, are available at
 9                                                                 https://github.com/mbsa-tud/InjectTF2.
          (4096)         (56 × 56 × 256)      (56 × 56 × 256)
          Dense             MaxPool                Conv               For each layer, we have carried out 100 fault injection ex-
10                                                                 periments. In each experiment, we flip a random bit of a ran-
           (256)         (28 × 28 × 256)     (56 × 56 × 256)
         Dropout              Conv               MaxPool           dom output value of the corresponding layer. The bar plots
11                                                                 represent the average for these 100 experiments.
           (256)         (28 × 28 × 512)      (28 × 28 × 256)
          Dense               Conv                 Conv               In the plots, we can observe several distinctive peaks.
12                                                                 These peaks reveal that the VGGs tend to erroneously classify
           (43)          (28 × 28 × 512)      (28 × 28 × 512)
                              Conv                 Conv            images into these sink classes after the fault injections. The
13                                                                 peaks are located differently for the presented layers. Note
                         (28 × 28 × 512)      (28 × 28 × 512)
                                                                   that the peaks are different also for the third layers of VGG16
                            MaxPool                Conv
14
                         (14 × 14 × 512)      (28 × 28 × 512)
                                                                   and VGG19.
                                                                      However, for several layers, especially from the same VGG
                              Conv                 Conv
15
                         (14 × 14 × 512)      (28 × 28 × 512)
                                                                   blocks, the peaks are quite similar. We also observed that
                                                                   such peaks appear only in the first layers, and the misclassifi-
                              Conv               MaxPool           cation became more random if we inject faults into the more
16
                         (14 × 14 × 512)      (14 × 14 × 512)
                                                                   in-depth layers. Note that the peaks for the seventh and tenth
                              Conv                 Conv            layers are lower than the peaks of the third layer. The peaks
17
                         (14 × 14 × 512)      (14 × 14 × 512)
                                                                   are distinctive for the first 11 layers of VGG 16 and the first 12
                            MaxPool                Conv            layers of VGG19. After that, the distribution becomes more
18
                          (7 × 7 × 512)       (14 × 14 × 512)      or less uniform.
                             Flatten               Conv
19
                             (25088)          (14 × 14 × 512)
                             Dense                 Conv
                                                                   4     Trustworthiness analysis
20
                             (4096)           (14 × 14 × 512)      The experimental results discussed above enable numerical
                             Dense               MaxPool           evaluation of the performance of the CNNs under the influ-
21
                             (4096)            (7 × 7 × 512)       ence of random hardware faults. For instance, we can statis-
                             Dense                Flatten          tically evaluate the probability of misclassification for each
22
                             (1000)               (25088)          resulting image class. This probability is higher for the sink
                                                   Dense           classes than for other classes. For this purpose, we use a
23                                                                 Bayesian Network (BN) model fed with the statistical results
                                                   (4096)
                                                   Dense           of the fault injection experiments.
24
                                                   (4096)
                                                   Dense           4.1    Formal Model of the Classifier
25
                                                   (1000)          The BN is defined using a formal set-based model of a
                                                                   classifier that is shown in Figure 4. This model is based on
                                                                   three sets, two functions, and three random variables.
 Set of images:        Set of layers:      Set of classes:         4.2   Bayesian Network



                       CNN classifier



   Input image                                 Resulting class
  (random var)                                 (random var)
                                                                     Input image             Bit flip           Other faults


                  A bit flip in one layer or
                  no bit flip (random var)                                               Resulting class


         Figure 4: Formal set-based model of a classifier.


Sets:

I = {i1 , i2 , ..., iNI } - Input images.
C = {c1, c2, ..., cNC } - Result classes.
L = {l1 , l2 , ..., lNL } - Layers of the CNN.

Functions:

g : I × C → {1, 0} - Formalization of the results of the
fault-free run, g(i, c) = 1 if image i is classified as class c,
and g(i, c) = 0 otherwise. For simplicity, we assume that
classification is always correct.

fkl : I × C → {1, 0} - Formalization of the results of             Figure 5: The Bayesian network and conditional probability tables.
the FI experiments, fkl (i, c) = 1 if image i is classified as
class c in the k th FI experiment, the faults are injected in
layer l ∈ L, fkl (i, c) = 0 otherwise.                                A BN is a graphical formalism for representing joint prob-
                                                                   ability distributions [Pearl, 1985]. It is a probabilistic directed
                                                                   acyclic graph that represents a set of variables and their condi-
Random variables:                                                  tional dependencies. Each node is defined with a Conditional
                                                                   Probability Table (CPT). Our BN describes the conditional
I ∈ I - Current input image. An independent discrete random        probabilities of C. Figure 5 shows the BN and the CPTs of
variable. For simplicity, we assume that the probability that      the three random variables. The CPTs of independent vari-
an image from I is the input image is equal for all images:        ables I and B define constant probabilities for each outcome.
P (I = i) = 1/NI , ∀i ∈ I. Otherwise the distribution should       The outcome of C depends on the outcomes of I and B. So
be specified statistically.                                        the probabilities are defined for each combination of the out-
                                                                   comes of I and B.
B ∈ {0 none0 } ∪ L - No bit flip or a bit flip in a partic-           The CPT of C is divided into two parts. The upper part
ular layer. An independent discrete random variable. The           describes the situation without bit flips. We assumed per-
value 0 none0 means that there was no bit flip during the          fect classification. Thus, this part consists just of zeroes
run. A value l ∈ L means that it was a bit flip in layer l.        and ones. The ones indicate the correct classes for each im-
We assume, that only a single bit flip can happen during           age. Mathematically we represent this using the function g:
the run. The distribution is defined according to a variable       pi,0 none0 ,c = g(i, c). The bottom part describes the situa-
plk that defines a probability of a bit flip in a layer lk .       tion when bit flips occur in corresponding layers. Here we
For the simplicity we will apply the same probability p            statistically approximate the probabilities using the results of
for each layer. The outcome 0 none0 is defined as the com-         our fault injection experiments. Mathematically we represent
                                                 PNL
plement of all other events: P (B =0 none0 ) = 1− k=1   plk .      this using the function f . Each probability is estimated as the
                                                                   number of times when i was classified as c divided by the to-
C ∈ C - Resulting class.           A discrete random variable      tal numberP of the fault injection experiments ofr the layer l:
that depends on I and B.                                           pi,l,c = k fkl (i, c)/k.
4.3      Quantification
The BN stores the results of the fault injection experiments
in a structured way. This allows the analysis of various
reliability-related properties. From the general cumulative
probability of misclassification to the specific probabilities of
the wrong classification of a particular input image because of
a bit flip in a particular layer. Moreover, other kinds of ran-
dom faults and their combinations can be taken into account,
as shown in Figure 5 with the dashed lines.
   As an example, we show how to quantify the trustworthi-
ness for each resulting class. We define the trustworthiness as
a kind of inverse probability, the probability that the resulting
class c is the correct class for the input image i, taking into
account the possibility of a bit flip in any layer. Ic is a sub-
set of images that belong to the class c: Ic ⊂ I : i ∈ Ic if
f (i, c) == 1, i ∈ I, c ∈ C. Then, the trustworthiness of class
c is the conditional probability P (I ∈ Ic ∩ B ∈ B|C = c).
Applying first the formula for conditional probability (Kol-
mogorov definition) and then the law of total probability, we
obtain the following expression.

              P (I ∈ Ic ∩ B ∈ B|C = c) =
              P (C = c ∩ I ∈ Ic ∩ B ∈ B)
           =                             =
                        P (C = c)
   P P
           P (C = c|B = b ∩ I = i)P (B = b)P (I = i)                 Figure 6: Misclassification probabilities for the varying probability
  i∈Ic b∈B                                                           of a bit flip for the image classes of the custom CNN.
=
         P (C = c|B = b0 ∩ I = i0 )P (B = b0 )P (I = i0 )
  P  P
      i0 ∈I b0 ∈B

Where, P (I = i) is from the CPT of I, P (B = b) is from
the CPT of B, and P (C = c|B = b ∩ I = i) is from the
CPT of C. In the numerator of the fraction, we sum up only
for i from Ic and for all i from I in the denominator. Ba-
sically, we compute the ratio of correct classifications to all
classification.
   In our experiments, we computed the probabilities with our
self-developed scripts. However, probabilistic analytical soft-
ware libraries, like pomegranate [Schreiber, 2018], allow effi-
cient and scalable methods for computation of Bayesian net-
works.
4.4      Results
Figures 6, 7, and 8 show misclassifcation probabilities for
each class. These probabilities are computed as one minus
the trustworthiness. The probabilities of bit flips vary from
10−7 to 1 for the custom CNN and from 10−11 to 10−4 for the
VGG16 and VGG19. Ten classes with the highest misclassi-
fication probabilities are highlighted with colors (sorted using
the results obtained for the probabilities of bit flip 10−9 ). The
remaining classes are shown in grey.
   Based on the estimated bit flip probabilities, we can decide
whether we trust the classification result or not. Moreover,
from the safety point of view, the misclassification for some
classes might be more hazardous than for the others. For
instance, it might be more critical to confuse the stop sign
with the main road sign than to confuse speed limits 30 and          Figure 7: Misclassification probabilities for the varying probability
                                                                     of a bit flip for the image classes of the VGG16.
50. Such cases can be easily quantified using the proposed
Bayesian model. They could also lead to re-training regard-
ing the classes with the lowest trustworthiness.
                                                                           tion. This will be updated in the final version of the paper.,
                                                                           pages xxx–xxx, XXX, XXX, XXX 2020. XXX.
                                                                        [Bosio et al., 2019] A. Bosio, P. Bernardi, A. Ruospo, and
                                                                           E. Sanchez. A reliability analysis of a deep neural network.
                                                                           In 2019 IEEE Latin American Test Symposium (LATS),
                                                                           pages 1–6, 2019.
                                                                        [Huang et al., 2018] Xiaowei Huang, Daniel Kroening,
                                                                           Wenjie Ruan, James Sharp, Youcheng Sun, Emese Thamo,
                                                                           Min Wu, and Xinping Yi. A survey of safety and
                                                                           trustworthiness of deep neural networks. arXiv preprint
                                                                           arXiv:1812.08342, 2018.
                                                                        [Li et al., 2018] Guanpeng Li, Karthik Pattabiraman, and
                                                                           Nathan DeBardeleben. Tensorfi: A configurable fault in-
                                                                           jector for tensorflow applications. In 2018 IEEE Inter-
                                                                           national Symposium on Software Reliability Engineering
                                                                           Workshops (ISSREW), pages 313–320. IEEE, 2018.
                                                                        [Libano, 2018] Fabiano Pereira Libano. Reliability analysis
                                                                           of neural networks in fpgas, 2018.
                                                                        [Liu et al., 2017] Y. Liu, L. Wei, B. Luo, and Q. Xu. Fault in-
                                                                           jection attack on deep neural network. In 2017 IEEE/ACM
                                                                           International Conference on Computer-Aided Design (IC-
Figure 8: Misclassification probabilities for the varying probability      CAD), pages 131–138, 2017.
of a bit flip for the image classes of the VGG19.                       [Pearl, 1985] Judea Pearl. Bayesian netwcrks: A model cf
                                                                           self-activated memory for evidential reasoning. In Pro-
5    Conclusion                                                            ceedings of the 7th Conference of the Cognitive Science
                                                                           Society, University of California, Irvine, CA, USA, pages
A series of fault injection experiments on several CNN-based               15–17, 1985.
classifiers have shown that random hardware faults result not           [Rakin et al., 2019] Adnan Siraj Rakin, Zhezhi He, and
in random misclassification but tend to misclassify the input              Deliang Fan. Bit-flip attack: Crushing neural network with
images into specific distinctive sets of classes. These sets               progressive bit search. In Proceedings of the IEEE Interna-
are different for functionally equivalent CNNs. Also, these                tional Conference on Computer Vision, pages 1211–1220,
sets depend on the layer where a fault is injected. This in-               2019.
formation has to be taken into account during the reliability
and safety analysis of such classifiers if they shall be inte-          [Russakovsky et al., 2015] Olga Russakovsky, Jia Deng,
grated into a safety-critical system. In this paper, we pro-               Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma,
posed the application of a Bayesian network model fed with                 Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael
the results of such fault injection experiments. This model al-            Bernstein, Alexander C. Berg, and Li Fei-Fei. Ima-
lows a broad range of numerical reliability and safety-related             geNet Large Scale Visual Recognition Challenge. Inter-
analysis of the classifier under test. As an application exam-             national Journal of Computer Vision (IJCV), 115(3):211–
ple, we have demonstrated how the proposed Bayesian model                  252, 2015.
helps to estimate the level of trustworthiness for each result-         [Schorn et al., 2019] Christoph Schorn, Andre Guntoro, and
ing image class.                                                           Gerd Ascheid. An efficient bit-flip resilience optimiza-
                                                                           tion method for deep neural networks. In 2019 Design,
References                                                                 Automation & Test in Europe Conference & Exhibition
                                                                           (DATE), pages 1507–1512. IEEE, 2019.
[Beyer et al., 2019] M. Beyer, A. Morozov, K. Ding,
                                                                        [Schreiber, 2018] Jacob Schreiber. Pomegranate: fast and
  S. Ding, and K. Janschek. Quantification of the impact
                                                                           flexible probabilistic modeling in python. Journal of Ma-
  of random hardware faults on safety-critical ai applica-
                                                                           chine Learning Research, 18(164):1–6, 2018.
  tions: Cnn-based traffic sign recognition case study. In
  2019 IEEE International Symposium on Software Relia-                  [Simonyan and Zisserman, 2014] Karen Simonyan and An-
  bility Engineering Workshops (ISSREW), pages 118–119,                    drew Zisserman.        Very deep convolutional networks
  Oct 2019.                                                                for large-scale image recognition.           arXiv preprint
                                                                           arXiv:1409.1556, 2014.
[Beyer et al., 2020] Michael Beyer, Andrey Morozov, Emil
  Valiev, Christoph Schorn, Lydia Gauerhof, Kai Ding, and               [Stallkamp et al., 2012] Johannes Stallkamp, Marc Schlips-
  Klaus Janschek. Two fault injectors for tensorflow: Eval-                ing, Jan Salmen, and Christian Igel. Man vs. computer:
  uation of the impact of random hardware faults on vggs.                  Benchmarking machine learning algorithms for traffic sign
  In The paper is submitted to EDCC2020. Under evalua-                     recognition. Neural Networks, 32:323–332, 2012.