=Paper= {{Paper |id=Vol-2916/paper_6 |storemode=property |title=Towards a Safety Case for Hardware Fault Tolerance in Convolutional Neural Networks Using Activation Range Supervision |pdfUrl=https://ceur-ws.org/Vol-2916/paper_6.pdf |volume=Vol-2916 |authors=Florian Geissler,Syed Qutub,Sayanta Roychowdhury,Ali Asgari,Yang Peng,Akash Dhamasia,Ralf Graefe,Karthik Pattabiraman,Michael Paulitsch |dblpUrl=https://dblp.org/rec/conf/ijcai/GeisslerQRAPDGP21 }} ==Towards a Safety Case for Hardware Fault Tolerance in Convolutional Neural Networks Using Activation Range Supervision== https://ceur-ws.org/Vol-2916/paper_6.pdf
     Towards a Safety Case for Hardware Fault Tolerance in Convolutional Neural
                    Networks Using Activation Range Supervision

    Florian Geissler1∗ , Syed Qutub1 , Sayanta Roychowdhury1 , Ali Asgari2 , Yang Peng1 , Akash
             Dhamasia1 , Ralf Graefe1 , Karthik Pattabiraman2 and Michael Paulitsch1
                                             1 Intel, Germany
                                2 University of British Columbia, Canada



                           Abstract                                     C1: Operational design domain:
                                                                        Inference of pretrained classifier
                                                                                                                                                         C4: The chance         of a soft error
                                                                                                                                                         event to occur can be given.
                                                                        networks with protection layers,                G1: System is                    C5: The simulated weight/neuron
      Convolutional neural networks (CNNs) have be-                     input represented by given dataset.
                                                                                                                      sufficiently safe in               fault model appropriately represents
                                                                        C2: An appropriate independent                                                   realistic soft errors.
      come an established part of numerous safety-                      dataset for bound extraction exists.           the presence of                   C6: The data representation has
                                                                                                                          soft errors.                   eight exponent bits (FP32, BF16).
      critical computer vision applications, including                  C3: “Sufficiently safe” is well defined
                                                                        by the end user and is proportional                                              C7: A fallback system/re-execution
      human robot interactions and automated driving.                   to the overall risk.                                                             can be used for uncorrectable errors.

      Real-world implementations will need to guarantee
      their robustness against hardware soft errors cor-                  G2: System detects critical               G3: System mitigates soft             G4: System does not increase
                                                                                  soft errors.                               errors.                            the error severity.
      rupting the underlying platform memory. Based on
                                                                          E2a: SDC/DUE events appear in                                                     E4a: DUE events can be handled
      the previously observed efficacy of activation clip-              conjunction with oob events with a
                                                                                                                  E3a: The probability of SDC/DUE
                                                                                                                  events is significantly reduced by        with negligible risk for any error
      ping techniques, we build a prototypical safety case                 high conditional probability.            restricting oob activations in                      severity.
                                                                                                                          protection layers.                E4b: The severity of residual SDC
      for classifier CNNs by demonstrating that range su-                 E2b: Oob events are detected by
                                                                         threshold-based protection layers.                                                depends on the application. As an
                                                                                                                   E3b: DUE events can further be
      pervision represents a highly reliable fault detec-                                                         mitigated by referring to a fallback     example, we study the scenario of
                                                                                                                                                           MioVision and ResNet50 and find
                                                                                                                      system or via re-execution.
      tor and mitigator with respect to relevant bit flips,               E2c: DUE events can further be
                                                                         detected by Nan/Inf monitoring.
                                                                                                                                                           that the average severity of errors
                                                                                                                                                                 is comparable or lower.
      adopting an eight-exponent floating point data rep-
      resentation. We further explore novel, non-uniform               Figure 1: Structured safety argument for the fault tolerance of a
      range restriction methods that effectively suppress              CNN in the presence of soft errors, using range restrictions. The
      the probability of silent data corruptions and un-               notation follows [10] including goals (G), context (C), and evidence
      correctable errors. As a safety-relevant end-to-end              (E). ”Oob” denotes ”out-of-bounds”.
      use case, we showcase the benefit of our approach
      in a vehicle classification scenario, using ResNet-
      50 and the traffic camera data set MIOVision. The
      quantitative evidence provided in this work can be
      leveraged to inspire further and possibly more com-                 Soft errors typically manifest as single or multiple bit up-
      plex CNN safety arguments.                                       sets in the platform’s memory elements [5]. As a conse-
                                                                       quence, network parameters (weight faults) or local compu-
                                                                       tational states (neuron faults) can be altered during inference
1    Motivation                                                        time, and invalidate the network prediction in a safety-critical
With the widespread use of convolutional neural networks               way, for example, by misclassifying a person as a background
(CNN) across many safety-critical domains such as auto-                image in an automated driving context [6–8]. This has led to
mated robots and cars, one of the most prevailing challenges           a search for strategies to verify CNN-based systems against
is the establishment of a safety certification for such artifi-        hardware faults at the inference stage [9]. With chip technol-
cial intelligence (AI) components, e.g., with respect to the           ogy nodes scaling to smaller sizes and larger memory density
ISO 26262 [1] or ISO/PAS 21448 (SOTIF) [2]. This certifi-              per area, future platforms are expected to be even more sus-
cation requires not only a high fault-tolerance of the trained         ceptible to soft errors [5].
network against unknown or adversarial input, but also ef-
ficient protection against hardware faults of the underlying              In this paper, we evaluate range restriction techniques in
platform [3, 4]. Importantly this includes transient soft er-          CNNs exposed to platform soft errors with respect to the key
rors, meaning disturbances originating from events such as             elements of a prototypical safety case. This means that we
cosmic neutron radiation, isotopes emitting alpha particles,           formulate arguments (in the form of ”goals”) that constitute
or electromagnetic leakage on the computer circuitry itself.           essential parts of a complete safety case, and provide quanti-
   ∗ corresponding author, Email:        florian.geissler@intel.com.   tative evidence to support these goals in the studied context
Copyright © 2021 for this paper by its authors. Use permitted under    (see Fig. 1). Individual safety arguments can be reused as
Creative Commons License Attribution 4.0 International (CC BY          building blocks of more complex safety cases. The structure
4.0).                                                                  of our goals is based on the probabilistic, high-level safety
                            No fault                                                         Weight fault                propagation due to the monotonicity of most neural network
                       (a) No protection                                 (b) No protection                  (c) Ranger
                                                                                                                         operations [11]. To suppress the propagation of such cor-
     Conv1
                                                                                                                         rupted values, additional range restriction layers are inserted
      Relu
     Ranger
                                                                                                                         in the network at strategic positions following the approach
    MaxPool                                                                                                              of Chen et. al. [8] (see Fig. 2 for an example). At inference
     Ranger
                                                                                                                         time, the protection layers then compare the intermediate ac-



                                                 Activation magnitudes
     Conv2
      Relu
     Ranger                                                                                                              tivations against previously extracted interval thresholds in
    MaxPool
     Ranger
                                                                                                                         order to detect and reset anomalously large values. Deriva-
    Reshape
     Ranger
                                                                                                                         tive approaches have been shown to be efficient in recovering
      FC1
      Relu
                                                                                                                         network performance [6–8, 12] and, advantageously, do not
     Ranger                                                                                                              require the retraining of CNN parameters nor computation-
      FC2
      Relu                                                                                                               ally expensive functional duplications.
     Ranger
      FC3                                                                                                                   The focus of this paper is to examine alternative restriction
                             Classes
                                                                                                                         schemes for optimized soft error mitigation. In a CNN, the
                                                             Weight fault                                                output of every kernel is represented as a two-dimensional
    (d) Clipping                 (e) Rescaling                              (f) Backflip              (g) Fmap average
                                                                                                                         (2D) feature map, where the activation magnitudes encode
                                                                                                                         specific features, on which the network bases its prediction.
                                                                                                                         Soft errors will manifest as distortions of feature maps in all
                                                                                                                         subsequent layers that make use of the corrupted value, as
                                                                                                                         shown in Fig. 2(a)-(b). The problem of mitigating soft errors
                                                                                                                         in a CNN can therefore be rephrased as restoring the fault-
                                                                                                                         free topology of feature maps.
                                                                                                                            Previous analyses have adopted uniform range restriction
                                                                                                                         schemes that truncate out-of-bound values to a finite thresh-
                                                                                                                         old [7, 8], e.g., Fig. 2(c)-(d). We instead follow the intuition
                                                                                                                         that optimized, non-uniform range restriction methods that
                                                                                                                         attempt to reconstruct feature maps (see Fig. 2(e)-(g), and de-
                                                                                                                         tails in Sec. 5) can not only reduce SDC to a comparable or
                                                                                                                         even lower level, but may also lead to less critical misclassi-
Figure 2: Visualization example of the impact of a weight fault                                                          fications in the case of an SDC. This is because classes with
using LeNet-5 and the MNIST data set. Range restriction layers                                                           more similar attributes will display more similar high-level
(”Ranger”) are inserted following [8] (top left). The rows represent                                                     features (e.g., pedestrian and biker will both exhibit upright
the feature maps of the individual network layers after range restric-                                                   silhouette, in contrast to car and truck classes).
tion was applied, where linear layers (FC1-FC3) were reshaped to                                                            Finally, a safety analysis has to consider that not all SDC
a 2D feature map as well for visualization purposes. In (b) - (g), a
large weight fault value is injected in the second filter of the first con-
                                                                                                                         events pose an equal risk to the user. We study a safety-
volutional layer. For the unprotected model (b), this leads to a SDC                                                     critical use case evaluating cluster-wise class confusions in
event (”0” gets changed to ”7”). The columns (c) - (g) then illustrate                                                   a vehicle classification scenario (Sec. 6). The example shows
the effect of the different investigated range restriction methods.                                                      that range supervision reduces the severe confusions propor-
                                                                                                                         tionally with the overall number of confusions, meaning that
                                                                                                                         the total risk is indeed mitigated.
objective of minimizing the overall risk [10], expressed as:                                                                In summary, this paper make the following contributions:
                                                                                                                            • Fault detection: We quantify the correlation between
                                                                     
 Ploss (i) = Pfailure (i) (1 − Pdetection (i)) + (1 − Pmitigation (i)) ,
                                                                                                                              SDC events and the occurrence of out-of-bound activa-
    Risk = ∑ Ploss (i) · Severity(i).                                                                                         tions to demonstrate the high efficiency of fault detection
                   i
                                                               (1)                                                            by monitoring intermediate activations,
Explicitly, for a fault type i, this includes the sub-goals of                                                              • Fault mitigation: We explore three novel range restric-
efficient error detection and mitigation, as well as a consid-                                                                tion methods that build on the preservation of the feature
eration of the fault severity in a given use case. On the other                                                               map topologies instead of mere value truncation,
hand, the probability of occurrence of a soft error (i.e., Pfailure
in Eq. 1) is assumed to be a constant system property that                                                                  • Fault severity: We demonstrate the benefit of range su-
cannot be controlled by run-time monitoring methods such as                                                                   pervision in an end-to-end use case of vehicle classifica-
activation range supervision.                                                                                                 tion where high and low severities are estimated by the
                                                                                                                              generic safety-criticality of class confusions.
   In a nutshell, range restriction builds on the observation
that silent data corruption (SDC) and detected uncorrectable                                                             The article is structured as follows: Section 2 reviews rele-
errors (DUE), e.g., NaN and Inf occurrences), stem primarily                                                             vant previous work while section 3 describes the setup used
from those bit flips that cause very large values, for example                                                           in this paper. Subsequently, the sections 4, 5, and 6 discuss
in high exponential bits [6]. Those events result in large ac-                                                           error detection, mitigation, and an exemplary risk analysis,
tivation peaks that typically grow even more during forward                                                              respectively, before section 7 concludes the paper.
2     Related work
Parity or error-correcting code (ECC) can protect memory el-
ements against single soft errors [5, 13]. However, due to the                                                               No faults
high compute and area overhead, this is typically done only
for selected critical memory blocks. Component replication                                                                   Inject
techniques such as triple modular redundancy can be used for                                                                 faults
the full CNN execution at the cost of a large overhead. Se-
lective hardening of hardware elements with the most salient
parameters can improve the robustness of program execution
in the presence of underlying faults [6, 14]. On a software
                                                                                                                             Faults
level, the estimation of the CNN’s vulnerable feature maps               DUE
(fmaps) and the selective protection by duplicated computa-
tions [15], or the assertive re-execution with stored, healthy
reference values [16] has been investigated. Approaches us-
ing algorithm-based fault tolerance (ABFT) [17] seek to pro-                                                  SDC
tect networks against soft errors by checking invariants that
are characteristic for a specific operation (e.g., matrix mul-     Figure 3: Illustration of SDC and DUE events. Errors are detected
tiplication). Symptom-based error detection may for exam-          or missed in the case of out-of-bound (oob) or in-bound (ib) events,
ple include the interpretation of feature map traces by a sec-     respectively. (Green) Samples of the data set that form the subset
ondary companion network [18]. The restriction of inter-           of a given filtering stage, (Yellow) samples of the data set that are
mediate ranges was explored [6, 12] in the form of modi-           discarded at the given stage, (White) samples that were filtered out
fied (layer-insensitive) activation functions such as tanh or      at a previous stage.
ReLU6. This concept was extended to find specific uniform
protection thresholds for neuron faults [8] or clipping bounds
for weight faults [7]. An alternative line of research is cen-     rate test input, which is taken from the training data sets of
tered around fault-aware retraining [19].                          ImageNet (143K images used) and MIOVision (83K images
                                                                   used), respectively. This step has to be performed only once.
                                                                   Bound extraction depends on the data set and will in general
3     Experimental setup                                           impact the safety argument (see Fig. 1). To check the suit-
3.1    Models, data sets, and system                               ability of the bounds, we verify that no out-of-bound events
CNNs are the most commonly used network variant for com-           were detected during the test phase in the absence of faults,
puter vision tasks such as object classification and detection.    so the baseline accuracy is the same with and without pro-
We compare the three standard classifier CNNs ResNet-50            tection. While all minimum bounds are zero in the studied
[20], VGG-16 [21], and AlexNet [22] together with the test         setup, the maximum activation values for ImageNet vary by
data set ImageNet [23] and MIOVision [24] for the investiga-       layer in a range of (see also Sec. 5) 1 < Tup < 45 for ResNet-
tion of a safety-critical example use case. Since fault injec-     50, 20 < Tup < 360 for VGG-16, and 65 < Tup < 170 for
tion is compute-intensive, we rescale our test data set for Im-    AlexNet. For MIOVision and ResNet-50, we find maximum
ageNet to a subset of 1000 images representing 20 randomly         bounds between 1 < Tup < 19.
selected classes. For MIOVision, a subset of 1100 images
(100 per class) that were correctly classified in the absence of   3.3   Fault model and injection
faults was chosen. All experiments adopt a single-precision        In line with previous investigations, we distinguish two dif-
floating point format (FP32) according to the IEEE754 stan-        ferent manifestations of memory bit flips referred to here as
dard [25]. Our conclusions apply as well to other floating         weight faults and neuron faults. The former represent soft er-
point formats with the same number of exponent bits, such as       rors affecting memory elements that store the learned network
BF16 [26], since no relevant effect was observed from fault        parameters, while the latter refer to errors in memory that
injections in mantissa bits (Sec. 4).                              holds temporary states such as intermediate network layer
   Experiments were performed in PyTorch (version 1.8.0)           outputs. While neuron faults may also impact states used
deploying torchvision models (version 0.9.0). For MIOVi-           for logical instructions, it was demonstrated that bit flip in-
sion, the ResNet-50 model was retrained [27]. We used Intel®       jections in the output of the affected layer are generally a
Core™ i9 CPUs, with inferences running on GeForce RTX              good model approximation [28]. Memory elements can be
2080, Titan RTX, and RTX 3090 GPUs.                                protected against single bit flips by mechanisms such as par-
                                                                   ity and ECC [5, 13]. However, this kind of protection is not
3.2    Protection layers and bound extraction                      always available due to the associated compute and area over-
We insert protection layers at strategic positions in the net-     head. Further, ECC typically cannot correct multi-bit flips.
work such as after activation, pooling, reshape or concatenate        We inject faults either directly in the weights of CNN lay-
layers, according to the model of Chen et al. [8]. Each pro-       ers (weight faults) or in the output of the latter (neuron faults),
tection layer requires specific bound values for the expected      using a customized fault injection framework based on Py-
activation ranges as a parameter. We extract those by mon-         torchFI [29]. To speed up the experiments we focus on bit
itoring the minimal and maximal activations from a sepa-           flips in the most relevant bit positions 0 − 8 (sign bit and ex-
ponential bits, neglecting mantissa) unless stated otherwise.
Fault locations (i.e., layer index, kernel index, channel etc.)
in the network are randomly chosen with an equal weight, so
without further constraints on the selection process to reflect
the arbitrary occurrence of soft errors. As weights are typ-
ically stored in the main memory and loaded only once for
a given application, we keep the same weight faults for one
entire epoch, running all tested input images. In total, we run
500 epochs, i.e., fault configurations, each one applied to 1K
images. Neuron faults, on the other hand, apply to memory
representing temporary states that are overwritten for each
new input. Therefore, we inject new neuron faults for each
new input and run 100 epochs resulting in 100K fault config-
urations, each one applied to a single image.

3.4   Evaluation
To quantify the impact of faults on the system safety, we
measure the rate of SDC events. Throughout, we consider
the Top-1 prediction to determine SDC. In line with previous          Figure 4: Bit-distribution across all weight parameters in conv2d
                                                                      layers. Values are represented in FP32, where only the sign bit (0)
work [6, 8], SDC is defined as the ratio of images that are           and the exponent bits (1 − 8) are shown.
misclassified in the presence of faults (without exceptions)
but correctly classified in the absence of faults and the overall
number of images, p(sdc) = Nincorrect /Ntest , correct (Fig. 3).      4   Error detection coverage
   During the forward pass, non-numerical exceptions in the           To effectively protect the network against faults, we first ver-
form of Inf and NaN values can be encountered, due to                 ify the error detection coverage for silent errors. Those er-
the following reasons: i) Inf values occur if large activa-           rors are detected by a given protection layer if the activation
tion values accumulate (for example during conv2d, linear,            values exceed (fall short of) the upper (lower) bound. If at
avgpool2d operations) until they exceed the maximum of the            least one protection layer is triggered per inference run, we
data representation. This effect becomes particularly appar-          register an out-of-bound (oob) event. Otherwise, we have an
ent when flips of the most significant bit (MSB, position index       in-bound (ib) event. In addition, we quantify the probabilities
1) are injected. ii) NaN values are found when denominators           of SDC and regular correct classification (cl) events, as well
are undetermined or multiple Inf values get added, e.g., in           as the respective conditional probabilities that correct and in-
BatchNorm2d layers, iii) NaN values can be generated di-              correct classifications occur given that oob or ib events were
rectly via bit flips in conv2d layers due to the fact that FP32       detected. This allows us to define true positive (Tp), false
encodes NaN as all eight exponent bits being in state ”1”. In         positive (Fp), and false negative (Fn) SDC detection rates as
the studied classifier networks, the latter effect is very rare for
single bit flips in weights (see Sec. 4) but not necessarily for                         Tp = p(sdc|oob) · p(oob),
single neuron bit flips or multiple flips of either type.                                Fp = p(cl|oob) · p(oob),                    (2)
   The creation of the above exceptions is found to differ                               Fn = p(sdc|ib) · p(ib).
slightly between CPU and GPU executions, as well as be-
tween experiments with different batch sizes on the acceler-          The fault detector then is characterized by precision, P =
ator. We attribute this observation to algorithmic optimiza-          Tp/(Tp + Fp), and recall, R = Tp/(Tp + Fn).
tions on the GPU that are not necessarily IEEE754-compliant              The Tab. 1 displays the chances of oob and sdc events re-
and thus affect the floating point precision [30]. To miti-           sulting from a single fault per image in the absence of range
gate the effect of exception handling we monitor the occur-           protection. For weight faults, we find that all three CNNs
rences of Inf and NaN in the output of any network layer.             showcase a high correlation between oob situations and ei-
All forward passes with an exception are separated and de-            ther SDC or DUE events (p(sdc|oob) + p(due|oob) > 0.99),
fine the detected uncorrectable error (DUE) rate, p(due) =            which can be associated with the chance of a successful error
Nexceptions /Ntest, correct , see Fig. 3.                             detection, Pdetection (see Eq. 1). The chance of finding SDC
   In a real system, DUE events can be readily monitored and          after ib events is very small ( 1e−3 ), leading to a very high
the execution is typically halted on detection. However, due          precision and recall performance (> 0.99). For neuron faults,
to the non-numerical nature of these errors we cannot apply           while the recall remains very high, the precision is reduced (in
the same mitigation strategy that is adopted for SDC events.          particular VGG-16 and AlexNet) due to additional Fp events
We therefore make the assumption that either a fallback sys-          where non-MSB oob events still get classified correctly.
tem (e.g., alternative classifier, emergency stop of vehicle,            We further verify that SDC events from single weight faults
etc.) can be leveraged or a timely re-execution is possible to        are attributed almost exclusively to flips of the MSB. This can
recover from transient DUE events. This in turn assumes that          be explained with the distribution of parameters in the studied
DUEs do not impact the system safety but may compromise               networks (Fig. 4). The weight values are closely centered
the system availability when occurring frequently.                    around zero, and thus exhibit characteristic properties when
represented in an eight-exponent data format. In the fault-free                                  Weight faults          Neuron faults
case, the MSB always has state “0”, while the exponent bits 2
to 4 are almost always in state “1”. This means that among the            ResNet-50:
relevant exponential bits all single bit flips of the MSB will            p(sdc)                 0.018 ± 0.001          0.013 ± 6e−4
produce large values, while those of the other exponential bits           p(oob)                 0.019 ± 0.001          0.013 ± 6e−4
will either be from “1” → “0” or will be too small to have a              p(sdc|oob)             0.981 ± 0.008          0.974 ± 0.008
significant effect.                                                       p(sdc|ib)              5e−5 ± 4e−5            0.0 ± 0.0
   For neuron faults, on the other hand, the distribution of              p(MSB|sdc)             0.998 ± 0.002          0.961 ± 0.012
fault-free values is input-dependent and broader, leading in              P                      0.997 ± 0.002          0.980 ± 0.006
general to a smaller quota of MSB flips to SDC, in favor of               R                      0.997 ± 0.002          1.0 ± 0.0
flips of other exponential bits and the sign bit. No SDC due              p(due)                 3e−4 ± 1e−4            5e−4 ± 1e−4
to mantissa bit flips were observed in either weight or neuron            p(due|oob)             0.016 ± 0.008          0.006 ± 0.005
faults. DUE events are unlikely (< 0.01) for a single bit flip as         p(MSB|due)             1.0 ± 0.0              1.0 ± 0.0
there are not multiple large values to add up. Further, network           VGG-16:
weights are usually < 1, meaning that at least two exponent               p(sdc)                 0.024 ± 0.001          0.016 ± 9e−4
bits are in state ”0”, and hence at least two bit flips are needed        p(oob)                 0.027 ± 0.001          0.020 ± 0.001
to directly generate a NaN value.                                         p(sdc|oob)             0.893 ± 0.010          0.778 ± 0.016
                                                                          p(sdc|ib)              7e−5 ± 7e−5            0.0 ± 0.0
5     Range restriction methods for error                                 p(MSB|sdc)             0.997 ± 0.003          0.397 ± 0.017
      mitigation                                                          P                      0.999 ± 0.001          0.820 ± 0.014
                                                                          R                      0.997 ± 0.003          1.0 ± 0.0
5.1    Model                                                              p(due)                 0.003 ± 4e−4           0.006 ± 4e−4
We refer to a subset of the tensor given by a specific index in           p(due|oob)             0.106 ± 0.011          0.051 ± 0.012
the batch and channel dimensions as a 2D feature map, de-                 p(MSB|due)             1.0 ± 0.0              1.0 ± 0.0
noted by f . Let x be an activation value from a given feature            AlexNet:
map tensor f ∈ { f1 , f2 , . . . , fCout }. Further, Tup and Tlow de-     p(sdc)                 0.022 ± 0.001          0.013 ± 0.001
note the upper and lower activation bounds assigned to the                p(oob)                 0.024 ± 0.001          0.015 ± 0.001
protection layer, respectively.                                           p(sdc|oob)             0.907 ± 0.012          0.877 ± 0.023
   Ranger: For a given set of ( f , Tup , Tlow ), Ranger [8] maps         p(sdc|ib)              2e−4 ± 1e−4            9e−5 ± 5e−5
out-of-bound values to the expected interval (see Fig. 2c),               p(MSB|sdc)             0.995 ± 0.003          0.245 ± 0.031
                                                                         P                      1.0 ± 0.0              0.913 ± 0.025
                                Tup if x > Tup ,                         R                      0.989 ± 0.005          0.994 ± 0.004
               rranger (x) = Tlow if x < Tlow ,                   (3)     p(due)                 0.003 ± 3e−4           0.005 ± 3e−4
                                                                                                 0.093 ± 0.012          0.040 ± 0.011
                                
                                   x        otherwise.                    p(due|oob)
                                                                          p(MSB|due)             1.0 ± 0.0              1.0 ± 0.0
   Clipper: In a similar way, clipping truncates activations
that are out of bound to zero [7],                                      Table 1: Statistical absolute and conditional probabilities of SDC,
                                                                       DUE events and the related precision and recall of the fault detector.
                             0 if x > Tup or x < Tlow ,                 Experiments of 10K fault injections were repeated 10 times, where
           rclipping (x) =                               (4)
                             x otherwise.                               a single fault per image was injected in any of the 32 bits for each
                                                                        image (from ImageNet, using a batch size of one). We further list
The intuition is that it can be favorable to eliminate corrupted        what proportion of SDC or DUE events were caused by MSB flips.
elements rather than to re-establish finite activations.
   FmapRescale: While uniform restriction methods help in
eliminating large out-of-bound values, the information en-              neuronal faults, where we may assume that a specific acti-
coded in relative differences of activation magnitudes is lost          vation value is bit-flipped directly. For weight faults, on the
when all out-of-bound values are flattened to the same value.           other hand, the observed out-of-bound output activation is the
The idea of rescaling is to linearly map all large out-of-bound         result of a multiply-and-accumulate operation of an input ten-
values back onto the interval [Tlow , Tup ], implying that smaller      sor with a bit-flipped weight value. However, we argue that
out-of-bound values are reduced more. This follows the in-              the presented back-flip operation will recover a representa-
tuition that the out-of-bound values can originate from the             tive product, given that the input component is of the order of
entire spectrum of in-bound values.                                     magnitude of one. To restore a flipped value, we distinguish
                                                                        the following cases:
                  (x−min( f ))(Tup −Tlow )
                
                 max( f )−min( f ) + Tlow if x > Tup ,
                                                                                              
                                                                                                0      if x > Tup · 264 ,
  rrescale (x) = Tlow                            if x < Tlow , (5)
                                                                                               
                                                                                               
                                                                                                       if Tup · 264 > x > Tup · 2,
                                                                                               
                                                                                               2
                                                                                              
                                                                                               
                
                  x                              otherwise.                     rbackflip (x) = Tup if Tup · 2 > x > Tup ,          (6)
                                                                                               
  Backflip: We analyze the underlying bit flips that may                                       
                                                                                               
                                                                                               T low  if x <  T low ,
                                                                                               
have caused out-of-bound values. This reasoning holds for                                      
                                                                                                x      otherwise.
                                 49.9                           ResNet-50                                               reconstruct a corrupted fmap. The intuition behind this ap-
                                                                                                              1 FI
SDC rate (in %)

                  40                                                                                          10 FI     proach is as follows: Every filter in a given conv2d layer tries
                  20    8.7                                                                                             to establish characteristic features of the input image. Typi-
                                        2.5   2.3    0.0    0.0             0.0    0.0   0.0   0.2    0.4    0.2
                   0                                                                                                    cally, there is a certain redundancy in the topology of fmaps,
                                 65.0                             VGG-16                                                since not all features the network was trained to recognize
                  60                          53.5                                                            1 FI
                                                                                                                        may be strongly pronounced for a given image (instead mix-
SDC rate (in %)




                                                                                                              10 FI
                  40
                       11.8
                                                                                                                        tures of potential features may form), or because multiple
                  20                    8.3                                                                  9.0
                                                     0.0    0.5             0.0    0.5   0.0   0.5    0.8               features resemble each other at the given processing stage.
                   0
                                 62.4                             AlexNet                                               Therefore, replacing a corrupted fmap with a non-corrupted
                                              54.9                                                            1 FI
                  60                                                                                                    fmap from a different kernel can help to obtain an estimate of
SDC rate (in %)




                                                                                                              10 FI
                  40
                                                                                                             21.1       the original topology. We average all healthy (i.e., not con-
                  20   10.9
                                        5.0          0.1    1.2             0.1    1.2   0.1   1.2    1.6               taining out-of-bound activations) fmaps by
                   0
                       No_protection     Ranger       Clipper                BackFlip    FmapAvg      FmapRescale
                                                                                                                              ind = {i = 1 . . .Cout | max( fi ) ≤ Tup , min( fi ) ≥ Tlow },
                                                     (a) Weight faults
                                 37.2                           ResNet-50
                                                                                                                                        1
                                                                                                                               favg =        ∑ fi .                                          (7)
                  30
                                                                                                                1 FI
                                                                                                                                      |ind| j∈ind
SDC rate (in %)




                                                                                                                10 FI

                  20                                                                                         16.1
                  10    4.7
                                        0.0   0.2    0.0    0.0             0.0    0.0   0.1
                                                                                               3.8
                                                                                                      0.3               If there are no healthy feature maps, favg will be the zero-
                   0                                                                                                    tensor. Subsequently, we replace oob values in a corrupted
                                 40.4                             VGG-16
                  40                                                                                            1 FI    fmap with their counterparts from the estimate of Eq. (7),
SDC rate (in %)




                                                                                                                10 FI

                  20
                                                                                                                                            
                                                                                                             8.2                              f (x) if x > Tup or x < Tlow ,
                        5.0
                                        0.0   0.3    0.0    0.1             0.0    0.1   0.0   1.3    0.2                        rfavg (x) = avg                                   (8)
                   0                                                                                                                          x        otherwise.
                                 31.9                             AlexNet
                  30     1 FI                                                                                24.2
SDC rate (in %)




                  20
                         10 FI
                                                                                               16.4                     5.2     Results
                  10    3.8
                                        0.3   2.7    0.1    0.4             0.0    0.4   0.5          1.0
                                                                                                                        In Fig. 5 we present results for the SDC mitigation exper-
                   0
                       No_protection     Ranger       Clipper                BackFlip    FmapAvg      FmapRescale
                                                                                                                        iments with different range supervision methods. Compar-
                                                                                                                        ing 1 and 10 fault injections per input image, we note that
                                                     (b) Neuron faults                                                  the unprotected models are dramatically corrupted with an
                                                                                                                        increasing fault rate (SDC rate becomes ≥ 0.50 for weights,
Figure 5: SDC rates for weight (a) and neuron (b) faults using dif-
                                                                                                                        ≥ 0.32 for neurons in the presence of 10 faults). We can asso-
ferent range supervision techniques. Note that compared to Tab. 1
rates are around 4× higher since we inject only in the bits 0 − 8 here.                                                 ciate the SDC rate with the chance of unsuccessful mitigation,
                                                                                                                        1 − Pmitigation , in Eq. 1. Weight faults have a higher impact
                                                                                                                        than neuron faults since they directly corrupt a multitude of
                                                                                                                        activations in a layer’s fmap output (in contrast to individual
The above thresholds are motivated by the following logic:
                                                                                                                        activations for neuron faults) and thus propagate faster than
Given appropriate bounds, an activation is < Tup before a bit
                                                                                                                        neuron faults.
flip. Any flip of an exponential bit i ∈ {1 . . . 8} effectively
                                                                                                                           All the studied range restriction methods reduce the SDC
multiplies a factor of pow(2, 28−i ). Hence, any value beyond                                                           rate by a significant margin, but perform differently for
Tup · 264 must have originated from a flip ”0” → ”1” of the                                                             weight and neuron fault types: For weight faults, we observe
MSB, meaning that the original value was between 0 and 2.                                                               that Clipper, Backflip, and FmapAvg are highly efficient in
We then set back all out-of-bound values in this regime to                                                              all three networks, with SDC rates suppressed to values of
zero, assuming that lower reset values represent a more con-                                                            . 0.01 (SDC reduction of > 50×). Ranger provides a much
servative choice in eliminating faults. Next, flipped values                                                            weaker protection, in particular in the more shallow networks
that are between Tup · 264 > x > Tup · 2 can possibly originate                                                         VGG-16 and AlexNet. FmapRescale performs better than
from a flip of any exponential bit. Given that Tup is typically                                                         Ranger but worse than the aforementioned methods. The
> 1, a bit flip has to produce a corrupted absolute value > 2 in                                                        deepest studied network, ResNet-50, benefits the most from
this regime. This is possible only if either the MSB is flipped                                                         any type of range restriction in the presence of weight faults.
from ”0” → ”1”, or the MSB is already at ”1” and another                                                                   When it comes to neuron faults (Fig. 5b), we see that Clip-
exponential bit is flipped ”0” → ”1”. In all variants of the lat-                                                       per and Backflip provide the best protection (SDC rate is sup-
ter case, the original value had to be already > 2 itself, and                                                          pressed to < 0.005, reduction of > 38×), followed by the
hence we conservatively reset out-of-bound values to 2. Fi-                                                             also very effective Ranger (except for AlexNet). FmapAvg
nally, corrupted values of Tup · 2 > x > Tup may originate from                                                         appears to be less efficient for higher fault rates in this sce-
any non-sign bit flip. Lower exponential or even fraction bit                                                           nario, while FmapRescale again falls behind all the above.
flips result from already large values close to Tup in this case,                                                          Overall, we conclude that the pruning-inspired mitigation
which is why we set back those values to the upper bound.                                                               techniques Clipper and Backflip represent the best choices
As in Ranger, values that are too small are reset to Tlow .                                                             among the investigated ranger supervision methods, as they
   FmapAvg: The last proposed range restriction technique                                                               succeed in mitigating both weight and neuron faults to very
uses the remaining, healthy fmaps of a convolutional layer to                                                           small residual SDC rates.
                                                            Articulated truck,
                                                                                 Non-VRU                                                                                                                                    33.1                        No_protection
                VRU cluster    Safety-critical fault             Bus, Car,
                                                                                 cluster
                                                                                                                                                 35                                                                                                     Ranger
                                                                                                                                                                                                                                                        Clipper




                                                                                                    SDC rate with safety-critical confusions (in %)
                                                            Motorcycle, Pickup
                                                            truck, Single-unit                                                                                                                                                                          FmapRescale
                 Pedestrian,                                 truck, Work-van,                                                                    30                                                                                                     Backflip
                   Bicycle                                    Non-motorized                                                                                                                                                                             FmapAvg
 Non-critical                                                     vehicle        Non-critical
 fault
                                       Non-critical fault
                                                                                 fault                                                           25

                                                                                                                                                 20

                                                                                                                                                 15
                                                                                                                                                                                                                            12.0
                                      Background
                                                                                                                                                                 6.8
                                                                                                                                                 10
                                        Background                                                                                                                                                                                  4.1
                                        cluster
                                                                                                                                                           5     2.8
                                                                                                                                                                             1.0
                                                                                                                                                                                     0.0       0.3       0.0     0.0                       0.0        0.6   0.1   0.0
                                                                                                                                                                                                                                    1.5
Figure 6: Formation of class clusters in MIOVision (VRU denotes                                                                                                              0.3     0.0       0.1       0.0     0.0                       0.0        0.2   0.0   0.0
vulnerable road user). We make the assumption here that confusions                                                                                         0                               1                                                     10
towards less vulnerable clusters are the most safety-critical ones.                                                                                                                                            Faults per epoch
                                                                                                                                                                                                         (a) Weight faults
                                                                                                                                                                                                                             20.1
   In the experiments of Fig. 5, the encountered DUE rates for                                                                                        20.0                                                                                              No_protection
                                                                                                                                                                                                                                                        Ranger
1 weight or neuron fault (0.003 for ResNet, 0.03 for VGG-16                                                                                                                                                                                             Clipper




                                                                                                         SDC rate with safety-critical confusions (in %)
or AlexNet) are only slightly reduced by range restrictions.                                                                                          17.5                                                                                              FmapRescale
                                                                                                                                                                                                                                                        Backflip
However, for a fault rate of 10 we find the following trends:                                                                                                                                                                15.5
                                                                                                                                                                                                                                                        FmapAvg
                                                                                                                                                      15.0
i) For weights, the DUE is significantly reduced in ResNet
(from 0.15 to 0.002), while rates in VGG (0.22) and AlexNet                                                                                           12.5
(0.26) remain. ii) For neurons, Ranger, Clipper and Backflip
suppress the DUE rate by a factor of up to 2× in all networks.                                                                                        10.0
   The studied range restriction techniques require different
                                                                                                                                                           7.5
compute costs due to the different number of additional graph
                                                                                                                                                                                                                                                      5.0         5.0
operations. In PyTorch, not all needed functions can be im-                                                                                                5.0
plemented with the same efficiency though. For example,                                                                                                            2.8
                                                                                                                                                                                                                                                      3.5         3.5

Ranger is executed with a single clamp operation, while no                                                                                                 2.5         2.2
                                                                                                                                                                                                   0.8            0.9
equivalent formulation is available for Clipper and instead                                                                                                                    0.0
                                                                                                                                                                               0.0
                                                                                                                                                                                       0.1
                                                                                                                                                                                       0.0         0.0
                                                                                                                                                                                                           0.1
                                                                                                                                                                                                           0.0     0.0
                                                                                                                                                                                                                                     0.1
                                                                                                                                                                                                                                     0.0   0.0
                                                                                                                                                                                                                                           0.0              0.0
                                                                                                                                                                                                                                                            0.0
three operations are necessary (two masks to select oob val-                                                                                               0.0                                 1                                                 10
                                                                                                                                                                                                                 Faults per image
ues greater and smaller than the threshold, and a masked-fill
operation to clip to zero). As a consequence, measured laten-                                                                                                                                            (b) Neuron faults
cies are framework-dependent and a fair comparison cannot
                                                                                                Figure 7: SDC rates for ResNet-50 and MIOvision. We inject 1 and
be made at this point. Given the complexity of the protection
                                                                                                10 faults targeting bits 0 − 8 in the network weights (a) and neurons
operations, we may instead give a qualitative performance                                       (b). The portion of safety-critical SDC events according to Fig. 6 is
ranking of the described methods: FmapRescale appears to                                        displayed as a dark-color overlay.
be the most expensive restriction method due to the needed
number of operations, followed by FmapAvg and Backflip.
Clipper and Ranger are the least complex, with the latter out-                                  see Fig. 6. Misclassifications that lead to the prediction of
performing the former in the used framework, due to its more                                    a class in a less vulnerable cluster are assumed to be safety-
efficient use of optimized built-in operations.                                                 critical (Severity ≈ 1 in Eq. 1, e.g., a pedestrian is misclassi-
                                                                                                fied as background), while confusions within the same clus-
6     Analysis of traffic camera use case                                                       ter or towards a more vulnerable cluster are considered non-
                                                                                                critical (Severity ≈ 0) as they typically lead only to similar or
As a selected safety-critical use case, we study object classifi-                               a more cautious behavior. This binary estimation allows us
cation in the presence of soft errors with a retrained ResNet-                                  quantify the overall risk as the portion of SDC events associ-
50 and the MIOVision data set [24]. The data contains im-                                       ated with the respective critical class confusions.
ages of 11 classes including for example pedestrian, bike, car,                                    From our results in Fig. 7 we make the following obser-
or background, that were taken by traffic cameras. The cor-                                     vations: i) The relative proportion of critical confusions is
rect identification of an object type or category can be safety-                                lower for weight than for neuron faults in the unprotected and
critical for example to an automated vehicle that uses the sup-                                 most protected models. For weight faults, the most frequent
port of infrastructure sensors for augmented perception [31].                                   confusions are from other classes to the class ”car” (the most
   However, not every class confusion is equally harmful.                                       robust class of MIOVision, with the most images in the train-
To estimate the severity of an error-induced misclassifica-                                     ing set), which are statistically mostly non-critical. Neuron
tion we establish three clusters of vulnerable, as well as non-                                 faults, on the other hand, distort feature maps in a way that
vulnerable road users (VRU or non-VRU), and background,                                         induces with the highest frequency misclassifications towards
the class ”background”. Those events are all safety-critical         Acknowledgment
(see Fig. 6), leading to a high critical-to-total SDC ratio. ii)
Range supervision is not only effective in reducing the over-        Our research was partially funded by the Federal Ministry of
all SDC count, but also suppresses the critical SDC count            Transport and Digital Infrastructure of Germany in the project
proportionally. For example, we observe that the most fre-           Providentia++ (01MM19008). Further, this research was par-
quent critical class confusion caused by 1 or 10 weight faults       tially supported by a grant from the Natural Sciences and En-
is from the class ”pedestrian” to ”car” (≈ 0.2 of all critical       gineering Research Council of Canada (NSERC), and a re-
SDC cases), where > 0.99 of those cases can be mitigated             search gift from Intel to UBC.
by Clipper or Backflip. For neuron faults, the largest criti-
cal SDC contribution is from ”pedestrian” to ”background”            References
(1 fault) or ”car” to ”background” (10 faults), both in about        [1]   International Organization for Standardization, “ISO 26262,”
0.1 of all critical SDC cases. Clipper or Backflip are able to             Tech. Rep., 2018. [Online]. Available: https://www.iso.org/
suppress > 0.91 of those events.                                           standard/68383.html
   As a consequence, all studied range-restricted models ex-         [2]   ——, “Road vehicles - Safety of the intended functionality,”
hibit a critical-to-total SDC ratio that is similar to or lower            Tech. Rep., 2019. [Online]. Available: https://www.iso.org/
than one of the unprotected network (< 0.41 for weight,                    standard/70939.html
< 0.78 for neuron faults), meaning that faults in the presence
                                                                     [3]   J. Athavale, A. Baldovin, R. Graefe, M. Paulitsch, and
of range supervision have on average a similar or lower sever-
                                                                           R. Rosales, “AI and Reliability Trends in Safety-Critical Au-
ity than faults that do not face range restrictions. A lower ratio
                                                                           tonomous Systems on Ground and Air,” Proceedings - 50th
can be interpreted as a better preservation of the feature map             Annual IEEE/IFIP International Conference on Dependable
topology: If the reconstructed features are more similar to the            Systems and Networks, DSN-W 2020, pp. 74–77, 2020.
original features there is a higher chance of the incorrect class
being similar to the original class and thus to stay within the      [4]   H. D. Dixit, S. Pendharkar, M. Beadon, C. Mason,
same cluster. The total probability of critical SDC events –               T. Chakravarthy, B. Muthiah, and S. Sankar, Silent Data
                                                                           Corruptions at Scale. Association for Computing Machinery,
and therefore the relative risk according to Eq. 1 – is negligi-           2021, vol. 1, no. 1. [Online]. Available: arxiv.org/abs/2102.
ble in the studied setup in the presence of Clipper or Backflip            11245
range protection.
                                                                     [5]   A. Neale and M. Sachdev, “Neutron Radiation Induced Soft
   The mean DUE rates in the unprotected model are 0.0                     Error Rates for an Adjacent-ECC Protected SRAM in 28 nm
(0.02) for 1 weight (neuron) fault and 0.11 (0.17) for 10                  CMOS,” IEEE Transactions on Nuclear Science, vol. 63, no. 3,
faults. Using any of the protection methods, the system’s                  pp. 1912–1917, 2016.
availability increases as DUE rates are negligible for 1 fault,
                                                                     [6]   G. Li, S. K. S. Hari, M. Sullivan, T. Tsai, K. Pattabiraman,
and reduce to < 0.03 (< 0.05) for 10 weight (neuron) faults.
                                                                           J. Emer, and S. W. Keckler, “Understanding error propagation
                                                                           in Deep Learning Neural Network (DNN) accelerators and ap-
                                                                           plications,” in Proceedings of the International Conference for
7   Conclusion                                                             High Performance Computing, Networking, Storage and Anal-
                                                                           ysis, SC 2017, 2017.
In this paper, we investigated the efficacy of range supervi-        [7]   L.-H. Hoang, M. A. Hanif, and M. Shafique, “FT-ClipAct:
sion techniques for constructing a safety case for computer                Resilience Analysis of Deep Neural Networks and Improving
vision AI applications that use Convolutional Neural Net-                  their Fault Tolerance using Clipped Activation,” 2019.
works (CNNs) in the presence of platform soft errors. In the               [Online]. Available: https://arxiv.org/abs/1912.00941
given experimental setup, we demonstrated that the imple-            [8]   Z. Chen, G. Li, and K. Pattabiraman, “Ranger: Boosting
mentation of activation bounds allows for a highly efficient               Error Resilience of Deep Neural Networks through Range
detection of SDC-inducing faults, most importantly featur-                 Restriction,” 2020. [Online]. Available: https://arxiv.org/abs/
ing a recall of > 0.99. Furthermore, we found that the range               2003.13874
restriction layers can mitigate the once-detected faults effec-      [9]   J. M. Cluzeau, X. Henriquel, G. Rebender, G. Soudain,
tively by mapping out-of-bound values back to the expected                 L. van Dijk, A. Gronskiy, D. Haber, C. Perret-Gentil,
intervals. Exploring distinct restriction methods, we observed             and R. Polak, “Concepts of Design Assurance for
that Clipper and Backflip perform best for both weight and                 Neural Networks ( CoDANN ),” Public Report Extract
neuron faults and can reduce the residual SDC rate to . 0.01               Version 1.0, pp. 1–104, 2020. [Online]. Available: https:
(reduction by a factor of > 38×). Finally, we studied the                  //www.easa.europa.eu/document-library/general-publications/
selected use case of vehicle classification to quantify the im-            concepts-design-assurance-neural-networks-codann
pact of range restriction on the severity of SDC events (repre-      [10] P. Koopman and B. Osyk, “Safety argument considerations for
sented by cluster-wise class confusions). All discussed tech-             public road testing of autonomous vehicles,” SAE Technical
niques reduce critical and non-critical events proportionally,            Papers, vol. 2019-April, no. April, 2019.
meaning that the average severity of SDC is not increased.           [11] Z. Chen, G. Li, K. Pattabiraman, and N. Debardeleben, “BinFI:
Therefore, we conclude that the presented approach reduces                An efficient fault injector for safety-critical machine learn-
the overall risk and thus enhances the safety of the user in the          ing systems,” International Conference for High Performance
presence of platform soft errors.                                         Computing, Networking, Storage and Analysis, SC, 2019.
[12] S. Hong, P. Frigo, Y. Kaya, C. Giuffrida, and T. Dumitras,         [26] Intel Corporation,      “bfloat16 - Hardware Numer-
     “Terminal brain damage: Exposing the graceless degradation              ics Definition,” Tech. Rep., 2018. [Online]. Avail-
     in deep neural networks under hardware fault attacks,” in Pro-          able: https://software.intel.com/content/www/us/en/develop/
     ceedings of the 28th USENIX Security Symposium, 2019.                   download/bfloat16-hardware-numerics-definition.html
[13] A. Lotfi, S. Hukerikar, K. Balasubramanian, P. Racunas,            [27] R. Theagarajan, F. Pala, and B. Bhanu, “EDeN: Ensemble of
     N. Saxena, R. Bramley, and Y. Huang, “Resiliency of auto-               Deep Networks for Vehicle Classification,” IEEE Computer
     motive object detection networks on GPU architectures,” Pro-            Society Conference on Computer Vision and Pattern Recog-
     ceedings - International Test Conference, vol. 2019-Novem,              nition Workshops, 2017.
     pp. 1–9, 2019.                                                     [28] C. K. Chang, S. Lym, N. Kelly, M. B. Sullivan, and M. Erez,
[14] M. A. Hanif and M. Shafique, “SalvagedNn: Salvaging                     “Evaluating and accelerating high-fidelity error injection for
     deep neural network accelerators with permanent faults                  HPC,” Proceedings - International Conference for High Per-
     through saliency-driven fault-aware mapping,” Philosophical             formance Computing, Networking, Storage, and Analysis, SC
     Transactions of the Royal Society A: Mathematical, Physical             2018, pp. 577–589, 2019.
     and Engineering Sciences, 2020. [Online]. Available: https:        [29] A. Mahmoud, N. Aggarwal, A. Nobbe, J. R. Sanchez Vicarte,
     //royalsocietypublishing.org/doi/10.1098/rsta.2019.0164                 S. V. Adve, C. W. Fletcher, I. Frosio, and S. K. S. Hari, “Py-
                                                                             TorchFI: A Runtime Perturbation Tool for DNNs,” in DSN-
[15] A. Mahmoud, S. K. Sastry Hari, C. W. Fletcher, S. V. Adve,              DSML, 2020.
     C. Sakr, N. Shanbhag, P. Molchanov, M. B. Sullivan, T. Tsai,
     and S. W. Keckler, “Hardnn: Feature map vulnerability evalu-       [30] Nvidia, “Cuda toolkit documentation,” 2021. [Online].
     ation in CNNS,” 2020.                                                   Available: https://docs.nvidia.com/cuda/floating-point/index.
                                                                             html
[16] J. Ponader, S. Kundu, and Y. Solihin, “MILR: Mathematically
                                                                        [31] A. Krämmer, C. Schöller, D. Gulati, and A. Knoll,
     Induced Layer Recovery for Plaintext Space Error Correction
                                                                             “Providentia - A large scale sensing system for the assistance
     of CNNs,” 2020. [Online]. Available: http://arxiv.org/abs/
                                                                             of autonomous vehicles,” arXiv, 2019. [Online]. Available:
     2010.14687
                                                                             arxiv:1906.06789
[17] K. Zhao, S. Di, S. Li, X. Liang, Y. Zhai, J. Chen, K. Ouyang,
     F. Cappello, and Z. Chen, “FT-CNN: Algorithm-Based Fault
     Tolerance for Convolutional Neural Networks,” IEEE Trans-
     actions on Parallel and Distributed Systems, vol. 32, no. 7, pp.
     1677–1689, 2021.
[18] C. Schorn, A. Guntoro, and G. Ascheid, “Efficient On-Line
     Error Detection and Mitigation for Deep Neural Network Ac-
     celerators,” in Safecomp 2018, vol. 11093 LNCS, 2018.
[19] L. Yang and B. Murmann, “SRAM voltage scaling for energy-
     efficient convolutional neural networks,” in Proceedings - In-
     ternational Symposium on Quality Electronic Design, ISQED.
     IEEE Computer Society, may 2017, pp. 7–12.
[20] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning
     for image recognition,” Proceedings of the IEEE Computer So-
     ciety Conference on Computer Vision and Pattern Recognition,
     vol. 2016-Decem, pp. 770–778, 2016.
[21] K. Simonyan and A. Zisserman, “Very deep convolutional net-
     works for large-scale image recognition,” 3rd International
     Conference on Learning Representations, ICLR 2015 - Con-
     ference Track Proceedings, 2015.
[22] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ImageNet clas-
     sification with deep convolutional neural networks,” Advances
     in Neural Information Processing Systems, vol. 2, pp. 1097–
     1105, 2012.
[23] Jia Deng, Wei Dong, R. Socher, Li-Jia Li, Kai Li, and Li Fei-
     Fei, “ImageNet: A large-scale hierarchical image database,”
     in 2009 IEEE Conference on Computer Vision and Pattern
     Recognition, 2009.
[24] Z. Luo, F. B. Charron, C. Lemaire, J. Konrad, S. Li, A. Mishra,
     A. Achkar, J. Eichel, and P.-M. Jodoin, “MIO-TCD: A new
     benchmark dataset for vehicle classification and localization,”
     IEEE Transactions on Image Processing, 2018.
[25] IEEE, “754-2019 - IEEE Standard for Floating-Point Arith-
     metic,” Tech. Rep., 2019.