=Paper= {{Paper |id=Vol-3087/paper_8 |storemode=property |title=A Practical Overview of Safety Concerns and Mitigation Methods for Visual Deep Learning Algorithms |pdfUrl=https://ceur-ws.org/Vol-3087/paper_8.pdf |volume=Vol-3087 |authors=Saeed Bakhshi Germi,Esa Rahtu |dblpUrl=https://dblp.org/rec/conf/aaai/GermiR22 }} ==A Practical Overview of Safety Concerns and Mitigation Methods for Visual Deep Learning Algorithms== https://ceur-ws.org/Vol-3087/paper_8.pdf
              A Practical Overview of Safety Concerns and Mitigation Methods
                           for Visual Deep Learning Algorithms
                                                Saeed Bakhshi Germi, Esa Rahtu
                                                          Tampere University
                                              Korkeakoulunkatu 7, 33720 Tampere, Finland
                                              saeed.bakhshigermi@tuni.fi, esa.rahtu@tuni.fi


                            Abstract                                     This paper focuses on the underlying cause of faults in
                                                                       a visual deep learning algorithm to provide a list of safety
  This paper proposes a practical list of safety concerns and          concerns and potential state-of-the-art mitigation methods.
  mitigation methods for visual deep learning algorithms. The
  growing success of deep learning algorithms in solving non-
                                                                       The main contributions of this paper are:
  linear and complex problems has recently attracted the atten-         • Providing a practical, complete, and categorical list of
  tion of safety-critical applications. While the state-of-the-art        possible faults with their underlying cause for different
  methods achieve high performance in synthetic and real-case             visual deep learning algorithm components.
  scenarios, it is impossible to verify/validate their reliability
  based on currently available safety standards. Recent works           • Providing potential state-of-the-art mitigation methods to
  try to solve the issue by providing a list of safety concerns and       deal with the faults.
  mitigation methods in generic machine learning algorithms              The rest of this paper is structured as follows. Section 2
  from the standards’ perspective. However, these solutions are        covers related works. Next, Section 3 explains safety con-
  either vague, and non-practical when dealing with deep learn-
  ing methods in real-case scenarios, or they are shallow and
                                                                       cerns related to a visual deep learning algorithm and pro-
  fail to address all potential safety concerns. This paper pro-       vides existing mitigation methods to deal with them. Finally,
  vides an in-depth look at the underlying cause of faults in a        Section 4 concludes the work.
  visual deep learning algorithm to find a practical and com-
  plete safety concern list with potential state-of-the-art mitiga-                       2   Related Works
  tion strategies.                                                     A visual deep learning algorithm is prone to different types
                                                                       of faults. Recent papers focus on either solving specific
                      1    Introduction                                faults or providing an overview of all system-related safety
                                                                       concerns. Here we discuss some of the most important con-
Deep learning is a powerful tool that solves mathemati-
                                                                       temporary works:
cally challenging tasks with high dimensional inputs and
                                                                          Zhang’s review of recent papers explains how violation of
multi-variable optimization requirements such as human re-
                                                                       critical assumptions in the training stage would lead to faults
identification, optical character recognition, and object de-
                                                                       and a non-robust system (Zhang, Liu, and Suen 2020). This
tection. The learning process involves using heuristic and
                                                                       review also categorically covers existing mitigation methods
numerical methods, which are often hard to explain or inter-
                                                                       and discusses each technique’s effectiveness. Song focuses
pret as the dimension grows (black-box behavior).
                                                                       on learning with noisy labels and discusses major strategies
   While state-of-the-art deep learning algorithms achieve
                                                                       to overcome the challenges of this topic (Song et al. 2021).
high performance in various synthetic and real-life cases,
                                                                       While these works and similar titles provide potential miti-
there is no guarantee for the reliability requirements that
                                                                       gation methods for specific faults, they do not offer a com-
safety-critical applications typically demand since avail-
                                                                       plete list of all safety concerns.
able safety standards do not provide a suitable verifica-
                                                                          Kläs suggests using uncertainty wrappers on deep learn-
tion/validation method for deep learning models.
                                                                       ing components to ensure the outcome is dependable (Kläs
   Recent works found another way of dealing with the prob-            and Jöckel 2020). However, these wrappers rely on specific
lem. By explaining the potential safety concerns of a deep             metrics that require prior knowledge of data, which is con-
learning algorithm, it is possible to provide suitable mitiga-         sidered impractical in the deep learning field.
tion methods around them. While the overall strategy sounds               Wozniak, Schwalbe, and Willers suggest different ap-
effective, most works fail to provide a practical list of safety       proaches to providing a safety concern list and mitigation
concerns and mitigation methods. These lists are typically             methods for developing a deep learning algorithm (Woz-
vague, impractical to implement, shallow, and incomplete.              niak et al. 2020; Schwalbe et al. 2020; Willers et al. 2020).
Copyright © 2022 for this paper by its authors. Use permitted under    The proposed strategies contain various goals related to the
Creative Commons License Attribution 4.0 International (CC BY          dataset, model, and training/inference stage. However, some
4.0).                                                                  goals are vague and non-practical, with no explanation on
how to achieve them or what to do if the goal is not achiev-      SC 2 – Inadequate Dataset: Due to the ever-changing na-
able. Moreover, the list is not complete in either work.          ture of real-world conditions, the collected data for training
   Houben provides an extensive list of practical methods to      will not have identical distribution with the real-world en-
improve the safety of a deep learning algorithm (Houben           vironment in the inference stage. Even a slight mismatch in
et al. 2021). The work covers the current state-of-the-art        the distribution can cause a significant drop in performance
methods to deal with specific problems. However, the pro-         and result in poor generalization.
vided safety concern list is neither complete nor adequately
categorized.                                                      SC 3 – Insufficient/Noisy Dataset: The cost of manu-
   Other similar works, such as (Heyn et al. 2021), also suf-     ally labeling a dataset increases exponentially with its size.
fer from the same issues. The flaws of recent works can be        While having a small clean validation dataset is feasible,
listed as one or more of the following:                           larger datasets tend to have noisy labels. A deep learning
                                                                  algorithm can memorize this noise, leading to poor general-
 • Not covering the underlying causes of faults, which            ization and low performance.
   might lead to poor choice of mitigation methods.
                                                                  SC 4 – Ill-Matched Architecture: Manually comparing
 • Providing non-practical and vague mitigation methods,
                                                                  different models and hyperparameters to find the best match
   which are not suitable for implementation.
                                                                  for the task is time-consuming and costly. Moreover, it re-
 • Overestimating the practical capabilities of mitigation        quires an expert in the field to provide an insight into the
   methods in dealing with faults and not providing backup        problem. An ill-matched architecture could result in unfore-
   plans in case of failure.                                      seen faults due to inherent weakness against specific situa-
                                                                  tions that might exist.
      3     Safety Concerns (SC) and Mitigation
                                                                  MM 1 – Learning with Unseen Data: Modern deep
                      Methods (MM)                                learning tools could be utilized to force the boundaries of
The development of a visual deep learning algorithm has           the training dataset even further. Out-of-distribution detec-
three major stages: (1) training, (2) evaluation, and (3)         tors can be used in the algorithm to detect unseen samples
inference. This section presents the list of possible faults      in the inference stage and reject the over-confident results of
within each stage.                                                the algorithm. These methods introduce uncertainty metrics
                                                                  to determine whether the algorithm should be trusted or not
3.1       Faults in the Training Stage                            (Chen et al. 2020; Sastry and Oore 2020; Bakhshi Germi,
Visual data is one of the significant sources of information      Rahtu, and Huttunen 2021).
for deep learning algorithms. Extracting useful information          Also, open-world recognition systems can be used to ex-
from visual data is a complex task that makes it prone to         tend the output space of the algorithm as it encounters out-
faults.                                                           lier samples in the inference stage. These methods con-
   A deep learning algorithm approximates the relationship        tinue to learn new classes during the inference stage to re-
between the input data and the objects in the real world by       duce the chance of over-confident wrong predictions (Par-
reducing the empirical risk on training data. Thus, having a      mar, Chouhan, and Rathore 2021; Bendale and Boult 2015).
proper training dataset is essential to reach the desired qual-      Moreover, the model could be trained to defend against
ity in the algorithm. A training dataset should be:               adversarial attacks by including such patterns in the training
                                                                  dataset (Xu et al. 2020; Yuan et al. 2019).
 • Complete: contain samples from the defined output space
   for the task.                                                  Discussing MM 1: Out-of-distribution detectors typically
 • Adequate: contain samples with identical distribution to       result in lower accuracy, open-world recognition systems are
   real-world.                                                    slow and demanding, and adversarial attacks keep evolving
 • Ample: contain a sufficient amount of samples for con-         and changing every day. The mentioned methods all have
   vergence of the algorithm.                                     their limitation. A suitable backup plan would involve utiliz-
                                                                  ing several models with various mitigation methods to create
 • Clean: contain well-labeled samples.                           an ensemble to vote for the final result.
   Moreover, different model structures come with specific
sets of benefits and weaknesses. Choosing the correct model,      MM 2 – Learning with Unequally Distributed Data:
setting up a suitable loss function and optimization algo-        Modern deep learning tools could be utilized to reduce the
rithm, and finding the perfect hyperparameters are essential      distribution mismatch between the training and inference
to achieve the best performance.                                  domain. Transfer learning and domain adaptation can be
                                                                  used to fine-tune the algorithm online during the inference
SC 1 – Incomplete Dataset: Due to the natural complex-            stage. These methods help the model to adapt to new envi-
ity of the real world, there is always a much larger open         ronments quickly and achieve better generalization by using
space than the defined output space for the task. Even with       a small batch of data in the inference stage (Farahani et al.
defined boundaries for the output space, known unknowns           2020; Zhuang et al. 2020).
( e.g., outlier classes) and unknown unknowns (e.g., adver-          On the other hand, the algorithm can achieve higher
sarial attacks) pose a significant issue for the algorithm by     performance by utilizing multiple sources of information
producing over-confident wrong predictions.                       for a single task (e.g., person identification with face, iris,
Figure 1: Samples of the same category in MNIST (Top) (Lecun et al. 1998) and CIFAR-10 (Bottom) (Krizhevsky 2009)
datasets (Taken from (Chen et al. 2021)). From left to right, the difficulty of classifying is increasing for both manual and
automatic label assignment, thus resulting in the increased chance of noisy labels.


voice, and fingerprint). Multimodal learning methods incor-       Discussing MM 3: Recent works prove that the label
porate supplementary and complementary data from mul-             noise is instance-dependent, as shown in Figure 1. This dis-
tiple modalities to the performance of a single task (Bal-        covery means most state-of-the-art methods in dealing with
trušaitis, Ahuja, and Morency 2018; Guo, Wang, and Wang          label noise need revision on how to mitigate the effects of
2019).                                                            label noise. Recent works happen to focus on this topic and
                                                                  provide effective solutions. While these solutions do not
Discussing MM 2: Transfer learning and domain adapta-             have mathematical proof, they perform decently on public
tion methods typically rely on having a decent starting point     benchmarks.
(trained network) and quality samples from the inference             Meanwhile, the research around synthesized data indi-
stage to fine-tune the model successfully. While the require-     cates that it may not represent the real world in every sit-
ments are hard to achieve, it is not impossible. Moreover,        uation due to the limitations of simulation environments and
multimodal methods have already been used with sensor fu-         lack of involved experts in the process. Moreover, the exist-
sion in autonomous vehicles (LIDAR, GPS, IMU, and so              ing public datasets might not suit the specific task or have
on), making them a strong candidate for use in deep learn-        other inconsistencies, such as low-quality images and noisy
ing systems. A suitable backup plan would involve storing         labels.
the input data during the inference stage to re-evaluate and         A suitable backup plan would involve developing a more
re-calibrate the algorithm by replacing parts of the older and    realistic simulation environment while including the physi-
non-useful training dataset in an iterative cycle.                cal knowledge about the task in the training process.
                                                                  MM 4 – Automated Architecture Selection: Modern
MM 3 – Learning with Noisy Labels and Small Dataset:
                                                                  deep learning tools could be utilized to select the optimum
Modern deep learning tools could be utilized to reduce
                                                                  model and hyperparameter for a given task. Automated hy-
the effect of label noise or eliminate the need for a large
                                                                  perparameter optimization (Yu and Zhu 2020; Luo 2016;
labeled dataset. Robust loss, sample selection, relabeling,
                                                                  Hutter, Lücke, and Schmidt-Thieme 2015) and neural ar-
and weighted training are all potential solutions to deal
                                                                  chitecture search (Wistuba, Rawat, and Pedapati 2019; Ren
with noisy labels in the training dataset (Song et al. 2021;
                                                                  et al. 2021) methods can reduce manual labor while elimi-
Cordeiro and Carneiro 2020; Adhikari et al. 2021). A com-
                                                                  nating the need for an expert. These methods rely on differ-
bination of multiple methods usually leads to better results.
                                                                  ent search algorithms to find the best model and hyperpa-
   On the other hand, data augmentation methods can be            rameters within the working domain.
used to create additional samples for the training dataset.
These methods typically involve rotating, scaling, shifting,      Discussing MM 4: Relying on search algorithms requires
and flipping data (Wang, Wang, and Lian 2020; Shorten             high computational power and proper comparison tools.
and Khoshgoftaar 2019). More advanced synthesizing tech-          While it will cost money and time to do it, the solution is
niques can lead to the creation of entire datasets (Raghu-        not impossible or impractical in most safety-critical applica-
nathan 2021; Nikolenko 2019). Additionally, existing public       tions.
datasets can be utilized to extend the samples at a lower cost.
   Moreover, the cost and time for manually labeling              3.2   Faults in the Evaluation Stage
datasets can be drastically reduced by using iterative label-     Evaluation of a trained deep learning algorithm requires
ing methods (Adhikari and Huttunen 2021).                         prior knowledge about the task. A testing dataset should in-
   Finally, semi-supervised and unsupervised training tech-       clude samples from all scenarios, no matter how rare, to en-
niques can be used to decrease the dependency on a clean          sure the safety of the algorithm. Also, proper performance
training dataset (Van Engelen and Hoos 2020; Schmarje             metrics should be selected during the tests to obtain compa-
et al. 2021).                                                     rable outputs.
                                      A                                       B                                           C
Figure 2: Effects of camera faults on the input image (Taken from (TND6233-D)): (A) Faulty clocking system, (B) Faulty
pipeline, and (C) Faulty row addressing logic.


   Moreover, formal verification/validation methods depend       SC 6 – Black-Box Behavior: The large volume of param-
on having an interpretable algorithm, which contrasts deep       eters and non-linear functions in deep learning algorithms
learning.                                                        result in an uninterpretable system. With no clear relation
                                                                 between the input and output of this black-box system and
SC 5 – Incompatible Metrics and Benchmarks: The
                                                                 the impossible task of testing the entire input domain, it is
most common performance metric in deep learning algo-
                                                                 hard to verify/validate deep learning algorithms based on
rithms is accuracy. However, other metrics might hold more
                                                                 safety standards.
value in safety-critical applications as the importance of
false-positive and false-negative grow exponentially in this     MM 6 – Opening the Black-Box: Representation learn-
field. Moreover, gathering a proper dataset to use as a bench-   ing enables the deep learning algorithm to discover the re-
mark has similar challenges to the training dataset.             lation between input data and output in a presentable way
MM 5 – Using Safety-Aware Metrics and Hazard-Aware               by showing the process of feature selection (Zhang et al.
Benchmarks: By including a weighted cost for each type           2018; Li, Yang, and Zhang 2018). Understanding this pro-
of fault in the performance metric, the algorithm can be eval-   cess helps to gain an insight into how the network interprets
uated according to safety requirements (Zhou et al. 2021;        input data, and which parts of data play a more significant
Gharib and Bondavalli 2019; Salman et al. 2020). These new       role in deciding the outcome.
evaluation metrics would make the trade-off between perfor-         Another way to gain such insight is to present a map of
mance and safety more visible.                                   pixel relevance for the algorithm. These heat maps illustrate
   On the other hand, a list of all hazardous scenarios can be   the importance of each pixel when calculating the output
prepared for every task for inclusion in the testing dataset     (König et al. 2021; Bach et al. 2015). Such information can
by performing a risk analysis on the task (Zendel et al.         be about isolated pixels or the interconnection of different
2018; Lambert et al. 2020). Such datasets could be treated       pixels. Studying these maps could show the effects of slight
as benchmarks for comparing different algorithms or vali-        changes in input on the output and help find potential haz-
dating their performance.                                        ardous cases.

Discussing MM 5: While formulating a new cost function           Discussing MM 6: This specific problem could be one of
requires expert knowledge, it is within the scope of expec-      the most important ones with the least proper solutions as of
tations in a safety-critical application. Various combination    yet. While it is possible to gain some insight into the opera-
of weighted metrics can be utilized and compared to find         tion of deep learning algorithms, the information cannot be
the most suitable one for the task. However, a bad decision      used in any form to verify/validate the algorithm based on
could result in a non-converging algorithm, thus there is a      traditional standards. A suitable backup plan would involve
necessity for mathematical proof about the convergence of        using safety case arguments and other similar approaches to
the algorithm.                                                   bypass the need for verification/validation for now.
   Moreover, the competitive nature of industry typically
prevents them from sharing any suitable benchmarks or            3.3   Faults in the Inference Stage
cost functions publicly, which means each company has to         In a typical case, a similar sensor used for collecting offline
spend time and resources on developing their own system. A       data provides the online data for the implemented algorithm.
suitable backup plan would involve third-party associations      On top of it, other hardware components are required for
funded by multiple companies to handle the problem for the       the algorithm to work correctly. These components can be
benefit of all members.                                          summarized as:
Figure 3: Effects of environmental factors on the input image (Taken from (Bakhshi Germi, Rahtu, and Huttunen 2021)): (A)
Original image, (B) Movement of camera/object (Motion blur), (C) Raindrop on the lens (Frosted-glass blur), (D) Out-of-
focus object (Gaussian blur), (E) Low illumination (Gaussian noise), (F,G) Improper balance of light and darkness (Low/High
brightness), and (H) Obscured object (Occlusion).


 • A camera to capture the input image.                         and ISO/PAS 21448 (ISO/PAS 21448) provide practical
 • A communication channel to transfer the captured image.      guidelines for verifying and validating hardware compo-
                                                                nents. Also, technical reports based on functional safety
 • A processing unit to host the deep learning algorithm.       standards can help develop or choose safe hardware com-
 • A power supply to keep the system running.                   ponents such as a camera (TND6233-D), communication
                                                                channel (Alanen, Hietikko, and Malm 2004), and operating
SC 7 – Defective Hardware: The first concern in deep
                                                                system (Slačka and Halás 2015).
learning algorithms is providing the necessary hardware
mentioned above. Hardware faults can have a wide range             Moreover, other precautions such as using redundant
of effects on the algorithm based on the faulty component,      hardware, proper noise shielding, and data fusion techniques
an example being the results of a faulty camera on the cap-     have already proved helpful in safety-critical applications
tured image, as shown in Figure 2. An implementation of         (Sklaroff 1976; Ciftcioglu and Turkcan 1996).
the algorithm might run into problems based on the defec-       Discussing MM 7.1: Assuming the hardware is chosen
tive hardware component:                                        based on the proper functional safety standards, it should op-
 • Camera faults that might result in various disturbances in   erate without significant safety concerns. However, this mit-
   the input image, such as pixel corruption or image distor-   igation method does not guarantee the complete removal of
   tion.                                                        any disturbance or corruption of data. Environmental factors
                                                                such as lousy illumination, movement, and obscured objects
 • Communication channel faults that might result in data
                                                                can affect input image quality without causing a hardware
   corruption or data loss.
                                                                failure, as seen in Figure 3. While some of these problems
 • Processing unit faults that might result in wrong calcula-   might not be recognizable by a human annotator, the deep
   tions, lagging, or freezing of the algorithm.                learning algorithm could run into faults based on the type
 • Power supply faults might result in breaking other hard-     and severity of corruption. Moreover, less severe levels of
   ware components or total system shutdown.                    hardware failure might cause noise variations on the input
                                                                data. A suitable backup plan would involve utilizing another
MM 7.1 – Following Functional Safety Standards: The             mitigation approach described as follows.
mentioned hardware components are not unique to deep
learning algorithms and have been used for decades in           MM 7.2 – Using Image Processing Techniques: Since
safety-critical applications. As a result, the current func-    the exact relation between the input image and the output
tional safety standards such as ISO 26262 (ISO 26262)           of the deep learning algorithm is not known, it is recom-
mended to have clean input data to reduce the change of           Adhikari, B.; and Huttunen, H. 2021. Iterative Bounding
unwanted outcomes. The current state-of-the-art image pro-        Box Annotation for Object Detection. In 25th International
cessing techniques such as denoising (Fan et al. 2019; Goyal      Conference on Pattern Recognition (ICPR), 4040–4046.
et al. 2020; Jebur, Der, and Hammood 2020), deblurring            Adhikari, B.; Peltomäki, J.; Bakhshi Germi, S.; Rahtu, E.;
(Sada and Goyani 2018; Nah et al. 2021; Abuolaim, Tim-            and Huttunen, H. 2021. Effect of Label Noise on Robust-
ofte, and Brown 2021), and enhancement (Putra, Purboyo,           ness of Deep Neural Network Object Detectors. In Com-
and Prasasti 2017) methods can improve the quality of the         puter Safety, Reliability, and Security. SAFECOMP Work-
input images and remove most of the disturbances not cov-         shops, 239–250.
ered by the previous mitigation method. Most image pro-
cessing techniques have solid mathematical foundations and        Alanen, J.; Hietikko, M.; and Malm, T. 2004. Safety of Dig-
passed extensive testing cycles to prove their effectiveness,     ital Communications in Machines. VTT Technical Research
making them easy to validate and verify for safety-critical       Centre of Finland. ISBN 951-38-6502-9.
applications.                                                     Bach, S.; Binder, A.; Montavon, G.; Klauschen, F.; Müller,
                                                                  K.-R.; and Samek, W. 2015. On Pixel-Wise Explanations for
Discussing MM 7.2: Image processing techniques are                Non-Linear Classifier Decisions by Layer-Wise Relevance
only valid when it’s known that the image is corrupted. Oth-      Propagation. PLOS ONE, 10(7): 1–46.
erwise, such functions can negatively affect a clean image
during the operation (e.g., removing/fading edges, bright-        Bakhshi Germi, S.; Rahtu, E.; and Huttunen, H. 2021. Selec-
ening the image without necessity, etc.). Applying a filter       tive Probabilistic Classifier Based on Hypothesis Testing. In
without knowing the type of corruption is almost as dan-          9th European Workshop on Visual Information Processing
gerous as not utilizing any technique. So, it is safe to as-      (EUVIP).
sume that some form of corruption is inevitable. A suitable       Baltrušaitis, T.; Ahuja, C.; and Morency, L.-P. 2018. Multi-
backup plan would involve using the rejection option as de-       modal Machine Learning: A Survey and Taxonomy. IEEE
scribed before to reduce the amount of overconfident wrong        Transactions on Pattern Analysis and Machine Intelligence,
outputs.                                                          41(2): 423–443.
                                                                  Bendale, A.; and Boult, T. 2015. Towards Open World
                     4    Conclusion                              Recognition. In IEEE Conference on Computer Vision and
The research around using deep learning algorithms in             Pattern Recognition (CVPR), 1893–1902.
safety-critical applications is growing rapidly, with the cur-    Chen, J.; Li, Y.; Wu, X.; Liang, Y.; and Jha, S. 2020.
rent state-of-the-art answers partially fulfilling the require-   Robust Out-of-distribution Detection for Neural Networks.
ments of old standards. However, the nature of the problem        arXiv:2003.09711.
demands to move away from the traditional broad-spectrum
                                                                  Chen, P.; Ye, J.; Chen, G.; Zhao, J.; and Heng, P.-A. 2021.
method of standardization as it is not suitable for deep learn-
                                                                  Beyond Class-Conditional Assumption: A Primary Attempt
ing algorithms. There is a high demand for task-specific
                                                                  to Combat Instance-Dependent Label Noise. Proceedings
standards to be developed. Until such standards are devel-
                                                                  of the AAAI Conference on Artificial Intelligence, 35(13):
oped, the research community focuses on alternative ap-
                                                                  11442–11450.
proaches and empirical analysis to provide practical solu-
tions on specific cases.                                          Ciftcioglu, O.; and Turkcan, E. 1996. Data fusion and sensor
   This paper provides a practical list of safety concerns for    management for nuclear power plant safety.
a visual deep learning algorithm by explaining the under-         Cordeiro, F. R.; and Carneiro, G. 2020. A Survey on Deep
lying cause of faults and providing current state-of-the-art      Learning with Noisy Labels: How to train your model when
solutions to mitigate them. By presenting the limitations of      you cannot trust on the annotations? arXiv:2012.03061.
existing mitigation methods, the need for further study is ex-    Fan, L.; Zhang, F.; Fan, H.; and Zhang, C. 2019. Brief re-
pressed. We hope this paper offers an insight to those who        view of image denoising techniques. Visual Computing for
want to utilize deep learning algorithms in their applications    Industry, Biomedicine, and Art, 2(1): 1–12.
or those who want to develop proper standard or safety case
arguments for such systems.                                       Farahani, A.; Voghoei, S.; Rasheed, K.; and Arabnia,
                                                                  H. R. 2020. A Brief Review of Domain Adaptation.
                   Acknowledgments                                arXiv:2010.03978.
This research is done as part of a Ph.D. study co-funded by       Gharib, M.; and Bondavalli, A. 2019. On the evaluation
Tampere University and Forum for Intelligent Machines ry          measures for machine learning algorithms for safety-critical
(FIMA).                                                           systems. In 15th European Dependable Computing Confer-
                                                                  ence (EDCC), 141–144.
                         References                               Goyal, B.; Dogra, A.; Agrawal, S.; Sohi, B.; and Sharma, A.
Abuolaim, A.; Timofte, R.; and Brown, M. S. 2021. NTIRE           2020. Image denoising review: From classical to state-of-
2021 Challenge for Defocus Deblurring Using Dual-pixel            the-art approaches. Information Fusion, 55: 220–244.
Images: Methods and Results. In IEEE/CVF Conference               Guo, W.; Wang, J.; and Wang, S. 2019. Deep Multimodal
on Computer Vision and Pattern Recognition Workshops              Representation Learning: A Survey. IEEE Access, 7: 63373–
(CVPRW), 578–587.                                                 63394.
Heyn, H.-M.; Knauss, E.; Muhammad, A. P.; Eriksson, O.;           Nikolenko, S. I. 2019. Synthetic Data for Deep Learning.
Linder, J.; Subbiah, P.; Pradhan, S. K.; and Tungal, S. 2021.     arXiv:1909.11512.
Requirement Engineering Challenges for AI-intense Sys-            Parmar, J.; Chouhan, S. S.; and Rathore, S. S. 2021. Open-
tems Development. arXiv:2103.10270.                               world Machine Learning: Applications, Challenges, and Op-
Houben, S.; Abrecht, S.; Akila, M.; Bär, A.; Brockherde,         portunities. arXiv:2105.13448.
F.; Feifel, P.; Fingscheidt, T.; Gannamaneni, S. S.; Ghobadi,     Putra, R.; Purboyo, T.; and Prasasti, A. 2017. A Review
S. E.; Hammam, A.; Haselhoff, A.; Hauser, F.; Heinze-             of Image Enhancement Methods. International Journal of
mann, C.; Hoffmann, M.; Kapoor, N.; Kappel, F.; Klingner,         Applied Engineering Research, 12: 13596–13603.
M.; Kronenberger, J.; Küppers, F.; Löhdefink, J.; Mlynarski,    Raghunathan, T. E. 2021. Synthetic Data. Annual Review of
M.; Mock, M.; Mualla, F.; Pavlitskaya, S.; Poretschkin,           Statistics and Its Application, 8(1): 129–140.
M.; Pohl, A.; Ravi-Kumar, V.; Rosenzweig, J.; Rottmann,
M.; Rüping, S.; Sämann, T.; Schneider, J. D.; Schulz, E.;       Ren, P.; Xiao, Y.; Chang, X.; Huang, P.-y.; Li, Z.; Chen, X.;
Schwalbe, G.; Sicking, J.; Srivastava, T.; Varghese, S.; We-      and Wang, X. 2021. A Comprehensive Survey of Neural
ber, M.; Wirkert, S.; Wirtz, T.; and Woehrle, M. 2021. In-        Architecture Search: Challenges and Solutions. ACM Com-
spect, Understand, Overcome: A Survey of Practical Meth-          puting Surveys, 54(4): 1–34.
ods for AI Safety. arXiv:2104.14235.                              Sada, M. M.; and Goyani, M. M. 2018. Image Deblurring
Hutter, F.; Lücke, J.; and Schmidt-Thieme, L. 2015. Beyond       Techniques–A Detail Review. International Journal of Sci-
manual tuning of hyperparameters. KI-Künstliche Intelli-         entific Research in Science, Engineering and Technology, 4:
genz, 29(4): 329–337.                                             176–188.
                                                                  Salman, T.; Ghubaish, A.; Unal, D.; and Jain, R. 2020.
ISO 26262. 2018. Road vehicles – Functional safety. Stan-
                                                                  Safety Score as an Evaluation Metric for Machine Learning
dard, International Organization for Standardization.
                                                                  Models of Security Applications. IEEE Networking Letters,
ISO/PAS 21448. 2019. Road vehicles — Safety of the in-            2(4): 207–211.
tended functionality. Standard, International Organization        Sastry, C. S.; and Oore, S. 2020. Detecting Out-of-
for Standardization.                                              Distribution Examples with Gram Matrices. In Proceedings
Jebur, R. S.; Der, C. S.; and Hammood, D. A. 2020. A Re-          of the 37th International Conference on Machine Learning,
view and Taxonomy of Image Denoising Techniques. In               volume 119, 8491–8501.
6th International Conference on Interactive Digital Media         Schmarje, L.; Santarossa, M.; Schröder, S.-M.; and Koch, R.
(ICIDM).                                                          2021. A Survey on Semi-, Self- and Unsupervised Learning
Kläs, M.; and Jöckel, L. 2020. A Framework for Build-           for Image Classification. IEEE Access, 9: 82146–82168.
ing Uncertainty Wrappers for AI/ML-Based Data-Driven              Schwalbe, G.; Knie, B.; Sämann, T.; Dobberphul, T.; Gauer-
Components. In Computer Safety, Reliability, and Security.        hof, L.; Raafatnia, S.; and Rocco, V. 2020. Structuring the
SAFECOMP Workshops, 315–327.                                      Safety Argumentation for Deep Neural Network Based Per-
Krizhevsky, A. 2009. Learning multiple layers of features         ception in Automotive Applications. In Computer Safety,
from tiny images. Technical report.                               Reliability, and Security. SAFECOMP Workshops, 383–394.
König, G.; Molnar, C.; Bischl, B.; and Grosse-Wentrup, M.        Shorten, C.; and Khoshgoftaar, T. M. 2019. A survey on
2021. Relative Feature Importance. In 25th International          image data augmentation for deep learning. Journal of Big
Conference on Pattern Recognition (ICPR), 9318–9325.              Data, 6(1): 1–48.
Lambert, J.; Liu, Z.; Sener, O.; Hays, J.; and Koltun, V. 2020.   Sklaroff, J. R. 1976. Redundancy Management Technique
MSeg: A composite dataset for multi-domain semantic seg-          for Space Shuttle Computers. IBM Journal of Research and
mentation. In IEEE Conference on Computer Vision and              Development, 20(1): 20–28.
Pattern Recognition (CVPR), 2879–2888.                            Slačka, J.; and Halás, M. 2015. Safety critical RTOS for
Lecun, Y.; Bottou, L.; Bengio, Y.; and Haffner, P. 1998.          space satellites. In 20th International Conference on Process
Gradient-based learning applied to document recognition.          Control (PC), 250–254.
Proceedings of the IEEE, 86(11): 2278–2324.                       Song, H.; Kim, M.; Park, D.; Shin, Y.; and Lee, J.-G. 2021.
Li, Y.; Yang, M.; and Zhang, Z. 2018. A Survey of                 Learning from Noisy Labels with Deep Neural Networks: A
Multi-View Representation Learning. IEEE Transactions on          Survey. arXiv:2007.08199.
Knowledge and Data Engineering, 31(10): 1863–1883.                TND6233-D. 2018. Evaluating Functional Safety in Auto-
Luo, G. 2016. A review of automatic selection methods             motive Image Sensors. White paper, ON Semiconductor.
for machine learning algorithms and hyper-parameter val-          Van Engelen, J. E.; and Hoos, H. H. 2020. A survey on semi-
ues. Network Modeling Analysis in Health Informatics and          supervised learning. Machine Learning, 109(2): 373–440.
Bioinformatics, 5(1): 1–16.                                       Wang, X.; Wang, K.; and Lian, S. 2020. A survey on face
Nah, S.; Son, S.; Lee, S.; Timofte, R.; and Lee, K. M. 2021.      data augmentation for the training of deep neural networks.
NTIRE 2021 Challenge on Image Deblurring. In IEEE/CVF             Neural computing and applications, 1–29.
Conference on Computer Vision and Pattern Recognition             Willers, O.; Sudholt, S.; Raafatnia, S.; and Abrecht, S. 2020.
Workshops (CVPRW), 149–165.                                       Safety Concerns and Mitigation Approaches Regarding the
Use of Deep Learning in Safety-Critical Perception Tasks.
In Computer Safety, Reliability, and Security. SAFECOMP
Workshops, 336–350.
Wistuba, M.; Rawat, A.; and Pedapati, T. 2019. A Survey on
Neural Architecture Search. arXiv:1905.01392.
Wozniak, E.; Cârlan, C.; Acar-Celik, E.; and Putzer, H. J.
2020. A Safety Case Pattern for Systems with Machine
Learning Components. In Computer Safety, Reliability, and
Security. SAFECOMP Workshops, 370–382.
Xu, H.; Ma, Y.; Liu, H.-C.; Deb, D.; Liu, H.; Tang, J.-L.;
and Jain, A. K. 2020. Adversarial attacks and defenses in
images, graphs and text: A review. International Journal of
Automation and Computing, 17(2): 151–178.
Yu, T.; and Zhu, H. 2020. Hyper-Parameter Optimization: A
Review of Algorithms and Applications. arXiv:2003.05689.
Yuan, X.; He, P.; Zhu, Q.; and Li, X. 2019. Adversarial
Examples: Attacks and Defenses for Deep Learning. IEEE
Transactions on Neural Networks and Learning Systems,
30(9): 2805–2824.
Zendel, O.; Honauer, K.; Murschitz, M.; Steininger, D.; and
Domı́nguez, G. F. 2018. WildDash - Creating Hazard-Aware
Benchmarks. In Computer Vision – ECCV, 407–421.
Zhang, D.; Yin, J.; Zhu, X.; and Zhang, C. 2018. Network
Representation Learning: A Survey. IEEE Transactions on
Big Data, 6(1): 3–28.
Zhang, X.-Y.; Liu, C.-L.; and Suen, C. Y. 2020. Towards
Robust Pattern Recognition: A Review. Proceedings of the
IEEE, 108(6): 894–922.
Zhou, J.; Gandomi, A. H.; Chen, F.; and Holzinger, A. 2021.
Evaluating the Quality of Machine Learning Explanations:
A Survey on Methods and Metrics. Electronics, 10(5).
Zhuang, F.; Qi, Z.; Duan, K.; Xi, D.; Zhu, Y.; Zhu, H.;
Xiong, H.; and He, Q. 2020. A Comprehensive Survey on
Transfer Learning. Proceedings of the IEEE, 109(1): 43–76.