=Paper=
{{Paper
|id=Vol-3087/paper_8
|storemode=property
|title=A Practical Overview of Safety Concerns and Mitigation Methods for Visual Deep Learning Algorithms
|pdfUrl=https://ceur-ws.org/Vol-3087/paper_8.pdf
|volume=Vol-3087
|authors=Saeed Bakhshi Germi,Esa Rahtu
|dblpUrl=https://dblp.org/rec/conf/aaai/GermiR22
}}
==A Practical Overview of Safety Concerns and Mitigation Methods for Visual Deep Learning Algorithms==
A Practical Overview of Safety Concerns and Mitigation Methods for Visual Deep Learning Algorithms Saeed Bakhshi Germi, Esa Rahtu Tampere University Korkeakoulunkatu 7, 33720 Tampere, Finland saeed.bakhshigermi@tuni.fi, esa.rahtu@tuni.fi Abstract This paper focuses on the underlying cause of faults in a visual deep learning algorithm to provide a list of safety This paper proposes a practical list of safety concerns and concerns and potential state-of-the-art mitigation methods. mitigation methods for visual deep learning algorithms. The growing success of deep learning algorithms in solving non- The main contributions of this paper are: linear and complex problems has recently attracted the atten- • Providing a practical, complete, and categorical list of tion of safety-critical applications. While the state-of-the-art possible faults with their underlying cause for different methods achieve high performance in synthetic and real-case visual deep learning algorithm components. scenarios, it is impossible to verify/validate their reliability based on currently available safety standards. Recent works • Providing potential state-of-the-art mitigation methods to try to solve the issue by providing a list of safety concerns and deal with the faults. mitigation methods in generic machine learning algorithms The rest of this paper is structured as follows. Section 2 from the standards’ perspective. However, these solutions are covers related works. Next, Section 3 explains safety con- either vague, and non-practical when dealing with deep learn- ing methods in real-case scenarios, or they are shallow and cerns related to a visual deep learning algorithm and pro- fail to address all potential safety concerns. This paper pro- vides existing mitigation methods to deal with them. Finally, vides an in-depth look at the underlying cause of faults in a Section 4 concludes the work. visual deep learning algorithm to find a practical and com- plete safety concern list with potential state-of-the-art mitiga- 2 Related Works tion strategies. A visual deep learning algorithm is prone to different types of faults. Recent papers focus on either solving specific 1 Introduction faults or providing an overview of all system-related safety concerns. Here we discuss some of the most important con- Deep learning is a powerful tool that solves mathemati- temporary works: cally challenging tasks with high dimensional inputs and Zhang’s review of recent papers explains how violation of multi-variable optimization requirements such as human re- critical assumptions in the training stage would lead to faults identification, optical character recognition, and object de- and a non-robust system (Zhang, Liu, and Suen 2020). This tection. The learning process involves using heuristic and review also categorically covers existing mitigation methods numerical methods, which are often hard to explain or inter- and discusses each technique’s effectiveness. Song focuses pret as the dimension grows (black-box behavior). on learning with noisy labels and discusses major strategies While state-of-the-art deep learning algorithms achieve to overcome the challenges of this topic (Song et al. 2021). high performance in various synthetic and real-life cases, While these works and similar titles provide potential miti- there is no guarantee for the reliability requirements that gation methods for specific faults, they do not offer a com- safety-critical applications typically demand since avail- plete list of all safety concerns. able safety standards do not provide a suitable verifica- Kläs suggests using uncertainty wrappers on deep learn- tion/validation method for deep learning models. ing components to ensure the outcome is dependable (Kläs Recent works found another way of dealing with the prob- and Jöckel 2020). However, these wrappers rely on specific lem. By explaining the potential safety concerns of a deep metrics that require prior knowledge of data, which is con- learning algorithm, it is possible to provide suitable mitiga- sidered impractical in the deep learning field. tion methods around them. While the overall strategy sounds Wozniak, Schwalbe, and Willers suggest different ap- effective, most works fail to provide a practical list of safety proaches to providing a safety concern list and mitigation concerns and mitigation methods. These lists are typically methods for developing a deep learning algorithm (Woz- vague, impractical to implement, shallow, and incomplete. niak et al. 2020; Schwalbe et al. 2020; Willers et al. 2020). Copyright © 2022 for this paper by its authors. Use permitted under The proposed strategies contain various goals related to the Creative Commons License Attribution 4.0 International (CC BY dataset, model, and training/inference stage. However, some 4.0). goals are vague and non-practical, with no explanation on how to achieve them or what to do if the goal is not achiev- SC 2 – Inadequate Dataset: Due to the ever-changing na- able. Moreover, the list is not complete in either work. ture of real-world conditions, the collected data for training Houben provides an extensive list of practical methods to will not have identical distribution with the real-world en- improve the safety of a deep learning algorithm (Houben vironment in the inference stage. Even a slight mismatch in et al. 2021). The work covers the current state-of-the-art the distribution can cause a significant drop in performance methods to deal with specific problems. However, the pro- and result in poor generalization. vided safety concern list is neither complete nor adequately categorized. SC 3 – Insufficient/Noisy Dataset: The cost of manu- Other similar works, such as (Heyn et al. 2021), also suf- ally labeling a dataset increases exponentially with its size. fer from the same issues. The flaws of recent works can be While having a small clean validation dataset is feasible, listed as one or more of the following: larger datasets tend to have noisy labels. A deep learning algorithm can memorize this noise, leading to poor general- • Not covering the underlying causes of faults, which ization and low performance. might lead to poor choice of mitigation methods. SC 4 – Ill-Matched Architecture: Manually comparing • Providing non-practical and vague mitigation methods, different models and hyperparameters to find the best match which are not suitable for implementation. for the task is time-consuming and costly. Moreover, it re- • Overestimating the practical capabilities of mitigation quires an expert in the field to provide an insight into the methods in dealing with faults and not providing backup problem. An ill-matched architecture could result in unfore- plans in case of failure. seen faults due to inherent weakness against specific situa- tions that might exist. 3 Safety Concerns (SC) and Mitigation MM 1 – Learning with Unseen Data: Modern deep Methods (MM) learning tools could be utilized to force the boundaries of The development of a visual deep learning algorithm has the training dataset even further. Out-of-distribution detec- three major stages: (1) training, (2) evaluation, and (3) tors can be used in the algorithm to detect unseen samples inference. This section presents the list of possible faults in the inference stage and reject the over-confident results of within each stage. the algorithm. These methods introduce uncertainty metrics to determine whether the algorithm should be trusted or not 3.1 Faults in the Training Stage (Chen et al. 2020; Sastry and Oore 2020; Bakhshi Germi, Visual data is one of the significant sources of information Rahtu, and Huttunen 2021). for deep learning algorithms. Extracting useful information Also, open-world recognition systems can be used to ex- from visual data is a complex task that makes it prone to tend the output space of the algorithm as it encounters out- faults. lier samples in the inference stage. These methods con- A deep learning algorithm approximates the relationship tinue to learn new classes during the inference stage to re- between the input data and the objects in the real world by duce the chance of over-confident wrong predictions (Par- reducing the empirical risk on training data. Thus, having a mar, Chouhan, and Rathore 2021; Bendale and Boult 2015). proper training dataset is essential to reach the desired qual- Moreover, the model could be trained to defend against ity in the algorithm. A training dataset should be: adversarial attacks by including such patterns in the training dataset (Xu et al. 2020; Yuan et al. 2019). • Complete: contain samples from the defined output space for the task. Discussing MM 1: Out-of-distribution detectors typically • Adequate: contain samples with identical distribution to result in lower accuracy, open-world recognition systems are real-world. slow and demanding, and adversarial attacks keep evolving • Ample: contain a sufficient amount of samples for con- and changing every day. The mentioned methods all have vergence of the algorithm. their limitation. A suitable backup plan would involve utiliz- ing several models with various mitigation methods to create • Clean: contain well-labeled samples. an ensemble to vote for the final result. Moreover, different model structures come with specific sets of benefits and weaknesses. Choosing the correct model, MM 2 – Learning with Unequally Distributed Data: setting up a suitable loss function and optimization algo- Modern deep learning tools could be utilized to reduce the rithm, and finding the perfect hyperparameters are essential distribution mismatch between the training and inference to achieve the best performance. domain. Transfer learning and domain adaptation can be used to fine-tune the algorithm online during the inference SC 1 – Incomplete Dataset: Due to the natural complex- stage. These methods help the model to adapt to new envi- ity of the real world, there is always a much larger open ronments quickly and achieve better generalization by using space than the defined output space for the task. Even with a small batch of data in the inference stage (Farahani et al. defined boundaries for the output space, known unknowns 2020; Zhuang et al. 2020). ( e.g., outlier classes) and unknown unknowns (e.g., adver- On the other hand, the algorithm can achieve higher sarial attacks) pose a significant issue for the algorithm by performance by utilizing multiple sources of information producing over-confident wrong predictions. for a single task (e.g., person identification with face, iris, Figure 1: Samples of the same category in MNIST (Top) (Lecun et al. 1998) and CIFAR-10 (Bottom) (Krizhevsky 2009) datasets (Taken from (Chen et al. 2021)). From left to right, the difficulty of classifying is increasing for both manual and automatic label assignment, thus resulting in the increased chance of noisy labels. voice, and fingerprint). Multimodal learning methods incor- Discussing MM 3: Recent works prove that the label porate supplementary and complementary data from mul- noise is instance-dependent, as shown in Figure 1. This dis- tiple modalities to the performance of a single task (Bal- covery means most state-of-the-art methods in dealing with trušaitis, Ahuja, and Morency 2018; Guo, Wang, and Wang label noise need revision on how to mitigate the effects of 2019). label noise. Recent works happen to focus on this topic and provide effective solutions. While these solutions do not Discussing MM 2: Transfer learning and domain adapta- have mathematical proof, they perform decently on public tion methods typically rely on having a decent starting point benchmarks. (trained network) and quality samples from the inference Meanwhile, the research around synthesized data indi- stage to fine-tune the model successfully. While the require- cates that it may not represent the real world in every sit- ments are hard to achieve, it is not impossible. Moreover, uation due to the limitations of simulation environments and multimodal methods have already been used with sensor fu- lack of involved experts in the process. Moreover, the exist- sion in autonomous vehicles (LIDAR, GPS, IMU, and so ing public datasets might not suit the specific task or have on), making them a strong candidate for use in deep learn- other inconsistencies, such as low-quality images and noisy ing systems. A suitable backup plan would involve storing labels. the input data during the inference stage to re-evaluate and A suitable backup plan would involve developing a more re-calibrate the algorithm by replacing parts of the older and realistic simulation environment while including the physi- non-useful training dataset in an iterative cycle. cal knowledge about the task in the training process. MM 4 – Automated Architecture Selection: Modern MM 3 – Learning with Noisy Labels and Small Dataset: deep learning tools could be utilized to select the optimum Modern deep learning tools could be utilized to reduce model and hyperparameter for a given task. Automated hy- the effect of label noise or eliminate the need for a large perparameter optimization (Yu and Zhu 2020; Luo 2016; labeled dataset. Robust loss, sample selection, relabeling, Hutter, Lücke, and Schmidt-Thieme 2015) and neural ar- and weighted training are all potential solutions to deal chitecture search (Wistuba, Rawat, and Pedapati 2019; Ren with noisy labels in the training dataset (Song et al. 2021; et al. 2021) methods can reduce manual labor while elimi- Cordeiro and Carneiro 2020; Adhikari et al. 2021). A com- nating the need for an expert. These methods rely on differ- bination of multiple methods usually leads to better results. ent search algorithms to find the best model and hyperpa- On the other hand, data augmentation methods can be rameters within the working domain. used to create additional samples for the training dataset. These methods typically involve rotating, scaling, shifting, Discussing MM 4: Relying on search algorithms requires and flipping data (Wang, Wang, and Lian 2020; Shorten high computational power and proper comparison tools. and Khoshgoftaar 2019). More advanced synthesizing tech- While it will cost money and time to do it, the solution is niques can lead to the creation of entire datasets (Raghu- not impossible or impractical in most safety-critical applica- nathan 2021; Nikolenko 2019). Additionally, existing public tions. datasets can be utilized to extend the samples at a lower cost. Moreover, the cost and time for manually labeling 3.2 Faults in the Evaluation Stage datasets can be drastically reduced by using iterative label- Evaluation of a trained deep learning algorithm requires ing methods (Adhikari and Huttunen 2021). prior knowledge about the task. A testing dataset should in- Finally, semi-supervised and unsupervised training tech- clude samples from all scenarios, no matter how rare, to en- niques can be used to decrease the dependency on a clean sure the safety of the algorithm. Also, proper performance training dataset (Van Engelen and Hoos 2020; Schmarje metrics should be selected during the tests to obtain compa- et al. 2021). rable outputs. A B C Figure 2: Effects of camera faults on the input image (Taken from (TND6233-D)): (A) Faulty clocking system, (B) Faulty pipeline, and (C) Faulty row addressing logic. Moreover, formal verification/validation methods depend SC 6 – Black-Box Behavior: The large volume of param- on having an interpretable algorithm, which contrasts deep eters and non-linear functions in deep learning algorithms learning. result in an uninterpretable system. With no clear relation between the input and output of this black-box system and SC 5 – Incompatible Metrics and Benchmarks: The the impossible task of testing the entire input domain, it is most common performance metric in deep learning algo- hard to verify/validate deep learning algorithms based on rithms is accuracy. However, other metrics might hold more safety standards. value in safety-critical applications as the importance of false-positive and false-negative grow exponentially in this MM 6 – Opening the Black-Box: Representation learn- field. Moreover, gathering a proper dataset to use as a bench- ing enables the deep learning algorithm to discover the re- mark has similar challenges to the training dataset. lation between input data and output in a presentable way MM 5 – Using Safety-Aware Metrics and Hazard-Aware by showing the process of feature selection (Zhang et al. Benchmarks: By including a weighted cost for each type 2018; Li, Yang, and Zhang 2018). Understanding this pro- of fault in the performance metric, the algorithm can be eval- cess helps to gain an insight into how the network interprets uated according to safety requirements (Zhou et al. 2021; input data, and which parts of data play a more significant Gharib and Bondavalli 2019; Salman et al. 2020). These new role in deciding the outcome. evaluation metrics would make the trade-off between perfor- Another way to gain such insight is to present a map of mance and safety more visible. pixel relevance for the algorithm. These heat maps illustrate On the other hand, a list of all hazardous scenarios can be the importance of each pixel when calculating the output prepared for every task for inclusion in the testing dataset (König et al. 2021; Bach et al. 2015). Such information can by performing a risk analysis on the task (Zendel et al. be about isolated pixels or the interconnection of different 2018; Lambert et al. 2020). Such datasets could be treated pixels. Studying these maps could show the effects of slight as benchmarks for comparing different algorithms or vali- changes in input on the output and help find potential haz- dating their performance. ardous cases. Discussing MM 5: While formulating a new cost function Discussing MM 6: This specific problem could be one of requires expert knowledge, it is within the scope of expec- the most important ones with the least proper solutions as of tations in a safety-critical application. Various combination yet. While it is possible to gain some insight into the opera- of weighted metrics can be utilized and compared to find tion of deep learning algorithms, the information cannot be the most suitable one for the task. However, a bad decision used in any form to verify/validate the algorithm based on could result in a non-converging algorithm, thus there is a traditional standards. A suitable backup plan would involve necessity for mathematical proof about the convergence of using safety case arguments and other similar approaches to the algorithm. bypass the need for verification/validation for now. Moreover, the competitive nature of industry typically prevents them from sharing any suitable benchmarks or 3.3 Faults in the Inference Stage cost functions publicly, which means each company has to In a typical case, a similar sensor used for collecting offline spend time and resources on developing their own system. A data provides the online data for the implemented algorithm. suitable backup plan would involve third-party associations On top of it, other hardware components are required for funded by multiple companies to handle the problem for the the algorithm to work correctly. These components can be benefit of all members. summarized as: Figure 3: Effects of environmental factors on the input image (Taken from (Bakhshi Germi, Rahtu, and Huttunen 2021)): (A) Original image, (B) Movement of camera/object (Motion blur), (C) Raindrop on the lens (Frosted-glass blur), (D) Out-of- focus object (Gaussian blur), (E) Low illumination (Gaussian noise), (F,G) Improper balance of light and darkness (Low/High brightness), and (H) Obscured object (Occlusion). • A camera to capture the input image. and ISO/PAS 21448 (ISO/PAS 21448) provide practical • A communication channel to transfer the captured image. guidelines for verifying and validating hardware compo- nents. Also, technical reports based on functional safety • A processing unit to host the deep learning algorithm. standards can help develop or choose safe hardware com- • A power supply to keep the system running. ponents such as a camera (TND6233-D), communication channel (Alanen, Hietikko, and Malm 2004), and operating SC 7 – Defective Hardware: The first concern in deep system (Slačka and Halás 2015). learning algorithms is providing the necessary hardware mentioned above. Hardware faults can have a wide range Moreover, other precautions such as using redundant of effects on the algorithm based on the faulty component, hardware, proper noise shielding, and data fusion techniques an example being the results of a faulty camera on the cap- have already proved helpful in safety-critical applications tured image, as shown in Figure 2. An implementation of (Sklaroff 1976; Ciftcioglu and Turkcan 1996). the algorithm might run into problems based on the defec- Discussing MM 7.1: Assuming the hardware is chosen tive hardware component: based on the proper functional safety standards, it should op- • Camera faults that might result in various disturbances in erate without significant safety concerns. However, this mit- the input image, such as pixel corruption or image distor- igation method does not guarantee the complete removal of tion. any disturbance or corruption of data. Environmental factors such as lousy illumination, movement, and obscured objects • Communication channel faults that might result in data can affect input image quality without causing a hardware corruption or data loss. failure, as seen in Figure 3. While some of these problems • Processing unit faults that might result in wrong calcula- might not be recognizable by a human annotator, the deep tions, lagging, or freezing of the algorithm. learning algorithm could run into faults based on the type • Power supply faults might result in breaking other hard- and severity of corruption. Moreover, less severe levels of ware components or total system shutdown. hardware failure might cause noise variations on the input data. A suitable backup plan would involve utilizing another MM 7.1 – Following Functional Safety Standards: The mitigation approach described as follows. mentioned hardware components are not unique to deep learning algorithms and have been used for decades in MM 7.2 – Using Image Processing Techniques: Since safety-critical applications. As a result, the current func- the exact relation between the input image and the output tional safety standards such as ISO 26262 (ISO 26262) of the deep learning algorithm is not known, it is recom- mended to have clean input data to reduce the change of Adhikari, B.; and Huttunen, H. 2021. Iterative Bounding unwanted outcomes. The current state-of-the-art image pro- Box Annotation for Object Detection. In 25th International cessing techniques such as denoising (Fan et al. 2019; Goyal Conference on Pattern Recognition (ICPR), 4040–4046. et al. 2020; Jebur, Der, and Hammood 2020), deblurring Adhikari, B.; Peltomäki, J.; Bakhshi Germi, S.; Rahtu, E.; (Sada and Goyani 2018; Nah et al. 2021; Abuolaim, Tim- and Huttunen, H. 2021. Effect of Label Noise on Robust- ofte, and Brown 2021), and enhancement (Putra, Purboyo, ness of Deep Neural Network Object Detectors. In Com- and Prasasti 2017) methods can improve the quality of the puter Safety, Reliability, and Security. SAFECOMP Work- input images and remove most of the disturbances not cov- shops, 239–250. ered by the previous mitigation method. Most image pro- cessing techniques have solid mathematical foundations and Alanen, J.; Hietikko, M.; and Malm, T. 2004. Safety of Dig- passed extensive testing cycles to prove their effectiveness, ital Communications in Machines. VTT Technical Research making them easy to validate and verify for safety-critical Centre of Finland. ISBN 951-38-6502-9. applications. Bach, S.; Binder, A.; Montavon, G.; Klauschen, F.; Müller, K.-R.; and Samek, W. 2015. On Pixel-Wise Explanations for Discussing MM 7.2: Image processing techniques are Non-Linear Classifier Decisions by Layer-Wise Relevance only valid when it’s known that the image is corrupted. Oth- Propagation. PLOS ONE, 10(7): 1–46. erwise, such functions can negatively affect a clean image during the operation (e.g., removing/fading edges, bright- Bakhshi Germi, S.; Rahtu, E.; and Huttunen, H. 2021. Selec- ening the image without necessity, etc.). Applying a filter tive Probabilistic Classifier Based on Hypothesis Testing. In without knowing the type of corruption is almost as dan- 9th European Workshop on Visual Information Processing gerous as not utilizing any technique. So, it is safe to as- (EUVIP). sume that some form of corruption is inevitable. A suitable Baltrušaitis, T.; Ahuja, C.; and Morency, L.-P. 2018. Multi- backup plan would involve using the rejection option as de- modal Machine Learning: A Survey and Taxonomy. IEEE scribed before to reduce the amount of overconfident wrong Transactions on Pattern Analysis and Machine Intelligence, outputs. 41(2): 423–443. Bendale, A.; and Boult, T. 2015. Towards Open World 4 Conclusion Recognition. In IEEE Conference on Computer Vision and The research around using deep learning algorithms in Pattern Recognition (CVPR), 1893–1902. safety-critical applications is growing rapidly, with the cur- Chen, J.; Li, Y.; Wu, X.; Liang, Y.; and Jha, S. 2020. rent state-of-the-art answers partially fulfilling the require- Robust Out-of-distribution Detection for Neural Networks. ments of old standards. However, the nature of the problem arXiv:2003.09711. demands to move away from the traditional broad-spectrum Chen, P.; Ye, J.; Chen, G.; Zhao, J.; and Heng, P.-A. 2021. method of standardization as it is not suitable for deep learn- Beyond Class-Conditional Assumption: A Primary Attempt ing algorithms. There is a high demand for task-specific to Combat Instance-Dependent Label Noise. Proceedings standards to be developed. Until such standards are devel- of the AAAI Conference on Artificial Intelligence, 35(13): oped, the research community focuses on alternative ap- 11442–11450. proaches and empirical analysis to provide practical solu- tions on specific cases. Ciftcioglu, O.; and Turkcan, E. 1996. Data fusion and sensor This paper provides a practical list of safety concerns for management for nuclear power plant safety. a visual deep learning algorithm by explaining the under- Cordeiro, F. R.; and Carneiro, G. 2020. A Survey on Deep lying cause of faults and providing current state-of-the-art Learning with Noisy Labels: How to train your model when solutions to mitigate them. By presenting the limitations of you cannot trust on the annotations? arXiv:2012.03061. existing mitigation methods, the need for further study is ex- Fan, L.; Zhang, F.; Fan, H.; and Zhang, C. 2019. Brief re- pressed. We hope this paper offers an insight to those who view of image denoising techniques. Visual Computing for want to utilize deep learning algorithms in their applications Industry, Biomedicine, and Art, 2(1): 1–12. or those who want to develop proper standard or safety case arguments for such systems. Farahani, A.; Voghoei, S.; Rasheed, K.; and Arabnia, H. R. 2020. A Brief Review of Domain Adaptation. Acknowledgments arXiv:2010.03978. This research is done as part of a Ph.D. study co-funded by Gharib, M.; and Bondavalli, A. 2019. On the evaluation Tampere University and Forum for Intelligent Machines ry measures for machine learning algorithms for safety-critical (FIMA). systems. In 15th European Dependable Computing Confer- ence (EDCC), 141–144. References Goyal, B.; Dogra, A.; Agrawal, S.; Sohi, B.; and Sharma, A. Abuolaim, A.; Timofte, R.; and Brown, M. S. 2021. NTIRE 2020. Image denoising review: From classical to state-of- 2021 Challenge for Defocus Deblurring Using Dual-pixel the-art approaches. Information Fusion, 55: 220–244. Images: Methods and Results. In IEEE/CVF Conference Guo, W.; Wang, J.; and Wang, S. 2019. Deep Multimodal on Computer Vision and Pattern Recognition Workshops Representation Learning: A Survey. IEEE Access, 7: 63373– (CVPRW), 578–587. 63394. Heyn, H.-M.; Knauss, E.; Muhammad, A. P.; Eriksson, O.; Nikolenko, S. I. 2019. Synthetic Data for Deep Learning. Linder, J.; Subbiah, P.; Pradhan, S. K.; and Tungal, S. 2021. arXiv:1909.11512. Requirement Engineering Challenges for AI-intense Sys- Parmar, J.; Chouhan, S. S.; and Rathore, S. S. 2021. Open- tems Development. arXiv:2103.10270. world Machine Learning: Applications, Challenges, and Op- Houben, S.; Abrecht, S.; Akila, M.; Bär, A.; Brockherde, portunities. arXiv:2105.13448. F.; Feifel, P.; Fingscheidt, T.; Gannamaneni, S. S.; Ghobadi, Putra, R.; Purboyo, T.; and Prasasti, A. 2017. A Review S. E.; Hammam, A.; Haselhoff, A.; Hauser, F.; Heinze- of Image Enhancement Methods. International Journal of mann, C.; Hoffmann, M.; Kapoor, N.; Kappel, F.; Klingner, Applied Engineering Research, 12: 13596–13603. M.; Kronenberger, J.; Küppers, F.; Löhdefink, J.; Mlynarski, Raghunathan, T. E. 2021. Synthetic Data. Annual Review of M.; Mock, M.; Mualla, F.; Pavlitskaya, S.; Poretschkin, Statistics and Its Application, 8(1): 129–140. M.; Pohl, A.; Ravi-Kumar, V.; Rosenzweig, J.; Rottmann, M.; Rüping, S.; Sämann, T.; Schneider, J. D.; Schulz, E.; Ren, P.; Xiao, Y.; Chang, X.; Huang, P.-y.; Li, Z.; Chen, X.; Schwalbe, G.; Sicking, J.; Srivastava, T.; Varghese, S.; We- and Wang, X. 2021. A Comprehensive Survey of Neural ber, M.; Wirkert, S.; Wirtz, T.; and Woehrle, M. 2021. In- Architecture Search: Challenges and Solutions. ACM Com- spect, Understand, Overcome: A Survey of Practical Meth- puting Surveys, 54(4): 1–34. ods for AI Safety. arXiv:2104.14235. Sada, M. M.; and Goyani, M. M. 2018. Image Deblurring Hutter, F.; Lücke, J.; and Schmidt-Thieme, L. 2015. Beyond Techniques–A Detail Review. International Journal of Sci- manual tuning of hyperparameters. KI-Künstliche Intelli- entific Research in Science, Engineering and Technology, 4: genz, 29(4): 329–337. 176–188. Salman, T.; Ghubaish, A.; Unal, D.; and Jain, R. 2020. ISO 26262. 2018. Road vehicles – Functional safety. Stan- Safety Score as an Evaluation Metric for Machine Learning dard, International Organization for Standardization. Models of Security Applications. IEEE Networking Letters, ISO/PAS 21448. 2019. Road vehicles — Safety of the in- 2(4): 207–211. tended functionality. Standard, International Organization Sastry, C. S.; and Oore, S. 2020. Detecting Out-of- for Standardization. Distribution Examples with Gram Matrices. In Proceedings Jebur, R. S.; Der, C. S.; and Hammood, D. A. 2020. A Re- of the 37th International Conference on Machine Learning, view and Taxonomy of Image Denoising Techniques. In volume 119, 8491–8501. 6th International Conference on Interactive Digital Media Schmarje, L.; Santarossa, M.; Schröder, S.-M.; and Koch, R. (ICIDM). 2021. A Survey on Semi-, Self- and Unsupervised Learning Kläs, M.; and Jöckel, L. 2020. A Framework for Build- for Image Classification. IEEE Access, 9: 82146–82168. ing Uncertainty Wrappers for AI/ML-Based Data-Driven Schwalbe, G.; Knie, B.; Sämann, T.; Dobberphul, T.; Gauer- Components. In Computer Safety, Reliability, and Security. hof, L.; Raafatnia, S.; and Rocco, V. 2020. Structuring the SAFECOMP Workshops, 315–327. Safety Argumentation for Deep Neural Network Based Per- Krizhevsky, A. 2009. Learning multiple layers of features ception in Automotive Applications. In Computer Safety, from tiny images. Technical report. Reliability, and Security. SAFECOMP Workshops, 383–394. König, G.; Molnar, C.; Bischl, B.; and Grosse-Wentrup, M. Shorten, C.; and Khoshgoftaar, T. M. 2019. A survey on 2021. Relative Feature Importance. In 25th International image data augmentation for deep learning. Journal of Big Conference on Pattern Recognition (ICPR), 9318–9325. Data, 6(1): 1–48. Lambert, J.; Liu, Z.; Sener, O.; Hays, J.; and Koltun, V. 2020. Sklaroff, J. R. 1976. Redundancy Management Technique MSeg: A composite dataset for multi-domain semantic seg- for Space Shuttle Computers. IBM Journal of Research and mentation. In IEEE Conference on Computer Vision and Development, 20(1): 20–28. Pattern Recognition (CVPR), 2879–2888. Slačka, J.; and Halás, M. 2015. Safety critical RTOS for Lecun, Y.; Bottou, L.; Bengio, Y.; and Haffner, P. 1998. space satellites. In 20th International Conference on Process Gradient-based learning applied to document recognition. Control (PC), 250–254. Proceedings of the IEEE, 86(11): 2278–2324. Song, H.; Kim, M.; Park, D.; Shin, Y.; and Lee, J.-G. 2021. Li, Y.; Yang, M.; and Zhang, Z. 2018. A Survey of Learning from Noisy Labels with Deep Neural Networks: A Multi-View Representation Learning. IEEE Transactions on Survey. arXiv:2007.08199. Knowledge and Data Engineering, 31(10): 1863–1883. TND6233-D. 2018. Evaluating Functional Safety in Auto- Luo, G. 2016. A review of automatic selection methods motive Image Sensors. White paper, ON Semiconductor. for machine learning algorithms and hyper-parameter val- Van Engelen, J. E.; and Hoos, H. H. 2020. A survey on semi- ues. Network Modeling Analysis in Health Informatics and supervised learning. Machine Learning, 109(2): 373–440. Bioinformatics, 5(1): 1–16. Wang, X.; Wang, K.; and Lian, S. 2020. A survey on face Nah, S.; Son, S.; Lee, S.; Timofte, R.; and Lee, K. M. 2021. data augmentation for the training of deep neural networks. NTIRE 2021 Challenge on Image Deblurring. In IEEE/CVF Neural computing and applications, 1–29. Conference on Computer Vision and Pattern Recognition Willers, O.; Sudholt, S.; Raafatnia, S.; and Abrecht, S. 2020. Workshops (CVPRW), 149–165. Safety Concerns and Mitigation Approaches Regarding the Use of Deep Learning in Safety-Critical Perception Tasks. In Computer Safety, Reliability, and Security. SAFECOMP Workshops, 336–350. Wistuba, M.; Rawat, A.; and Pedapati, T. 2019. A Survey on Neural Architecture Search. arXiv:1905.01392. Wozniak, E.; Cârlan, C.; Acar-Celik, E.; and Putzer, H. J. 2020. A Safety Case Pattern for Systems with Machine Learning Components. In Computer Safety, Reliability, and Security. SAFECOMP Workshops, 370–382. Xu, H.; Ma, Y.; Liu, H.-C.; Deb, D.; Liu, H.; Tang, J.-L.; and Jain, A. K. 2020. Adversarial attacks and defenses in images, graphs and text: A review. International Journal of Automation and Computing, 17(2): 151–178. Yu, T.; and Zhu, H. 2020. Hyper-Parameter Optimization: A Review of Algorithms and Applications. arXiv:2003.05689. Yuan, X.; He, P.; Zhu, Q.; and Li, X. 2019. Adversarial Examples: Attacks and Defenses for Deep Learning. IEEE Transactions on Neural Networks and Learning Systems, 30(9): 2805–2824. Zendel, O.; Honauer, K.; Murschitz, M.; Steininger, D.; and Domı́nguez, G. F. 2018. WildDash - Creating Hazard-Aware Benchmarks. In Computer Vision – ECCV, 407–421. Zhang, D.; Yin, J.; Zhu, X.; and Zhang, C. 2018. Network Representation Learning: A Survey. IEEE Transactions on Big Data, 6(1): 3–28. Zhang, X.-Y.; Liu, C.-L.; and Suen, C. Y. 2020. Towards Robust Pattern Recognition: A Review. Proceedings of the IEEE, 108(6): 894–922. Zhou, J.; Gandomi, A. H.; Chen, F.; and Holzinger, A. 2021. Evaluating the Quality of Machine Learning Explanations: A Survey on Methods and Metrics. Electronics, 10(5). Zhuang, F.; Qi, Z.; Duan, K.; Xi, D.; Zhu, Y.; Zhu, H.; Xiong, H.; and He, Q. 2020. A Comprehensive Survey on Transfer Learning. Proceedings of the IEEE, 109(1): 43–76.